For more than a century, the term reinforcement has referred to the emergence of a reliable, learned change in behavior produced when a response or stimulus is differentially followed by a rewarding or punishing event (a reinforcer). Early examples of instrumental reinforcement included: Edward Lee Thorndike’s (1874-1949) experiments with cats learning to escape (what he called) a puzzle box by manipulating a release mechanism for the door, B. F. Skinner’s (1904-1990) training of rats to press a lever for food, and the work of Willard Stanton Small (1870-1943) and Edward Chace Tolman (1886-1959) showing that rats could learn a maze path with, or without, food in the goal box. The dominant example of early Pavlovian reinforcement—from the work of the Russian physiologist Ivan Petrovich Pavlov (1849-1936)—showed that a dog could be conditioned to salivate to a signal predicting the imminent arrival of either meat powder or a weak acid solution in its mouth.
Each of the above examples provided an answer to a critical conceptual and practical question, “What conditions must be added to the simple repetition of a stimulus and/or a response to produce learning?” However, each answer also left notable theoretical questions unresolved. As a result, more than half of the twentieth century was spent on attempts to develop a general theory of learning. These theories differed in: (1) how learning was represented (for example, as stimulus-response or stimulusstimulus associations, as cognitive maps, memories, or transmitted information; (2) the necessary and sufficient qualities of a reinforcer (for example, a drive reducer vs. a drive inducer); and (3) the necessary temporal and spatial relations between the response and/or stimulus and the reinforcer (for example, contiguity vs. contingency). For additional differences see the analyses in William K. Estes’s Modern Learning Theory: A Critical Analysis of Five Examples (1954) of Thorndike’s concept of the strengthening of neural connections, Edwin Ray Guthrie’s (1886-1959) emphasis on temporal contiguity and repetition, Tolman’s cognitive maps and expectancies, and Clark L. Hull’s (1884-1952) multiple learning variables and performance equations relating them to each other and behavior.
The theory wars were both exciting and frustrating. Excitement came from the development of extensive frameworks like Hull’s hypothetico-deductive system, which was applied to behaviors ranging from approach-withdrawal reactions (Neal E. Miller, 1944), to curiosity (Harry Fowler, 1967), and personality and psychotherapy (John Dollard and Neal E. Miller, 1950). Difficulties arose from the inability of researchers to agree on critical tests to distinguish alternative theories and concepts, and from the continued failure of general theories to account for important empirical phenomena like partial reinforcement, and incentive contrast. For example, Abram Amsel (1922-2006; 1992) and E. J. Capaldi (1967) showed that relatively unpredictable reinforcers produced greater resistance to extinction than more frequent but predictable reinforcers. Similarly, Charles F. Flaherty (c. 1937-2004; 1966) showed that the effects of presenting two different reinforcers depended on their order. The absence of a satisfying general theoretical account of diverse phenomena led many researchers to abandon the search for a general theory and focus on the empirical effectiveness and applications of specific reinforcement procedures.
Thorndike claimed that all organisms were susceptible to the effects of reinforcement. Skinner supported the empirical generality of this claim by showing how reinforcement could be used to readily control the laboratory behavior of rats and pigeons. Marian Breland Bailey and Keller Breland, early students of Skinner, handily extended his reinforcement approach to the training of animals in shows and zoos by using “bridge stimuli” (secondary reinforcers). During his lengthy career, Skinner focused increasingly on the control of human behaviors by reinforcers, at first by using reinforcement schedules to produce human-like behaviors in pigeons, including “superstitions” and “assembly-line” response patterns. In a recent continuation of this kind of comparative approach, Brian Lau and Paul W. Glimcher used brain scanning to identify specific areas involved in the well-known operant behavior of matching response rates to reinforcer frequency in rhesus monkeys.
Skinner also added a specific human focus, emphasizing the potential effects of operant conditioning on everyday human behavior in his writing of a utopian novel, Walden Two (1948), followed by several books of essays: Science and Human Behavior (1953), Beyond Freedom and Dignity (1971), and About Behaviorism (1974). Further, Skinner helped develop specific reinforcement applications in the form of programmed learning in education, and token economies for treating the mentally ill and deficient. Many examples of such procedures are described in an extensive applied literature, e.g., the Journal of Applied Behavior Analysis, and in edited volumes (such as William O’Donohue’s Learning and Behavior Therapy, 1998; see also work of Leonard Krasner).
Reinforcers, in terms of their ability to produce learning and behavior, were first limited largely to incentives that reduced physiological imbalances of the body; but reinforcers also can be novel stimuli, social or sexual cues, or simply access to a higher probability response. For example, William Timberlake and James W. Allison showed that reinforcement depends on the use of operant schedules that restrict access to a response or reinforcer relative to its free (unconstrained) baseline level. William Timberlake and Valerie Farmer-Dougan reviewed how such a general regulatory framework could be translated to applied settings, and James W. Allison showed the relevance of regulatory work-income trade-offs to economic models (see Behavioral Economics 1983). J. E. R. Staddon (see Adaptive Dynamics: The Theoretical Analysis of Behavior, 2001) linked operant behavior to general optimization models, and J. C. David (see MacKay in Information Theory, Inference, and Learning Algorithms, 2003) and other scientists used computer modeling to produce optimizing algorithms leading to adaptive learning in machines.
Beginning with work on information processing and interaction among predictive cues in the 1960s, Pavlovian conditioning became an important focus of reinforcement research and theory. Psychologists Robert A. Rescorla and Allan R. Wagner modeled the interaction of cues in Pavlovian conditioning using a variant of a linear operator model in which stimuli predicting a reinforcer compete for its incremental strengthening effect. Their model appropriately predicts that when two predictive stimuli are trained together, the more salient stimulus typically overshadows learning about the less salient stimulus, and a stimulus learned first most often interferes with (blocks) learning about a stimulus added later (although both facilitation and configural learning can occur in similar circumstances).
Subsequent researchers, like Ralph Miller and Mark E. Bouton, added a role for context conditioning, especially in application-related phenomena like the reinstatement of extinguished fear. Similar models of reinforcement have focused either on aspects of the interaction of Pavlovian and instrumental contingencies (see Geoffery Hall, 2002), or on the scalar effect on acquisition speed and performance of the length of the CS relative to the length of inter-reinforcer interval (see a summary by Russell M. Church, 2002). Finally, Modelers such as Nestor A. Schmajuk (1977) have turned to the use of multilayer connectionist models, based on error-correction algorithms, to simulate interactions among multiple predictive cues.
To complete the picture, current investigators interested in neurophysiological substrates of learning often use Pavlovian and Operant reinforcement procedures to clarify the effects of brain stimulation, lesions, drugs (see Shepherd Siegel), neural transmitters and trans-gene manipulations (see Louis D. Matzel). Finally, considerable progress has been made in deciphering the contribution of simple sensory, motor, and Pavlovian mechanisms to learning related to species typical behavior. See the admirable book by Thomas J. Carew, 2000, in which he includes accounts of the neurophysiological work by Richard W. Thompson (see also Thompson, 2005) on conditioning of the nictitating membrane in rabbits, and by Eric Kandel (see also Frank Krasne, 1985) on gill withdrawal in Aplysia (sea slugs).
A final approach to reinforcement began with twentieth-century researchers interested in evolution and learning. For example, T. C. Schneirla and M. E. Bitterman examined similarities and differences among species in reinforcement learning (see accounts in Comparative Psychology: A Handbook edited by Gary Greenberg and Martin Haraway). More recent investigators have explored the extent to which phenomena of human cognition can be produced by selective reinforcement of the perceptual choices of nonhuman animals. See Edward A. Wasserman and Thomas R. Zentall’s edited book on categorization and short-term memory mechanisms in animals, and edited book of Cecilia M. Heyes and Bennet G. Galef focused on imitation and social learning ( Social Learning in Animals: The Roots of Culture ).
More ecologically oriented researchers like John Garcia, Alan Kamil, David Sherry, and Charles R. Gallistel have clarified how ecologically specialized perceptual-motor learning mechanisms evolved to meet requirements for survival (summarized in Sara J. Shettleworth, 1998 Cognition, Evolution, and Behavior ). The relevance of such mechanisms to all laboratory research was pointed out by William Timberlake (2002) in examples of how ecologically-based abilities of common laboratory species have affected the design and results of common laboratory apparatus and procedures. Finally, work on human information-processing abilities, like that of John Tooby and Leda Cosmides on causal reasoning, and Paul Rozin and Carol Nemeroff on contamination and disgust reactions to foodstuffs, apply a similar ecological and evolutionary logic in the study of humans.
In short, given the diversity of reinforcement phenomena, mechanisms, and results, it appears that further increases in the effectiveness and generality of Reinforcement Theories and Models will depend on considering the effects of evolutionary, functional, applied, and neuro-physiological contexts, rather than depending solely on general learning principles and theories.
SEE ALSO Classical Conditioning; Cognition; Economics, Behavioral; Hull, Clark; Learned Helplessness; Neuroscience; Operant Conditioning; Pavlov, Ivan; Skinner, B. F.; Thorndike, Edward; Tolman, Edward
Allison, James W. 1983. Behavioral Economics. New York: Praeger.
Capaldi, E. J. 1966. Partial Reinforcement: A Hypothesis of Sequential Effects. Psychological Review 73: 459-477.
Church, Russel M. 2002. Temporal Learning. In Stevens’ Handbook of Experimental Psychology, vol. 3, eds. Hal Pashler and Randy Gallistel, 365-394. 3rd ed. New York: John Wiley and Sons.
Dollard, John, and Neal E. Miller. 1950. Personality and Psychotherapy: An Analysis in Terms of Learning, Thinking, and Culture. New York: McGraw-Hill.
Domjan, Michael. 2006. The Principles of Learning and Behavior. 5th ed. Belmont, CA: Thomson/Wadsworth.
Estes, William K. 1954. Modern Learning Theory: A Critical Analysis of Five Examples. New York: Appleton-Century-Crofts.
Flaherty, Charles F. 1996. Incentive Relativity. New York: Cambridge University Press.
Fowler, Harry. 1965. Curiosity and Exploratory Behavior. New York: Macmillan.
Gallistel, Charles R. 1990. The Organization of Learning. Cambridge, MA: MIT Press.
Garcia, John, and Rodrigo Garcia y Robertson. 1985. Evolution of Learning Mechanisms. In Psychology and Learning, ed. Barbara L. Hammonds, 187-243. Washington, DC: American Psychological Association.
Greenberg, Gary, and Maury M. Haraway. 1998. Comparative Psychology: A Handbook. New York: Garland.
Hull, Clark L. 1943. Principles of Behavior: An Introduction to Behavior Theory. New York: Appleton-Century.
Krasne, Frank. 2002. Neural Analysis of Learning in Simple Systems. In Stevens’ Handbook of Experimental Psychology, vol. 3, eds. Hal Pashler and Randy Gallistel, 131-200. 3rd ed. New York: John Wiley and Sons.
Krasner, Leonard. 1985. Applications of Learning Theory in the Environment. In Psychology and Learning, ed. Barbara L. Hammonds, 49-94. Washington, DC: American Psychological Association.
Lau, Brian, and Paul W. Glimcher. 2005. Dynamic Response-by-Response Models of Matching Behavior in Rhesus Monkeys. Journal of the Experimental Analysis of Behavior 84 (3): 555-579.
MacKay, David J.C. 2003. Information Theory, Inference, and Learning Algorithms. Cambridge, U.K.: Cambridge University Press.
Mackintosh, Nicholas John. 1983. Conditioning and Associative Learning. Oxford: Clarendon Press.
Matzel, Louis D. 2002. Learning Mutants. In Stevens’ Handbook of Experimental Psychology, Vol. 3, eds. Hal Pashler and Randy Gallistel, 201-238. 3rd eds. New York: John Wiley and Sons.
Miller, Neal E. 1944. Experimental Studies of Conflict. In Personality and the Behavior Disorders, ed. J. McV. Hunt, 1044. New York: Ronald.
Miller, Ralph, and Martha Escobar. 2002. Learning: Laws and Models of Basic Conditioning. In Stevens’ Handbook of Experimental Psychology, vol. 3, eds. Hal Pashler and Randy Gallistel, 47-102. 3rd ed. New York: John Wiley and Sons.
O’Donohue, William. 1998. Learning and Behavior Therapy. Boston: Allyn and Bacon.
Rozin, Paul, and Carol Nemeroff. 2002. Sympathetic Magical Thinking: The Contagion and Similarity “Heuristics.” In Heuristics and Biases: The Psychology of Intuitive Judgement, eds. Thomas Gilovich, Dale Griffin, and Daniel Kahneman, 201-216. Cambridge, U.K.: Cambridge University Press.
Schmajuk, Nestor A. 1997. Animal Learning and Cognition: A Neural Network Approach. Cambridge, U.K.: Cambridge University Press.
Shettleworth, Sara J. 1998. Cognition, Evolution, and Behavior. New York: Oxford University Press.
Siegel, Shepherd. 2005. Drug Tolerance, Drug Addiction, and Drug Anticipation. Current Directions in Psychological Science 14 (6): 296-300.
Staddon, J. E. R. 2001. Adaptive Dynamics: The Theoretical Analysis of Behavior. Cambridge, MA: MIT Press.
Thompson, Richard F. 2005. In Search of Memory Traces. Annual Review of Psychology 56: 1-23.
Timberlake, William D. 2002. Niche-related Learning in Laboratory Paradigms: The Case of Maze Behavior in Norway Rats. Behavioural Brain Research 134 (1): 355-374.
Timberlake, William D., and Valeri A. Farmer-Dougan. 1991. Reinforcement in Applied Settings: Figuring Out ahead of Time What Will Work. Psychological Bulletin 110 (3): 379-391.
Tooby, John, and Leda Cosmides. 2005. Conceptual Foundations of Evolutionary Psychology. In The Handbook of Evolutionary Psychology, ed. David M. Buss, 5-67. Hoboken, NJ: John Wiley and Sons.
Wasserman, Edward A., and Thomas R. Zentall, eds. 2006. Comparative Cognition: Experimental Explorations of Animal Intelligence. Oxford: Oxford University Press.