In the 1930s B. F. Skinner developed a new methodology for the study of animal learning and behavior. He called it operant behavior, to reflect the fact that the animal "operated" on the environment to produce a reward, or reinforcer. The Behavior of Organisms, published in 1938, was the principal document in which he presented his findings and his conceptual approach to the study of animal learning and behavior.
In the method that Skinner developed, the animal (most often a rat, pigeon, or monkey) emits particular behaviors, called instrumental responses (or behaviors), to gain a reinforcer. Most often, these responses involve an operandum (formerly called manipulandum) that is suited to the subject's motor abilities. Rats, monkeys, and other mammals press a horizontal bar (or lever) in the experimental chamber (often called a Skinner box), while pigeons peck at a vertical disk (or key); fish can be taught to swim through a ring. Normally, the reinforcer immediately follows the response.
Animals learn to emit particular instrumental responses because the reinforcers shape behavior. Behaviors that are followed by a reinforcer increase in frequency, and behaviors that are not followed by a reinforcer decrease in frequency. For example, to train a rat to press a lever, the experimenter may first reinforce the animal every time it approaches the lever. When the rat is reliably approaching the lever, reinforcers are provided only if it actually touches the lever. Finally, only pressing the lever is reinforced. This shaping of behavior by progressively narrowing the range of behaviors that are reinforced (the operant class) is known as the method of successive approximation. If reinforcement for a behavior is discontinued, the behavior will decrease in frequency and may stop completely. This process is known as extinction.
In discrete-trial procedures, the trial ends with a single response, and the probability, latency, or force of that response is recorded as the measure of behavior. Skinner developed another method of studying behavior that he called free-operant procedures. Here, the subject has access to the operandum for extended periods—sometimes an extended trial, on other occasions an entire experimental session—and can respond repeatedly during that period. Therefore, the rate of responding becomes the primary measure of behavior. Skinner developed an ingenious method for displaying the rate with a cumulative record (see Figure 1). Each response displaces a pen upward by a small amount on a moving strip of paper. This makes the rate of responding immediately visible as the measure of behavior. The higher the rate of responding, the steeper the slope of the cumulative record. However, in most current experimental applications, counters and computers are used to record and analyze response output. These measures allow for more quantitative analyses of behavior.
Schedules of Reinforcement
The designated instrumental response is followed on at least some occasions by a reinforcer such as a food pellet or liquid refreshment for the rat or monkey, grain for the bird, or money, tokens, or "points" for a human subject. Skinner designed schedules of reinforcement that provided reward only intermittently, in contrast with continuous reinforcement, where each response is reinforced. The subject may be reinforced only after emitting a number of responses, on a ratio schedule, or for a response after a period of time has elapsed, on an interval schedule. The required ratio may be constant on all occasions; this is a fixed-ratio schedule. Or it may vary from trial to trial; this is a variable-ratio schedule.
Likewise, in an interval schedule the interval may be fixed or variable. Skinner found that each of these schedules produced distinctive cumulative records. For example, in fixed-ratio schedules, animals frequently do not respond immediately after a reinforcer; this is called a post-reinforcement pause. Then they emit responses in a high-rate "burst" to obtain the reinforcer. In fixed-interval schedules, the subject typically does not respond immediately after the reinforcer, and the rate of responding steadily accelerates as the end of the interval approaches. Variable-interval and variable-ratio schedules usually generate steady rates of responding. Ratio schedules generally produce high rates of responding because the rate of reinforcement depends entirely on the rate of responding. However, ratio schedules requiring a large number of responses for each reinforcer may induce ratio strain in the form of extended periods of no responding.
These simple schedules of reinforcement can be combined into more complex schedules. One schedule may produce yet another schedule before a reinforcer is given, a chain schedule, or two schedules may regularly alternate on one operandum, a multiple schedule. In these schedules, distinctive stimuli signal which particular schedule is currently in effect. In a mixed schedule, the component schedules alternate, but they are not signaled by an external cue.
In concurrent schedules, two (or more) schedules are simultaneously in effect and the subject can choose between them. These schedules can be arranged on separate operanda or on one operandum. In the latter procedure the subject can choose between schedules by performing a switching response to a different operandum. It has been found that animals distribute the time spent responding to each schedule in proportion to the rate of reinforcement obtained from each. This relation is known as the matching law. Type of schedule, magnitude of the reinforcers, and type of reinforcement are also important determinants of choice. For example, studies of self-control have shown that animals are "impulsive"; they choose small, immediate reinforcers over delayed, but much larger, reinforcers.
Discriminative stimuli can signal the effective schedule of reinforcement. For rats, these can be different tones or the presence or absence of a "house light" in the chamber. For pigeons, different colors or patterns may be projected onto the response key. Monkeys are often presented with complex visual patterns. The discriminative stimuli come to control the rates of responding. For example, a pigeon will respond at the same rate to a key lit red or green if both colors signal a variable-interval (VI) schedule. However, if the VI schedule during the green-light component is removed, then the rate of responding to this negative stimulus rapidly decreases. The response rate to the red light, the positive stimulus, will actually increase over its previous level, a phenomenon called behavioral contrast. New stimuli from the same stimulus dimension can be presented in a generalization test. For example, if the discriminative stimuli used in training are two tones, then a rat may be tested with a range of tonal frequencies. Gradients of generalization (or discrimination) are readily obtained; that is, the amount of responding to each new stimulus is an orderly function of its similarity to the positive training stimulus.
If the stimuli are more complex, such as pictures, this provides an opportunity for the study of concept attainment when the stimuli belong to different classes. Pigeons, for example, readily learn to discriminate between pictures containing images of one or more people and pictures without a person.
Stimulus control is also studied using discretetrial choice procedures. A stimulus is presented as a sample, and then the animal must choose which of two response alternatives is correct for that particular stimulus. Correct choices are reinforced. Such methods are analogous to signal detection experiments with human subjects and have provided precise measurements of animal perception. If a delay intervenes between the sample stimulus and the choice, the short-term memory or working memory of animals can be studied. Generally, the accuracy of choice decreases markedly with delays of even a few seconds.
Control with Aversive Stimuli
Positive reinforcers are normally appetitive stimuli. Aversive stimuli, such as electric shock or loud noise, are also effective in the control of behavior. If aversive stimuli are consequences for responding, they are punishers, and they reduce the rate of responding, which is otherwise maintained by positive reinforcement. Animals are very sensitive to both the strength and the frequency of the punishers. Aversive stimuli are also used in the study of escape and avoidance. The latter is most often studied in a free-operant situation. The subject, most often a rat, is subjected to brief, intermittent shocks. By emitting a required response, such as bar pressing or crossing a hurdle, the subject can postpone or cancel the shock. This procedure generates consistent rates of avoidance behavior in rats, monkeys, and other organisms, especially when each response guarantees a shock-free interval.
Operant methodology has shown that animal behavior is an orderly function of its antecedents (discriminative stimuli) and its consequences (reinforcement and punishment). It has also enabled experimenters to explore various areas of animal perception, cognition, and choice. Furthermore, the principles of operant behavior have application to humans. Operant techniques have been employed in personal instruction and in the treatment of dysfunctional human behavior.
Catania, A. C. (1979). Learning. Englewood Cliffs, NJ: Prentice- Hall.
Domjan, M. P., and Burkhard, B. (1985). The principles of learning and behavior, 2nd edition. San Francisco: Brooks/Cole.
Flaherty, C. F. (1985). Animal learning and cognition. New York: Knopf.
Schwartz, B., and Reisberg, D. (1991). Learning and memory. New York: Norton.
Skinner, B. F. (1938). The behavior of organisms. New York: Appleton-Century.