Reinforcement or Reward in Learning: Cerebellum
Reinforcement or Reward in Learning: Cerebellum
Researchers have extensively studied the neural substrates mediating reinforcement during classical conditioning of discrete motor responses. In a seminal Science (1986) paper, Richard F. Thompson proposed a cerebellar model of associative learning in which the convergence of conditioned stimulus (CS) and unconditioned stimulus (UCS) information at cerebellar cortex and the cerebellar deep nuclei form the basis of the memory trace in classical conditioning. One of the central tenets of this model—which is based on electrophysiological, lesion, and stimulation data—is that reinforcement in the form of the UCS is conveyed to the cerebellum via climbing fibers originating in the inferior olive.
The Nature of Reinforcement
In its simplest and most widely accepted definition, the term reinforcement applies to any stimulus or consequence that increases the likelihood that the immediately antecedent behavior will occur again in an operant learning task. As regards classical conditioning, however, the term is often applied to the UCS, which elicits a reflex (unconditioned response, or UCR) and, when paired with a neutral CS, strengthens or confirms the predictive relationship between the two stimuli. Conceptually, the UCS may be thought of as a teaching input.
Electrophysiology of the Inferior Olive
Rescorla and Wagner have proposed a behavioral model of classical conditioning (1972; Rescorla, 1988) in which associative strength is greatest early in training and declines with each stimulus pairing. If that model is true, then neural activity within the reinforcement pathway ought to decrease across training. In 1991 Sears and Steinmetz tested this hypothesis and reported that UCS-evoked activity in the rabbit dorsal accessory olive was large during the initial training trials but decreased concomitantly with the development of conditioned responses (CRs) later in training. On exclusively UCS trials or on a paired trial in which the rabbit failed to give a CR, evoked activity within the olive was again apparent, suggesting that olivary activity may be an error-detection signal. Sears and Steinmetz proposed that such a partial reinforcement mechanism maintains continued CR expression. Single-unit recordings of Purkinje cells conducted by Foy and Thompson in 1986 indicated that the decrement in olivary activity with continued training was reflected in cerebellar cortical activity. At the beginning of training, 61 percent of 118 studied neurons displayed complex spikes in response to UCS onset. At the end of training, only 27 percent of the Purkinje cells displayed complex spikes evoked by UCS onset.
The foregoing data indicate that once the association between the CS and UCS has been formed, no further activity in the UCS pathway is necessary except in the case of a performance error. A significant body of behavioral data supports this view. For example, Kamin (1968) has demonstrated that once a CS-UCS association has been formed, insertion of a second CS immediately after the first yields no behavioral learning to the new CS. Kamin calls this process blocking. No new information is added by the second CS, so the animal essentially ignores it. If continued activity were to occur in the UCS pathway, then the animal should attach the same associative properties to the second CS. Kim, Krupa, and Thompson (1998) demonstrated exactly this phenomenon by injecting the GABA antagonist picrotoxin into the olivary complex after acquisition of the first CS-UCS (tone-air puff) association. The antagonist prevented the normal diminution of olivary activity that occurs with training. Subsequent training with both the original and a second CS (light) was accompanied by evoked complex spike activity in the cerebellar cortex to both CSs. Exclusively CS test trials indicated that the animal acquired comparable associations between both CSs and the UCS.
Inferior Olive Lesions
If the inferior olive is the source of reinforcing input to the cerebellum, then destruction of this nucleus should produce behavioral phenomena that are identical to those observed when the exteroceptive UCS is omitted from training trials. Lesions of the rostromedial portions of the inferior olive block CR acquisition in rabbit training using paired-tone (CS) and air-puff (UCS) presentations in a standard delay-conditioning procedure (McCormick, Steinmetz, and Thompson, 1985). Further, in experienced animals, lesions of this nucleus gradually abolished learned responses in a manner that was similar to that observed in intact control animals undergoing extinction (i.e., exclusively CS trials).
Electrical Stimulation of the Inferior Olive
Substitution of an exteroceptive UCS with electrical stimulation of the inferior olive or climbing fiber afferents to the cerebellum provides, perhaps, the most stringent test of whether this structure is the locus of reinforcement during classical conditioning. Electrical stimulation must be able to recreate all of the behavioral phenomena normally associated with peripheral UCS presentations during conditioning.
Stimulation of the inferior olive produces a variety of movements, depending on the location of the stimulating electrode. These movements may include eye blinks; movement of the head, neck, facial muscles; or limbs. Pairings of a tone CS and electrical stimulation of the dorsal accessory nucleus of the inferior olive as the UCS produces normal rates of eye-blink conditioning (Mauk, Steinmetz, and Thompson, 1986). Furthermore, the range of interstimulus intervals (ISI) with UCS electrical stimulation was identical to that of peripheral UCSs. Conditioning was maximal when the CS-UCS interval was held at 150 or 250 msec. Shorter intervals of 50 msec prevented acquisition.
Electrical Stimulation of Cerebellar White Matter
One criticism leveled against the preceding experiments is that eye blinks elicited by stimulation of the inferior olive as the UCS might reflect antidromic activation of spinal trigeminal neurons and spread of activation to mossy-fiber collaterals that project to rabbit lobule HVI (site of facial map) (Moore and Blazis, 1989). To explore this issue, Swain, Shinkman, Nordholm, and Thompson performed a parametric investigation of cerebellar white-matter stimulation in 1992. They chose as the site of stimulation the white matter immediately underlying cerebellar lobule HVI because it is remote from brain stem and spinal-reflex pathways. Stimulation at this site would activate climbing fibers and other cerebellar afferents. In their study, white-matter stimulation elicited eye blinks as well as movements of the face or neck and, when paired with a tone, CS produced learning comparable to that seen with peripheral UCSs or with olivary stimulation. No significant differences were noted in learning rate for the various movements evoked by stimulation. When switched to CS alone presentations, the animals extinguished rapidly and upon reinstatement of paired training showed reacquisition with substantial savings. Similar findings have been reported by Gormezano and colleagues (1983) in paradigms using a peripheral UCS. The correspondence between learning in their experiment and conditioning with a peripheral UCS was remarkable even upon examination of the small details. For example, researchers found that the percentage of CRs on the last day of reacquisition training was smaller than that of the last day of acquisition training. While this difference was not significant, it is consistent with reports by several investigators that the CS may acquire inhibitory properties during extinction training that become evident upon retraining.
The experiment included control rabbits that received either randomly or explicitly unpaired presentations of the tone CS and white-matter stimulation as the UCS. Animals that received explicitly unpaired presentations of the conditioning stimuli were profoundly impaired when they were subsequently switched to a paired CS-UCS training procedure. Previous behavioral work by Rescorla (1969) has demonstrated that the explicitly unpaired control procedure may result in the CS acquiring inhibitory properties such that subsequent acquisition training is retarded.
A subsequent study by the group (Swain et al., 1999) found that exposure to as few as 108 exclusively UCS trials was sufficient to produce a potent UCS preexposure effect. Animals exposed to the UCS prior to training typically required more than 600 trials to learn. Animals that received no preexposure learned at a normal rate (100-200 trials). Exposure to UCSs of fixed duration also promoted an increase in the amplitude and a decrease in the latency of the stimulus-evoked reflex as trial presentations progressed. There have been similar findings for UCS pre-exposure and reflex augmentation with exteroceptive UCSs (Mis and Moore, 1973).
The authors (Swain et al., 1992) also reported anecdotal observations that further support the hypothesis that cerebellar white-matter stimulation as a UCS results in normal behavioral learning. After an animal that exhibited an ipsilateral lip movement to cerebellar stimulation had been trained and completed the experiment, an impromptu experiment was conducted. While the animal was in its home cage, if a whistle at about 1kHz (CS-tone frequency) was presented, the rabbit responded with a conditioned lip movement. If a whistle of a different pitch was presented, the rabbit did not respond. These observations suggest that conditioning was specific to the tone CS and not to context and that the CR exhibited a stimulus generalization gradient to the frequency of the whistle.
Destruction of the olive or its efferent climbing fibers blocks learning in naïve animals and extinguishes it in the experienced ones. Physiological records indicate that olivary neurons respond strongly to UCS presentation at the beginning of training but subside as learning occurs. Stimulation or suppression of activity within the olivary climbing-fiber system can recreate a host of behavioral phenomena, including conditioned inhibition, the UCS preexposure effect, UR augmentation, and blocking. Together these data indicate that the inferior olive and its efferent axons, the climbing fibers, are the neural pathway that conveys information about reinforcement to the cerebellum.
Foy, M. R., and Thompson, R. F. (1986). Single unit analysis of Purkinje cell discharge in classically conditioned and untrained rabbits. Society for Neuroscience Abstracts 12, 518.
Gormezano, I., Kehoe, E. J., and Marshall, B. S. (1983). Twenty years of classical conditioning research with the rabbit. In J. M. Sprague and A. N. Epstein, eds., Progress in psychobiology and physiological psychology, Vol. 10. New York: Academic Press.
Kamin, L. J. (1968). Attention-like processes in classical conditioning. In M. R. Jones, ed., Miami symposium on the prediction of behavior: Aversive stimulation. Miami: University of Miami Press.
Kim, J. J., Krupa, D. J., and Thompson, R. F. (1998). Inhibitory cerebello-olivary projections and blocking effect in classical conditioning. Science 279, 570-573.
Mauk, M. D., Steinmetz, J. E., and Thompson, R. F. (1986). Classical conditioning using stimulation of the inferior olive as the unbconditioned stimulus. Proceedings of the National Academy of Sciences of the United States of America 83, 5,349-5,353.
McCormick, D. A., Steinmetz, J. E., and Thompson, R. F. (1985). Lesions of the inferior olivary complex cause extinction of the classically conditioned eyelid response. Brain Research 359, 120-130.
Mis, R. W., and Moore, J. W. (1973). Effects of preacquisition US exposure on classical conditioning of the rabbit's nictitating membrane response. Learning and Motivation 4, 108-114.
Moore, J. W., and Blazis, D. E. J. (1989). Stimulation of a classically conditioned response: A cerebellar network implementation of the Sutton-Barto-Desmond model. In J. H. Byrne and W. O. Berry, eds., Neural models of plasticity. New York: Academic Press.
Rescorla, R. A. (1969). Pavlovian conditioned inhibition. Psychological Bulletin 72, 77-94.
—— (1988). Behavioral studies of Pavlovian conditioning. Annual Review of Neuroscience 11, 329-352.
Rescorla, R. A., and Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A. H. Black and W. F. Prokasy, eds., Classical conditioning II: Current research theory. New York: Appleton-Century-Crofts.
Sears, L. L. and Steinmetz, J. E. (1991). Dorsal accessory inferior olive activity diminishes during acquisition of the rabbit classically conditioned eyelid response. Brain Research 545, 114-122.
Swain, R. A., Shinkman, P. G., Nordholm, A. F., and Thompson, R. F. (1992). Cerebellar stimulation as an unconditioned stimulus in classical conditioning. Behavioral Neuroscience 106, 739-750.
Swain, R. A., Shinkman, P. G., Thompson, J. K., Grethe, J. S., and Thompson, R. F. (1999). Essential neuronal pathways for reflex and conditioned response initiation in an intracerebellar stimulation paradigm and the impact of unconditioned stimulus preexposure on learning rate. Neurobiology of Learning and Memory 71, 167-193.
Thompson, R. F. (1986). The neurobiology of learning and memory. Science 233, 941-947.