Reinforcement or Reward in Learning: Electrical Self-Stimulation, Brain

views updated

Electrical Self-Stimulation, Brain

For physiological psychology, the discovery that electrical stimulation of certain brain regions is so powerfully rewarding that laboratory rats eagerly self-administer it was an earthquake, the shock waves of which rippled through the popular press in hyperbolic recountings. "It may prove the key to human behavior," trumpeted a Montreal newspaper. Some reports even went so far as to fuel fears that brain stimulation reward (BSR) could be used as an agent for social control. However amusing in retrospect, much of this hype was an understandable reflection of the amazement, shared by scientists and laypeople alike, of the powerful and immediate impact of electrical stimulation on behavior. The stimulation electrode, injecting a meaningful signal into the neural circuitry mediating goal-directed behavior, can turn the average lab rat into a craven voluptuary, willing to press a lever to the point of starvation or exhaustion.

Even though the initial hype has subsided (Talwar et al., 2002), BSR has become a model system for the study of positive reinforcement. Because organisms acquire new responses to obtain electrical stimulation, the effect qualifies as reinforcement according to the law of effect. Early evidence suggested that brain stimulation reward differs in some respects from conventional rewards such as food and water, whereas later research emphasized their similarities. Researchers have linked the rewarding effect of electrical stimulation to the mechanisms governing choice between competing goals and activities.

The Relationship Between BSR and Natural Rewards

Researchers explored the relationship between BSR and gustatory rewards in an extensive series of experiments whereby, under various physiological states, rats had to choose between lateral hypothalamic (LH) stimulation varying in strength and either gustatory reward alone or a compound reward consisting of a gustatory reward plus fixed BSR. The researchers estimated preference by varying the strength of the stimulation triggered by one spout while holding constant the reward triggered by a second spout. They measured the stimulation strength required to produce isopreference. The experimental protocol sought to minimize the difference between the gustatory reward and the BSR. Thus, the gustatory reward was intraorally infused in small volumes via a catheter, while an intragastric cannula minimized the accumulation of gustatory reward in the gut to simulate the instantaneous and insatiating nature of BSR.

Initially, when given a choice between a gustatory stimulus (an intraoral infusion of sucrose) and LH stimulation, the rats choose the sucrose infusion if the brain stimulation was weak; but they shifted their preference as the LH stimulation became stronger. With sucrose availability, rats eschew suprathreshold levels of brain stimulation in favor of the sucrose reward. Clearly, the availability of sucrose alters the preference for BSR when the two rewards are evaluated in a common system of measurement wherein the larger reward is selected. In a subsequent experiment, researchers pitted LH stimulation against a compound reward consisting of sucrose and an equi-preferred train of brain stimulation to determine whether the results of the common evaluation can summate. Indeed, the rats assigned a higher value to the compound reward than to its sucrose component alone. The strength of LH stimulation required to balance the compound reward exceeded the stimulation strength required to balance the sucrose reward alone. The finding that BSR and sucrose reward can be combined lends further support to the idea of evaluation in a common system of measurement.

The competition and summation experiments confirm the hypothesis that BSR and natural rewards have something important in common: they are evaluated on a common currency scale. As McFarland and Sibley pointed out (1975), orderly choice between mutually exclusive behaviours must be based on a common currency; the influences contributing to the choice of each activity seem to converge on a "behavioural final common path" (p. 265). There is considerable evidence that thresholds for BSR remain remarkably stable across a variety of states that are associated with changes in choice of goal object. Prominent among these states are changes in energy balance or nutrient requirements. For example, neither the increase in salt appetite that accompanies sodium depletion nor the suppression of appetite that accompanies postingestive feedback is associated with substantial changes in BSR. These findings suggest that BSR mimics a global currency of goal evaluation.

In contrast to these findings, other research has shown that BSR is sensitive to changes in energy balance when the electrode is in a specific region of the LH (perifornical LH). Moreover, leptin, a hormone that communicates the state of the fat stores to the brain, exerts opposite influences on the rewarding effect of stimulating food restriction-sensitive and -insensitive sites. These results suggest that functionally distinct pathways, differentially responsive to the state of the organism, exist within the neural substrate for BSR. One way of interpreting the contrast between the results obtained at different stimulation sites is to propose that stimulation at the restriction-insensitive sites mimics a global currency, whereas stimulation at restriction-sensitive sites mimics a local one, a currency related to long-term energy stores.

The Neural Circuitry Subserving Reward: Lessons from Studies on BSR

The Directly Stimulated Stage

Electrical stimulation provides the most effective means of introducing a reward signal into the brain. The strength of the electrically induced rewarding effect and the ease with which it can be controlled render electrical stimulation a powerful tool for studying the reward substrate. Consider the task of linking BSR to the activity of identified neurons. The first step toward accomplishing this task is to identify the neurons in the immediate vicinity of the electrode tip whose direct activation gives rise to the rewarding effect of the stimulation. Finding these cells and tracing their inputs and outputs is likely to shed light on the mechanisms in the brain that underlie the behavioral effects of positive reinforcement.

The initial strategy employed to investigate the BSR system was to map the brain sites that support self-stimulation. Stimulation of numerous regions evokes BSR. An especially effective site is the medial forebrain bundle (MFB), which allows easy shaping of self-stimulation behavior. Accordingly, finding the directly stimulated stage (first stage) of MFB self-stimulation has been the focus of intensive research. For decades researchers sought the quantitative characteristics of the first-stage neurons through psychophysical techniques. Behavioral measurements of recovery from refractoriness, collision block, and anodal hyperpolarization block have led to the idea that the directly stimulated neurons subserving the powerfully rewarding effect produced by electrical stimulation of the MFB originate in the basal fore-brain and give rise to fine, myelinated axons that descend through the MFB toward the tegmentum. Further research has supported this idea. Excitotoxic lesions of a region in the lateral basal forebrain that includes the lateral preoptic area, the substantia innominata, and parts of the bed nucleus of the stria terminalis increase the threshold for self-stimulation of more caudal sites along the MFB. Fos immunostaining has shown that rewarding stimulation of the MFB activates neurons in the region of the effective lesions.

The Role of Dopamine Neurons

BSR is similar in certain respects to the self-administration of psychomotor stimulants. Both are self-administered in a compulsive fashion, both may activate the same brain circuitry, and both are dopamine-dependent. Microdialysis and voltametry studies indicate that both psychomotor stimulant reward and BSR elevate dopamine levels in the nucleus accumbens. But the most compelling evidence suggesting dopaminergic involvement in BSR comes from pharmacological studies. Dopamine receptor blockers increase self-stimulation thresholds, whereas dopamine agonists have the opposite effect.

Researchers believed that direct activation of MFB dopamine fibers initiated rewarding signals in this region until psychophysical and electrophysiological findings ruled out this possibility. Dopamine neurons are not myelinated, and their thresholds of activation are above the level of the stimulation parameters commonly used in BSR studies. The question then arises: What place does the dopamine system occupy in the neural substrate of the rewarding effect?

According to the simplest hypothesis, the dopamine neurons are in series with the first-stage ones and thus carry the reward signal. An alternative hypothesis is that the dopamine neurons do not actually relay the reward signal but rather play a permissive role at some stage of the BSR substrate. One example of a permissive role would be to gate signal processing in the pathway responsible for the rewarding effect. Perhaps increased activity in the dopamine neurons enhances transmission of the reward signal, whereas decreased activity reduces or even blocks its transmission.

The permissive-role hypothesis finds substantial corroboration in a microdialysis study designed to measure the levels of dopamine released in the nucleus accumbens during self-stimulation for equi-rewarding pulses that consisted of different combinations of stimulus parameters of the electrical stimulation. Some had argued that if the reward signal travels along dopamine neurons, the release of dopamine should not depend on the stimulus parameters because equi-rewarding stimuli should produce a constant output in all neural stages carrying the reward signal, irrespective of the spatio-temporal nature of the signal. However, the results showed that the magnitude of dopamine release differed between sets of self-stimulation parameters that, nevertheless, produced the same rewarding effect. The finding that the rewarding effect of the stimulation is not mirrored in the magnitude of dopamine release as measured by microdialysis contravenes the hypothesis that the reward signal elicited by MFB stimulation is relayed by the mesolimbic dopamine neurons.

Another study measuring dopamine release from dopamine terminals in the nucleus accumbens by voltametry provided additional evidence that dopamine is not a neural substrate for reward per se. The study showed that although dopamine release correlated with the ability of animals to learn the self-stimulation behavior, extracellular dopamine was nonetheless absent during self-stimulation itself. These results are consistent with two hypotheses: that dopamine is related to the novelty and predictability of rewards and that dopamine neurons provide teaching signals for appetitive learning and reinforcement. Dopamine could thus mediate plasticity changes at sites in the brain that subserve the learning of the self-stimulation behavior. Indeed, there appears to be a correlation between dopamine-dependent potentiation in corticostriatal circuits and the rate of acquisition of lever-pressing behavior.

Future studies might clarify the relationship between the first stage of the BSR substrate and dopamine neurons while shedding light on circuitry involved in both the processing of reward information and the reinforcement learning.

Bibliography

Arvanitogiannis, A., Flores, C., Pfaus, J. G., and Shizgal, P. (1996). Increased ipsilateral expression of Fos following lateral hypothalamic self-stimulation. Brain Research 720, 148-154.

Arvanitogiannis, A., Tzschentke, T. M., Riscaldino, L., Wise, R. A., and Shizgal, P. (2000). Fos expression following self-stimulation of the medial prefrontal cortex. Behavioural Brain Research 107, 123-132.

Arvanitogiannis, A., Waraczynski, M., and Shizgal, P. (1996). Effects of excitotoxic lesions of the basal forebrain on MFB self-stimulation. Physiology and Behavior 59, 795-806.

Carr, K. D. (1996). Feeding, drug abuse, and the sensitization of reward by metabolic need. Neurochemistry Research 21, 1,455-1,467.

Fulton, S., Woodside, B., and Shizgal, P. (2000). Modulation of brain reward circuitry by leptin. Science 287, 125-128.

Gallistel, C. R., Shizgal, P., and Yeomans, J. S. (1981). A portrait of the substrate for self-stimulation. Psychological Review 88, 228-273.

Garris, P. A., Kilpatrick, M., Bunin, M. A., Michael, D., Walker, Q. D., and Wightman, R. M. (1999). Dissociation of dopamine release in the nucleus accumbens from intracranial self-stimulation. Nature 398, 67-69.

McFarland, D. J., and Sibley, R. M. (1975). The behavioural final common path. Philosophical Transactions of the Royal Society of London B 270, 265-293.

Miliaressis, E., Emond, C., and Merali, Z. (1991). Re-evaluation of the role of dopamine in intracranial self-stimulation using in vivo microdialysis. Behavioural Brain Research 46, 43-48.

Olds, J., and Milner, P. M. (1954). Positive reinforcement produced by electrical stimulation of septal area and other regions of rat brain. Journal of Comparative and Physiological Psychology 47, 419-427.

Reynolds, J. N. J., Hyland, B. I., and Wickens J. R. (2001). A cellular mechanism of reward-related learning. Nature 413, 67-70.

Shizgal, P. (1997). Neural basis of utility estimation. Current Opinion in Neurobiology 7, 198-208.

—— (1999). On the neural computation of utility: Implications from studies of brain stimulation reward. In D. Kahneman, E. Diener, and N. Shwarz, eds., Well-being: The foundations of hedonic psychology. New York: Russell Sage Foundation.

Talwar, S. K., Xu, S., Hawley, E. S., Weiss, S. A., Moxon, K. A., and Chapin, J. K. (2002). Rat navigation guided by remote control. Nature 417, 37-38.

Wise, R. A. (1996). Addictive drugs and brain stimulation reward. Annual Review of Neuroscience 19, 319-340.

AndreasArvanitogiannis

Learning and Memory