Imagery, Mental

views updated

IMAGERY, MENTAL

In many ways, mental imagery has been a fundamental issue in the history of philosophy. At least since Aristotle, philosophers have argued that knowledge is often represented in the form of mental images, taken to be inner pictures of some sort. However, questions have frequently been raised about the capacity of such images to play roles in thinking, remembering, and imagining; for instance, in George Berkeley's well-known doubts about the possibility of general or abstract images. Debates about mental imagery have been important in the history of psychology as well. Because the images in question are the bearers of conscious experience, claims about them have often been made on the basis of introspection, and the rejection of introspection in favor of behavioral studies was central to the emergence of psychology as a science. However, with the rise of cognitive science, quantified behavioral research has put mental images back on the map.

For example, Roger Shepard and his colleagues (1982) asked subjects to determine whether one geometrical figure matched another, the overall orientation of which was tilted relative to that of the original figure. Reaction times were a linear function of the angle of the tilt: The greater the displacement between the two otherwise identical figures, the longer it took subjects to respond. The implication is that reaction time depends on an operation such as rotating one of the perceived figures through space. Assuming a constant rotation rate, time to respond will depend on the distance through which the figure is rotated. One conclusion that can be drawn is that imaging is like perceiving, because matching rotating objects or figures in perception is similarly governed by a time-to-distance law.

This perceptual similitude thesis is an important part of pictorialist theories, according to which images are like mental pictures. It is particularly important on Stephen Kosslyn's account, the most fully developed version of pictorialism. In one well-known experiment, Kosslyn asked subjects to visualize a map they had previously studied and to focus on one of several items represented on the map (e.g. a hut, a pond, a tree). Subjects were then asked to say whether various items were located on the map. Reaction times were a linear function of the distance between the original focal point and the identified item. This suggests that they were scanning a mental map and not simply accessing a description or list. While reaction times might be due to the position of terms on a list, given the initial conditions of free study, there is no reason to think that the locations would be listed systematically by their proximity to the focal point, with the nearest first, the farthest last, and so on.

In Kosslyn's Image and Mind (1980), scanning and other operations, such as panning and zooming, are defined as functions that could be performed by a digital computer. This use of a computer model illustrates why theories of mental imagery do not need to treat the mind as an immaterial substance or entail a homunculus to view the inner pictures. Forming and accessing images can be explained in terms of more basic level operations, which can themselves be further decomposed into fundamental processes that a machine could perform.

The same is true for the mental sentences posited by descriptionalist theories of imagery, which constitute the opposing camp. The best-known of these has been developed by Zenon Pylyshyn in Computation and Cognition (1984). His argument has two parts. First, he claims that evidence shows that imaging is cognitively penetrable. It is influenced by background knowledge and belief. Therefore, he argues, there cannot be perception-like processes of the sort that pictorialism requires; that is, generic operations such as scanning or rotating at a standard rate that are part of a fixed functional architecture employed similarly across imaging tasks. Second, he maintains that what is a vice for the pictorialist is a virtue on the descriptionalist account. The data on reaction times can be explained, he argues, precisely in terms of the effects of tacit knowledge in the face of experimental task demands. This tacit knowledge is expressed in language-like representations and operates through the production of the descriptions in which imaging consists.

For instance, when four-year-old children were shown an inclined beaker containing colored liquid and then later asked to draw it, they typically drew the fluid level as perpendicular to the sides of the beaker. The implication is that the children's memory images, upon which the drawings are based, are not simply pictures that reproduce the perceived object, and the images reflect that young children do not possess an understanding of geocentric level. Extending this analysis to rotation and scanning studies, Pylyshyn argues that the results can be explained in terms of task demands: Subjects are led to believe that, in visualizing objects, they are to replicate the process of perceiving the objects. Knowing that perceived object rotations must obey a time-to-distance law, they reproduce the relevant reaction times, although not necessarily with conscious intent.

However, a number of objections have been made to these claims. First, it is sometimes argued that Pylyshyn's descriptionalist view makes images epiphenomenal, giving them no role in causal explanations of behavior. Of course, if such images can be identified with the underlying data structures that take a descriptive form, the charge is not strictly correct. Nonetheless, such a construal will not explain the phenomenal properties of conscious imagery, which are thus excluded from scientific accounts.

Second, not all reaction time studies can be explained in terms of task demands, a point that Pylyshyn now concedes. Moreover, because imaging is affected by background knowledge, it need not be taken to undercut explanations in terms of a basic set of perceptual operations. Kosslyn agrees that imaging is cognitively penetrable. He notes, for example, that the rate of scanning may vary across individuals or tasks. However, that does not mean that scanning cannot be defined in terms of standard operations—such as shifting attention incrementally—or that the employment of those operations is not governed by law-like generalizations. Scanning might be one of a fixed set of operations available to everyone—even if it is not always used—and it can exhibit regularities, despite the effects of knowledge and belief. For example, it can be assumed to occur at a constant rate within individual subjects on a given task.

Nonetheless, a positive account must be given of the knowledge effects that imaging does display, and this requires more than an appeal to perceptual similitude. Thus Kosslyn argues that imaging occurs in a visual buffer, a distinctively spatial medium analogous to an internal computer monitor. Although the representations on such a screen will be composed of distinct elements—such as cells in a matrix that can be labeled—the images are said to be pictorial, in the sense that spatial properties of objects are represented by the spatial properties of the medium.

Originally posited as part of Kosslyn's computational model, this visual buffer is identified in his Image and Brain (1994) with topographically organized areas of visual cortex. In topographic representations, the features of an object can be distorted. Nonetheless, spatiallydefined regions of the medium will correspond systematically to spatial regions of the object. Moreover, unlike descriptions, such images have the property that the farther apart two points appear to be on an object, the more representational elements there will be between representations of the points. Although these elements need not be closely contiguous, they cannot be just anywhere. If two points appear to be adjacent in a represented object, then the elements that represent them must be—at least in an extended sense—adjacent as well. Several types of evidence from brain research can now be cited in support of the pictorialist view; for instance, lesions to the visual system cause subjects to be unaware of one side of the visual field in imaging, just as they do in perception.

One objection often made to pictorialism is that mental pictures lack the syntactical regularities that would allow them to express thoughts precisely. Sentences can be used to single out certain types of information while ignoring others, but pictures will inevitably represent features that are irrelevant to the task at hand. Thus Daniel Dennett (1981) has argued that imagining cannot be mental picturing, because the former can be more indeterminate than the latter. On the one hand, it is possible to imagine a striped tiger without envisioning it as having a definite number of stripes. On the other hand, it is impossible to depict a striped tiger without showing the number of stripes that it has. However, this line of argument commits what Ned Block (1983) has called the "photographic fallacy." It assumes that pictures cannot employ selective devices; there are actually several ways in which pictures can omit details, Block argues (e.g., by virtue of viewpoint, occlusion, atmospheric blurriness, or schematization). Moreover, the argument from indeterminacy can be turned around. In The Imagery Debate (1991), Michael Tye has argued that there are certain kinds of corollary or implicit information that both pictures and images inevitably carry. For instance, any perceptual or imagistic representation of two objects, A and B, will necessarily represent an apparent direction of one to the other. Descriptions of A and B need not contain information of that sort. Thus, Tye argues, images cannot be construed as descriptions alone.

However, one way to capture picture-like properties in descriptionist terms has been proposed by Geoffrey Hinton (1979). According to him, imagery does not occur in a special medium, a visual buffer of the sort that Kosslyn describes. However, neither does it depend on the same format and processes as higher-order thought—that is, descriptions in Pylyshyn's sense. Rather, it involves a distinctive format and set of operations, albeit defined over descriptions of a more elaborate kind. Attached to object-centered descriptions of shapes are egocentric coordinates for objects in a scene, which add spatial information and allow for operations of a special sort (e.g., a gradual alteration of the coordinates, in terms of which rotation and scanning can be described). This account explains why subjects find it hard to identify figures embedded in complex geometrical shapes (a triangle in a star of David) or to reinterpret ambiguous figures, once an original interpretation has been made (e.g., to see the rabbit in the duck-rabbit image if it was initially seen as a duck). Interpretations not included in or derivable from the original descriptions will be hard to come by, and this may be particularly so if they require a revision of the coordinate reference frame. The problem is that Kosslyn's evidence shows that subjects are able to reinterpret images even when the new interpretation is incompatible with the original reference frame. This would not be predicted on Hinton's account.

Tye has proposed a hybrid theory, according to which an image consists in an array and an interpretive description combined. The descriptive components are limited, consisting primarily of part descriptions that are produced whenever the part is scrutinized. Thus the array is not rendered irrelevant by a complex description that could simply take its place. However, it is unclear exactly how arrays and interpretations are combined on this account; that is, why certain descriptions are generated for an array on certain tasks and precisely how the properties of the array are used to perform the task. The question is why, on the one hand, basic part descriptions are not simply activated directly on a visual memory task, thus making the array unnecessary. On the other hand, if the array functions to support the discovery of previously unnoticed features, then there is no guarantee that ambiguities will not appear in descriptions of basic shapes themselves.

One promising avenue for research is suggested by Tye's argument that part descriptions are generated only as needed. That claim is consistent with Kosslyn's current view that imagery and perception are governed by a principle of opportunistic processing : Representational resources can be deployed in diverse and sometimes limited ways, as required by a task. In his 1994 book and subsequent research, Kosslyn argues that imaging is not a single capacity, but comprises a set of subsystems that are distributed in the brain. Although these subsystems are functionally specialized, they can interact, and they can be employed strategically in various combinations. This approach implies that the interpretation of images in the visual buffer need not always consist in inferences over language-like representations. Instead, assignment of content in an image is constrained by the particular operations and strategies in which aspects of the image are incorporated. For instance, image scanning consists in enhancing activity in various parts of the visual buffer, thus priming specific features, making them easier to encode. In that sense, scanning patterns constitute interpretations, because they bias the content that can be ascribed to the image by the visual system. This account has the potential to explain individual differences in image interpretation in terms of variations in perception-like strategies. Born out of Kosslyn's turn to neural networks and connectionist modeling, this emphasis on imaging strategies tracks the ongoing development of cognitive science and philosophy of mind.

See also Images.