Personality Measurement

views updated

Personality Measurement

I. OverviewJack Block


II. Personality InventoriesWayne H. Holtzman


III. The Minnesota Multiphasic Personality InventoryW. Grant Dahlstrom


IV. Situational TestsSebastiano Santostefano



The field of personality measurement, viewed in the perspective of its short but busy history, has progressed in a fumbling, digressive manner. It may now, because of hard-won recognitions, be ready for significant and cumulative advances.

The problem and strategy

The reason for the disappointing disparity between the energies already expended and the accomplishments which may be certified lies in the uncertainty which has surrounded the concept of personality and the naïveté in psychology about the logic and justification of measurement.

In the physical sciences and in some subareas of psychology where intuitions are strong and widely held, one need not be acutely concerned with the distinction between a concept and the way in which that concept happens to be measured. By and large, measuring length by a ruler and weight by a scale does not generate controversy; the relation of these concepts to their respective methods of measurement seems obvious and beyond dispute. In the study of personality, however, there is no escaping an immediate, insistent, incessant preoccupation with the problem of linking concepts and empirical operations. Personality measurements, if they discriminate at all, always express, in a pure or impure form, explicitly or implicitly, a personality concept in terms of which the differential behavior may be understood. Personality concepts, if they are not hopelessly vague or inconsistent, always imply specifiable behavioral differences in people. These differences may be impractical to study or they may be judged as trivial, but the concept requires that they exist.

The distinction between, yet interdependence of, concept and measurement in the study of personality requires a spiraling interplay of these levels of analysis—concepts should suggest approaches to measurement, and measurement should refine conceptual formulations. This reciprocal improvement of theory and method has come to be called the process of “construct validation” (Cronbach & Meehl 1955). Unhappily, in the adolescent and polyglot field of personality, progress has been more often circular than helical. And yet, if a science of personality is to be formed, the responsibilities of coupling concept and measure must be met.

The domain of personality psychology is sprawling and fuzzily delineated. With so many definitions of the field and with a plethora of vaguely redundant but not readily integrated concepts, it is difficult to know where to try to begin measuring the nebulous notion of personality and toward what ends. [SeePersonality.]

In order not to be immobilized by the endless conceptual possibilities, two tactics have been adopted. The first of these requires the courage to be arbitrary—to propose, essentially by fiat, that a small set of concepts comprehends the important, necessary ways of differentiating people. These concepts need not constitute—and thus far have never been—a formal model of personality. Rather, they represent simply a set of dimensions or ideas in terms of which their expounder finds it convenient and congenial to conceptualize his view of “personality.” Selection of these concepts may be supported by observational, introspectional, clinical, test, or experimental data, of varying degrees of quality and persuasiveness, but often is not. These concepts serve—or should serve—a heuristic purpose; they provide a way to begin and, subsequently, a way of indicating and integrating diverse phenomena that may fall under the rubric of this conceptual assertion. Presumably, the usefulness of the concepts will be tested, and they will be re shaped under the impact of the empirical consequences they entail. The reader may wish to consider, as an instance of this general approach, the attempt of Lazarus (1966) to construct and delineate a concept of stress [seeStress].

The second tactic, now increasingly employed, eschews the seeming arrogance of arbitrary conceptualization. It begins with, and even encourages, diversity but rests its fate in analytical methods believed capable of identifying the lawfulness and conceptual structure of the personality phenomena which have been collected for study. Patternings that are found to be stable when these means are applied to variegated data are presumptive evidence of functional entities. These functional entities may be named and employed thereafter as integrative and consequential concepts. By and large, this approach has employed the method of factor analysis, with all of its strengths and weaknesses. A good illustration of the approach is the work of R. B. Cattell, recently presented in an integrating, summary volume (1965). [SeeFactor analysisandTraits.]

However an investigator chooses to proceed in articulating and delimiting the sense in which he will define “personality,” he will confront the dismaying problem of “measurement.” What is measured or recorded is behavior, and behavior has many determinants and many meanings. Identical behaviors may be the expression of fundamentally different mediating sequences or, conversely, different behaviors may be essentially equivalent in the functions they serve or the processes from which they eventuate. Behavior is susceptible to many transient and otherwise conceptually irrelevant conditions; it is susceptible to being organized in many ways and to many levels of interpretation. In all this flux, how are anchor points to be established?

By way of illustration, consider the concept of anxiety and its measurement. Perhaps all conceptualizations of personality leave room for or try to encompass the notion of anxiety; by focusing on the measurement of anxiety, most of the problems and logic of personality measurement may be illustrated.

The term “anxiety” takes on scientific meaning only by its empirical concomitants and consequences. But, for the present, the sense with which the term should be understood here is perhaps sufficiently conveyed by resort to the tautology of a dictionary definition (Webster’s New International Dictionary, second edition). Thus, anxiety means “painful uneasiness of mind respecting an impending or anticipated ill; a state of restlessness and agitation, with a distressing sense of oppression about the heart; expectancy of evil or danger without adequate ground … ; concern, dread, fear, foreboding, misgiving, worry, solicitude, uneasiness, apprehension.” [SeeAnxiety.]

Clearly, the notion of anxiety refers to subjective experience. And because anxiety is important regardless of whether it is theoretically conceptualized as a prime mover, as derivative, or as epiphenomenon, psychologists wish to measure it.

Four basic procedures

There are four classes of indicators that may be used: (1) a subject may be observed in his everyday life and actions (presumably without his actions’ being affected by these observations) and from his behaviors a judgment is made as to whether he is anxious; (2) a subject may be asked, by means of a questionnaire, to state directly (or in ways he may not completely understand) whether he is anxious; (3) a subject may be placed in a controlled or test situation designed to elicit special behaviors or products relevant to anxiety; (4) a subject’s physiological reactions may be assessed by various instruments, to determine whether he shows certain responses or changes presumed to be indicants of anxiety.


We all form impressions of other people in the course of observing them. If these impressions can be objectified and systematized, then there is no reason why observation cannot be employed as one kind of measurement of personality. Observation has an appeal because of its closeness to experience and because of its apparent simplicity. It has been criticized for the other side of these virtues—the data generated are subject to observer idiosyncrasies and often do not meet scientific standards of reproducibility. The method depends heavily on the quality of the observers, the range of contexts in which judges observe the subject, and the way in which the observations are codified so as to formulate a dimensional index. With close attention to these concerns this approach becomes cumbersome, but it still appears to be the method of choice when complex concepts are being considered or when a problem area is being approached for the first time. There are not many instances of appropriate usage of this approach. For consideration of the ways and means of employing it effectively, see, for example, Guilford (1954, chapter 11), Peak (1953), Block (1961). [SeeObservation.]

When this method of measuring is used, anxiety might be indexed as some composite of the judgments of several psychologists or psychiatrists, each judge having observed the subject in a different but evocative situation and each judge referring the concept of anxiety to his professional (and personal) understanding of the term. Each judge having placed the subject at a (numerical) point along an anxiety continuum, some average value of these numbers can serve as a score or measurement of the subject with respect to anxiety level.

Questionnaires, or inventories

By far the most frequent technique invoked to measure personality has been the questionnaire. Typically, the questionnaire, or personality inventory, will contain a wide variety of questions on highly personal matters, which the subject is asked to answer. Such devices are extremely convenient for the investigator; explicit answers are provided by the subject regarding a host of matters, and these responses lend themselves immediately to a variety of analytical techniques. However, there are a number of hazards in the use of inventories, and research has for a long time been concerned with identifying and coping with these problems.

In the initial usages of personality inventories, it was presumed that the responses of subjects could be believed and that the interpretative or conceptual significance of a response was clearly apparent. This presumption is by no means wholly untrue, but its applicability is conditioned by the motivation of the respondent vis-a-vis the nature and context of the queries and by the complexity of the relations through which an inventory item relates to a personality dimension.

People are astonishingly open and cooperative when approached in an interested, supportive, and nondemeaning way. Too often, however, psychologists have impersonally handed out highly personal questionnaires for reasons not appreciated or believed by the recipients. It is not surprising, under such circumstances, that deliberate innocuity of response has been and often still is the order of the day. Indeed, failure to cover up and appear blandly conventional in such settings appears to be a sign of psychopathology. But in the context of seeking psychiatric help, it is far less likely that someone will dissimulate in his responses to an inventory. In any event, many questionnaires incorporate items which, separately or in conjunction, provide an indication of the frankness of the respondent and hence of the validity of his responses. There have been increasingly successful attempts to construct questions or combinations of questions the full implications of which almost no respondent could be expected to understand.

A more interesting problem is the establishment of the psychological import of questionnaire responses. The statements to which a subject is asked to respond may or may not be ambiguous. However, the implications of a single response, offered without a context, almost invariably are obscure. Just what is meant by an affirmative answer to the statement, “I know who is responsible for my troubles”? Is the respondent acknowledging that he is his own worst enemy or is he projecting an accusation of persecution upon someone else? The item “I like tall women” is a consistent empirical indicator of impulsivity in American males, but why? Seemingly equivocal items and items only deviously related to personality dimensions abound in contemporary inventories; and indeed, herein lies their strength, since even a sophisticated subject finds it difficult to anticipate or control the dimensional implications of his response.

It was the great conceptual contribution of the Minnesota Multiphasic Personality Inventory, still the dominant questionnaire used in the United States, to turn away explicitly from the apparent cogency of items and turn instead toward a requirement that items (and the scales subsequently developed from these items) discriminate between behaviorally or socially different criterion groups. This attitude toward test construction is not so blandly empirical as has been believed—there are still conjectures regarding fruitful questionnaire statements, and there is recognition that much can be learned from thoughtful reflection on the processes that might underly the emergence of an unanticipated discriminator. But first there is the insistence that beyond question the items separate groups. With this approach, well applied, the problem of dissimulation largely vanishes, since respondents cannot foretell the implications of their answers. To date, because of the formidable labors involved in developing a widely ranging inventory, applying it to large and variegated samples, and then exploring for discriminating items, the full potential of this approach has not been realized.

In particular it has often been the case that criterion groups are selected on the basis of poor or uninteresting or confused criteria. Thus, subsequent empirical analyses can provide only weak or trivial or ambiguous response to the particular research inquiry. There has been little systematic consideration given to the problem of which criterion groups might, should, or must be discriminated. However, with the advent of the computer, which can perform the vast clerical labors of comparison, it may be expected that the multiple contrasts required to build more than a narrowly specified validity will be forthcoming. For a discussion of some of these problems, see Block (1965) and Campbell (1957).

With regard to the concept of anxiety, it would be necessary first to contrast the inventory responses of a group of individuals considered by experts to be anxious with responses provided by a comparable group of nonanxious individuals. Those statements affirmed or denied with differential frequency by the members of the anxious group and corroborated on other samples would constitute an anxiety scale. Once refined and established as a valid discriminator, this scale could be employed to identify individuals responding to items in a way characteristic of the anxious individuals initially studied. Although the significance of a subject’s response to any one item relating to anxiety can readily be questioned, it is reasoned that the closer the correspondence between an individual’s response pattern and the response pattern typifying the anxious group, the more likely it is that he is anxious.

Interviewing. It may be noted that the frequently used technique of the interview falls between the two categories of personality measurement so far discussed. If the interview employs standardized questions, systematically asked of the subject, then the interview may be regarded as a verbally administered questionnaire. If the interview is unstructured and simply provides the interviewer with an opportunity to observe subjects in a social (and tense) situation so that the interviewer can formulate some impressions, then the interview falls into the classification of observations as a basis for personality measurement. [SeeInterviewing, article onpersonality appraisal.]

Test situations

Observations are heavily and cumbersomely dependent on the quality of the judges used, their number, and the significance of the subject’s behaviors sampled; questionnaires are convenient enough but often appear too distant from actual behaviors to earn a trust in their relevance. Because of the deficiencies or constraints in these approaches, psychologists often turn to specially devised test situations as a basis for eliciting behaviors to serve as indicants of personality variables. This orientation has special promise, as yet only partially achieved because the special premises on which it depends are not sufficiently understood.

Test situations are constructed by the researcher to present to the participating subject a psychological context or influence or choice formulated in a way that makes it relevant to the personality variable under study. The response of the subject to the environmental demand upon him can, provided the subject appropriately interprets the test situation, serve as an indicator of the selected personality characteristic the test situation was designed to elicit. The crucial requirement of this approach, upon which all of its usefulness depends, is signified by the italicized phrase in the preceding sentence—if his response is to have the desired interpretive significance and if the responses of different subjects are to be considered as comparable, the subject must apprehend the test situation in the way the experimenter prefers.

More often than psychologists have wished to acknowledge, subjects react to experimental situations in ways that foil interpretation or that are idiosyncratic. The usefulness of the test situation may be voided in two ways. First, unrealized by the experimenter, different subjects may enter the situation with different attitudes or sets or expectations, which because of their heterogeneity cause behavior within the test situation to have equivocal implications. Thus, an experimental situation concerned with competition and cooperation, employing actual money as a reward, might fail to recognize the different premises with which, say, a desperately poor subject, a wealthy subject, and a devout Quaker might conceive of their participation in the experiment. Similar responses could well have different bases; different responses could be open to multiple interpretations. Of course, these different premises can themselves become the focus of study, but if they go unrecognized and if subjects are unwarrantedly presumed to be homogeneous, then the results derived from the experiment can have little significance.

Second, also without realizing it, the experimenter may have in fact defined the test situation in ways different from those he had intended, so that the behavior of subjects, although they may be homogeneous enough, responds to quite another (and unrecognized) issue or influence than the experimenter has conceptualized. Thus, an experimenter may seek to convey in his test situation the false bit of information that the subject’s performance on some task is unusually poor. The experimenter’s goal in communicating this “fact” may be to motivate the subject intensely. However, the subject’s own view of his performance and general competence, amplified perhaps by ineffective acting by the experimenter, may cause him (and all other subjects) to disbelieve the experimenter and to behave on quite different grounds. For reasons of tact and tacitness the experimenter perhaps may not be confronted with the knowledge of this state of affairs, and he may therefore presume that the subjects responded in the way his situation directed them. Failure to check appropriately on the congruence between experimenters’ and subjects’ conceptions of the test situation is a frequent error.

To exploit situational tests maximally for the purposes of measuring personality, a firmer recognition is required of the interpersonal nature of psychological experimentation and of the manifold structurings of a test situation which can be developed by the facile minds of human beings. An experimenter’s presumption about the processes operating within his experimental situation is insufficient alone—it must be supported by other views congruent with it. A consensus derived from the judgments of a group of psychologists is helpful, but the most supportable basis for evaluating a test situation is the consensus perception developed by actual subjects or individuals equivalent to the subjects being studied. Chein (1954), Gibson (1960), and Jessor (1956), each from a different viewpoint, have written thoughtfully on the matters involved in environmental specification.

Further, the presence and the potency of the experimenter’s effects upon his results must be assessed if generalizations beyond the immediately obtained results are to be made. In direct and indirect ways experimenters or examiners influence the behaviors of the subjects they encounter—a friendly, warm experimenter will elicit higher levels of aspiration from subjects than will a stern, aloof experimenter; subjects are often exquisitely sensitive to (and cooperative with) the hypotheses of an experimenter. The review by Kintz and his associates (1965) summarizes to date much of what has been discovered regarding unacknowledged and unrecognized experimenter influences on subject performance.

Yet, if experimenters prove to be essentially interchangeable and if the psychological definition of the test situation is personally involving and essentially the same for all subjects and if, therefore, a subject is faced with the necessity of a behavioral choice, his decision must be formed on the basis of intrapsychic considerations. If the environmental directives are not overwhelming, and thereby compelling of a uniformity of response, then differential behavior will result, and this differential behavior will reflect differences in personality. The trick is to be incisive in the behaviors studied, so that the manifestations of personality thus developed are nontrivial.

To exemplify a test situation relevant to the measuring of anxiety, consider a rote-learning task involving occasional unforewarned, extremely painful electric shocks. Some subjects will learn the designated material quickly, and others will learn slowly. Provided there is convincing information, obtained separately, that the subjects are of equivalent intelligence and background vis-à-vis the learning task and that they have equal thresholds of pain sensitivity, similar reactions to pain, etc., the hypothesis may be advanced that differential behavior in the learning task is attributable to differences in anxiety level. To help support this interpretation, our research design might call for the same subjects to experience an equivalent learning task without the threat of shock and its anxiety-evoking potential. In this second context the subjects should, according to theory, certainly manifest a much smaller range of efficiency in the learning situation and, again according to the concept of anxiety held, perhaps an effectiveness that on the average is superior to the average they achieved when threatened by shock. When an individual’s performance under both conditions is compared, it becomes possible to infer his level of anxiety.

Projective methods. Projective methods, such as the Rorschach ink blots or the Thematic Apperception Test, may be viewed either as test situations or as providing a basis in observation for personality formulations by judges. If responses are scored and these scores are taken as indicants of personality, then the projective technique is being employed as a test situation. If projective stimuli are conceived of simply as the occasion for a complex, semistandardized interview of a kind, eliciting many behaviors, from which an examiner, by virtue of his background and experience can, without scoring responses, evaluate the responding subject, then the projective method serves to provide behavioral grist for the integrative mill of the observer. [SeeProjective methods.]

Physiological responses

A great deficiency in the three approaches to personality measurement just described is that the behavior of the subject is to a greater or lesser extent under his volitional control—the subject can, more than a little, decide how he will respond when under observation or when faced with an inventory or when functioning in a test situation. He may be caught unawares, and he may not know how to direct his behavior, but he can, often appreciably, shape his manifest behavior. Since so many dimensions of personality relate to behaviors regarding which a subject may be motivated to be—deliberately or without awareness—misleading, it is important to escape this limitation.

The autonomic nervous system, however, is not, except in a most unusual and sophisticated individual, under the control of intentions. An increase in heart rate, a hypermotility of the stomach, a rise in circulating epinephrine, cannot in general be volitionally prevented. Accordingly, the use of measures of such functions is attractive to psychologists.

The psychophysiological method has enjoyed cyclic popularity in personality psychology. Despite the appreciable investments of time and thought put into the formidable problems of instrumentation, the contributions of psychophysiology to personality measurement have thus far been few and often without reliable usefulness. The reasons for this are several.

One problem is that of recording in sufficient detail and without artifact or the intrusion of irrelevancies the physiological phenomena going on underneath the subject’s skin. Although such technical matters have inhibited growth of the field, recently great strides have been made and impressive sophistication has developed with regard to transducing and recording various physiological changes in the individual. While heretofore all psychophysiological measurements required carefully prepared rooms and carefully wired subjects, now the time is near when many of these measures will be telemetrically recorded from unhampered subjects carrying miniature transmitters as they go about their everyday lives. It may be expected that more representative and hence more cogent samplings of psychophysiological changes will be gathered as these technical advances reach maturity and acceptance.

However, even with this kind of technological progress, the psychophysiological field (and the possibilities of the field) cannot be advanced unless there is further insight regarding such matters as what physiological channels to record and what facets of the multifaceted physiological process to fix upon as informative and dependable. A second problem of the psychophysiological approach, then, is attributable to the slowness with which truly incisive measures are being found. Quite properly there has been a reluctance to standardize any single set of physiological indexes, because of insufficient orderliness of the measures thus far studied and dissatisfaction regarding their ultimate behavioral relevance. Different investigators continue to employ different measures, and there have been few multivariate studies oriented toward establishing a reduced number of dependable measures. Much cumbersome, trying, and undramatic work must still be done before the necessary systemization and convergence on fundamental measures is accomplished.

A third problem affecting the psychophysiological field stems, not from technical lags or deficiencies of discovery, but rather from an attitude underlying much of the research being done. The orientation of many investigators who have chosen to work in the psychophysiological realm is more toward the physiological than the psychological. Physiological reactions or changes, although occurring within the individual’s life-span, have often been looked upon simply as end products of physical stimuli or as cause and effect of other physiological changes. This kind of focus has its merits and its successes. A more psychological approach, however, would use psychophysiological reactions, given meaning by the context of their occurrence, as signs or tokens of the existence of a subject’s motivational state or characterological tendencies. Polygraph recordings of sexual arousal and of sympathy may well look alike; it is the environmental context in which the recordings were made that defines the significance of the interior changes that have been identified. The psychophysiological approach perhaps may realize its potential contribution to personality measurement when it is conjoined with a psychology of context. [SeeLearning, article onneurophysiological aspects.]

In applying the psychophysiological method to the measuring of anxiety, we might choose to place the subject in a stressful situation and then measure the level and changes in the electrical conductance of his skin. Despite various problems of artifact and technique, changes in skin conductivity appear to mirror changes in the emotionality of the subject. As he relaxes, his skin conductance drops; as he becomes tense or is placed under explicit stress, his level of skin conductance rises, usually by a series of rapid, superimposed changes. We may judge a subject as anxious if his conductance level has risen greatly from its normal or basal conductance level. Alternatively, we may measure anxiety as a function of the time required for the subject’s conductance level to revert from its peak to its basal level. Or we may define anxiety in terms of the number of abrupt conductance changes (galvanic skin responses) a subject emits in a unit of time during which there is no clear external or situational stimulus. In this last definition of an anxiety measure, it is presumed that “nonspecific” or nonattributable galvanic skin responses are tripped by internal, private, even unconscious anxiety stimuli. Additional conductance measures of anxiety of course can be proposed, but it is not yet clear which, if any, may be used with assurance.

States and traits

Concern with physiological indexes, such as conductance, brings forward with particular clarity a distinction we have thus far neglected but which in fact exists, however we propose to measure personality, and which confounds many interpretations of personality relevant behavior. This is the “statetrait” separation, a conceptual attempt to distinguish between relatively transient, actually existing internal conditions that an individual experiences (states) and the general tendency of an individual to find himself enveloped by certain classes of circumstances (traits). Are these conductance rises or accommodation differences reflective of subject differences in anxiety level, or do they reflect instead subject differences in susceptibility to stress? Furthermore, when we conceptualize anxiety in general, are we talking of a particular, existential condition at the moment affecting a given subject (i.e., the subject’s state) or are we out to measure the likelihood that the subject will experience often the phenomenal state of anxiety, i.e., that the subject has the trait, however achieved, of being anxiety prone? What we measure and how it will be measured depend upon the conceptual position explicitly reached in regard to the statetrait distinction. Much unrewarding research and discussion in the psychological literature has been due to a failure to make this separation or to accept its consequences. In particular, state notions have been measured in ways appropriate only to traits, without recognition of the error—the correlates of anxiety proneness have been presumed to be the correlates of anxiety states (e.g., Taylor 1956). As has been shown by Mandler (1959) and others, there is no logical necessity for this presumption to be true and there are abundant empirical instances of its falsity.

Short of special subject selection, prolonged measurement, and perhaps even longitudinal study, the empirical separation of the concepts of state and trait is a difficult task indeed. The recent review by Martin (1961) points up well both the difficulties and the attractiveness of the physiological approach to personality measurement.

The necessity of convergence

Having assessed a personality concept—in the present instance, anxiety—in multiple ways, it is necessary to evaluate the extent to which these operationally diverse but conceptually equivalent indexes are related to each other. It is embarrassing if the several measures of a concept do not constitute a congruent ensemble. It is also bothersome when a measure proves to relate too closely to an index or behavior conceptually irrelevant to the dimension of immediate interest. Thus, a small or moderate correlation between a measure of anxiety and scores on an established intelligence test is conceptually supportable from a variety of theoretical viewpoints. But an extremely high connection between anxiety and intelligence might cause us to question whether our proposed anxiety index is not, instead, simply a camouflaged measure of intellect.

By and large, personality measures to date have not performed well when evaluated against the criteria of convergence and discrimination. Campbell and Fiske (1959), after an elegant presentation of the logic which should underlie the development of a personality measure, evaluate a variety of commonly used indexes. They find that rather few of the measures psychologists employ meet the criteria that would justify the interpretive labels these measures have been awarded. Their analysis and survey is a significant one, which those interested in personality measurement will wish to read.

Humphreys (1960) and Cattell (1961) have offered some useful psychometric tactics for dealing with the problem of excessively low correlations of a measure with conceptually similar measures and excessively high correlations of a measure with conceptually dissimilar indexes. As these and related suggestions gain currency and take hold, we may expect that personality measures will develop a substantiality they do not yet enjoy. The general principle for progress in personality measurement appears to involve aggregating a number of conceptually equivalent but operationally diverse measures and, through the use of standard scoring and factor analysis, constructing an average or expression of the dimension common to the phenotypically heterogeneous indexes being brought together.

In sum, the goal of a scientific understanding of personality and its development requires recognition of the inevitable and conducive role of personality measurement. There has been extensive and increasingly sophisticated research oriented toward the development of conceptually univocal and psychometrically sound measures of personality. The basic approaches to personality measurement include the use of observation, questionnaires, specially constructed test or social situations, and physiological indicators. A significant continuing problem in the field is the poor agreement between ostensibly equivalent measures. This problem is being solved by greater care and refinement in the construction and identification of personality measures.

Jack Block

[See alsoPsychometrics.]


Block, Jack 1961 The Q-sort Method in Personality Assessment and Psychiatric Research. Springfield, III.: Thomas.

Block, Jack 1965 The Challenge of Response Sets. New York: Appleton.

Campbell, Donald T. 1957 Factors Relevant to the Validity of Experiments in Social Settings. Psychological Bulletin 54:297–312.

Campbell, Donald T.; and Fiske, Donald W. 1959 Convergent and Discriminant Validation by the Multitrait-Multimethod Matrix. Psychological Bulletin 56: 81–105.

Cattell, Raymond B. 1961 Theory of Situational, Instrument, Second Order, and Refraction Factors in Personality Structure Research. Psychological Bulletin 58:160–174.

Cattell, Raymond B. 1965 The Scientific Analysis of Personality. Baltimore: Penguin.

Chein, Isidor 1954 The Environment as a Determinant of Behavior. Journal of Social Psychology 39:115–127.

Cronbach, Lee J.; and Meehl, P. E. (1955) 1956 Construct Validity in Psychological Tests. Pages 174–204 in Herbert Feigl and Michael Scriven (editors), The Foundations of Science and the Concepts of Psychology and Psychoanalysis. Minneapolis: Univ. of Minnesota Press. → First published in Psychological Bulletin.

Gibson, James J. 1960 The Concept of the Stimulus in Psychology. American Psychologist 15:694–703. Guilford, Joy P. 1954 Psychometric Methods. 2d ed. New York: McGraw-Hill. → First published in 1936.

Humphreys, Lloyd G. 1960 Note on the Multi-trait-Multimethod Matrix. Psychological Bulletin 57:86–88.

Jessor, Richard 1956 Phenomenological Personality Theories and the Data Language of Psychology. Psychological Review 63:173–180.

Kintz, B. L. et al. 1965 The Experimenter Effect. Psychological Bulletin 63:223–232.

Lazarus, R. S. 1966 Psychological Stress and the Coping Process. New York: McGraw-Hill.

Mandler, George 1959 Stimulus Variables and Subject Variables: A Caution. Psychological Review 66:145149.

Martin, Barclay 1961 The Assessment of Anxiety by Physiological Behavioral Measures. Psychological Bulletin 58:234–255.

Peak, Helen 1953 Problems of Objective Observation. Pages 243–299 in Leon Festinger and Daniel Katz (editors), Research Methods in the Behavioral Sciences. New York: Dryden.

Taylor, Janet A. (1956) 1963 Drive Theory and Manifest Anxiety. Pages 205–222 in Martha T. Mednick and Sarnoff A. Mednick (editors), Research in Personality. New York: Holt. → First published in Psychological Bulletin.


One of the most popular ways of studying personality is to construct a questionnaire, or inventory, containing short statements about a person’s behaviors and feelings that he can check as true or false. This deceptively simple idea of a self-report inventory for personality assessment was first employed on a large scale in World War i, for initial screening of army recruits. Containing items similar to questions a psychiatrist would ask, the Woodworth Personal Data Sheet served as the prototype for numerous personality inventories which have flourished since. [SeeWoodworth.]

The modern self-report paper-and-pencil personality inventory bears only a slight resemblance to the straightforward, simple approach of Woodworth. Numerous, well-developed scales may be embedded in a single instrument, including some especially designed to detect the individual who is faking or who is failing to read the items carefully. Objectively scored answer sheets, standardized scales with normative reference groups, and actuarial predictions from statistical tables are now commonplace. And yet, most critics would agree that even the more widely used personality inventories today are still limited in validity and general application.

The self-inventory approach to personality is appealing on several grounds. Most individuals are more candid and objective in their self-appraisal when responding to statements in an impersonal, printed inventory than they would be in a direct interview or written autobiography. The method is objective, repeatable, economical, and efficient. The results are amenable to examination by multivariate statistical methods and psychometric theory and to rapid analysis by computers. [See Factor analysisandPsychometrics.]

Major approaches

Three major approaches have been employed in the development of personality inventories. The most obvious starts with a definition of the trait being measured, such as extroversion–introversion or neuroticism. A large number of statements are written and gradually refined until the manifest content “looks right,” on the face of it, with respect to the trait in question. Woodworth’s Personal Data Sheet and many of the personal adjustment inventories that have followed it, such as the Bell Adjustment Inventory, the Bernreuter Personality Inventory, and the Cornell Medical Index, have relied chiefly upon face validity for evidence of their value. Unfortunately, the use of these a priori, direct methods has been almost uniformly disappointing. Such information as they provide is superficial and misleading at best.

The second approach, involving strictly empirical methods, arose partly in reaction to the failure of the earlier “common-sense” approach. Starting with a large pool of a few hundred statements, culled from a variety of sources, experimental forms of the inventory are given to samples of individuals drawn from known criterion groups. Those items which discriminate the criterion groups are placed in a provisional scale, while the remainder are discarded or employed for other purposes. The provisional scales are cross-validated by administering a refined version of the inventory to fresh criterion samples. The most widely used instrument of this type is the Minnesota Multiphasic Personality Inventory (MMPI). Ten clinical and four validity scales were originally developed for the MMPI, by using criterion groups drawn from clinical populations. The MMPI pool of 550 items has proved convenient for the construction of many other empirical scales employing similar methods. The possible number of such empirical scales in a large pool of items is almost infinite, being bound only by the number of different specific criterion groups to be discriminated.

A third major approach to the problem starts with a large pool of items, as in the MMPI, but sets as a goal the development of homogeneous scales that are logically independent and internally consistent. Ideally, one would first like to know the major dimensions underlying variations in response to every conceivable statement that can be imagined about an individual’s personality. Then scales could be constructed to represent these dimensions by appropriate combinations of items. The dimensions of personality emerge only after a long process of internal item analysis, scale construction and revision, and the use of factor analysis to clarify the existing structure. The Guilford–Zimmerman Temperament Survey (Guilford & Zimmerman 1955) and Cattell’s Sixteen Personality Factor Questionnaire (16 PF) (Cattell et al. 1957) are two wellknown examples of personality inventories constructed by this method.

The Guilford–Zimmerman Temperament Survey is appropriate for adolescents and adults. The inventory grew out of work by Guilford and his colleagues (Guilford & Zimmerman 1956), in which 13 primary dimensions of personality were found by factor analysis of 69 item clusters, where three or more clusters per factor had been hypothesized in advance. With some exceptions, intercorrelations of the 13 factor scores are generally close to zero. Ten of these dimensions are included in the temperament survey.

Cattell’s 16 PF was the result of an ambitious plan to sample the entire “personality sphere” (Cattell 1946) more completely than ever before. A total of 171 personality variables was compiled, and large numbers of items were written. Experimental test forms were given to college students, and a series of factor analyses was carried out, extracting more factors than most other investigators would judge to be meaningful and using oblique rather than orthogonal rotations. The 16 factors believed by Cattell to be most significant form the basis for the 16 PF. While the major scales represented in the questionnaire closely resemble dimensions found by other investigators using similar approaches, the short scales have very low reliability and, if they are to be used at all, are recommended only for research purposes. The overlap of the scales permits the extraction of second-order factors (Cattell 1956).

Cattell and his associates have also tried to extend the use of paper-and-pencil personality inventories down the age scale to six-year-olds (Cattell & Coan 1958). For the most part, however, these attempts have not been very successful below young adolescence.

Still a different point of view among factor analysts working with personality inventories is that of Eysenck (1953). Where Cattell and Guilford postulate many primary dimensions, Eysenck insists upon a higher level of abstraction, with only three basic dimensions—neuroticism, introversionextroversion, and psychotic behavior. The first two of these are represented in the Maudsley Personality Inventory, developed by Eysenck and used widely in England.

Disagreements as to the number of personality dimensions which can be measured reliably by questionnaire methods arise largely because of disputes over the best way to discover and define such dimensions. Employing different methods of factor analysis, Eysenck conservatively sticks to only very general factors. Moreover, he fails to sample systematically throughout the universe of all personality statements that can possibly be put in questionnaire form. In reviewing this dilemma Coan (1964) has pointed out that behavioral organization or personality structure can be conceptualized on four different levels—specific response, habitual response, trait, and type. The selection of subjects, number of factors extracted, kind of rotation employed, and density of personality-variable sampling affect the degree of referent generality or level within the hierarchy at which inferences about dimensions are made. When viewed in this manner, Eysenck, preferring types, has a much higher referent generality than have Cattell and Guilford, who focus on traits and even habitual responses. It is interesting to note that most second-order factor analyses of primary dimensions in personality inventories yield general factors similar to the three proposed by Eysenck, pointing to a basis for resolving the dispute.

The California Psychological Inventory (CPI), developed by Gough (1957), is a hybrid of the strict empirical approach, exemplified by the MMPI, and the dimensional approach of Guilford and Cattell. Instead of using factor analysis as a basis for defining dimensions, Gough emphasizes concepts drawn from the folk culture or common language as criteria for scale construction. The CPI covers 15 traits, plus three additional scales for detecting invalid or deviant records. The scales were developed by item analysis in terms of behavioral criteria (usually ratings), without regard for their internal homogeneity. However, Gough did take into account dimensional studies of normal personality before selecting the behavioral ratings to be predicted from questionnaire scales.

If major dimensions in personality inventories are really stable, they should appear consistently in different inventories, provided the same aspects of the personality sphere are sampled. A number of studies have been conducted comparing two or more personality questionnaires given to the same subjects. Typical of such studies are the one by Mitchell (1963) comparing the CPI with the 16 PF and the one by LaForge (1962) comparing the MMPI with the 16 PF. Although Mitchell found little relationship between the primary factors in the CPI and the 16 PF, the congruence of secondorder factors in the two inventories was quite striking. Six second-order factors were found among the 34 scales, the largest two of which were easily identified as neuroticism and extroversion-introversion, verifying Eysenck’s contention that these two general factors run through most personality questionnaires and are of greater importance than the lower-order traits. LaForge also found neuroticism to be the first factor running through both the 16 PF and the MMPI, although his other factors are complex. Because the MMPI comprises clinically derived scales of an overlapping, empirical nature, dimensional analysis using factorial methods is not very satisfactory unless one works directly with the individual item responses rather than the derived scales.

Spurious factors in test taking

A major problem in personality inventories is the control of response set, stylistic variables, and spurious factors that may be confounded with the obtained scores on personality scales. Self-report tests are subject to falsification by the individual who wishes to make a good impression. Quite aside from any intentional faking, people do not know themselves well enough to give factual answers to most personality or attitude questions. Since some personality inventories allow the individual a range of responses, some subjects give many extreme answers, while others are characteristically cautious. Some will tend to agree with a statement or check “true” rather than “false,” regardless of the item content. While these confounding variables have been recognized for many years, only recently has a great deal of interest been generated in their psychological meaning and in methods for their measurement and control.

In his critical survey of personality assessment, Vernon (1964) reviews the history of this problem. An exhaustive review has also been made by Rorer (1963), who concludes that response style has been greatly overrated as a factor in personality measurement. Other recent reviews of interest have been published by Jackson and Messick (1958), McGee (1962a; 1962b), Christie and Lindauer (1963), and Holtzman (1965). [SeeResponse sets.]

Social desirability and acquiescence

The two response styles upon which the greatest amount of research has been done are social desirability and acquiescence. Socially desirable responding is the tendency to give answers that are independently judged by social consensus to be desirable. Acquiescence is the tendency to agree with a statement, regardless of content. Many writers have taken strong positions concerning the significance of these two response styles, positions from which they retreat only with great reluctance.

Using the method of successive intervals for scaling items from the MMPI, Edwards (1957a) developed his Social Desirability Scale as a measure of the tendency to respond in the socially desirable fashion. The 39 items in the scale are keyed so that the higher the score, the more the individual endorses the socially desirable alternative. The correlations between this scale and the first general factor on the MMPI are so high that some investigators have argued that most of the variance in the MMPI can be accounted for in terms of this single dimension of social desirability–undesirability (Edwards 1962). It is clear to most observers, however, that the Social Desirability Scale alone is no match for the full MMPI profile (Marlowe & Gottesman 1964; Wiggins 1962; Elvekrog & Vestre 1963; Jackson & Messick 1962). The standard scales in the MMPI cover much more than social desirability.

As in the case of social desirability, considerable debate continues over the importance of acquiescence in personality inventories. Jackson and Messick (1962) computed separate scores for trueor-false keyed items for a large number of MMPI scales, as well as for five social-desirability scales, and performed a factor analysis of the intercorrelations of the 40 variables separately for three large, diverse samples of individuals. The three samples yielded identical results. More than half of the total reliable variance could be attributed to the two stylistic dimensions—acquiescence and social desirability. On the other hand, Dahlstrom (1962) takes Edwards and Jackson and Messick to task for interpreting social desirability or tendency to agree primarily as response style and for not considering the possibility that true-or-false answers may nonetheless be veridical, may be attempts at various impression formations, or may be acquiescences to a particular pressure conveyed by the instructions, the examiner, or the test setting. Block (1962) presents data in support of the idea that social desirability and adjustment are basically different concepts even though they obviously are related.

With these many conflicting points of view, it is difficult to reach a firm conclusion concerning the relative importance of response styles as confounding variables in personality inventories. Remaining largely unsettled and for the most part ignored is the more crucial question of external validity. Do these response styles measure anything important beyond their interesting correlates with other variables from paper-and-pencil questionnaires?

In general, most attempts to demonstrate the personality correlates of stylistic variables by comparing them with independently derived personality measures have yielded negative results (McGee 1962a; 1962b; Foster & Grigg 1963; Appley & Moeller 1963). The behavioral correlates of response styles are few and of questionable significance. While their existence cannot be denied in at least some personality inventories, their interpretation as important variables in their own right is debatable.

Variations among personality inventories

Most contemporary personality inventories employ a dichotomous response format of true–false or agree–disagree and are designed as omnibus instruments, with multidimensional scales covering different traits. A few have departed from this standard form in one way or another.

Edwards (1957b) developed the Edwards Personal Preference Schedule by pairing statements having equal social desirability values, requiring the subject to choose the member of each pair which is most like him. This forced-choice method makes it more difficult for an individual to fake a socially desirable picture, although response styles are not really eliminated (Borislow 1958; Corah et al. 1958). The 225 paired comparisons lead to scores on 15 scales, patterned after Henry Murray’s list of needs. Low intercorrelations between the scales and moderate reliabilities (.60–.87) suggest that the inventory has promise as a research instrument, although its external correlates have been disappointing. [See Personality: contemporary viewpoints, article oncomponents of an evolving personological system.]

The forced-choice format for responding to personality items usually involves pairs or triplets of statements, from which the subject chooses the alternative most like or least like himself. A more extensive variation of this approach is the Q-sort method, introduced by Stephenson (1953) and popularized by Rogers and his associates (Rogers & Dymond 1954). Typically, a number of statements describing traits are printed on small cards, which can then be sorted by the subject into a forced normal distribution, using a limited set of between 7 and 11 categories arranged on a continuum from “most like me” to “least like me.” The same statements can be reshuffled and sorted again, using different instructional sets, such as “how I want to become” or “how others view me,” instead of “how I really am.” Special statistical problems arise in the treatment of such data, because of the ipsative nature of the distributions. The method has been reviewed thoroughly by Wittenborn (1961).

Another variation involves changing the response format from a simple dichotomy of yes–no or true–false to a continuum represented by five to nine gradations of response. Comrey (1964) found that he could improve item reliability by using a ninepoint response format ranging from “always” to “never” for some items and from “definitely” to “absolutely not” for others. Still in experimental form, Comrey’s personality inventory consists of three validity scales plus 30 short scales of six items each, derived by repeated factor analysis of item intercorrelations and construction of new items. The nine-point response format yields more information per item than the usual three-point scale, making it possible to achieve split-half reliabilities ranging from .62 to .93 (median .84) for the 30 short scales.

A five-point response format was developed by Moore and Holtzman (1965) by combining the usual true–false dichotomy with an intensity scale for measuring the degree of concern or worry reported by the subject. In addition to the lowest point, “false, or does not apply to me,” four levels of personal concern were linked with a response of true, ranging from “true but of no concern to me” to “true and of great concern to me.” As in the study by Comrey, the use of a multiple-response continuum yielded better results than a dichotomous true–false, this time in a personal-problem inventory containing eight well-defined scales, used in a study of thirteen thousand high school youth.

The adjective check list, or trait list, represents an inventory technique at the other extreme. Instead of a short sentence or statement, each item consists of a single trait adjective, such as “friendly” or “timid.” Occasionally, bipolar traits are employed rather than single words. A long list of such adjectives is given to the subject, who is asked to mark quickly the items most like him (and perhaps those least like him). An example of a well-developed adjective check list for use as a personality inventory is the one constructed by Gough and Heilbrun (1965).

A rather novel approach to the format of an inventory has been successfully developed by Endler and his associates (1962) in their S–R (stimulusresponse) Inventory of Anxiousness. Fourteen different five-point scales are rated separately for each of 11 different situations, making a total of 154 items. The scales contain such content as “My heart beats faster,” and “I get an uneasy feeling,” while the situations define common events: “You are just starting off on a long automobile trip.” Designed as a research instrument, the S-R Inventory permits systematic study of responses, situations, and individual differences with respect to self-reported anxiety, as well as interactions between them.

Of all the dimensions revealed in personality inventories, self-reported anxiety, or “neuroticism,” is one of the best defined and most widely used. Consequently, a number of attempts have been made to devise questionnaires to measure anxiety. The Taylor Manifest Anxiety Scale was developed, using items from the MMPI (Taylor 1953). The IPAT (Institute for Personality and Ability Testing) Anxiety Scale was constructed by Cattell from the 40 items (20 subtle and 20 obvious) most-highly correlated with the second-order anxiety factor in the 16 PF. And Eysenck’s neuroticism scale in the Maudsley Personality Inventory is basically a measure of self-reported anxiety. Correlations between these similar measures of anxiety are sufficiently high to indicate a large common core, although each has its own unique variance as well. [SeeAnxietyandNeurosis.]

Variations of anxiety questionnaires have also been developed for use with children. Patterned after his Test Anxiety Questionnaire, Sarason’s Test Anxiety Scale for Children (Sarason et al. 1960) deals specifically with anxiety aroused by events in the classroom, particularly being tested or challenged. The complete children’s inventory also includes a general anxiety scale, a lie scale, and a defensiveness scale, which can be used as control measures. A somewhat similar inventory, developed independently by a different approach, is the Children’s Manifest Anxiety Scale (Castaneda et al. 1956).

Other variations in format, content, and purpose of inventories are described in A Study of Values (Allport et al. I960); the Mooney Problem Check List (Mooney & Gordon 1950); and the Minnesota Counseling Inventory (Berdie & Layton 1957). Mental Measurements Yearbook, edited by O. K. Buros, is a standard compendium of most published personality inventories, as well as of other kinds of psychological tests (e.g., Buros 1959).


A major problem with personality inventories has been the establishment of their validity. All too frequently an inventory has been published without sufficient evidence that it either has predictive validity with respect to some future behavioral criterion or has concurrent validity in terms of external correlates. In an exhaustive review of earlier work on this problem, Ellis (1946) concluded that personality questionnaires are of dubious value because they fail to correlate with either practical criteria or independent measures of presumably the same personality traits. While some of the same criticisms can still be leveled at many current personality inventories, the thrust of the argument has been blunted somewhat by the successful efforts of more recent investigators to demonstrate the validity, as well as limitations, of contemporary methods.

The most extensive research has been done on various empirically derived scales of the MMPI. This work demonstrates the usefulness of the MMPI for certain kinds of actuarial prediction, screening, and group discrimination, although its value for individual clinical diagnosis is still debatable and depends partly on the skill of the clinician.

Since most personality inventories combine both the rational and empirical approaches and involve explicitly derived dimensions with trait names to describe them, validation attempts are usually concurrent rather than predictive. Properly conducted concurrent validation must involve two or more independent methods of measuring the same traits in the same individuals. Comparing the Guilford Zimmerman Temperament Survey with Cattell’s 16 PF, for example, does not validate either inventory, since both sets of traits are derived by the same method, the self-report personality questionnaire. Personality scores on the inventory must be compared with ratings of behavior, with clinical judgments of the traits, or with trait scores from some other independent method of assessment.

The most extensive cross-method correlational studies have been undertaken by Cattell and his associates (Cattell 1950; 1957; Cattell & Scheier 1961). The sheer magnitude of Cattell’s work represents a tour de force unequaled by previous investigators, even though he is vulnerable to serious criticism (Becker 1960). Generally speaking, the highest concurrent validities are obtained when inventory scores are compared with peer-group ratings for the same traits, regardless of the particular inventory in question. Ratings by trained observers also correlate to some extent with inventory scores. Validity coefficients of this type on the CPI, for example, range from .21 to .57 (Gough 1957). The lowest validity coefficients arise when cross media comparisons involving inventory scores and laboratory-type measures or objective performance tests of personality are made. More often than not, such correlations are close to zero.

A crucial design for examining validity by the cross-method technique has been proposed by Campbell and Fiske (1959). Usually an investigator is content to show that the correlation for measures of a given trait across two methods is significantly better than chance. A much more rigorous test of validity requires that the cross-method correlation for a given trait be higher than the correlation between any two different traits, whether within the same method or across the two methods. Insisting that both discriminant and convergent validity must be considered, Campbell and Fiske recommend that all the within-method and cross method intercorrelations be arranged in a multitrait, multimethod matrix. This matrix can be examined in a number of ways, to determine both the convergent and discriminant validity. An example of this method applied to five traits purportedly measured by the CPI is reported by Dicken (1963).

A more successful application of the CampbellFiske method in the validation of a personality inventory has been carried out by Jackson (1965), working with his newly developed Personality Research Form (PRF). Using four different methods of assessing the same 20 traits in a sample of 51 college students, Jackson demonstrated high convergent and discriminant validity for most of the scales in the PRF. Individual validity coefficients for the PRF, using peer ratings of behavior, ranged from .17 to .71 (median .52). The PRF appears to be a most promising instrument, largely because it is one of the first in a new generation of personality inventories that capitalize heavily upon modern test-construction theory and upon high-speed computers for item analysis. Two parallel forms of the PRF, with precise equivalence, are available. Derived originally from Murray’s system of needs, the 20 scales contain 20 items each and are augmented by two control scales, one for social desirability and one for acquiescence.

The next few years should see considerable improvement in the power of the inventory approach to personality assessment. While much of the research of the past ten years has been methodological in nature, sufficiently advanced technology is now at hand to overcome some of the more glaring weaknesses in existing personality inventories. It is unlikely, however, that paper-and-pencil self-report methods of assessing personality will ever be much more than just that, namely, a systematic way of measuring how a person judges himself.

Wayne H. Holtzman

[Other relevant material may be found inInterviewing, article onPersonality appraisal; Observation; Projective methods; Psychometrics.]


Allport, Gordon W.; Vernon, Philip E.; and Lindzey, Gardner 1960 A Study of Values. 3d ed. Boston: Houghton Mifflin. → Allport and Vernon were the authors of the first edition, published in 1931.

Appley, Mortimer H.; and Moeller, George 1963 Conforming Behavior and Personality Variables in College Women. Journal of Abnormal and Social Psychology 66:284–290.

Becker, Wesley C. 1960 The Matching of Behavior Rating and Questionnaire Personality Factors. Psychological Bulletin 57:201–212.

Berdie, Ralph F.; and Layton, Wilbur L. 1957 Minnesota Counseling Inventory. New York: Psychological Corporation.

Block, Jack 1962 Some Differences Between the Concepts of Social Desirability and Adjustment. Journal of Consulting Psychology 26:527–530.

Borislow, Bernard 1958 The Edwards Personal Preference Schedule (EPPS) and Fakability. Journal of Applied Psychology 42:22–27.

Buros, Oscar K. (editor) 1959 The Fifth Mental Measurements Yearbook. Highland Park, N.J.: Gryphon.

Campbell, Donald T.; and Fiske, Donald W. 1959 Convergent and Discriminant Validation by the Multitrait-Multimethod Matrix. Psychological Bulletin 56:81–105.

Castaneda, Alfred; Mccandless, B. R.; and Palermo, David S. 1956 The Children’s Form of the Manifest Anxiety Scale. Child Development 27:317–326.

Cattell, Raymond B. 1946 The Description and Measurement of Personality. New York: World.

Cattell, Raymond B. 1950 Personality: A Systematic Theoretical and Factual Study. New York: McGraw Hill.

Cattell, Raymond B. 1956 Second Order Personality Factors in the Questionnaire Realm. Journal of Consulting Psychology 20:411–418.

Cattell, Raymond B. 1957 Personality and Motivation Structure and Measurement. New York: World.

Cattell, Raymond B.; and Coan, Richard W. 1958 Personality Dimensions in the Questionnaire Responses of Six and Seven-year-olds. British Journal of Educational Psychology 28:232–242.

Cattell, Raymond B.; Saunders, D. R.; and Stice, G. 1957 Sixteen Personality Factor Questionnaire. Champaign, Ill.: Institute for Personality and Ability Testing.

Cattell, Raymond B.; and Scheier, Ivan H. 1961 The Meaning and Measurement of Neuroticism and Anxiety. New York: Ronald Press.

Christie, Richard; and Lindauer, Florence 1963 Personality Structure. Annual Review of Psychology 14:201–230.

Coan, Richard W. 1964 Facts, Factors, and Artifacts: The Quest for Psychological Meaning. Psychological Review 71:123–140.

Comrey, Andrew L. 1964 Personality Factors, Compulsion, Dependence, Hostility, and Neuroticism. Educational and Psychological Measurement 24:75–84.

Corah, Norman L. et al. 1958 Social Desirability as a Variable in the Edwards Personal Preference Schedule. Journal of Consulting Psychology 22:70–72.

Dahlstrom, W. Grant 1962 Commentary: The Roles of Social Desirability and Acquiescence in Responses to the MMPI. Pages 157–168 in Samuel Messick and John Ross (editors), Measurement in Personality and Cognition. New York: Wiley.

Dicken, Charles F. 1963 Convergent and Discriminant Validity of the California Psychological Inventory. Educational and Psychological Measurement 23:449459.

Edwards, Allen L. 1957a The Social Desirability Variable in Personality Assessment and Research. New York: Dryden.

Edwards, Allen L. 1957b Edwards Personal Preference Schedule. New York: Psychological Corporation.

Edwards, Allen L. 1962 Social Desirability and Expected Means on MMPI Scales. Educational and Psychological Measurement 22:71–76.

Ellis, Albert 1946 The Validity of Personality Questionnaires. Psychological Bulletin 43:385–440.

Elvekrog, Maurice O.; and Vestre, Norris D. 1963 The Edwards Social Desirability Scale as a Short Form of the MMPI. Journal of Consulting Psychology 27:503–507.

Endler, Norman S.; Hunt, J. McV.; and Rosenstein, Alvin J. 1962 An S-R Inventory of Anxiousness. Psychological Monographs 76, no. 17.

Eysenck, Hans J. (1953) 1960 The Structure of Human Personality. 2d ed. London: Methuen.

Foster, Robert J.; and Grigg, Austin E. 1963 Acquiescent Response Set as a Measure of Acquiescence: Further Evidence. Journal of Abnormal and Social Psychology 67:304–306.

Gough, Harrison G. 1957 California Psychological Inventory: Manual Palo Alto, Calif.: Consulting Psychologists Press.

Gough, Harrison G.; and Heilbrun, Alfred B. Jr. 1965 The Adjective Check List. Palo Alto, Calif.: Consulting Psychologists Press.

Guilford, Joy P.; and Zimmerman, Wayne S. 1955 The Guilford-Zimmerman Temperament Survey. Beverly Hills, Calif.: Sheridan.

Guilford, Joy P.; and Zimmerman, Wayne S. 1956 Fourteen Dimensions of Temperament. Psychological Monographs 70, no. 10:1–26.

Holtzman, Wayne H. 1965 Personality Structure. Annual Review of Psychology 16:119–156.

Jackson, Douglas N. 1965 The Development and Evaluation of the Personality Research Form. Unpublished manuscript, Univ. of Western Ontario.

Jackson, Douglas N.; and Messick, Samuel 1958 Content and Style in Personality Assessment. Psychological Bulletin 55:243–252.

Jackson, Douglas N.; and Messick, Samuel 1962 Response Styles on the MMPI: Comparison of Clinical and Normal Samples. Journal of Abnormal and Social Psychology 65:285–299.

Laforge, Rolfe 1962 A Correlational Study of Two Personality Tests: The MMPI and Cattell 16 PF. Journal of Consulting Psychology 26:402–411.

Mcgee, Richard K. 1962a Response Style as a Personality Variable: By What Criterion? Psychological Bulletin 59:284–295.

Mcgee, Richard K. 1962b The Relationship Between Response Style and Personality Variables. Part 2: The Prediction of Independent Conformity Behavior. Journal of Abnormal and Social Psychology 65:347–351.

Marlowe, David; and Gottesman, Irving I. 1964 The Edwards SD Scale: A Short Form of the MMPI: Journal of Consulting Psychology 28:181–182.

Mitchell, James V. Jr. 1963 A Comparison of the First and Second Order Dimensions of the 16 PF and CPI Inventories. Journal of Social Psychology 61:151166.

Mooney, Ross L.; and Gordon, Leonard V. 1950 Mooney Problem Check List. New York: Psychological Corporation.

Moore, Bernice M.; and Holtzman, Wayne H. 1965 Tomorrow’s Parents: A Study of Youth and Their Families. Austin: Univ. of Texas Press.

Rogers, Carl R.; and Dymond, Rosalind F. (editors) 1954 Psychotherapy and Personality Change: Coordinated Research Studies in the Client-centered Approach. Univ. of Chicago Press.

Rorer, Leonard G. 1963 The Great Response Style Myth Oregon Research Institute, Research Monograph 3, no. 6.

Sarason, Seymour B. et al. 1960 Anxiety in Elementary School Children. New York: Wiley.

Stephenson, William 1953 The Study of Behavior: Q Technique and Its Methodology. Univ. of Chicago Press.

Taylor, Janet A. 1953 A Personality Scale of Manifest Anxiety Journal of Abnormal and Social Psychology 48:285–290.

Vernon, Philip E. 1964 Personality Assessment: A Critical Survey. New York: Wiley.

Wiggins, Jerry S. 1962 Strategic, Method, and Stylistic Variance in the MMPI. Psychological Bulletin 59:224–242.

Wittenborn, J. R. 1961 Contributions and Current Status of Q Methodology. Psychological Bulletin 58:132–142.


The Minnesota Multiphasic Personality Inventory (MMPI) is a standardized, American test of emotional status and adjustment. It consists of 550 verbal items to which the subject answers true or false about himself. Scores are obtained on 14 scales: 10 that relate to clinical categories and 4 that help provide an estimate of the validity of the test for individual subjects. Scoring involves applying stencils to an answer sheet, either by hand or by various automatic machine methods. A psychograph is made for each subject that provides a score profile, appropriate weights to correct for defensiveness, and separate normative transformations for each sex. In addition, a number of ratios and indexes may be computed for special applications.

The initial derivational steps were undertaken in the late 1930s by Starke R. Hathaway, a psychologist, and J. Charnley McKinley, a neuropsychiatrist. All the basic work was carried out in the Department of Psychiatry and Neurology at the University of Minnesota Hospitals in Minneapolis. Inpatients and outpatients served as subjects in the carefully selected criterion groups, while hospital visitors, friends and relatives of patients on the various wards, made up the normal control group. Each component scale was derived empirically, containing only items that separated known patients from normals; each scale usually passed through several preliminary forms. Most of these scales are described in a series of articles in Welsh and Dahlstrom (1956). The clinical categories that were the original referents of the scales are hypochondriasis, depression, conversion hysteria, psychopathic deviate, masculinity-femininity, paranoia, psychasthenia, schizophrenia, hypomania, and social introversion. In Table 1, these scales are listed in the order in which they appear in the profile (Hathaway & McKinley 1943).

Since its original publication, a large number of special scales and indexes have been developed from the MMPI items. Most of these special procedures, together with a great deal of the data

Table 1 – Basic Minnesota Multiphasic Personality Inventory scales
Validity scales:  
“Cannot say” score? 
Clinical scales:  
Conversion hysteriaHy3
Psychopathic deviatePd4
Social introversionSi0

from which the original scales were derived, have been brought together in An MMPI Handbook (Dahlstrom & Welsh 1960). The MMPI items and methods have also served as points of departure for several other widely used personality instruments, most notably Gough’s California Psychological Inventory (1957), developed with greater emphasis on social-psychological criteria, and Berdie and Layton’s Minnesota Counseling Inventory (1957), devised for high school guidance purposes. In addition, translations of the MMPI have been made into at least 15 different languages (see Dahlstrom & Welsh 1960, appendix N). Most of these versions can be considered to be only at the experimental stage at this time, since linguistic and cultural differences preclude simple translation of the items and routine application of the standard scoring keys. However, in this regard, Horiuchi (1963) showed that the D scale worked surprisingly well in separating clinical cases of depressive disorder from normal controls in her Japanese-language version developed at Doshisha University in Kyoto. French, Italian, and Spanish (Spanish American) versions have been restandardized with sufficient care to warrant their use.

Nature of the scales. As indicated in Table 1, the basic scales are usually grouped into categories of validity and clinical scales.

Validity scales. The main function of the validity scales is to check on the possibility that the test subject may have been unable or unwilling to comply with the test instructions and procedures. Perceptive and conscientious use of these indexes serves to offset many of the widespread objections to personality inventories, which occur because of their reliance upon a subject’s language comprehension, contact with reality, cooperativeness, and personal insight. These validity scales also measure some of the same variance that is measured by the clinical scales. Thus, the L scale reflects personal stability as well as pervasive test defensiveness, the F scale mirrors severity of emotional disturbances and contact with reality as well as poor comprehension of English, and the K scale covaries with prognosis in psychiatric treatment as well as the more subtle tendencies toward self-enhancement or self-excoriation that permeate responses to the inventory. A detailed description of the interpretation of validity scales is available in Dahlstrom and Welsh (1960, chapter 5).

Clinical scales. When they were first published, the basic clinical scales were routinely designated by the names of traditional psychiatric syndromes, together with a one-letter or two-letter abbreviation (see Table 1). As the research literature grew, increasing evidence of nonpsychiatric interpretive implications was accumulated. Usage has shifted from the restrictive semantic referents of these scale names to the letter symbols and, more recently, to the numerical designations. Thus, the hysteria scale which was derived by comparing the responses of conversion hysteria cases with those of normal adults came to be designated simply by the symbol Hy when it became clear that this scale was useful in evaluating anxiety hysteria and a variety of psychosomatic syndromes as well. More recently, as its fuller significance in evaluating impulse control, emotional immaturity, level of insight, susceptibility to hypnosis, and repressive defenses is recognized, it is deemed less biasing to refer to the hysteria scale as simply scale 3. [SeeHysteria.]

In a similar manner the implications of the elevation of the component scores have changed over the years. A score falling two standard deviations above the average for normal subjects of the same sex was rather rarely achieved by normal subjects on any particular scale but typically fell close to the central tendency of patients showing the psychiatric characteristics under consideration; early clinical usage employed an elevation of two standard deviations as a critical level in score interpretations. These test inferences usually were part of a diagnostic decision process and the elevations were judged to be either “clinically significant” or not. This argument was a probabilistic one, whereas the decision was a categorical one involving placement within a normal or pathological group. Subsequent research has clarified the interpretive implications of scores and of score combinations at various absolute elevations in the profile. A detailed description of the interpretation of the clinical scales is available in Dahlstrom and Welsh (1960, chapter 6).

Profiles. More abstract designations of the scales and greater reliance upon scale patterns at particular levels of score elevation have culminated in a system of profile coding and increasing reliance upon configural interpretive summaries (Hathaway & Meehl 1951; Drake & Oetting 1959; Hathaway & Monachesi 1961; Good & Brantner 1961; Marks & Seeman 1963; Gilberstadt & Duker 1965). In one such scheme, most of the information in the score profile is disregarded; only the sequence of relative elevations among the scales and a rough index of absolute elevation are preserved. In such a system, patterns are designated by reference to the highest or two highest scales and for some purposes by reference to low points as well. Within the groupings formed by these criteria, additional distinctions about absolute elevation, validity scale patterns, or related scale sequences may be drawn.

Extended validation. As a behavioral summary, the psychiatric syndrome seems to have been a felicitous choice for a test criterion. Although there are many problems inherent in using them (Meehl 1959), these syndromes occupy a middle ground in degree of abstraction from the behavioral observations upon which they are based. There is by now a large amount of loosely articulated research in psychiatry bearing upon etiology, therapeusis, and prognosis that is centered upon these personality constructs. By their efforts in deriving these scales, both in compiling item groups and in immediately and successfully cross-validating with new-patient groups, Hathaway and McKinley showed that these constructs were workable and at least minimally reliable. By now a long series of publications has substantiated the validational efforts, showing that these scales can similarly separate normals from patients for a range of ages, races, religions, regions of origin, and educational levels. This research has also made clear that there is no mutual exclusiveness among these psychiatric syndromes. Scales separately constructed to distinguish clinical subgroups from normals do not singly also distinguish among the subgroups themselves. Either the patterns among the clinical scales must be used, or special scales must be constructed for these purposes (Rosen 1962).

Useful as the syndrome may be as a summary of behavior, there is a legitimate interest in evaluations that are either more concrete and specific or more abstract and inclusive. MMPI scales derived from syndromes have proved useful in evaluating such specific behaviors as anxiety, hostility, hallucinations, phobias, or suicidal impulses. Similarly, broader psychiatric conceptions such as differentiations of psychosis-neurosis, delinquency proneness (Hathaway & Monachesi 1953; 1963), or prognosis under various forms of psychiatric treatment have been studied.

Broader applications. Perhaps the best index that the criteria initially chosen led to fruitful and productive scale variables has been the rapid extension of this instrument beyond the special preoccupations and concerns of psychiatry to general areas of personality, to psychology as a whole, and to sociology and the other social sciences. Examples of such areas are psychodynamic processes such as repression, projection, and perceptual defense and vigilance; emotional maturity and control; social conformity, popularity, and leadership; political participation, religious affiliation, and occupational selection.

Research has also appraised the impact of a variety of personal characteristics and demographic variables on MMPI performance. Thus, while it has been known from the outset that the sex membership of the test subject influenced the component scales sufficiently to justify separate norms on six scales (1, 2, 3, 5, 7, and 8) in the profile, subsequent work has also revealed that even with separate normative tables systematic variations appear in the frequencies of code configurations (see Dahlstrom & Welsh 1960, appendix M). The reported differences are quite consistent with the long-established sex differences in various psychiatric morbidity trends.

Similarly, age trends have been studied, the most detailed analyses being made in older adult age ranges. Recently, a cross-sectional comparison has been made between adolescent and young adult levels and a start made upon some longitudinal follow-up studies (Hathaway & Monachesi 1963). [SeeIndividual differences, article onsex differences.]

Two different approaches have been made to analyze the impact of social class status on MMPI responses. Most of the work has been based upon family origin, or status bestowed on the test subject by his parents, such as father’s education, father’s occupational level, or family income level. Thus, Gough (1948) and others have studied differences in test patterns among groups of subjects representing different family statuses. They have demonstrated that there is greater defensiveness in the test responses of subjects from higher status families. Gough developed a scale for this dimension on the basis of test data from high school students; some of the early work indicated that the scale was sensitive to subsequent shifts in social class level. That is, if a student could not manage to live and produce at the social level of his own family, one would predict that he would subsequently occupy a lower status. A second approach was employed by Nelson (1952), who used the earned status of his young adult patients as the basis for determining class level, combining data on their educational and occupational levels. His classifications correlated well, but by no means identically, with Gough’s for a sample of general psychiatric patients. He also noted different profile patterns, diagnostic trends, and prognostic outcomes covarying with his socioeconomic classification.

Critique. In some contrast to the generally successful and encouraging results from this wide range of application of the MMPI to practical problems in personality appraisal, at least two major lines of criticism of the instrument and the method should be discussed. One of these patterns of criticism has grown out of the research of Edwards (1957), Jackson and Messick (1958), and Campbell and Fiske (1959) and their coworkers. According to Campbell and Fiske, the total variance of a set of test scores can logically be subdivided into methods variance (variability inherent in the test) and trait variance (variability in the personality attribute under consideration). Using a similar framework Messick and Jackson have advanced the formulation that methods variance can be conceptualized in terms of various stylistic features of test performance, such as response biases. These include preferences for a particular response, e.g., true (or false) to an excessive number of items, acquiescence (or conformity) responses, personally or socially desirable responses (responding to items in terms of possible adverse implications rather than in terms of fact; Edwards 1957) and other test-oriented sets and predispositions. These authors contrast these response sets with valid, content-related endorsements of inventory items, the trait variance in the Campbell–Fiske formulation. To the extent that test scores reflect style variance, they provide less valid information, presumably, about the actual content referents of the items in the various scales. Special scales constructed to evaluate these sources of “invalidity,” by both Edwards (1957) and Jackson and Messick (see Messick & Ross 1962), have shown high correlations with component MMPI scales. While the issues here are complex (see Dahlstrom 1962; Block 1965), one limitation of the approach that these workers employ can be readily discerned: there is no reason to believe that MMPI-like personality scales, empirically derived against external criteria, necessarily furnish their discriminations solely on the basis of content of the test items. Although much of the information used in the work of these scales may indeed be stylistic (i.e., relying upon identifiable features of a person’s approach to a test situation) it may nonetheless be valuable in characterizing a particular personality attribute. So, when high correlations are obtained between empirical scales and stylistic scales, the common sources of variance reflected in the product-moment correlations need not be interpreted as useless, nondiscriminating information bound to a particular inventory or type of test. [SeeExpressive behavior; Response sets.]

A sober re-examination of the pragmatics of MMPI usage is well represented by the work of Little and Shneidman (1959). From judges experienced in particular testing techniques, blind clinical interpretations were obtained of various test protocols from 12 different subjects. These interpretations included clinical diagnoses, ratings of degree of maladjustment, Q sorts, and true–false statements about the behavioral and adjustment patterns of the subjects. Some of the results were probably biased conservatively by limiting features of the Q decks and the statements covering background experiences; nevertheless, there was generally a disappointing degree of correspondence between these blind test interpretations and similar descriptions by judges who knew the case histories. While the results based on MMPI profiles were generally not much better or worse than the interpretations based on other tests (Rorschach, Thematic Apperception Test, Make-a-Picture-Story ), the absolute level of success in these psychodiagnostic endeavors was by no means reassuring. Thus, it appears that even though various MMPI indexes are useful when applied to separate and discrete decisions about patients or clients, there remains unresolved the additional problem of integrating the multitude of possible implications of a particular profile and producing a coherent and accurate personological summary of that test subject. Some investigators have begun to work on this crucial problem; most of the relevant MMPI research has been devoted to combinatorial patterns of the MMPI scores and their interpretive implications (see Dahlstrom & Welsh 1960, appendix M). In addition to the problem of finding the best way to combine MMPI results to reach valid and discriminating psychodiagnostic summaries there also remains the broader problem of combining MMPI score data with nontest information on a particular person. Since the nonpsychometric information is generally less reliable and contains data of unknown degrees of redundancy with the profile information, it is even more difficult to collate it with MMPI variables for the common purpose of accurate depiction of personality. One promising approach, exemplified by Kleinmuntz’ pioneering efforts (1963), is the utilization of high-speed computers for the storage, retrieval, and most efficient integration of personological variables, once the first-order relationships have been discerned and made explicit. In this way the limitations of the human clinician may be in part bypassed for this routine activity (see Meehl 1963).

W. Grant Dahlstrom

[Other relevant material may be found inClinical psychology; Interviewing, article onpersonality appraisal; Projective methods; Psychometrics; Scaling.]


Berdie, Ralph F.; and Layton, Wilbur L. 1957 Minnesota Counseling Inventory. New York: Psychological Corp.

Block, Jack 1965 The Challenge of Response Sets: Unconfounding Meaning, Acquiescence, and Social Desirability in the MMPI. New York: Appleton.

Campbell, Donald T.; and Fiske, Donald W. 1959 Convergent and Discriminant Validation by the Multitrait-Multimethod Matrix. Psychological Bulletin 56: 81–105.

Dahlstrom, W. Grant 1962 Commentary: The Roles of Social Desirability and Acquiescence in Responses to the MMPI. Pages 157–168 in Samuel Messick and John Ross (editors), Measurement in Personality and Cognition. New York: Wiley.

Dahlstrom, W. Grant; and Welsh, George S. 1960 An MMPI Handbook: A Guide to Use in Clinical Practice and Research. Minneapolis: Univ. of Minnesota Press.

Drake, Lewis E.; and Oetting, Eugene R. 1959 An Mmpi Codebook for Counselors. Minneapolis: Univ. of Minnesota Press.

Edwards, Allen L. 1957 The Social Desirability Variable in Personality Assessment and Research. New York: Dryden.

Gilberstadt, Harold; and Duker, Jan 1965 A Handbook for Clinical and Actuarial MMPI Interpretation. Philadelphia: Saunders.

Good, Patricia E.; and Brantner, John P. 1961 The Physicians Guide to the MMPI. Minneapolis: Univ. of Minnesota Press.

Gough, Harrison G. (1948) 1956 A Scale for a Personality Dimension of Socioeconomic Status. Pages 187–194 in George S. Welsh and W. Grant Dahlstrom (editors), Basic Readings on the MMPI in Psychology and Medicine. Minneapolis: Univ. of Minnesota Press. → First published in Volume 13 of the American Sociological Review as “A New Dimension of Status: 1. Development of a Personality Scale.”

Gough, Harrison G. 1957 The California Psychological Inventory: Manual. Palo Alto, Calif.: Consulting Psychologists Press.

Hathaway, Starke R.; and Briggs, Peter F. 1957 Some Normative Data on New MMPI Scales. Journal of Clinical Psychology 13:364–368.

Hathaway, Starke R.; and Mckinley, J. Charnley (1943) 1951 The Minnesota Multiphasic Personality Inventory: Manual. Rev. ed. New York: Psychological Corp.

Hathaway, Starke R.; and Meehl, Paul E. 1951 An Atlas for the Clinical Use of the MMPI. Minneapolis: Univ. of Minnesota Press.

Hathaway, Starke R.; and Monachesi, Elio D. (editors) 1953 Analyzing and Predicting Juvenile Delinquency With the MMPI. Minneapolis: Univ. of Minnesota Press.

Hathaway, Starke R.; and Monachesi, Elio D. 1961 An Atlas of Juvenile MMPI Profiles. Minneapolis: Univ. of Minnesota Press.

Hathaway, Starke R.; and Monachesi, Elio D. 1963 Adolescent Personality and Behavior: MMPI Patterns of Normal, Delinquent, Dropout, and Other Outcomes. Minneapolis: Univ. of Minnesota Press.

Horiuchi, Haruyo 1963 An Evaluation of Clinical Depression by Means of a Japanese Translation of the MMPI. Unpublished manuscript, Univ. of North Carolina.

Jackson, Douglas N.; and Messick, Samuel 1958 Content and Style in Personality Assessment. Psychological Bulletin 55:243–252.

Kleinmuntz, Benjamin 1963 MMPI Decision Rules for the Identification of College Maladjustment: A Digital Computer Approach. Psychological Monographs 77, no. 14.

Little, Kenneth B.; and Shneidman, Edwin S. 1959 Congruencies Among Interpretations of Psychological Test and Anamnestic Data. Psychological Monographs 73, no. 6.

Marks, Philip A,; and Seeman, William 1963 Actuarial Description of Abnormal Personality: An Atlas for Use With the MMPI. Baltimore: Williams & Wilkins.

Meehl, Paul E. 1959 Some Ruminations on the Validation of Clinical Procedures. Canadian Journal of Psychology 13:102–128.

Meehl, Paul E. 1963 Foreword. Pages vii–x in Philip A. Marks and William Seeman, Actuarial Description of Abnormal Personality: An Atlas for Use With the MMPI. Baltimore: Williams & Wilkins.

Messick, Samuel; and Ross, John (editors) 1962 Measurement in Personality and Cognition. New York: Wiley.

Nelson, Sherman E. 1952 The Development of an Indirect, Objective Measure of Social Status, and Its Relationship to Certain Psychiatric Syndromes. Ph.D. dissertation, Univ. of Minnesota.

Rosen, Albert 1962 Development of MMPI Scales Based on a Reference Group of Psychiatric Patients. Psychological Monographs 76, no. 8.

Welsh, George S.; and Dahlstrom, W. Grant (editors) 1956 Basic Readings on the MMPI in Psychology and Medicine. Minneapolis: Univ. of Minnesota Press.


A situational test is “a measure of a person’s reaction to a situation that requires an actual adaptive response, rather than a mere ‘test’ response. The situation may be contrived by the examiner but must be recognized as posing a real problem to be solved, independent of its status as a test” (English & English 1958, p. 504). In this definition the key phrases differentiating situational tests from other personality tests are “posing a real problem” and “requiring an actual adaptive response.” Thus the ideal situational test would be one that simulates some significant aspect of an individual’s environment (e.g., his superior reprimanding him for poor performance) and evokes typical coping behavior without the subject’s being aware that he is undergoing testing.

It is interesting that laymen seem to evaluate personality by observing and making inferences about everyday behavior, much as psychologists do with situational tests. Thus, if a friend seems always to organize the games and activities of a social event, we may conclude that he is authoritative and perhaps exhibitionistic; or if we notice that a fellow office worker maintains a careful arrangement of objects on his desk, we may conclude that he is orderly and very controlled.

Clinical psychologists, on the other hand, when confronted with the task of describing and predicting the behavior of a person, traditionally base their evaluations upon samples of verbal behavior elicited by questionnaires, pictures, or inkblots. However, psychologists are becoming more aware that a fairly large difference seems to exist between the content and meaning of the verbal behavior observed through use of traditional tests and the content and meaning of the behavior about which they must make judgments and predictions. Because of this difference, inferences about real-life behavior, drawn only from samples of verbal-test behavior, are now being recognized as hazardous and difficult to make. Moreover, the clinician is reminded that fantasy behavior and self-description may not necessarily have a direct relation to overt behavior, and it is suggested that the clinician use tests that will duplicate the general factors underlying the day-to-day demands of the criterion situation (Cronbach 1956). In this connection it is both interesting and significant that expert clinicians are making more use than previously of so-called extratest behavior, for example, by nothing that a patient forcefully throws together blocks he has been asked to assemble (suggesting aggressiveness and impulsiveness) or that a patient exclaims upon being shown an inkblot, “Boy, you’re really trying to put something over on me with this one!” (suggesting suspiciousness).

Since a major goal of personality assessment is to explain and predict overt behavior and since a person’s verbal responses to questionnaires or inkblots may spring from levels of personality quite different from those which are activated when he is coping with a real-life situation, psychologists find themselves in a dilemma. One solution to the problem may be for psychologists systematically to observe actual coping behavior, as well as fantasy and self-report behavior. The method of situational testing is ideally suited for this purpose. Situational testing is intended to provide settings and tasks closely approximating those encountered in the normal course of daily living, thereby yielding samples of a person’s characteristic ways of coping with real-life demands and of the personality traits and structures implicated in these coping efforts.

Definition of coping behavior Coping behavior is considered goal-directed, adaptive, and instrumental. Implicit in the concept of coping is the assumption that when an individual is confronted with an object, person, or demand of significance to him, he experiences some disruption in his psychological equilibrium along with the need to readapt himself, that is, to re-establish his equilibrium. In order to re-establish his steady state, the individual calls upon a variety of combined personality and intellectual resources by means of which he physically acts upon, manipulates, or avoids the object confronting him in ways that bring about some adaptive resolution. Although coping behavior can be influenced by many factors, it is always assumed to be determined, in part at least, by an individual’s needs, drives, goals, personality structure, and history; the nature of these determinants influences the kinds of changes he effects in person-object relationships. Therefore, the goal of coping behavior is generally assumed to be need gratification, threat reduction, and mastery in achieving a readaptation. When coping, the individual responds to actual physical stimuli of some potential significance to him, rather than to conceptually defined or artificial stimuli.

As outlined here, then, so-called expressive aspects of behavior (e.g., handwriting), psychomotor behavior, and physiological and reflex responses are not considered coping behavior. Verbal behavior elicited by inkblots or questionnaires is also excluded. Verbal responses, however, may qualify in certain instances as coping behavior. For example, if an individual actually fails a task and then blames his failure on the make-up of the test, the verbal blaming response is considered to be part of the need-determined coping behavior.

History of situational testing


One of the earliest significant statements concerning performance testing of personality was made by Francis Galton (1884). After concluding that the character that shapes our conduct is a durable entity and therefore measurable, Galton proposed that it be assessed by observing definite acts in response to particular situations. However, he noted that such observation need not wait for chance happenings in real life that would elicit significant behavior; he pointedly stated, “Emergencies need not be waited for, they can be extemporized; traps, as it were, can be laid” (p. 182). Substituting the term “test” for “trap,” we find that Galton was advocating situational testing. He discussed briefly one of his efforts at setting up “traps.” After observing that friends who “have an ‘inclination’ to one another [seem to]… incline or slope together when sitting side by side” (p. 184), Galton devised a pressure gauge to be placed on the legs of chairs in an effort to test this hypothesis.

Galton’s proposal that personality be measured through carefully recorded acts representative of usual conduct received some attention in the years that followed. The work from 1900 to 1930 is reviewed by Symonds (1931). Perhaps the most significant in this period is the work of P. E. Voelker (see pp. 302–304, 306, 308 in Symonds 1931), whose situational tests of “honesty” were later applied with little modification by many workers, including Hugh Hartshorne and Mark A. May (1928) in their now classic Studies in Deceit. For example, one test measuring cheating consisted of a sheet of cardboard on which were printed ten circles of varying diameters. The subject was instructed to close his eyes and try to write in each circle the appropriate number (one to ten). Successful subjects obviously had cheated by opening their eyes while performing, since the task is impossible to accomplish otherwise. [SeeMoral development.]

Expressive behavior

The early efforts discussed above gave way in the 1930s to studies of expressive features of personality. Although research with expressive movements represented a change in focus, it nevertheless made a contribution to the history and development of situational testing by bringing systematic attention to the importance of nonverbal behavior in personality assessment. The suggestion by Allport and Vernon (1933) that a person’s forms of expression (e.g., body postures assumed, type of gait, manner of speaking) are determined by, and therefore reveal, personality was followed by many studies. Werner W. Wolff (1942), who made the largest contribution to this area, used a method in one study closely approaching that of situational testing. He proposed that by observing how a child punched a balloon, one could determine whether he was aggressive or insecure and by observing a child manipulating a jar of cold cream, one could determine whether he reacted cautiously, carelessly, or fearfully to his environment. [SeeExpressive behavior.]

Objective tests

In the early 1940s there appeared a group of workers who contributed more directly to the technique and concept of situational testing and who distinguished themselves by specifically challenging the adequacy of the inventories and projective devices rapidly growing in popularity as instruments of personality assessment. These investigators argued that the behavior measured in personality assessment should include not only introspective, self-assessing, verbal responses but also responses to objective tests not dependent upon the judgment of scorers, in which the individual would be unaware of the diagnostic meaning of his performance. R. B. Cattell (1957) and his colleagues have clearly made the largest contribution in this area. Their aim has been to establish a complete taxonomy of personality and to construct a battery of methods that will measure personality in terms of this taxonomy. Their strategy has been to collect observations from three types of sources, observations based on (1) ratings of life in situ and life records; (2) questionnaires (self-report); and (3) objective tests. Then by means of factor analysis, they have attempted to identify the major dimensions along which individuals differ in each of these realms of behavior and to compare and relate dimensions found in one realm with those found in another. Only the objective tests developed by Cattell have relevance for the topic of situational testing. Some four hundred of these objective tests have been used with various clinical groups and age levels. Three methods are employed: pencil and paper tests; physiological and psychomotor tests; and miniature situational tests. As an example of the first, the subject notes whether he agrees or disagrees with a statement presented to him along with an indication of the percentage of persons who agree with it; the score reflects one’s tendency to agree with the majority. An example of a situational test is the “impairment of performance by fright test.” Here the individual performs the first of three equally difficult finger mazes under normal conditions, the second while he is shocked periodically, and the third while a cage containing a rat and several cockroaches stands adjacent to the maze. Reviews of factor-analytic studies using these objective measures are available (e.g., Cattell & Scheier 1958). In Great Britain, Eysenck and his colleagues (1947) have also made substantial contributions to the development of objective tests of personality, their research strategy and technique broadly resembling those employed by Cattell in the United States. In addition, estimation of the sizes of squares (Krathwohl & Cronbach 1956) has been suggested as a response that reveals personality. This type of research can be credited with having brought to attention the limitations of the exclusive use of verbal responses, the need for objectivity in scoring, and the usefulness of factor analysis for personality assessment. However, from another standpoint many of these studies are themselves limited, because the experiences or response processes activated in the subject by many of the objective tests used may be unrelated to responses evoked by some real-life situation of potential interest to the student of personality.

Simulation of life situations

Also during the 1940s, a fourth phase developed in the history of situational testing. The subject was presented with a situation constructed by the investigator to simulate as much as possible some real-life criterion; judges rated the degree to which the subject manifested personality traits presumed to be critical for handling the situation.

Max Simoneit is credited (see Kelly 1954) with the first attempt to use situational tests, which he developed to assess the suitability of individuals for various military assignments in the German army. Believing that military success depended on an individual’s personality, Simoneit began by listing the traits that distinguished the personalities of German military heroes. He then devised situations that were likely to evoke these traits and administered them to officer candidates. This program influenced the one set up by the British War Office Selection Board, which, in turn, influenced the program set up by the U.S. Office of Strategic Services (1948) in the 1940s. The mission of the OSS was to select and maintain a network of agents who would conduct destructive operations behind enemy lines during World War II. The battery of tests used to screen candidates included situational tests for assessing such variables as emotional stability, leadership, and skill in observing and reporting. For example, candidates were asked to direct two recalcitrant workmen (played by actors) in the construction of a giant cube and to supervise the movement of equipment across a brook.

After the war, the OSS found that its test ratings did not correspond with overseas staff appraisals of candidates but noted that some of the contributions of the method could not be assessed statistically because war conditions prevented the gathering of adequate criterion data.

The OSS studies brought attention to, and stimulated interest in, the method of situational testing during the first years after World War II. Weislogel (1954) discussed the continued use of situational testing by the military. Kelly and Fiske (1951) used situational tests to investigate the selection of students for professional training in clinical psychology. For example, groups of students were asked to place large cement blocks in specified arrangements, or a student was asked to interview a teacher (played by another student) whose sexual conduct had been called into question. After examining their data statistically, Kelly and Fiske reported situational tests contributed little, if anything, to the predictions made.

The postwar years also saw the emergence of variants of situational tests. The leaderless-group discussion technique, also originated by a German psychologist, J. B. Riefert, in the 1920s, was extensively investigated in the United States (e.g., Bass 1954) and in Britain (Higham 1952), primarily as a means of selecting personnel for industry. Leadership potential was measured by asking a group of subjects to carry on a discussion about a topic usually determined by the criterion situation, and the amount of leadership each subject displayed was rated. Role playing, another variant, was also explored and developed (Mann 1956).

After the postwar years, situational testing, in contrast to projective tests and questionnaires, failed to gain momentum and popularity as a method of personality assessment. This seemed to be due in part to the failure of the OSS and clinicalpsychology programs to report statistical support for the predictive power of situational tests, in part to problems inherent in the method. These problems are discussed below after a brief review of current applications of situational testing.

Current applications

Military applications

Perhaps the greatest attention currently being paid to situational testing is by the U.S. military. Tupes and his colleagues (1958), for example, employed situational tests designed to simulate the present-day job role of an air force officer. In one test the candidate is asked to play the role of a squadron commander confronted by three pilots (played by actors), all of whom wish a leave at a time when only two can be spared. In another, the candidate interviews an actor playing the role of an irate mayor in whose town airmen on leave have recently been aggressive and abusive; and in another, small groups of candidates, placed in a prison compound, are asked to escape. Measures obtained from these various situations related significantly to criterion measures obtained six months after candidates were graduated. Kipnis (1962) predicted job performance of U.S. naval personnel with a “handskills test” intended to measure an individual’s motivation to persist beyond minimum standards on a tiring task. The examinee, who views the test as a measure of hand and finger dexterity, pencils tally marks, a task that rapidly promotes hand and arm fatigue. The examiner tells the subject what a passing score is (established empirically) and notes whether the subject stops or slows down after reaching the announced passing score or continues to strive.

General applications

In civilian psychology, situational testing is receiving isolated but concentrated attention from several workers. Lois B. Murphy and her colleagues (1962) have conducted a series of studies designed to identify the “coping styles” children use to handle demands, new opportunities, and the problems they experience throughout development. Murphy uses various naturalistic situations to elicit adaptive responses; for example, children are observed during pediatric examinations, at the zoo, at parties, and while being transported to and from the clinic.

Miniature situations test. Santostefano (1960) developed a Miniature Situations Test (MST) in an effort to make available a method that would overcome some of the disadvantages of the naturalistic assessment situation, while nevertheless providing samples of nearly real-life behavior for systematic study. Under the MST method, a child is presented with two objects and invited to act upon one of them as specified (e.g., “You may either water this plant or break this light bulb”; “You may either see how strong you are by squeezing this [a dynamometer] or raise this flag in honor of Abraham Lincoln”). The child is urged to make his choice quickly and is encouraged to make use of feelings he experiences when confronted with the objects. This method assumes that the child will selectively perceive and respond to the objects presented, endowing them with positive, negative, or neutral incentive values according to the dominant motives and personality attributes unique to him and that he will reveal some aspect of his personality through the objects he values or rejects and through the changes he physically imposes upon them. It is also assumed that the choices presented will activate genuine affective and motive states within the child and that although the nature of the instrumental acts required by various situations may vary, the acts are nonetheless related to or derived from the same motive state. Thus, for example, if a child chooses, in several situations, to water a plant, to repair torn paper with Scotch tape, and to help the examiner put on a pair of gloves, it is assumed that his personality functioning is very much characterized by the disposition to bestow affection and to nurture.

The MST, in comparison to the freer naturalistic method, seems to have several advantages for personality research. While allowing the free expression of personality dimensions in situations requiring the subject to cope with people and objects, the MST (1) provides identical and restricted stimuli to all subjects, giving some assurance that each subject will become equally involved with the stimulus configuration designated as crucial; (2) produces unequivocally identifiable responses; (3) yields measures whose meanings are not completely dependent on the judgments of raters; (4) elicits a wide sampling of personality dispositions represented in the choices performed; (5) makes available, in addition to specific acts, a wide range of behaviors, encapsuled, however, in brief time samples and available for scrutiny and systematic analysis; and (6) is feasible for use in ordinary research settings.

Factor-analytic studies have contributed to the construct validity of the miniature situations, as have studies comparing the performance of various clinical groups (e.g., orphans, delinquents, brain-damaged children) with controls. The job effectiveness and conscientiousness of rehabilitated mentally retarded young adults have also been predicted with promising success.

Leaderless-group discussion. The leaderless-group discussion technique is one variant of situational testing that has continued, since the 1940s, to receive increasingly widespread use in the United States (in the military and industrial spheres), and in England, Australia, and Germany. In one study (Kiessling & Kalish 1961), for example, groups of applicants to the Honolulu Police Academy were asked to assume that they were alone on foot, patrolling an area noted for “cop haters,” when they notice two delinquents stealing a tire from a car and suddenly find themselves surrounded by many other delinquents. The group is asked to discuss what one should do and why, while raters evaluate each candidate.

Clinical applications

Recently other variations of the situational-testing technique have been proposed to study questions of interest to clinicians. For example, Lerner (1963) noted that recent social-psychological studies, whose results support the traditional notion that schizophrenics are unresponsive and autistic, have used artificial stimuli, such as asking patients to respond to stick-figure drawings to determine their capacity to perceive cues in social roles. Lerner, instead, asked groups of schizophrenics to clean a ward and rated their behavior in terms of, for example, verbal interaction and dependency on the leader. He reported that schizophrenics exhibit appropriate social motivation and responsiveness when behaving in a meaningful situation. Similarly, Schulman and his colleagues (1962) have proposed situational testing for investigating parent-child interaction.

Non-English-speaking countries

The author finds little evidence that the technique of situational testing is in use in non-English-speaking countries.

Japan. A recent review (Kodama 1957) of personality tests used in Japan indicates a clear preference for projective and questionnaire devices borrowed from those most popular in the United States. One performance test of personality developed by Japanese psychologists, and apparently widely used, makes inferences about “excitation, adaptation, and temperament” from an individual’s continuous performance with addition problems; but it fails to qualify as a situational test because there seems to be no intent to approximate, in the test situation, behavior associated with some reallife criterion.

The Soviet Union. Recent surveys of Soviet psychology indicate that techniques for assessing individual differences and personality are not of central interest to Russian psychologists, who are concerned typically with studying intellectual abilities or aptitudes. However, Brozek (1962) notes that one of the distinctive features of Soviet psychology is its emphasis on studying psychological processes under natural conditions. He points out that Russian investigators tend to see abilities (e.g., scientific, technical, literary) as integral components of personality and that one preferred method of studying these abilities is to observe the performance of individuals in “natural experiments,” a method highly regarded by Soviet psychologists as an approach most true to life. According to this point of view, observation of the way in which a pupil solves tasks that are presented to him enables the investigator to assess best the psychological characteristics implicated in various abilities. The resemblance between this methodological philosophy and that of situational testing is noteworthy.

Germany. In spite of the fact that situational testing originated in Germany, the writer has found little evidence that the technique, except for the variant of group discussion (Rundgespraech, or round-table talk), is currently widely used in that country (Meili 1937). In a critique of psychological tests Simoneit (1954) argues that increased objectivity and quantification in personality assessment has resulted in less psychological meaningfulness and that the commonly used projective tests are inadequate in evaluating a person’s true feelings; he proposes situational testing and the study of expressive behavior as a remedy to this problem.

Implications for theory and practice

Considerations in devising and using tests

Three main procedural steps should be followed in devising and using situational tests. The first step is to select and describe the personality characteristics that are critically implicated in the criterion under investigation. The second is to construct tasks that will elicit these particular characteristics; and the third is to choose a method of observation and scoring. These steps are largely interrelated. The more detailed the analysis and understanding of the criterion situation and of the personality characteristics to be isolated and observed, the better one can construct situations to elicit these traits in ways which are identifiable and scorable. The past failure of situational tests to relate to criteria can be attributed, to a large extent, to the fact that workers have not specified in sufficient detail those aspects of the criterion selected as critical (e.g., Stern, Stein, & Bloom 1956). Thus, if situational tests are being used to assess effective parental performance, it should be made clear whether “effective” means setting limits, giving emotional comfort, or expressing an interest in the child’s activities. The situations devised by Tupes and his colleagues (1958) are good examples of tests derived from a detailed understanding of the criterion.

Several additional general considerations concerning the make-up and use of situational tests are worth noting. Situational tests are intended to be lifelike and therefore should evoke adaptive responses as isomorphic as possible with the behavior and feelings required by the criterion situation. The tasks should be structured in such a way that each subject experiences the same need to respond and to become involved and each is unaware of the specific psychological meaning of his behavior and the intent of the situation. Ideally, scoring should be confined to whether the personality characteristic under question appears or not; if degrees of appearance of a trait are desired, the gradations should be clearly defined in terms of identifiable forms of behavior. Situational testing can be a costly technique because of the elaborate staging, material, and the use of actors and observers sometimes required. Lastly, it is important to recognize that compromises are available between the degree of experimental control and of lifelikeness built into the situational test. The more naturalistic the setting and observations, the closer the criterion is approximated; but at the same time, the more expensive the test, the greater the problem of insuring that subjects will become involved with the same stimuli and the more difficult it is to identify critical responses. The less naturalistic the test, the greater the degree of experimental control over stimuli and responses.

The concept of levels

In the investigation and clinical assessment of personality, one major goal is to learn about the psychological organization of an individual. For some time this organization has been viewed by psychologists as containing levels (e.g., conscious–unconscious, latent–manifest, depth–surface). Recently much attention has been given to the relation between levels of personality functioning tapped by traditionally used projective and inventory devices and levels activated by situations in everyday living. This interest seems to have grown out of the fact that psychologists continue to experience failure and frustration in their efforts to assess and predict personality functioning by using projective devices and inventories. Lindzey and Tejessy (1956), in discussing this issue, note that perhaps the most perplexing of the many problems facing the user of projective techniques is the difficulty in assigning an inference made about the patient to some specific level of his behavior; that is, will the attribute inferred from the test response be expressed freely in a public setting, or will it be revealed in rarely encountered circumstances, or only in fantasy? Similarly, workers have wondered whether motive states are necessarily reflected in fantasy and whether fantasies are always mirrored in overt behavior; their research findings are equivocal. Questions and concerns such as these have led some workers to develop a theoretical system dealing with levels of personality (e.g., Leary 1957), defining and distinguishing between public, preconscious, and unconscious levels of functioning and their interrelationships, while others have been led to propose that the psychologist use as data only behavior that is as much like the criterion as possible (e.g., Cronbach 1956). The task facing personologists is to discover which test indices obtained at the fantasy or self-report levels are related to coping behavior and then to learn the systems and principles governing the interrelationships among these various links. Relating standardized social-situational tests to projective and questionnaire devices could contribute to a solution of the problems raised in the consideration of levels of personality.

The criterion problem

Since the 1930s much effort has been devoted to improving the predictions that can be made from various test devices. Recently, however, workers have expressed dissatisfaction over this one-sided enterprise and the primarily negative results it has produced. There is a growing awareness that solutions to the problem of predicting behavior depend as much upon a definition of the criterion situation (the relevant types of behavior and personality traits it evokes) as upon the development of test instruments. The main goal of assessment is prediction relevant to situations of daily living. But naturalistic situations seldom contain easily defined, well-controlled stimulus configurations that are critically implicated in the evocation of behavior considered adaptive in terms of the demands of the situation. Discussion of the criterion problem (Stern, Stein, & Bloom 1956), when combined with discussion of the philosophy and technique of situational testing, provides psychologists with an approach to constructing adequate criterion stimuli against which personality instruments can be tested, an approach, interestingly enough, which closely parallels that used by Simoneit when first devising situational tests. If, for example, an investigator is interested in developing the power of a projective test to predict a child’s ability to handle the stresses of the first days of school, he could develop his criterion by observing children in the first days of school and by interviewing teachers, in an effort to isolate those aspects of a child’s personality that appear critical in handling the demands of this situation. Situational tests could then be constructed to evoke systematically and reliably those particular personality dimensions, providing adequate criteria with which the investigator could explore the predictive powers of his instrument.

Current research concerned with the effects of television on the aggressive behavior of individuals nicely illustrates how failure to define and develop a criterion may lead to inconsistent findings. After showing subjects an aggressive film, one worker (Feshbach 1961) observed the associations they made to aggressive and neutral words, while others (Mussen & Rutherford 1961) observed answers to questions about the desire to aggress. Perhaps it is not surprising, then, that some studies report a decrease in “aggressive behavior” after participation in fantasied aggression and other studies report an increase. It would seem that situational tests, which would provide the subject with the opportunity to aggress overtly and forcefully in terms of well-defined objects and circumstances, would provide a better test of the question whether fantasied aggression leads to an increase in behavioral aggression.

The technique and philosophy of situational testing have been traced historically and related to issues that concern theory, research, and clinical practice in personality assessment. What is the future for situational testing? Some workers have voiced the opinion that its future is dim. However, this writer feels that it is more than a bit paradoxical that psychologists are quick to reject the worthwhileness of test behavior closely resembling aspects of daily living, while at the same time they cling comfortably to their projective devices and inventories, which primarily evoke words as far removed from real-life conduct as the inkblots and pictures themselves. It would seem that the dominating influence of behaviorism and logical positivism has cultivated, at least among American psychologists, a preference for the sterile laboratory as a research setting and a fetish for test instruments, however far removed from the conditions of real life, an influence which seems to be spreading to personality-assessment practices in other countries. However, it is encouraging to note that dissenting voices are growing louder, as are pleas that behaviorism not frighten social scientists away from the study of naturalistic behavior. The survey given here suggests that much of the groundwork has been laid for the development of promising methods that could aid social scientists in predicting the day-to-day behavior of individuals. Use of the situational-test technique might even make the assessing of personality as exciting as Galton’s first attempt.

Sebastiano Santostefano

[Other relevant material may be found inExperimental design; Problem solving; Projective methods; Psychometrics.]


Allport, Gordon W.; and Vernon, Philip E. 1933 Studies in Expressive Movement. New York: Macmillan.

Bass, Bernard M. 1954 The Leaderless Group Discussion. Psychological Bulletin 51:465–492.

Brozek, Joseph 1962 Current Status of Psychology in the U.S.S.R. Annual Review of Psychology 13:515–566.

Cattell, Raymond B. 1957 Personality and Motivation Structure and Measurement. New York: World.

Cattell, Raymond B.; and Scheier, Ivan II. 1958 The Objective Test Measurement of Neuroticism, U.I. 23: A Review of Eight Factor Analytic Studies. Indian Journal of Psychology 33:217–236.

Cronbach, Lee J. 1956 Assessment of Individual Differences. Annual Review of Psychology 7:173–196.

English, Horace B.; and English, Ava C. (1958) 1962 A Comprehensive Dictionary of Psychological and Psychoanalytical Terms: A Guide to Usage. New York: McKay.

Eysenck, Hans J. 1947 Dimensions of Personality. London: Routledge.

Feshbach, Seymour 1961 The Stimulating Versus Cathartic Effects of a Vicarious Aggressive Activity. Journal of Abnormal and Social Psychology 63:381–385.

Galton, Francis 1884 Measurement of Character. Fortnightly Review New Series 36:179–185.

Hartshorne, Hugh; and May, Mark A. 1928 Studies in Deceit. 2 vols. New York: Macmillan. → Volume 1: General Methods and Results. Volume 2: Statistical Methods and Results.

Higham, Martin H. 1952 Some Recent Work With Group Selection Techniques. Occupational Psychology 26:169–175.

Kelly, Everett L. 1954 The Place of Situation Tests in Evaluating Clinical Psychologists. Personnel Psychology 7:484–492.

Kelly, Everett L.; and Fiske, Donald W. 1951 The Prediction of Performance in Clinical Psychology. Ann Arbor: Univ. of Michigan Press.

Kiessling, Ralph J.; and Kalish, Richard A. 1961 Correlates of Success in Leaderless Group Discussion. Journal of Social Psychology 54:359–365.

Kipnis, David 1962 A Noncognitive Correlate of Performance Among Lower Aptitude Men. Journal of Applied Psychology 46:76–80.

Kodama, Habuku 1957 Personality Tests in Japan. Psychologia 1:92–103.

Krathwohl, David R.; and Cronbach, Lee J. 1956 Suggestions Regarding a Possible Measure of Personality: The Squares Test. Educational and Psychological Measurement 16:305–316.

Leary, Timothy 1957 Interpersonal Diagnosis of Personality: A Functional Theory and Methodology for Personality Evaluation. New York: Ronald Press.

Lerner, Melvin J. 1963 Responsiveness of Chronic Schizophrenics to the Social Behavior of Others in a Meaningful Test Situation. Journal of Abnormal and Social Psychology 67:295–299.

Lindzey, Gardner; and Tejessy, Charlotte 1956 The matic Apperception Test: Indices of Aggression in Relation to Measures of Overt and Covert Behavior. American Journal of Orthopsychiatry 26:567–576.

Mann, John H. 1956 Experimental Evaluations of Role Playing. Psychological Bulletin 53:227–234.

Meili, Richard (1937) 1961 Lehrbuch der psychologischen Diagnostik. 4th ed., rev. & enl. Bern (Switzerland) : Huber.

Murphy, Lois B. 1962 Widening World of Childhood: Paths Toward Mastery. New York: Basic Books.

Mussen, Paul; and Rutherford, Eldred 1961 Effects of Aggressive Cartoons on Children’s Aggressive Play. Journal of Abnormal and Social Psychology 62:461–464.

Santostefano, Sebastiano 1960 An Exploration of Performance Measures of Personality. Journal of Clinical Psychology 16:373–377.

Schulman, Robert E.; Shoemaker, D. J.; and Moelis, I. 1962 Laboratory Measurement of Parental Behavior. Journal of Consulting Psychology 26:109–114.

Simoneit, Max 1954 Zur Kritik der Test-Psychologie. Psychologische Rundschau 5:44–53.

Stern, George G.; Stein, M. I.; and Bloom, B. S. 1956 Methods in Personality Assessment: Human Behavior in Complex Social Situations. Glencoe, III: Free Press.

Symonds, Percival M. 1931 Diagnosing Personality and Conduct. New York: Appleton.

Tupes, ErnestC, Carp, A.; and Borg, W. R. 1958 Performance in Role-playing Situations as Related to Leadership and Personality Measures. Sociometry 21: 165–179.

U.S. Office of Strategic Services 1948 Assessment of Men: Selection of Personnel for the Office of Strategic Services. New York: Rinehart.

Weislogel, Robert L. 1954 Development of Situational Tests for Military Personnel. Personnel Psychology 7:492–497.

Wolff, Werner W. 1942 Projective Methods for Personality Analysis of Expressive Behavior in Preschool Children. Character and Personality 10:309–330.