It is usually presumed that the score a person obtains on a psychological test is determined by the content of the items and that this score reflects such characteristics as the examinee’s knowledge on achievement tests, his abilities on aptitude tests, his interests on interest inventories, his traits on personality measures, and his opinions on attitude scales. Evidence in support of this presumption is important in establishing the meaning of the score. The responses a person makes to a test, however, are a function not only of item content but also of the form of item used and of other aspects of the test situation. The item form or the test directions, for example, may induce temporary preferences that systematically influence responses, such as the tendency to respond quickly rather than accurately. In addition, the examinee may characteristically bring to tests of certain formats various test-taking attitudes and habits, such as the tendency to agree, that produce a cumulative effect on his score. Properties of test form, then, may differentially influence an individual’s mode of item response and may also permit the operation of preferred or habitual styles of response. These stylistic response consistencies are called response sets. A response set is a habit or temporary disposition that causes a person to respond to test items differently than he would if the same content were presented in a different form (Cronbach 1946, p. 476; 1949).

A test presumed to measure one characteristic may thus also be measuring another characteristic (response set) that might not have been reflected in the score if some other form of the test had been administered. In this discussion the term “form” is used loosely to include all aspects of the test situation to which the examinee may react, including the form of the item, the tone of its statement, the number and nature of response alternatives provided, the test directions, instructions for guessing, and the presumed use to which the scores will be put.

Response set variance is of two major types: (1) transitory consistencies limited to a particular test or to a single testing session and (2) reliable, stable consistencies with some generality over tests and situations. Two forms of stable consistencies may also be distinguished in turn: (a) the type that reflects durable but relatively trivial individual differences (perhaps in language usage or expressive habits) and (b) the type that reflects significant aspects of personality (Cronbach 1949). To emphasize this latter type of stable response-set variance and its potential usefulness in the measurement of personality characteristics, Jackson and Messick (1958) suggested that it be renamed “response style.” A response style, then, is a consistency in the manner of response to some aspect of test form other than specific item content; it is relatively enduring over time and displays some generality in responses both in other tests and in nontest behavior (Jackson & Messick 1962, p. 134).

Kinds of response sets

The operation of response sets of various kinds has been detected on tests of several forms in widely different content areas, including ability and achievement tests, personality questionnaires and checklists, interest inventories, performance ratings, and attitude scales. The following are some of the major kinds of response sets investigated thus far.

Tendency to gamble or guess

Reliable individual differences have been noted in the tendency to guess when in doubt (see Cronbach 1946). These gambling consistencies vary over a wide range, from responding only when certain to attempting every item. Because he usually has partial knowledge, the frequent guesser tends to receive higher scores on ability tests than the more cautious respondent; and this difference in scores is not entirely eliminated by corrections for chance or random responding. Although one might attempt to assess partial knowledge and take it into account (Coombs, Milholland, & Womer 1956), a simpler approach to the reduction of gambling’s extraneous influence on scores would be to use (or claim to use) scoring formulas that penalize guessing. However, even though the threat of severe penalties may reduce the incidence of guessing on the average, bold students will still tend to guess more than cautious ones, and differences in gambling propensities will still influence individual scores. Because of this, Cronbach (1950) has recommended that examinees be directed to answer every item, except on those occasions when the response set is intended to measure some personality characteristic, such as cautiousness and risk taking (Swineford 1938; Messick & Hills I960; Kogan & Wallach 1964).

Speed vs. accuracy

When speed is an important aspect of test performance, as when the time limit is insufficient for the completion of the test, individuals tend to differ reliably in their preferences for responding rapidly as opposed to carefully and accurately (Cronbach 1950). This set to work rapidly may be related to the gambling tendency, whereas the opposing preference for carefulness appears to represent a more cautious test-taking strategy.

Evasiveness, indecision, and indifference

Substantial reliability has been repeatedly demonstrated for the tendency to use the noncommittal middle category on several response options, such as the neutral category on attitude scales, the “?” on the “yes-?-no” format, the “indifferent” choice on the “like-indifferent-dislike” option, or the “uncertain” response on “agree-uncertain-dis-agree” (Broen & Wirt 1958; Cronbach 1946; Lorge 1937). Possible bases for these preferences seem manifold. They may stem from a motivation to evade revealing a definite position. They may reflect indecision in the face of difficult choices or indifference in the face of uninteresting ones. As a cautious unwillingness to commit oneself, they might be inversely related to the gambling tendency discussed above. Or they may reflect consistent differences in the interpretation of category labels.

Interpretation of judgment categories

Reliable preferences for particular response options have been observed that appear to be partly due to stable differences in viewpoint about the meaning and scope of the judgment categories provided (Cronbach 1946; 1950). Thus, for example, two persons with the same pattern of interests would obtain quite different scores if one consistently interpreted the “like” category in a “like-indifferent-dislike” format to include anything that he did not dislike while the other limited its application to those things that he actively desired.


Another example of preferences for particular response options is the tendency to mark extreme categories as opposed to more moderate ones on rating scales and Likert-type formats (which permit degrees of agreement and disagreement). Individual differences in the use of extremes on rating scales are very reliable and, for attitude scales at least, appear to be relatively independent of item content (Kogan & Wallach 1964; Peabody 1962). In addition to its possible occurrence as a result of differences in the interpretation of judgment categories, the tendency to respond extremely may also stem from a “desire for certainty” or may reflect self-confidence in expressing opinions (Brim & Hoff 1957; Kogan & Wallach 1964).


The tendency to select the option “correct answer not given” in multiple-choice ability tests has been suggested as a response-set measure of self-confidence (Mullins 1963). The tendency to choose this alternative on a verbal test was found to be significantly correlated with the same tendency on a spatial test, even with ability held constant.


When no specific limit is placed on the number of responses required (as in instructions to “list the activities that interest you” or “mark those statements that reflect your attitudes”), some individuals consistently tend to give many responses while others tend to give few (Broen & Wirt 1958; Cronbach 1946; 1950). The tendency to be broadly inclusive in responding may be a reflection of individual differences in the perceived breadth of categories used in classification (Pettigrew 1958), or it may reflect uncritical-ness in delimiting these categories and in establishing criteria for inclusion.


Consistent differences in the strictness of evaluating the equivalence of objects or their acceptability in terms of some standard have been interpreted as a response set of criticalness. Scores for this disposition (for example, the tendency to respond “different” in appraising the equivalence of two possibly alternative expressions or to respond “ambiguous” in judging the acceptability of a sentence as an unambiguous statement) were found to be reliable, significantly in-tercorrelated, and generally unrelated to content measures from the same tasks (Frederiksen & Messick 1959).


The tendency to answer “true” on true-false examinations and tendencies to respond “agree,” “like,” and “true” on personality, interest, and attitude questionnaires have been found to be reliable and stable over time (Cronbach 1942; 1946; 1950; Couch & Keniston I960; Jackson & Messick 1958). This acquiescence tendency operates primarily when the respondent lacks knowledge or certainty about his answers. On achievement tests, then, acquiescence would be expected to reduce the spread between knowledgeable and poor students on true items and to increase it on false items, thereby lowering the reliability and validity of the “trues” score and raising that of the “falses” (Cronbach 1942). On personality and attitude scales, acquiescence would tend to occur when items are ambiguous, vague, or so neutral that their desirability or un-desirabihty is unclear (Jackson & Messick 1962). Individuals acquiescing to personality questionnaires were shown in one study to possess personality characteristics of extroversion, expressiveness, and impulsivity (Couch & Keniston 1960), thus suggesting that such acquiescence has some basis in stylistic consistencies of temperament.

There are several indications that the tendency to agree with extreme and sweeping generalizations is a different, although correlated, response style that has been called the tendency to “over-generalize” (Jackson & Messick 1958). Measures of this tendency have been found to have significant negative correlations with both criticalness and verbal ability (Frederiksen & Messick 1959), as well as with the abilities to overcome embedded-ness and distraction (Forehand 1962). This suggests that the tendency to overgeneralize is more a function of cognitive and intellectual limitations than it is of temperamental dispositions.

Tendency to respond desirably

The tendency to respond not to specific item content but to a more general connotation of an item’s meaning— namely, its desirability—has been found to be highly reliable (Edwards 1957). Although this tendency to respond in a socially desirable way is thus not completely independent of item content, it nevertheless represents a general response consistency that is different from the more limited consistencies generated by the truthful endorsement of specific item content. It qualifies as a response set under the definition given above (Cronbach 1946), since its operation causes individuals to give different responses when the same content is presented in different forms. Thus, responses to statements in a true-false form might be determined largely by the desirability of the characteristics described; but when the same items are presented in a forced-choice form, with alternatives matched for judged desirability, the responses are more likely to be determined by specific content (Edwards 1957).

On the basis of an obtained high relation between desirability responding and content scores (Edwards 1957; Jackson & Messick 1962), a major portion of response variance on several personality questionnaires has been attributed to the set to respond desirably. The meaning of this relation, however, remains controversial. Some theorists feel that desirability responding represents an attempt to put oneself in a favorable light, while others feel that it results from an accurate description of the individual’s actual desirability. It is unlikely that respondents would deliberately fabricate on such a large proportion of their responses, particularly under research conditions; but it also seems unlikely that accurate selfdescriptions would produce a single pervasive dimension intimately associated with desirability that on many questionnaires overshadows the various content dimensions of personality. An alternative hypothesis interprets desirability responding not in terms of deliberate misrepresentation but chiefly in terms of an autistic bias in self-regard (Damarin & Messick 1964).

Tendency to fake

The tendency to fake and distort responses in an attempt to bias the impression given to the examiner represents another reliable type of desirability responding. This tendency, which must be at least partially deliberate, is usually assessed on items with a wide discrepancy between the judged desirability of the characteristic described and its frequency of occurrence.

Thus, a penchant to claim desirable but extremely rare characteristics, or to deny undesirable but extremely common frailties, would be interpreted as lying. Such items appear in several “lie” and malingering scales, which have been found to intercorrelate in factor-analytic studies to form a dimension of response that is distinct from the type of desirability responding described above (Edwards & Walsh 1964; Jackson & Messick 1962). The operation of this bias in self-report may be appraised through the use of these lie scales so that particularly suspect respondents may be detected.

Tendency to deviate

Consistencies have been noted in the tendency to deviate from a modal response or from the typical response of some criterion group. Since pathological groups, which are deviant from normals in critical ways, also display deviant response styles on certain tests (such as a picture preference test), the generality of deviant response patterns has been hypothesized (Berg 1955). The implications of this hypothesis and its empirical basis were critically evaluated by Sechrest and Jackson (1963), who expressed reservations about the claimed generality.

Measurement and control of response sets

Several different kinds of response sets have been isolated that are reliable and stable and that reflect varying degrees of generality. Their operation may either increase or reduce the reliability and empirical validity of scores. However, since they cause individuals with equal status on an ability or trait to receive different test scores, response sets seriously attenuate logical validity and complicate interpretations (Cronbach 1946). Thus, response sets tend to introduce errors of measurement that should be avoided and controlled.

Procedures for controlling response sets include (1) changing the test form to prevent their occurrence, (2) modifying directions to reduce their operation, and (3) using special response-set scores to correct for their influence (Cronbach 1950). If possible, response options with fixed categories, such as “true-false” or “yes-?-no,” should be avoided in favor of multiple-choice or forced-choice forms. If fixed-response categories are used, the alternatives should be defined as clearly and as objectively as possible. Whenever the occurrence of a particular response set is thought likely, the instructions should be written to provide more structure and to reduce ambiguity. With regard to differences in guessing, for example, subjects should be specifically directed to answer all items (Cronbach 1950). In addition, special response set scores (Helmstadter 1957) may be used to correct for the set effect and to detect and discard subjects extreme in set responding.

On the other hand, some response sets appear to reflect important personality characteristics, and measures of these response styles may prove to be useful in their own right. When a response set is used as a measure of personality, the above control procedures should be avoided, and the test conditions should be designed to elicit the response set and to heighten its influence.

Samuel Messick

[See alsoPsychometrics. Other relevant material may be found inAchievement testing; Aptitude Testing; Decision Making; Intelligence And Intelligence Testing; Personality Measurement; Scaling; Vocational Interest Testing.]


