## Scaling

**-**

## Scaling

# Scaling

Measurement of a property involves assignment of numbers to objects as a way of representing that property. The process thus includes both a formal, logical system, e.g., the real number system, and an empirical system, the set of instances of the property.

**Scale types.** An important aspect of measurement is the correspondence between some of the characteristics of the number system involved and some of the relations between instances of the property. The particular set of characteristics of the formal number system that is used determines the *type of scale* that is obtained. Scale types can most easily be described by the type of transformation that can be applied to a given scale without changing the numerical interpretation of the basic empirical relations. Although there is no limit to the number of scale types that might be devised, only a relatively small number have been found useful. Some of the more important scale types are described here.

In *ordinal scales* the only relevant property of numbers is their *order*. Numbers are assigned to objects so that the order of the numbers corresponds to the order of the objects with respect to the property of interest. Any transformation of a given numerical assignment that preserves order, i.e., any *monotonic* transformation, would serve as well. An ordinal scale is therefore said to be determined up to a monotonic transformation.

*Fixed-origin ordinal* scales are rarely discussed, but are of some empirical importance in the fields of social psychology and personality, where bipolar attributes are not uncommon. Of relevance in such scales is the fact that numbers have a unique origin. Numbers are assigned not only so that their order corresponds to the order of the objects but also so that the origin of the numerical series corresponds to an empirically meaningful origin or zero point of the property. Any monotonic transformation that does not change the origin leaves the numerical interpretation of the basic relations unchanged.

In *interval scales*, in addition to order, differences between the numbers correspond, in some generalized sense, to the distances or differences between the instances of the property. Interval scales are thus determined up to a linear transformation of the form

*y = ax + b*,

where *a* is any positive number and *b* is any number whatsoever.

*Log interval* scales are closely related to the interval scale. In the interval scale, numerical differences correspond to empirical differences; in the log interval scale, numerical ratios correspond to empirical ratios. The transformation characterizing the log interval scale is given by

*y = ax ^{b}*,

where *a* and *b* are positive.

*Ratio* scales can be described either as those in which the order, the differences, and the ratios of the assigned numbers correspond to the order, differences, and ratios of instances of the property or as those in which order, differences, and origin all receive numerical interpretation. For either description, the ratio scale is determined up to a linear transformation of the form *y = ax*, where a is positive.

*Difference* scales and ratio scales are related in a way similar to the way that interval and log interval scales are related; in each case one scale is equivalent to the logarithm of the other. In the difference scale, the procedures used to assign numbers serve to fix the order, the differences between numbers, and the unit of measurement. The characterizing transformation is *y = x + a*. [*For a more extensive discussion of scale types, see*Psychometrics; Statistics, descriptive; *see also* Suppes & Zinnes 1963.]

The concept of scale type is important because the type of scale on which a property is measured restricts the kinds of information that can be obtained in an experiment. The scale type therefore limits the kinds of meaningful questions that can be asked regarding the property. Consider the following three questions:

*Do high-scoring subjects increase in ability more than low-scoring subjects* (*after some specified treatment*)? If the ability is measured on a scale that contains only ordinal information, and if both low and high scorers increase in ability with no overlap in distribution, then the question cannot be answered. The question deals with the relative magnitude of differences (or perhaps ratios), and in a purely ordinal scale the relative magnitude of numerical differences or ratios is meaningless.

*Is amount of aggression directly proportional to degree of frustration (in some specified situation’*)? Let *A* denote amount of aggression and *F* the degree of frustration. The question then concerns the form of the equation that relates *A* to *F*, that is, one asks whether these two variables are functionally related by an equation of the form *A* = *kF*, where k is some positive constant. Clearly, an affirmative answer cannot be given unless both *A* and *F* are measured on ratio scales.

*Do college sophomores differ more widely in their attitude toward civil rights than in their attitude toward the Asian policy of the United States*? Questions such as this require a scale type ordinarily ignored in physical measurement. Consider the formally equivalent question “Does this set of containers differ more widely in their volume than in their weight?” The question, which does not seem particularly unreasonable in the attitude context, now appears absurd: a difference of x cubic inches is neither more nor less nor equal to a difference of y pounds. The two differences are simply not comparable. However, volume and weight are measured on ratio scales. If they, or the two attitude variables, were measured on difference scales, then a basis for the comparison would exist.

**Kinds of measurement.** So far, scale type has been treated as though it were an inherent part of a given numerical assignment. In one sense, this is correct; each scaling procedure includes a set of rules governing assignment of numbers to instances of the property. The rules limit to varying extents and in varying ways the freedom of the investigator to decide which numbers to assign to the instances of the property. The characterization of scale type by admissible transformations is simply a compact way of stating the particular restrictions imposed.

But scaling procedures also differ markedly in the quality of information the numbers represent. At one extreme are measurement procedures that rely almost entirely on definition or assumption. An experimenter might simply *define* the attitude of a subject to be equal to the number of favorable responses he gives to a specified collection of items. Such *measurement by fiat* has been both useful and common throughout the behavioral sciences. The type of scale resulting from such a procedure is simply a matter of *definition or assumption*. Although these procedures determine a unique number to be assigned to a given subject, the experimenter may have reasons for wanting to attribute meaning only to the ordinal property of the numerical assignment. Or he may want to attribute meaning to both ordinal and interval properties, thus obtaining an interval scale. Suitable modifications of the definition could therefore result in any of the scale types already described. Obviously, such modifications do not change the empirical basis of the numerical assignment, which, in the present instance, is simply the number of favorable responses to the particular set of items.

The usefulness of such defined scales depends heavily on the experience and intuition of the investigator. The scales tend to be exceedingly special. Different investigators, when defining scales in this way for the same attribute, would ordinarily select different, or at best only partially overlapping, sets of items. Because favorable responses depend as much on the nature of the items as on the attitude of the subject, the value assigned to a given subject would clearly be expected to differ from one scale to another. While the scales might be ratio scales by definition, it is unlikely that they would differ from each other only in unit of measurement. The procedures themselves provide no basis for choosing among the many scales of a given attribute that might be defined in this way. Hence, confidence in any one cannot be expected to be great.

A somewhat different procedure for obtaining scale properties by definition relies on assumptions made about the distribution of subjects. The investigator might stipulate, for example, that the number of favorable responses to his set of items only serves to order the subjects on the attribute. But he then might go on to assume that his population of subjects is distributed in some specified way on the attribute (ordinarily, the assumption of a normal distribution is made, but other distributions could serve as well). Knowledge of order and of the properties of the assumed distribution enables the investigator to increase the level of the scale from ordinal to interval or perhaps even to ratio. But again, the meaning of the interval or ratio properties of the scale is derived only from the definition. There is no limit to the number of base populations that can be defined. And scales simply defined through use of the distribution assumption on one population cannot be expected to match those based on other populations.

A more appealing basis for establishing interval or ratio properties is provided by fundamental measurement procedures. In these procedures,’the meaning of the properties of the final scale is determined, at least to some extent, by the empirical relations between the observables. Fundamental measurement procedures involve verifiable scientific models or theories, place restrictions on the behavior of the observables, and are subject to empirical test. One possible outcome of an application of a fundamental procedure, of course, is the conclusion that the scale in question does not fit the data, that the theoretical assumptions are not appropriate for that particular situation. Under these circumstances, no scale is obtained. But when the empirical fit of data to the theory is acceptable, the result is a scale whose properties are based to some extent on empirical law rather than arbitrary definition alone. It is with such fundamental measurement procedures in scaling that this article is concerned.

**Two broad classes of scaling models.** The fundamental scaling procedures can be grouped into two broad classes: those concerned with the measurement of stimulus attributes and those concerned primarily with measurement of individual difference or responder attributes. Different models within each class represent alternative rationales for attaining essentially the same objective. The objective itself ordinarily differs between classes.

## Stimulus models

The stimulus models use responses of subjects to stimuli to establish scales for attributes of stimuli. Scale values are assigned only to the stimuli. The subject, or group of subjects, is characterized by the entire scale. Individual or group differences in this context refer to *differences between scales*, and not to differences in scale values on the attribute. The stimulus models are therefore concerned with psychological attributes *of* stimuli *for* subjects or groups of subjects.

In the stimulus model, the subject himself acts as a judge or evaluator of the stimuli. His task is a particularized one he does not simply judge or respond to the stimuli per se but, rather, evaluates the stimuli with respect to some specified attribute. Hence, the same set of stimuli can be scaled with respect to several different attributes. A set of statements all designed to reflect an attitude toward some topic might, for example, be scaled with respect to such attributes as degree of favorability toward the topic, relevance, complexity, or intensity. The subject’s task in each case is to evaluate the stimuli solely with respect to the attribute of interest. The resulting scale is then indicative of how these particular statements are perceived with respect to the denned attribute by that particular subject or group of subjects.

Stimulus models are used to obtain both psycho-physical and psychometric scales. The models appropriate for psychometrics are actually a subset of the psychophysical models, but the measurement problem itself differs considerably in the two fields. Measurement in psychophysics has been concerned primarily with determining functional relations between physical variables and the corresponding subjective attributes. The usual experimental strategy holds constant or controls all relevant physical dimensions but one, and then determines how the subjective attribute varies as a function of that single physical variable. The resulting psychophysical magnitude function is thus a relation between two variables that is expected to hold when other physical variables influencing the subjective attribute are fixed or controlled as specified. For example, a psychophysical function for loudness might relate the perceived loudness of 1,000-cycle sinusoidal tones of a given duration to sound energy measured at the ear. Other functions ordinarily are needed for white noise, for clicks, or for tones of another duration or a different frequency. Although each separate function is monotonic for the relevant physical variable, no such simple relation exists relating sound energy to subjective loudness for stimuli that vary on several relevant dimensions. Development of equations specifying the loudness of any stimulus that possesses sound energy would be a great deal more complex. [*See*Hearing; Psychophysics.]

In psychophysics the physical variable determines the set of stimuli to which the results apply. In psychometrics, it is more nearly the reverse. Psychometric scaling begins with the psychological attribute. The procedures are applicable to any set of stimuli possessing the attribute, and the problem is to scale the set of stimuli as it exists. The stimulus set is typically complex, varying with respect to other subjective dimensions and with respect to a host of correlated physical variables. There is, of course, no single, monotonically related physical correlate. Monotone relations are to be expected only when all but one of the physical correlates are held constant. Complexity of the stimulus set is therefore the chief characteristic distinguishing psychophysical from psychometric scaling problems. Several major consequences of this complexity are listed below. [*See*Psychometrics.]

*Recognizability of individual stimuli—*The subject can often identify psychometric stimuli because of their variation on extraneous attributes. Since he may tend to remember earlier responses to the same stimulus, independent judgments on successive trials cannot routinely be expected. Replication over trials with the same subject is therefore troublesome. Most psychometric scales have been based on groups of subjects, so that the recognition problem is not so severe.

*Ordinal properties—*When there is awareness of a monotonically related physical variable, knowledge of order on the subjective attribute is automatically implied. For the complex stimulus sets of psychometrics, order of the stimuli is typically an unknown that must be determined. Experimental procedures that require prior knowledge of order of the stimuli often then cannot be used.

*Production techniques—*When a physical variable is monotonically related to a psychological attribute, there is ordinarily a procedure for easily generating a virtually continuous set of stimuli that are ordered with respect to the psychological attribute. This in turn makes possible the use of the various production techniques procedures that require the subject to adjust a variable, i.e., to generate a new stimulus, with specified properties. On the other hand, with psychometric stimuli such production techniques typically cannot be used: the subject cannot on demand paint a painting halfway between two standards in aesthetic quality, write an essay twice as good as a standard, or create a worker whose merit falls between that of two others.

*Interpolation—*A monotonic physical attribute can be used as a basis for interpolation. The stimulus that could be judged greater than a standard 75 per cent of the time can be estimated from data obtained by using other stimuli. In addition, a subset of stimuli can be used to establish the functional relation between the two variables. Given the relation, the scale values of all remaining stimuli can be determined numerically or graphically, without resort to additional experimental observations. Neither technique is available for the complex stimulus sets of psychometrics, the closest approximation of which is the establishment of a scale of standards against which new stimuli can be evaluated by direct comparison.

The stimulus scaling models can be classified into two types on the basis of the rationale used to obtain scale properties above and beyond order. In the variability models, the discriminability of the stimuli, or the variability of judgment with respect to the stimuli, is used as the basis for deriving the higher-level metric scale properties. In the quantitative judgment models, the additional metric properties are obtained directly from the quantitative aspects of the judgments themselves.

**Variability models.** Perhaps the best known variability model is that developed by Thurstone (1927). Thurstone’s general model begins with a set of elements called discriminal processes. Each element has a value on a postulated, underlying psychological continuum. Presentation of a stimulus to a subject results in arousal of a discriminal process. Different presentations of the same stimulus may arouse different discriminal processes there is thus no one to one relation between the two. Rather, a distribution of discriminal processes is associated with each stimulus. Presentation of a stimulus is equivalent to sampling a discriminal process from the distribution associated with that stimulus. [*See*Concept Formation; Learning, *article on*discrimination learning.]

In Thurstone’s model it is postulated that these distributions (one for each stimulus) are normal with means s; and variances erf. The mean is taken as the scale value of the stimulus. The standard deviation, cr,, called the discriminal dispersion, is the index of the ambiguity or confusability of the stimulus; the smaller the cr,, the less ambiguous and the more readily discriminable the stimulus. The general model, therefore, provides both location and spread parameters for the stimuli. The conceptual framework of the model is illustrated in Figure 1.

We will consider here two particularizations of

the Thurstone model, one developed for comparative judgments and another developed for categorical judgments.

*The law of comparative judgment.* The law of comparative judgment postulates that in any comparison of two stimuli, the psychological difference between the stimuli is indirectly measured by the relative frequency with which the difference is perceived. For example, two stimuli, *j* and *k*, are presented to the subject, with instructions to indicate which one appears greater with respect to some designated attribute. Since each stimulus is held to arouse a discriminal process, d_{j} or d_{g}, the subject selects as greater the stimulus which on that occasion arouses the discriminal process having the greater value on the underlying continuum.

If the difference d_{k} d, 0, stimulus *k* is selected; if d_{k} d_{j} 0, stimulus j is selected. Exact equality is not allowed. Each paired presentation is formally equivalent to drawing an element from a bivariate normal distribution. An indefinitely large number of such observations generates a distribution of discriminal differences. From a large number of such judgments one can also determine the proportion of times that stimulus k is judged greater than stimulus *j*.

One of the characteristics of the normal distribution that makes it particularly valuable for distribution models of this type is that differences between paired observations drawn from a bivariate normal distribution are normally distributed. The mean of the difference distribution is equal to the difference between the original means, or scale values, s_{k},. s_{j},; the variance is where r_{jk} represents the correlation between the pairs of discriminal processes. If we define P_{jk}, as the probability that stimulus k is judged greater than stimulus j (that is, that d_{k} - d_{j} > 0) on any given presentation, and define xp, as the unit normal deviate corresponding to pu,, then the general form of the law of comparative judgment for n stimuli is given by the set of equations

The quantity *x _{jk}* is the (theoretical) observable; the remaining terms are unknowns. Given n stimuli, there are n(n l)/2 possible equations and, since both the origin and unit are arbitrary, there are 2(n—l)+n(n—l)/2 unknowns. Hence there are always more unknowns than independent equations, and further restrictions on the data are necessary to establish scale values. Three such specializations have been proposed, of which only one, Thurstone’s case v, the most restrictive, has been found generally useful. In case v, it is assumed that the value under the radical in (1) is constant, i.e., that the distributions of discriminal differences all have the same variance. Then, arbitrarily denning the variance as unity, for convenience, there remain simply the n(n — l)/2 equations

Efficient procedures for estimating the scale values from these equations are available. The scale obtained is an interval scale, because the unit of measurement must be defined arbitrarily, and a constant can be added to all of the scale values without changing the solution. Furthermore, the interval properties are based on empirical law. Consider any three ordered stimuli, i j k, and the three unit normal deviates x_{ij}, x_{ik}, and x_{jk}. From (2), since the numerical differences between scale values must add,

it is clear that the model requires that

The additivity-of-difference rule can therefore be said to provide the empirical basis for the interval properties of the final scale. Actually, of course, one would not expect (3) to hold exactly for empirical data. Among other things, the obtained proportions on which the deviates are based are only estimates and not population values. A statistical test for the goodness of fit of the model to data has been provided by Mosteller (1951). Graphical and numerical procedures can also be used to aid the investigator in deciding whether the fit is close enough for his purposes even though it does not meet the statistical criterion. Given estimates of the scale values of the stimuli, one can compute the set of theoretical proportions that would fit the estimated scale values perfectly. The discrepancies between these derived proportions and the corresponding observed proportions can then be examined. Alternatively, for any two stimuli, i and j, one can plot the obtained values of xik versus xik for all values of k. From (2) it is clear that the model requires that the plotted points fall along a straight line with unit slope. [*See*Goodness of fit.]

*The law of categorical judgment.* Several additional postulates are used to adapt the general Thurstone model to categorical data and to derive a set of equations relating stimulus values and category boundaries to the relative frequency with which each stimulus is judged to be in each category. In the usual experimental procedures, the subject is required to rate or sort a set of n stimuli into a fixed number of ordered categories. Graphic rating procedures or arrangement methods can also be used, but these require that the data first be artificially divided into a limited set of ordered categories.

Thurstone’s general model illustrated in Figure 1 deals with a set of n stimuli and their associated distributions of discriminal processes, with means s, and standard deviations . The present extension adds parallel notions dealing with the location of boundaries between adjacent categories.

Data are presented in the form of ratings of stimuli on an m + 1 step scale. It is assumed that the psychological continuum is divided by the m category boundaries into m + 1 intervals, which may or may not be equal in extent. In the original version of the model, the boundaries were treated as fixed. We shall follow the more general version that allows the boundary locations to vary from observation to observation. The location of the gth category boundary is treated as a random variable with mean t_{s} and variance . If we assume that these distribution functions are also jointly normal, then precisely the same reasoning that led to (1) leads to the set of mn equations for the general law of categorical judgment:

where t, —s( denotes the difference between the mean location of category boundary g and the mean location of stimulus j. Eq. (4) is again too general to be of more than academic interest. As was true for (1), there are always more unknowns than equations. One common restricted version of the general equation is based on the assumption that the radical is a monotone function of the stimulus standard deviations, . This would occur, for example, if both the correlation term and the variances of the category boundaries were constant, or if constant variances . and independence (r_{js} = 0) were assumed. If we now let

we can then write the nm equations

Eqs. (5) are formally identical to the equations underlying the “method of successive intervals” (Saffir 1937; Gulliksen 1954), although in the traditional version the assumption of fixed boundaries leads to the interpretation of a, as the standard deviation (discriminal dispersion) of stimulus j rather than as a more general spread parameter only monotonically related to.σ_{i}

Efficient and practical estimation procedures have been developed for both complete and incomplete data matrices and are discussed in Torgerson (1958).

Another common restricted case assumes that the difference distributions all have the same variance. Then, if the unit of measurement is defined so that the value under the radical in (4) is 1, we can write the mm. equations,

Eqs. (6) represent the most restricted version of the law and underlie Attneave’s method of graded dichotomies (1949), Garner and Hake’s equidiscriminability scaling procedure (1951), and Edwards and Thurstone’s method of successive intervals (1952). Note that this version allows a single parameter for both stimuli and category boundaries, whereas the other version allowed a second parameter for the stimuli. The final scale, if achieved, is therefore an equidiscriminability or equiconfusability scale, with properties similar to those of (2) for comparative judgments.

*The logistic model for choice data.* The logistic model is the major alternative to Thurstone’s general model for obtaining a scale based on discrimi-nability or variability of judgment. The model was developed for the paired comparison case by Bradley and Terry (1952). Luce’s choice axiom (1959) extended the model to choice data in general and gave the procedure a more elegant, axiomatic basis.

Luce’s version of the model requires, in essence, that the subjects’ choices behave in a manner analogous to conditional probabilities. Where *S* and *T* are both subsets of a set *U* and where *p*(*T*), the probability of choosing an element of *T*, is greater than 0, then the formula for the conditional probability of *S* given *T*, *p*(*S*ǀ*T*), is *p*(*S* ∩ *T*)/*p*(*T*). The interesting property of conditional probability for present purposes is the multiplication rule. Given three subsets, such that *R* ⊂ *S* ⊂ *T*,

p(RǀS).p(SǀT)=p(RǀT).

Luce’s axiom, for the case where the probability of choosing any element is neither 0 nor 1, is the empirical analogue:

That is the probability that the subject will choose an R-element when presented with the set *T* is equal to the probability that he will choose an R-element when presented with the set S multiplied by the probability that he will choose an S-element when presented with the set T.

An alternative form of the axiom is known as the constant ratio rule (Clarke 1957). Let x and y denote any two stimuli, let p(x,y) denote the probability that stimulus x is chosen when only x and y are presented, and let p_{s}(x) denote the probability that stimulus x is chosen when the subset S is presented. The constant ratio rule states that

where C_{xv} denotes the constant ratio of the probability of choosing stimulus x to the probability of choosing stimulus y. The rule requires that for any set S containing both stimulus x and stimulus y, the ratio of the probability of choosing stimulus x to that of choosing stimulus y be invariant. For a given pair of stimuli the ratio is constant regardless of either what or how many other stimuli are included in the set from which the choice is made. The complete experiment for evaluating Luce’s axiom or the constant ratio rule would therefore require the experimenter to obtain separate estimates of the probability of choosing stimulus x (for all x e T) for every subset of T containing a: as a member.

When the requirements of the axiom are met, then there exists a v-scale such that for every subset of stimuli S,

where v(x) denotes the scale value of stimulus x. From (9) and the constant ratio rule it is also clear that

Given the axiom and the definition, the scale values are determined up to multiplication by a positive constant, i.e., the u-scale is a ratio scale. It should be noted, however, that the ratio property depends on the particular definition chosen, as well as on the requirements placed on the data by the axiom. The empirical requirements of the choice axiom may limit the scale to determination up to a multiplicative constant and an exponent. Eq. (9), in effect, specifies a particular exponent in the same way that the unit was specified in (2).

The axiom places two rather severe restrictions on the data. Consider first the paired comparison case, and let x, y, z denote any three stimuli. Then the probability ratios must follow a multiplication rule:

Hence, any two probabilities determine the third. It can also be shown that the pairwise probabilities determine all probabilities of choice from larger sets and, in fact,

An adequate test of the model as stated would require evaluation of the extent to which both restrictions are met.

The simpler version of the logistic model, as developed by Bradley and Terry, is limited to paired comparison data and, hence, only the first restriction is applicable. Efficient estimating procedures and tests of goodness of fit are available (Bradley Terry 1952; Bradley 1954).

*Thurstone’s case V and the logistic model.* If we consider only the paired comparison situation, then the equation relating scale values to observable proportions for the logistic model can be written simply as

or, equivalently, as

Eq. (14) shows that in the v-scale a simple function of the observable is interpreted as a ratio of scale values. From (11) it is apparent that the metric properties of the v-scale are based on a mul-tiplication-of-ratios rule, with log interval properties. In Thurstone’s case v, the corresponding basis for the metric properties was an additivity-of-differ-ences rule, which yielded an interval scale.

Bradley and Terry, as well as Luce, have shown that the logistic model can also be used to generate an interval scale based on additivity of differences. If one defines r(x) = In v(x) and w(x,y=’) E r(x) — r(y), then (14) can be written

and from (13),

which is a simple form of the logistic function.

The logistic function and the normal ogive are markedly similar over most of the empirically interesting range. As a result, a close fit of the data to Thurstone’s case v insures a fit to the logistic model that can be only trivially better or worse. For all practical purposes, the values from the r-scale of the logistic model will be linearly related to the scale values from the Thurstone case v model. And both, of course, will be logarithmically related to the v-scale values.

It might be noted that the Thurstone model can also be easily modified to yield a log interval scale similar to the 7^-scale of the logistic model. We need only to substitute log normal distributions for normal distributions and subjective ratios for subjective differences. The new scale will be exponentially related to the Thurstone case v scale, and will therefore be related to the v-scale by a power transformation. In general, any model in which the observable is interpreted as a subjective difference in scale values can be paralleled by one in which the observable is interpreted as a subjective ratio. The difference-based scale will be logarithmically related to the ratio-based scale, and the fit of one model to any given set of data will be precisely matched by the fit of the other, parallel model.

## Quantitative judgment models

The variability methods require only ordinal judgments by the subject; the higher—level, metric properties of the obtained scales are derived through models relating variability of judgment to subjective differences or ratios. In the quantitative judgment models, the higher—level properties are obtained directly from the subject’s judgments. There are two classes of quantitative judgment methods: the difference, or distance, methods and the ratio methods. The difference methods have long been used to develop scales in the social sciences (Thurstone Chave 1929). Serious application of the ratio methods in the context of social science is more recent (see, for example, the summary of research in Stevens 1966). In the difference methods, the metric properties of the obtained scale depend upon the ability of the subject to evaluate or equate subjective differences. In the ratio methods, the metric properties depend upon his ability to evaluate or equate subjective ratios. Scales obtained from the two types of procedures are not ordinarily linearly related.

*Difference methods.* Difference methods include the methods of equisection, equal-appearing intervals, category rating, and category production. The methods differ from one another in the specific nature of the experimental procedure employed, but they all have the same objective: to obtain a scale in which the differences in numbers assigned to the stimuli directly reflect the corresponding subjective differences between the stimuli.

In equisection, the subject produces a set of stimuli that divides a given range of the attribute into subjectively equal steps. In equal-appearing intervals and category rating he sorts or rates the stimuli into a given set of subjectively equally spaced steps or categories. In category production, he produces on each trial a stimulus that corresponds to one of a set of equally spaced categories.

The usual procedures do not include built-in checks for the adequacy of the assumption that the subject can produce such judgments in an internally consistent manner. The assumption does place empirical restrictions on the behavior of the judgments, however, so appropriate checks are possible. The simplest and most direct check uses the method of bisection. Here, the subject first bisects the interval between two standard stimuli, a and e, to produce a new stimulus, c, halfway beween. He next bisects the interval a-c, producing a stimulus b, and bisects the interval c-e, producing a stimulus d. Finally, he bisects the interval between b and d. If the subject’s bisections are made in an internally consistent manner, the stimulus produced by the final bisection between b and d will be equal to c, the stimulus produced by the bisection of a and e.

For psychometric stimuli, where production techniques are unavailable, a more or less equivalent procedure involves imbedding a given subset of stimuli in several different larger sets. If the subject’s rating or sorting judgments fulfill the requirements of an interval scale, the scale values assigned to the members of the subset will be linearly related across the several experiments.

*Ratio methods.* Ratio methods include the methods of fractionation, constant sum, magnitude and ratio estimation, and magnitude and ratio production. In all of these methods, the subject’s task is to adjust stimuli or assign numbers so that the ratios between the numbers correspond to the subjective ratios between the stimuli. In fractionation, the subject adjusts a variable to be equal to 1/k of a standard. In the constant sum method, he divides 100 points between two standards so that the ratio between the assigned points corresponds to the subjective ratio. In the magnitude estimation procedure, his task is to assign numbers so they are proportional to the subjective magnitudes, whereas in the ratio estimation method, he assigns numerical ratios to pairs of stimuli. In the corresponding production procedures, the roles of numbers and stimuli are reversed: the experimenter chooses the numbers and the subject produces the appropriate stimuli or stimulus ratios.

Only the constant sum method has a built-in check for adequacy of the basic assumption. But again, appropriate checks of internal consistency are available for the other methods and, in fact, parallel those for the difference methods.

*Differences versus ratios.* The logarithmic, or nearly logarithmic, empirical relationship between difference scales and ratio scales has led to the conjecture that the subject’s quantitative judgments are based on a single perceived quantitative relation (Torgerson 1961). According to this argument, when the subject is told to equate differences, he interprets the subjective relation as a difference. When he is told to equate ratios, he interprets the same relation as a ratio. The conjecture receives additional support from the behavior of quantitative judgments when the direction of the attribute is reversed, i.e., when the subject reports on smoothness rather than roughness, or darkness rather than lightness. For difference scales, the attribute and its reverse are linearly, though negatively, related. Stimuli that are separated by equal differences in one direction remain equally different when the direction of judgment is reversed. For the ratio scales, the attribute and its reverse are reciprocally related. Here the equal ratio property is invariant, but differences of course change markedly.

The empirical results based on quantitative judgment are thus similar to the formal limitations of the variability methods. Each set of procedures yields scales on which either differences or ratios, but not both, are empirically meaningful. Additional procedures seem necessary to obtain a scale where both relations have an empirical basis. One such procedure has been devised by Cliff (1959).

## Individual difference models

The individual difference, or responder, models use subjects’ responses to stimuli as a way of establishing scales for attributes on which the subjects (responders) themselves differ. The models are based upon individual differences; an ultimate aim is to assign scale values to the subjects to represent amount or degree of the attribute possessed by each. Stimuli may also be assigned scale values on a parallel attribute, or they may be assigned parameter values relating them in some more general manner to the attribute of interest. Individual difference models are therefore those in which a response to a stimulus is interpreted as a function of both the scale value of the responder and the parameter values of the stimulus.

The relation between the task set for the subject and the attribute continuum differs markedly between the stimulus models and the individual difference models. In the stimulus models, the task is a particularized one—the subject evaluates stimuli with respect to the attribute being measured. In the individual difference models, the task itself has no obvious relation to the underlying attribute. In the categorical response methods, the subject simply indicates whether or not (or the extent to which) he agrees with, likes, or can pass a particular item. In the comparative response methods, he indicates which item he agrees with most, or which he prefers. In neither case, however, is the attribute of interest one of agreement or preference. It is, rather, a latent or underlying attribute that serves to explain or account for the variation in the responses of the subjects to the stimulus items.

A concrete example might serve to clarify the distinction. Suppose we are given a set of samples of coffee that differ only in the amount of sugar that has been added. The samples are presented, two at a time, to a population of subjects. Each subject chooses the member of each pair that he prefers. If a stimulus model were successfully applied to the data, the result would be a scale of preferability of the samples, ranging from the least preferred to the most preferred for that population of subjects. But if an individual differences model were successfully applied to the data, the result would be a scale of sweetness, with the stimuli ranging from least sweet to most sweet and each subject characterized by a scale value denoting the degree of sweetness that he considers ideal. The subjects’ responses do not tell us, of course, that the underlying variable is degree of sweetness. If the coffee samples had varied in strength or in amount of cream added rather than in sweetness, their responses would still have been of exactly the same form. In general, the individual differences models are explicitly concerned solely with the question of whether or not the responses of a given group of subjects to a given set of stimuli can be accounted for by a single underlying attribute. They do not tell one how to select the stimuli in the first place or, given that the stimuli form a scale, how the underlying attribute should be interpreted.

Individual differences models can be classified in a number of ways. One important difference, which will be used to organize the remainder of this section, is the distinction between the simple deterministic models and the more elaborate probabilistic models. Deterministic models are developed in terms of the ideal case; there is no provision in the model itself for unsystematic variance. Since the ideal case virtually never occurs, the empirical question is one of the extent to which the ideal approximates the actual behavior of the subjects. The probabilistic models represent more realistic attempts to account for the behavior of the subjects, provide explicitly for unsystematic variance, and, at least theoretically, lend themselves to statistical evaluation of goodness of fit.

**Simple deterministic models.** Three general models will be considered here: Coombs’s distance models for preferential choice data, Guttman’s scalogram model for categorical responses to monotone items, and the scalogram models for categorical responses to point items.

*Distance models.* In Coombs’s distance models, both subjects and stimuli are represented as points along an underlying, latent continuum. According to the general model, the subject’s preferential choices among stimuli are determined by the relative distances of the stimuli from that subject. A stimulus located at the same point as the subject represents his ideal, the one he would prefer above all others. As distance from the subject increases, the desirability of the stimulus to that subject decreases. Subject A thus will prefer stimulus X to stimulus Y when the distance d_{ax} is less than the distance dAy on the underlying continuum.

Any of a large number of experimental procedures might be used to obtain a preferential ranking of the stimuli—called a qualitative I—scale—for each subject. The analytical problem is to determine whether the set of —scales could have been derived from distances between subjects and stimuli on a common underlying attribute, and, if so, to construct the joint scale—or J—scale—giving the required locations of stimuli and subjects. Coombs’s unfolding technique (1950; 1964) is a procedure for accomplishing this goal. The end result is a scale which orders the stimuli, orders the midpoints between pairs of stimuli, and locates the subjects in segments of the scale defined by the interval between adjacent stimulus midpoints.

The order of the midpoints provides information on the relative size of the intervals between some, but not all, of the stimuli. For example, given four stimuli in order ABCD, the midpoint between B and C precedes that between A and D, when the distance d_{CD} > d_{AB}. Scales such as this, which provide not only an order for the elements themselves but also an order for some of the distances between elements, are called ordered metric scales.

With suitable modifications (Coombs 1953), the general approach described above can also be applied to data obtained from the “order k\n” and “pick k\n” methods. In the “order k\n” methods, the subject rank-orders only the first k out of the n stimuli presented to him, and in the “pick k\n” methods, he simply chooses the k out of n stimuli he prefers most. Procedures for ordering the stimuli and for locating the subjects in ordered classes are given in Coombs (1953) and Torgerson (1958).

*Monotone and point scalogram models.* The scalogram models for categorical data are concerned with subjects and items, where an item is defined as any stimulus or procedure that partitions the set of subjects into two or more mutually exclusive categories. In these models, both subjects and category boundaries can be considered as points on an underlying continuum. The scaling problem is to determine whether the subjects and category boundaries can be positioned along the continuum so that all subjects are located within the appropriate categories of all items.

Three types of ideal items are of interest: the monotone multicategory item, the monotone dichot-omous item, and the nonmonotone, or point, item.

A *monotone item* divides the underlying continuum into as many segments as there are response categories, with each category corresponding to one and only one segment. The following multicategory item provides an example of the general form:

My weight is (*a*) between 150 and 175 pounds; (*b*) 150 pounds or less; (*c*) 175 pounds or more.

*Dichotomous monotone* items provide only two response alternatives. For example,

1 Weigh over 150 pounds. True_____

False_____

Different items ordinarily partition the underlying continuum in different places. In general, n monotone items, with a total of m categories in all, partition the underlying continuum into m — n + 1 segments.

Guttman’s general scalogram procedure (see Stouffer et al. 1950) is a routine for determining whether the n items (*a*) are monotonic and (*b*) can be considered as partitioning the same underlying attribute for the given population of subjects. If so, the final result is, first, a rank order of the item category boundaries on the underlying continuum and, second, the locations of subjects in the segments defined by adjacent category boundaries. The general procedure can be used with any combination of multicategory and dichotomous monotone items. Somewhat simpler procedures are available if all items are dichotomous.

*Nonmonotone*, or point, items also partition the underlying continuum into segments, but not in a one-to-one fashion. An example of one form of non-monotone item is

My weight is between 150 and 175 pounds.

True _____

False_____

The positive response to this type of nonmonotone item corresponds to a single segment—in this example, the range from 150 to 175 pounds. But a negative response corresponds to two segments: subjects weighing over 175 pounds and those weighing under 150 pounds both respond negatively.

In order to evaluate the scale hypothesis and to obtain an ordinal scale, it is necessary for the segments corresponding to the positive alternatives to overlap. If no overlap occurs, the data contain information sufficient only to locate the subjects into a set of unordered categories. The procedures for uncovering the underlying scale for nonmonotone items are different from, but conceptually similar to, those appropriate for monotone items.

An alternative ideal model for nonmonotone items treats the positive category of the item as a point and represents each subject by a segment of the underlying continuum. An example of an item which might fit such a model better would be

I weigh about 150 pounds. Yes_____

No_____

Here, subjects might differ with respect to their ideas of the range encompassed by the term “about.” Since the two models are formally equivalent— they differ only in the interchange of the role of subject and item—the same analytical routines are applicable.

**Probabilistic models: the general theory.** Most of the models developed for measuring individual difference attributes can be considered within the framework of a single general probabilistic model whose conceptual basis is essentially that of Lazars-feld’s latent structure analysis (1954; see also Stouffer et al. 1950), in which the attribute to be measured is represented formally by an underlying or latent space of one or more dimensions, subjects are represented by points in the latent space, and the stimuli are treated as variables. Stimuli may be psychological test items, attitude statements, rating scales, or, in general, anything that classifies the subjects into two or more exclusive and exhaustive categories. The observations, or manifest data, thus consist of an N by n matrix of the scores of the N subjects on the n stimuli. The stimulus variables are characterized in the general model by the parameters of the equation describing the regression of that variable on the underlying latent attribute.

An important general notion is the principle of local independence. Since each stimulus variable is related to the underlying attribute, the stimulus variables will be related to each other in some fashion. The principle of local independence requires that the systematic relations among the stimuli be accounted for entirely by the separate relations of each stimulus variable to the underlying attribute. Thus, at any given point in the latent space, the stimuli are all mutually independent.

The over—all theory or conceptual framework clearly is completely general. In order to turn it into a testable model—one which places restrictions on the data—additional postulates or conditions are necessary. Three important considerations are the nature of the manifest data required, the nature of the postulated latent attribute, and the form of the regression equation relating the stimulus variables to the latent attribute. A particular model or testable specialization of the general theory depends on making a decision about each of the three. [*See*Latent structure.]

Three levels or types of manifest response data are of major concern here, but obviously others are possible: *Dichotomous items*—the stimulus variable classifies the subject into one of two categories; ordered classes—the stimulus variable classifies the subject into one of several ordered categories; *interval scale*—the stimulus variable provides an interval scale value for the subjects.

Three restrictive assumptions that can be placed on the postulated latent attribute are the following: the subjects are assumed to be located only at a relatively small number of points or regions in the space; the underlying attribute is assumed to be unidimensional; and the underlying attribute is assumed to be a multidimensional Euclidean space.

Four types of regression equations that relate the stimulus variables to the latent attribute are the various step functions, the linear equation, the normal ogive equation, and the logistic equation.

The models also differ in whether or not they provide for or allow variance about the regression line or surface. The distinction here is essentially that made earlier beween deterministic and probabilistic models. The deterministic models for categorical data, which make no provision for extraneous variance, can be considered as limiting cases for certain of the probabilistic versions.

**Some specializations of the general theory.** All of the models described in this section can be treated as specialized instances of general regression theory. In all cases, subjects can be considered as points in an underlying attribute space, stimuli as manifest variables related to the underlying space by a regression equation, and interrelations between manifest variables in terms of their separate relations to the underlying attribute space. The models differ in the kind of data required, the kind of space allowed, the form of the postulated regression equation, and in whether or not explicit provision is made for variance about the postulated regression surface. Table 1 provides a summary of the characteristics of and differences between a number of the more common models.

*Lazarsfeld’s latent class model.* In Lazarsfeld’s latent class model the stimuli (items) are dichotomous and are scored either as 0 or as 1. The latent space is nominal—subjects are located at discrete

Table I — Characteristics of some individual difference models that fall within the general regression approach | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

DETERMINISTIC | PROBABILISTIC | |||||||||||

General scalogram | Dichotomo us scalogram | Multidimensional extensions of scalogram | Latent class | Latent profile | Latent distance | Linear | Two factor | Multiple factor analysis | Normal ogive | Logistic | ||

Observable | ||||||||||||

Dichotomous | × | × | × | × | × | × | × | |||||

Ordered classes | × | |||||||||||

Interval | × | × | × | |||||||||

Latent space | ||||||||||||

Points | × | × | ||||||||||

Unidimensional | × | × | × | × | × | × | × | |||||

Multidimensional | × | × | ||||||||||

Regression | ||||||||||||

Step function | × | × | × | × | × | × | ||||||

Linear | × | × | × | |||||||||

Normal ogive | × | |||||||||||

Logistic | × | |||||||||||

Variance about regression | ||||||||||||

Yes | × | × | × | × | × | × | × | × | ||||

No | × | × | × |

points or, equivalently, along arbitrarily ordered intervals of a unidimensional continuum. The regression equation is then a step function, and provision is made for variance about the regression line.

If the classes are ordered arbitrarily, the height of the regression line for an item j, over the interval denned by class s, is the average score of the subjects in class s on that item. Since only scores of 0 and 1 are used, the height of the regression line for class s is also the probability that a person in class s will answer item ; correctly. This probability is denoted by v_{jt}. The unknowns to be determined are the set of probabilities V]t and the proportion of subjects in each class, n_{s}.

The observational information can be expressed by the following set of manifest proportions:

P_{i} = the proportion of subjects in the entire sample answering item *j* positively;

P_{jk} = the proportion of subjects answering both item *j* and item *k* positively;

p_{jkl} = the proportion answering items *j, k*, and *l* positively;

Lazarsfeld has shown that this information can be incorporated into a set of accounting equations, and several procedures have been devised for solving such equations (Green 1951; Anderson 1954). The solutions themselves do not directly indicate the class membership of individual subjects but, rather, give only the proportion of subjects in a class. Additional procedures are needed to assign subjects to classes.

*Gibson’s latent profile model.* Gibson’s latent profile model (1959), which differs from Lazars-feld’s only in the type of data required, is designed for manifest responses measured on an interval scale. Since the equations for latent profile analysis are formally identical to those of latent class analysis, no new analytical procedures are needed for their solution.

*Lazarsfeld’s continuous models.* In Lazarsfeld’s continuous models the stimuli are dichotomous items scored either as 0 or as 1, but the latent space is generally assumed to be a unidimensional continuum, although multidimensional extensions have also been suggested. Hence, the several continuous models proposed by Lazarsfeld differ from each other only in the form of the regression line. Where x the latent variable, f, = the regression of item j on x, and g(x) the density function of

the individuals, Lazarsfeld provides the following set of accounting equations:

In the latent distance model, the regression, or trace line, for the items ; is assumed to be a step function of the form shown in Figure 2. It is assumed that the probability of a response is some constant value (a, — fc,) up to a particular point Xj, and at this point it jumps to another value (a/+ bj) and remains constant thereafter. The equation for the regression can be written

f_{i}(x)= a_{i}–b_{i}, for x≤x_{i}

f_{i}(x)= a_{i}+b_{i}, for x>x_{i}

Each item is thus represented by three parameters: a_{i}, b_{i}, and a break point, x_{i}.

It is a general rule that step-function regression lines provide no metric information about the shape of the underlying distribution of subjects. Any monotonic transformation of the underlying continuum leaves the form of such regression lines unchanged; hence, the final scale is ordinal. Solutions for the latent distance model have been provided by Hays and Borgatta (1954).

It is interesting to note that if the parameters a_{i} and b_{i} are both assumed to have values of for all items, the latent distance model reduces to Gutt-man’s scalogram model for dichotomous items.

Lazarsfeld’s linear model assumes that the regression lines can be represented by an equation of the form

*f*_{i}(*x*) = *a*_{i} + *b*_{i}*x*,

where 0 ≤ f_{j}(x) ≤ 1 for all items over the region of the continuum containing the distribution of subjects. Solutions for the item parameters a, and bj are available (see Torgerson 1958).

*Spearman’s two-factor model.* Spearman’s two-factor model (1927) differs from Lazarsfeld’s linear model only in the form of the manifest data. When it is assumed that the stimulus variables are measured on an interval scale and standardized with a mean of zero and unit variance, then one can write the equation

S_{ij}, = C_{ig}a_{jg} + e_{ij} ,

where s_{ij} = score of subject *i* on stimulus j, c_{ig} = score of subject i on the general factor g, a_{ja} = loading of stimulus j on the general factor, and en represents variance due to unique factors and to error.

The regression line can be written simply as

f_{j}(x) = a_{j},x,

which differs from Lazarsfeld’s linear model only by the lack of an additive constant.

*Multiple factor analysis*. The standard procedures of multiple factor analysis require manifest data on interval scales and differ from Spearman’s model by the substitution of a multidimensional latent space for the unidimensional general factor. The basic equation for the score of subject i on stimulus j becomes

where m = 1, 2, ..., M is an index for dimensions. The corresponding regression function becomes

[*See*Factor analysis; *the biography of*Spearman.]

*Normal ogive and logistic models.* Normal ogive and logistic models require dichotomous items scored either as 0 or as 1, and assume a unidimensional latent attribute. They differ from the latent distance and linear models in their assumption of regression lines that seem intuitively to be more reasonable, particularly for measuring aptitudes and abilities. Models using the normal ogive trace lines have been developed by Lord (1953) and Tucker (1952). Birnbaum (1957) and Rasch (1960) have worked with the logistic function.

Warren S. Torgerson

[*Directly related are the entries*Latent structure; MathematicsProbability; Psychometrics. *Other relevant material may be found in*Factor analysis; *the articles listed under*Measurement; Psychophysics; Quantal response; *and in the biographies of*Fechner; Spearman; Thurstone; Weber, Ernst Heinrich.]

## BIBLIOGRAPHY

Anderson, T. W. 1954 On Estimation of Parameters in Latent Structure Analysis. Psychometrika 19:1—10.

Attneave, Fred 1949 A Method of Graded Dichotomies for the Scaling of Judgments. Psychological Review 56:334-340.

Birnbaum, A. (1957) 1958 Probability and Statistics in Item Analysis and Classification Problems. Texas: U.S. Air Force School of Aviation Medicine, Randolph Air Force Base.

Bradley, Ralph A. 1954 Incomplete Block Rank Analysis: On the Appropriateness of the Model for a Method of Paired Comparisons. Biometrics 10:375-390.

Bradley, Ralph A.; and TERRY, MILTON E. 1952 Rank Analysis of Incomplete Block Designs. I. The Method of Paired Comparisons. Biometrifea 39:324-345.

Clarke, Frank R. 1957 Constant-ratio Rule for Confusion Matrices in Speech Communication. Journal of the Acoustical Society of America 29:715-720.

Cliff, Norman 1959 Adverbs as Multipliers. Psychological Review 66:27-44.

Coombs, Clyde H. 1950 Psychological Scaling Without a Unit of Measurement. Psychological Review 57:145-158.

Coombs, Clyde H. 1953 Theory and Methods of Social Measurement. Pages 471-535 in Leon Festinger and Daniel Katz (editors), Research Methods in the Behavioral Sciences. New York: Dryden.

Coombs, Clyde H. 1964 A Theory of Data. New York: Wiley.

Edwards, Allen L.; and Thurstone, L. L. 1952 An Internal Consistency Check for Scale Values Determined by the Method of Successive Intervals. Psychometrika 17:169-180.

Garner, Wendell R.; and Hake, Harold W. 1951 The Amount of Information in Absolute Judgments. Psychological Review 58:446-459.

Gibson, W. A. 1959 Three Multivariate Models: Factor Analysis, Latent Structure Analysis, and Latent Profile Analysis. Psychometrika 24:229-252.

Green, Bert F. JR. 1951 A General Solution for the Latent Class Model of Latent Structure Analysis. Psychometrika 16:151-166.

Gulliksen, Harold 1954 A Least Squares Solution for Successive Intervals Assuming Unequal Standard Deviations. Psychometrika 19:117-139.

Hays, David G.; and Borgatta, Edgar F. 1954 An Empirical Comparison of Restricted and General Latent Distance Analysis. Psychometrika 19:271-279.

Lazarsfeld, Paul F. 1954 A Conceptual Introduction to Latent Structure Analysis. Pages 349-387 in Paul F. Lazarsfeld (editor), Mathematical Thinking in the Social Sciences. Glencoe, 111.: Free Press.

Lord, Frederic M. 1953 An Application of Confidence Intervals and of Maximum Likelihood to the Estimation of an Examinee’s Ability. Psychometrika 18:57-76.

Luce, R. Duncan 1959 Individual Choice Behavior: A Theoretical Analysis. New York: Wiley.

Mosteller, Frederick 1951 Remarks on the Method of Paired Comparisons. Part 3: A Test of Significance for Paired Comparisons When Equal Standard Deviations and Equal Correlations Are Assumed. Psycho-metrika 16:207-218.

Rasch, Georg W. 1960 Probabilistic Models for Some Intelligence and Attainment Tests. Copenhagen: No publisher listed.

Saffir, Milton A. 1937 A Comparative Study of Scales Constructed by Three Psychophysical Methods. Psy-metriha 2:179-198.

Spearman, Charles E. 1927 The Abilities of Man: Their Nature and Measurement. London: Macmillan.

Stevens, S. S. 1966 A Metric for the Social Consensus. Science New Series 151:530-541.

Stevens, S. S.; and Galanter, E. H. 1957 Ratio Scales and Category Scales for a Dozen Perceptual Con-tinua. Journal of Experimental Psychology 54:377-411.

Stouffer, Samuel A. et al. 1950 Measurement and Prediction. Studies in Social Psychology in World War II, Vol. 4. Princeton Univ. Press. See especially the chapters by Louis Guttman and by Paul F. Lazarsfeld.

Suppes, Patrick; and Zinnes, Joseph L. 1963 Basic Measurement Theory. Volume 1, pages 1-76 in R. Duncan Luce, Robert R. Bush and Eugene Galanter (editors), Handbook of Mathematical Psychology. New York: Wiley.

Thurstone, L. L. 1927 A Law of Comparative Judgment. Psychological Review 34:273-286.

Thurstone, L. L.; and Chave, Ernest I. (1929) 1937 The Measurement of Attitude: A Psychophysical Method and Some Experiments With a Scale for Measuring Attitude Toward the Church. Univ. of Chicago Press.

Tohgerson, Warren S. 1958 Theory and Methods of Scaling. New York: Wiley.

Toroerson, Warren S. 1961 Distances and Ratios in Psychophysical Scaling. Acta psychologica 19:201-205.

Tucker, L. R. 1952 A Level of Proficiency Scale for a Unidimensional Skill. American Psychologist 7:408 only.

## Scaling

# Scaling

Scaling can be defined as the structural and functional consequences of a change in size and scale among similarly organized animals. To examine what "consequences of a change in size" means, consider what would happen if one scaled up a cockroach simply by expanding it by a factor of 100 in each of its three dimensions. Its mass, which depends on volume, would increase by a factor of 1 million (100 x 100 x 100). The ability of its legs to support that mass, however, depends on the cross-sectional area of the leg, which has only increased by a factor of ten thousand (100 x 100). Similarly, its ability to take in oxygen through its outer surface will also grow only by ten thousand, since this too is a function of surface area. This disparity between the rapid growth in volume and the slower growth in surface area means the super-sized cockroach would be completely unable to support its weight or acquire enough oxygen for its greater body mass.

The consequences of body size on the physiology, ecology, and even behavior of animals, can be appreciated if one examines in more detail differences in function between organisms of widely different sizes. For example, consider that a 4-ton elephant weighs about 1 million times more than a 4-gram shrew, and further consider that the shrew consumes enough food daily to equal about 50 percent of its body weight. Imagine then what the daily food consumption of 1 million shrews would be (2 tons of food), and realize that the elephant is probably consuming instead only about 100 pounds of food. From this example it is obvious that daily food requirements do *not* scale directly with body mass. In fact, most body processes scale to some proportion of body mass, rarely exactly 1.0.

## Allometric Analysis

How can one determine the relationship of body processes to body mass? The best technique for uncovering the relationship is to plot one variable (for example, food requirements or metabolic rate) against body mass for groups of similar animals (for example, all mammals, or even more specifically, carnivorous mammals). Such a plot is called an X-Y regression. Using a statistical technique called least-squares regression gives an equation
that best fits the data. The equation for scaling of any variable to body mass is *Y = aW* ^{b}*,* where *Y* is the variable to be determined, *W* is the animal body mass (or weight), and *a* and *b* are empirically derived constants from the regression. The exponent *b* is of particular interest, since it gives the scaling relationship one is looking for in nonlinear relations, such as that of **metabolism** and body mass. This mathematical technique is called allometric analysis. Allometric analysis can be used to predict the capacity or requirements of an unstudied animal, one that might be too rare to collect or too difficult to maintain in captivity for study.

## Metabolism

Using this technique, several interesting relationships between animal structure and function have been uncovered. Among the most well studied is the relationship between animal metabolism and body mass, introduced above, in which *M* (metabolism) scales to the 0.75 power of body weight (*M = aW* ^{0.75}). This means that while the total energy needs per day of a large animal are greater than that of a small animal, the energy requirement *per gram * of animal (mass-specific metabolism) is much greater for a small animal than for a large animal. Why should this be the case? For birds and mammals that maintain a constant body temperature by producing heat, the increased mass-specific metabolism of smaller animals was once thought to be a product of their greater heat loss from their proportionately larger surface area-to-volume ratio. However, the same mathematical relationship between metabolism and body mass has been found to hold for all animals studied, and even unicellular organisms as well. Therefore, the relationship of metabolism to body size seems to represent a general biological rule, whose basis eludes scientific explanation at this time.

Allometric analysis has shown that different body processes, involving different organs, scale with different exponents of body mass. For example, blood volume, heart weight, and lung volume all scale almost directly with body mass (exponent = 0.99–1.02). Thus, the oxygen delivery system (heart and lungs) is directly proportional to body mass, even though the metabolism, and thus oxygen requirements, of the body scale with body mass to the 0.75 power. If the hearts are proportionately the same size for large and small animals, but mass-specific oxygen requirements are higher for small animals, then this implies that hearts in small animals must pump faster to deliver the greater quantity of oxygenated blood. Similarly, lung ventilation rates of smaller animals must be higher than those of larger animals. Both predictions have been borne out by measurements that support this conclusion from the allometric analysis.

## Locomotion

The energy requirement for locomotion also scales with body size, in much the same way that metabolism does. But here another factor comes into play: the type of locomotion. It is obvious that locomotion is much more energetically expensive than sitting still, but are some types of locomotion more expensive than others? Let's compare running, swimming, and flying. In plotting the cost of running versus body mass, one notes that metabolic cost increases directly as a function of mass. What about swimming and flying?
Again, cost increases with mass, but the regression lines for these allometric analyses exhibit different slopes than the one for runners. As might be expected the cost (per kilometer per gram of animal) is lowest for swimmers, where the body mass is supported by buoyancy; next highest for flyers, where body mass is partially supported by air mass; and highest for runners, who lose energy to friction with the ground. While water is more **viscous** to move through than air, swimmers (especially fish) have streamlined bodies that reduce frictional drag and reduce cost.

Allometric analysis helps explain why animals can only get so large or so small. Limits placed on structural support, amount of gut surface area required to process the required energy per day, and cost of locomotion become limiting factors for large animals. High surface area-to-volume ratios, high metabolic costs of existence, and limits on the speed of diffusion and cell surface area become limiting factors for small animals. Thus, animal structural design has functional implications that determine physiological processes and ultimately the ability to exist under specific ecological constraints.

**see also** Circulatory Systems; Flight; Gas Exchange; Physiological Ecology; Temperature Regulation

*Susan Chaplin*

## Bibliography

Peters, Robert H. *The Ecological Implications of Body Size.* Cambridge: Cambridge University Press, 1983.

Schmidt-Nielsen, Knut. *How Animals Work.* Cambridge: Cambridge University Press, 1972.

## scaling

**scaling** The adjustment of values to be used in a computation so that they and their resultant are within the range that can be handled by the process or equipment. The scaling factor is reapplied to correct the result before output or – if this is not possible – it is output as a qualifier with the result.