views updated

# Variances, Statistical Study of

General approaches to the study of variability

Parameters describing dispersion

Statistical inference

BIBLIOGRAPHY

This article discusses statistical procedures related to the dispersion, or variability, of observations. Many such procedures center on the variance as a measure of dispersion, but there are other parameters measuring dispersion, and the most important of these are also considered here. This article treats motivation for studying dispersion, parameters describing dispersion, and estimation and testing methods for these parameters.

Some synonyms or near synonyms for “variability” or “dispersion” are “diversity,” “spread,” “heterogeneity,” and “variation.” “Entropy” is often classed with these.

Why study variability?

In many contexts interest is focused on variability, with questions of central tendency of secondary importance—or of no importance at all. The following are examples from several disciplines illustrating the interest in variability.

Economics. The inequality in wealth and income has long been a subject of study. Yntema (1933) uses eight different parameters to describe this particular variability; Bowman (1956) emphasizes curves as a tool of description [cf. Wold 1935 for a discussion of Gini’s concentration curve and Kolmogorov 1958-1960 for Lévy’s function of concentration; see also Income distribution].

Industry. The variability of industrial products usually must be small, if only in order that the products may fit as components into a larger system or that they may meet the consumer’s demands; the methods of quality control serve to keep this variability (and possible trends with time) in check. [An elementary survey is Dudding 1952; more modern methods are presented in Keen & Page 1953; Page 1962; 1963; see also Quality control, Statistical.]

Psychology. Two groups of children, selected at random from a given grade, were given a reasoning test under different amounts of competitive stress; the group under higher stress had the larger variation in performance. (The competitive atmosphere stimulated the brighter children, stunted the not-so-bright ones: see Hays 1963, p. 351; for other examples, see Siegel 1956, p. 148; Maxwell I960; Hirsch 1961, p. 478.)

## General approaches to the study of variability

The simplest approach to the statistical study of variability consists in the computation of the sample value of some statistic relating to dispersion [see Statistics, Descriptive, article onlocation and dispersion]. Conclusions as to the statistical significance or scientific interpretation of the resulting value, however, usually require selection of a specified family, 0, of probability distributions to represent the phenomenon under study. The choice of this family will reflect the theoretical framework within which the investigator performs his experiment (s). In particular, one or several of the parameters of the distributions of 0 will correspond to the notion of variability that is most relevant to the investigator’s special problem.

The need for the selection of a specified underlying family, 9, is typical for statistical methodology in general and has the customary consequences : ideally speaking, each specified underlying family, 9, should have a corresponding statistic (or statistical procedure) adapted to it; even if a standard statistic (for example, variance) can be used, its significance and interpretation may vary widely with the underlying family. Unfortunately, the choice of such a family is not always self-evident, and hence the interpretation of statistical results is sometimes subject to considerable “specification error.” [Seeerrors, article oneffects of errors in statistical assumptions.]

Two of the special families of probability distributions that will not be discussed in this article are connected with the methods of factor analysis and of variance components in the analysis of variance.

The factor analysis method analyzes a sample of N observations on an n-dimensional vector (X1, X2, …, Xn) by assuming that the Xi; (i =1, …, n) are linear combinations of a random error term, a (hopefully small) number of “common factors,” and possibly a number of “specific factors.” (These assumptions determine a family, ) Interest focuses on the coefficients in the linear combinations (factor loadings). Unfortunately, the method lacks uniqueness in principle. [See Factor Analysis; see also the survey by Henrysson 1957.]

The variance components method, in one of its simpler instances, analyzes scalar-valued observations, xiju (k= 1, …, nij), on nij individuals, observed under conditions Q,ij(i =1,…, r; j = 1, …,s), starting from the assumption that xijk =μ+ai+bj +dj eijk, where the ai, bj,cij, eijk are independent normal random variables with mean 0 and variances respectively. The objective is inference regarding these four variances, in order to evaluate variability from different sources. [See Linear Hypotheses, article onanalysis of variance.]

## Parameters describing dispersion

Scales of measurement. Observations may be of different kinds, depending on the scale of measurement used: classificatory (or nominal), partially ordered, ordered (or ordinal), metric (defined below), and so forth. [See Psychometricsand Statistics, Descriptive, for further discussion of scales of measurement.] With each scale are associated transformations that may be applied to the observations and that leave the essential results of the measurement process intact. It is generally felt that parameters and statistical methods should in some sense be invariant under these associated transformations. (For dissenting opinions, see Lubin 1962, pp. 358-359.)

As an example, consider a classificatory scale. Measurement in this case means putting an observed unit into one of several unordered qualitative categories (for instance, never married, currently married, divorced, widowed). Whether these categories are named by English words, by the numbers 1, 2, 3, 4, by the numbers 3, 2, 1, 4, by the numbers 100, 101, 250, 261, or by the letters A, B, C, D does not change the classification as such. Hence, whatever it is that statistical methods extract from classification data should not depend on the names (or the transformations of the names) of the categories. Thus, even if the categories have numbers for their names, as above, it would be meaningless to compute the sample variance from a sample.

Parameters in general . Given a family, ℐ, of probability distributions, an identifiable parameter is a numerical-valued function defined on Let P be a generic member of ℐ, and let m be a positive integer. Most of the parameters for describing dispersion discussed in this article can be defined as

EPg (X1, X2, ... , Xm),

where X1, X2, …, Xm are independently and identically distributed according to P and g is an appropriate real-valued function. For example, the variance may be defined as EP[1/2(X1X2)2]. Given a family, ℐ, of probability distributions, one evidently has a wide choice of parameters (choosing a different g will usually yield a different parameter).

Different parameters will characterize (slightly or vastly) different aspects of . For instance, part of the disagreement between the eight methods of assessing variability described by Yntema (1933) stems from the fact that they represent different parameters. Of course, it is sometimes very useful to have more than one measure of dispersion available.

Dispersion parameters . A listing and comparison of various dispersion parameters for some of the scales mentioned above will now be given.

Parameters for classificatory scales. In a classificatory scale let there be q categories, with probabilities θi (i = 1, 2, …,q); ∑iθi = 1. The dispersion parameter chosen should be invariant under name change of the categories, so it should depend on the θi only. If all θi are equal, diversity (variability) is a maximum, and the parameter should have a large value. If one θi is 1, so that the others are 0, diversity is 0, and the parameter should conventionally have the value 0. A family of parameters having these and other gratifying properties (for example, a weak form of additivity; see Rényi 1961, eqs. (1.20) and (1.21)) is given by

the amount of information of order α (entropy of order α). Note that

is Shannon’s amount of information [see Information theory]. This information measure has a stronger additivity property—Blyth (1959) points out that if the values of X are divided into groups, then the dispersion of X = between group dispersion + expected within group dispersion. Miller and Chomsky (1963) discuss linguistic applications.

There are other measures of dispersion for classificatory scales besides the information-like ones. (For example, see Greenberg 1956.)

Parameters for metric scales. On a metric scale observations are real numbers, and all properties of real numbers may be used.

(a) For probability distributions with a density f,

there is the information-like parameter

Hi(f) = -E∫,log2,f(X) = -∫f(x)log2f(x) dx ≥ 0,

whenever the integral exists. This parameter is not invariant under arbitrary transformations of the X-line, although it is under translations. (For interesting maximum properties in connection with rectangular, exponential, normal distributions, see Rényi 1962, appendix, sec. 11, exercises 12, 17.) For a normal distribution with standard deviation

(b) Traditional measures of dispersion for metric scales are the standard deviation, σ ≥ 0, and the variance, σ2=Ep[(X-μ)2], where μ=EpX. As mentioned above, an alternative definition is

half the expectation of the square of the difference of two random variables, X, and X2, independently and identically distributed. This definition of σ2 suggests a whole string of so-called mean difference parameters, listed below under (c), (d), and (e),

all of which, like σ and σ2, are invariant under translations only.

(c) Gini’s mean difference is given by

The integral at the right is in general form; if X1,, X, have the density function f, the integral is

Wold (1935, pp. 48-49) points out the relationship between this parameter and Cramér’s ω2 method for testing goodness of fit. As can be seen in Table 1, below, Gini’s mean difference is a distribution-dependent function of σ.

There are variate difference parameters that involve the square of “higher-order differences” they are distribution-free functions of σ. An example is

EP[(X3 - 2X2 + X1)2] = 6σ2

There are also variate difference parameters involving the absolute value of higher-order differences; they are distribution-dependent functions of cr. An example is

(d) By analogy with the first definition of the variance, there are dispersion parameters reflecting absolute variation around some measure of central tendency. Examples are the mean deviation from the mean, μ,

δμ = Ep|X––μ|,

and from the median, MedX,

δMed = Ep|X–MedX|.

These are distribution-dependent functions of σ.

(e) There are dispersion parameters based on other differences. Two examples are the expected value of range of samples of size n,

EpWn=Ep[X(n)–X(1)],

where X(1)= min (X1; X2, …, Xn) and X(n), max (X1; X2, …, X(n); and the difference of symmetric quantile points,

ξ1−αα

where

and f is the density of the probability distribution P. Both these parameters are distribution-dependent functions of σ. Note that this last parameter, the difference of symmetric quantile points, is not based on expected values of random variables.

(f) Another dispersion parameter is the coefficient of variation, σ/μ (given either as a ratio or in per cent), invented to eliminate the influence of absolute size on variability (for example, to compare the variation in size of elephants and of mice). Sometimes it does exactly that (Banerjee 1962); sometimes it does nothing of the sort (Cramer 1945, table 31.3.5). (For further discussion, see Pearson 1897.)

Because they are distribution-dependent functions of cr, the parameters cited under (a), (c), (d), and (e) are undesirable for a study of the variance, σ2, unless one is fairly sure about the underlying family of probability distributions. This will be illustrated below. Despite this drawback, these parameters are, of course, quite satisfactory as measures of dispersion in their own right.

Comparison of dispersion measures. Table 1 lists the quotient of several of the above-mentioned parameters divided by σ, together with other relevant quantities. It gives these comparisons for the distributions of the types listed in the first column, with parameter specifications as indicated in the next two columns. The parameterization is the same as that in Distributions, Statistical, article onspecial continuous distributions.) The sign “˜” before an entry denotes an asymptotic result (for large n or large μ). Table 1 illustrates how

Table 1 - comparison of dispersion parameters
Distributional formParameters of distributionCoefficient of variationRatios of mean difference to σ
σμMed Xσ/μδμδMedEWn/σ
a. Here γ is Euler’s constant: γ= 0.5772157 …
b. Not known from the literature.
Normalσμμσ/μ
Exponentialθθθloge21116/92/eloge2˜γ+logena
Double exponentialλλb
Rectangularb

bad the distribution dependence of these parameters can really be. [See Errors, article on Non-sampling errors, for further discussion.]

Multivariate distributions. Most of the parameters discussed for univariate distributions can be generalized to multivariate distributions, usually in more than one fashion. The variance, for instance, is the expected value of one-half the square of the distance between two random points on the real line. Generalization may be attained by taking the distance between two random points in k-space or by taking the content of a polyhedron spanned by k + 1 points in k-space. Thus, a rather great variety of multivariate dispersion parameters are possible. [See Multivariate analysisand, for example, van der Vaart 1965.]

## Statistical inference

Shannon’s amount of information . Consider, first, point estimation of Shannon’s amount of information for discrete distributions. Suppose a sample of size n is drawn from the probability distribution, with q categories and probabilities θi, described earlier. Suppose ni; observations fall in the ith category; Then

suggests itself as the natural estimator for H1,(P) = −Σθlog2θi. The properties of this estimator have been studied by Miller and Madow (1954) and (by a simpler method) by Basharin (1959). The sampling distribution of has mean

(The term O ( 1/n2) denotes a function of n and the 0, such that for some positive constant, c, the absolute value of the function is less than θ/n2.) So for “small” n the bias, the difference between EĤ and H1,(P), is substantial. [For low-bias estimators, see Blyth 1959; for a general discussion of point estimation, see Estimation, article onpoint estimation.]

The variance of one population . Procedures for estimating the variance of a single population and for testing hypotheses about such a variance will now be described.

Point estimation for general ℐ. Let the underlying family, ℐμ consist of all probability distributions with density functions and known mean, p, or of all discrete distributions with known mean, μ. In both cases the theory of U-statistics (see, for example, Fraser 1957, pp. 135-147) shows that the minimum variance unbiased estimator of σ2, given a sample of size n, is

Note that the sampling variance, , which measures the precision of the estimator relative to the underlying distribution, P (a member of ℐμ.), is definitely distribution dependent. If μ is, again, the family of all absolutely continuous (or discrete) distributions now with unknown mean, then the uniformly minimum variance unbiased estimator of σ2 is

where x̄ is the sample mean. Again, varP is very much distribution dependent.

For more restricted families of distributions it is sometimes possible to find other estimators, with smaller sampling variances. Also, if the unbiased-ness requirement is dropped, one may find estimators that, although biased, are, on the average, closer to the true parameter value than a minimum variance unbiased estimator: for the family of normal distributions,

is such an estimator of σ2.

Distribution dependence. To illustrate the dependence of the quality of point estimators upon the underlying family of probability distributions, Table 2 lists the sampling variance of for random samples from 5 different distribution families. It is seen that the quotient (where P indicates some nonnormal underlying

Table 2 The sampling variance of .
Distribution
b. Example is due to Hotelling 1961, p. 350.
Normal
Exponential
Double exponential
Rectangular
Pearson type VIIb (f(x)=k(1+x2a−2)−p;p>5/2)

distribution and N a normal one) may vary from 2/5 to ∞. Hence, unless can be chosen in a responsible way, little can be said about the precision of as an estimator of σ2 (although for large samples the higher sample moments will be of some assistance in evaluating the precision of this estimator).

Normal distributions. Tests and confidence intervals on dispersion parameters for the case of normal distributions will now be discussed. In order to decide whether a sample of n observations may have come from a population with known variance, , or from a more heterogeneous one, test the hypothesis against the one-sided alternative where σ2 is the (unknown) variance characterizing the sample (Rao 1952, sec. 6a. 1, gives a concrete example of the use of this one-sided alternative). In order to investigate only whether the sample fits into the given population in terms of homogeneity, test against , where is a two-sided alternative. If the underlying family is normal, the most powerful level-α test for the one-sided alternative rejects H0whenever

where is tne 100δ per cent point of the chisquare distribution for n − 1 degrees of freedom (so that is the upper 100α per cent point of the same distribution). [For further discussion of these techniques and the terminology, see Hypothesis testing.]

The most powerful unbiased level-α test for the two-sided alternative rejects H0; whenever

Here and , with β+γ =α =λ+ ν (see Lehmann 1959, chapter 5, sec. 5, example 5, and pp. 165 and 129; for tables, see Lindley et al. 1960: α = 0.05, 0.01,0.001). In practice the nonoptimal equal-tail test is also used, where and with β=γ½α. For the latter test the standard chi-square tables suffice, and the two tests differ only slightly unless the sample size is very small.

The one-sided and two-sided confidence intervals follow immediately from the above inequalities; for example, a two-sided confidence interval for σ2 at level α is

S2/C12<S2/C2

Nonnormal distributions. The above discussion of the distribution dependence of point estimators of dispersion parameters should have prepared the reader to learn that the tests and confidence interval procedures discussed above are not robust against nonnormality. Little has been done in developing tests or confidence intervals for σ2 when is unknown or broad. Hotelling (1961, p. 356) recommends using all available knowledge to narrow down to a workable family of distributions, then adapting statistical methods to the resulting family.

Mean square differences. For a large family of absolutely continuous distributions with unknown mean, the minimum variance unbiased estimator, , was introduced above. An alternative formula is

This formula suggests another estimator of 2σ2, unbiased, but not with minimum variance:

If the indices 1, 2, …, n in the sample x1,x2, …, xn indicate an ordering of some kind (for example, the order of arrival in a time series), then is called the first mean square successive difference. Similarly,

the second mean square successive difference, is an unbiased estimator of 6σ2.

If the underlying family, ℐ, is normal, then asymptotically (for large n)

(see Kamat 1958). These estimators, although clearly less precise than , are of interest because they possess a special kind of robustness—against trend. Suppose the observations x1, xn have been taken at times t1, … tn from a time process, X(t) =ϕ(t) + Y, where ϕ is a smoothly varying function (trend) of t, and the distribution of the random variable Y is independent of t (for example, ϕ might describe an expanding economy and Y the fluctuations in it). Let an estimator be sought for var(Y). Most of the trend is then eliminated by considering only the successive differences xiXi+1= ϕ(ti)− ϕ(ti+1)+ yiyi+1, thus making for an estimator of var(Y) with much less bias. These methods have been applied to control and record charts by Keen and Page (1953), for example.

Little work has been done on studying the sampling distributions of successive difference estimators in cases where the underlying distribution is nonnormal. Moore (1955) gave moments and approximations of for four types of distributions.

The standard deviation of one population . Since , one might feel that the standard deviation, σ, should be estimated by the square root of a reasonable estimator of σ2. This is, indeed, often done, and for large sample sizes the results are quite acceptable. For smaller sample sizes, however, the suboptimality of such estimators is more marked (specifically, Es2 ≠σ if the underlying family is normal, an unbiased estimator is

where Г is the gamma function). Therefore, there has been some interest in alternative estimators, like those now to be described.

Estimation via alternative parameters. In Table 1 it was pointed out that, depending on the underlying family, ℐ, of distributions, certain relations exist between σ and other dispersion parameters, θ, of the form θ, σ. So if one knows ℐ, one may estimate θ by, say, T(x), apply the conversion factor 1/ν, and find an unbiased estimator of σ.

Thus, the mean successive differences,

are, if is normal, unbiased estimators of , and respectively, with sampling variances (see Kamat 1958) given by

(Here the term o (1 //n) denotes a function that, after multiplication by n, goes to zero as n becomes large.) See Lomnicki (1952) for the sampling variance of [n(n −1 )]−1ΣiΣj|xixj|, Gini’s mean difference, for normal, exponential, and rectangular . Again, if

and is normal, then is an unbiased estimator of σ its sampling variance is

which is close to the absolute lower bound, σ2/(2n). The properties of

where Me(x) is the sample median, differ slightly, yet favorably, from those of dm. The literature on these and similar statistics is quite extensive.

The last column of Table 1 suggests the use of the sample range, Wn= X(n) − X(1), to estimate σ the conversion factor now depends on both the underlying distribution and the sample size, n (for normal distributions, see David 1962, p. 113, table 7A.1). With increasing n, the precision of converted sample ranges as estimators of σ decreases rapidly. One may then shift to quasi ranges (X(n−r+1) − X(r)) or, better still, to linear combinations of quasi ranges (see David 1962, p. 107). The use of quasi ranges to obtain confidence intervals for interquantile distances (ξ1−α,- — ξα) was also discussed by Chu (1957). This type of estimator employs order statistics. A more efficient use of order statistics is made by the so-called best unbiased linear systematic statistics and by approximations to these [for more information, see Nonparametric statistics, article Onorder statistics], These linear systematic statistics are especially useful in case the data are censored [see Statistical analysis, special problems of, article ontruncation and censorship]. It should also be mentioned that grouping of data poses special problems for the use of estimators based on order statistics [see Statistical analysis, special problems of, article ongrouped observations].

Comparing variances of several populations . As in the example of increased variation on a reasoning test with competitive stress, discussed -above, it appears that situations will occur in which interest is focused on differences in variability as the response to differences in conditions. Two groups were compared in the example, but the situation can easily be generalized to more than two groups. Thus, one may want to apply more than two levels of competitive stress, and one may even bring in a second factor of the environment, such as different economic backgrounds (in which case one would have a two-way classification).

Bartlett’s test and the F-test. Consider k populations and k samples, one from each population (in the reasoning-test example, each group of children under a given level of stress would constitute one sample). Let the observations xr1,xr2,…,xrn, be a random sample from the rth population (r=1,… k,).Let

Define and νr = nr−1,

where

Bartlett’s 1937 test of the hypothesis H0: the k variances are equal, against HA: not all variances are equal, assumes all samples to be drawn from normal distributions and rejects H0 if and only if the statistic L is too large, where

The true values of the means of the k populations do not influence the outcome of this test. The distribution of L is known to be chi-square, with k− 1 degrees of freedom, for large samples; for samples of intermediate size, it is desirable to use, as a closer approximation, the fact that L/(l + c), where

has approximately the same chi-square distribution. Bartlett’s test is unbiased. Against these virtues there is one outstanding weakness: the test has total lack of robustness against nonnormality [see Errors, article oneffects of errors in statistical assumptions].

For k= 2, Bartlett’s test reduces to a variant of the F-test: reject H0: σ1 = σ2 in favor of , is either too large or too small. The onesided F-test rejects H0 in favor of , if where F1−α;ν1,ν2 is the upper 100a per cent point of the F-distribution for ν2 and ν2 degrees of freedom. The F-test in this context naturally has the same lack of robustness against nonnormality as Bartlett’s test.

Alternate test for variance heterogeneity.Bart-lett and Kendall (1946) proposed an alternative approach: apply analysis of variance techniques to the logarithms of the ksample variances. The virtue of this suggestion is that the procedure can be generalized immediately to a test of variances in a two-way classification. Box (1953, p. 330) showed that this test, too, is nonrobust against nonnormality. More robust procedures are described below.

Variances of two correlated samples. McHugh (1953) quotes a study of the effect of age on dispersion of mental abilities; the same group of persons was measured at two different ages. Naturally the two samples are correlated, and the F-test does not apply. Under the assumption that the pairs (x11, x,21), …, (x1i, x2i), …, (x1n, x2n) constitutea sample from a bivariate normal distribution with variances and and correlation coefficient ρ, the hypothesis is tested by the statistic

where and are as defined in the discussion of Bartlett’s test, above, and r is the sample correlation coefficient. The statistic is distributed under the null hypothesis as Student’s t with n− 2 degrees of freedom. One-sided tests, two-sided tests, and confidence intervals follow in the customary manner. Specifically, the hypothesis is tested by means of the statistic

This method, which was proposed by Morgan (1939) and Pitman (1939), is based on the easily derived fact that the covariance between X + Y and X − Y is the difference between the variances of X and Y, so that the correlation between the sum and difference of the random variables is zero if and only if the variances are equal.

Testing for variance-heterogeneity preliminary to AN OVA. The analysis of variance assumes equality of variance from cell to cell. Hence, it is sometimes proposed that the data be run through a preliminary test to check this assumption, also called that of homoscedasticity; variance heterogeneity is also called heteroscedasticity.

There are two objections to this procedure. First, the same data are subjected to two different statistical procedures, so the two results are not independent. Hence, a special theoretical investigation is needed to find out what properties such a double procedure has (see Kitagawa 1963). Second (see Box 1953, p. 333), one should not use Bartlett’s test for such a preliminary analysis, because of its extreme lack of robustness against nonnormality: one might discard as heteroscedastic data that are merely nonnormal, whereas the analysis of variance is rather robust against nonnormality. (An additional important point is that analysis of variance is fairly robust against variance heterogeneity, at least with equal numbers in the various cells.) [Seesignificance, tests of, for further discussion of preliminary tests.]

In view of the relative robustness of range methods, Hartley’s suggestion (1950, pp. 277-279) of testing for variance heterogeneity by means of range statistics is quite attractive.

Robust tests against variance heterogeneity.Box (1953, sec. 8) offers a more robust fe-sample test against variance heterogeneity: each of the ksamples is broken up into small, equal, exclusive, and exhaustive random subsets, a dispersion statistic is computed for each subset, and the within-sample variation of these statistics is compared with the between-sample variation. (Box 1953 applies an analysis of variance to the logarithms of these statistics; Moses 1963, p. 980, applies a rank test to the statistics themselves.)

Another approach applies a permutation test, which amounts to a kurtosis-dependent correction of Bartlett’s test (Box & Andersen 1955, p. 23). The results are good, although they are better in the case of known means than in the case of unknown means.

Still another procedure uses rank tests [see Non-parametric statistics, articles onthe fieldand onranking methods; see also a survey by van Eeden 1964]. Moses (1963, sees. 3, 4) makes some enlightening remarks about things a rank test for dispersion can and cannot be expected to do.

H. Robert van der Vaart

## BIBLIOGRAPHY

Banerjee, V. 1962 Experimentelle Untersuchungen zur Gültigkeit des Variationskoeffizienten V in der Natur, untersucht an zwei erbreinen Populationen einer Wasserläuferart. Biometrische Zeitschrift 4:121-125.

Bartlett, M. S. 1937 Properties of Sufficiency and Statistical Tests. Royal Society of London, Proceedings Series A 160:268-282.

Bartlett, M. S.; and Kendall, D. G. 1946 The Statistical Analysis of Variance-heterogeneity and the Logarithmic Transformation. Journal of the Royal Statistical Society, Series B 8:128-138.

Basharin, G. P. 1959 On a Statistical Estimate for the Entropy of a Sequence of Independent Random Variables. Theory of Probability and Its Applications 4:333-337. -→ First published in Russian in the same year, in Teoriia veroiatnostei i ee primeneniia, of which the English edition is a translation, published by the Society for Industrial and Applied Mathematics.

Blyth, Colin R. 1959 Note on Estimating Information. Annals of Mathematical Statistics 30:71-79.

Bowman, Mary J. 1956 The Analysis of Inequality Patterns: A Methodological Contribution. Metron 18, no. 1/2:189-206.

Box, G. E. P. 1953 Non-normality and Tests on Variances. Biometrika 40:318-335.

Box, G. E. P.; and Andersen, S. L. 1955 Permutation Theory in the Derivation of Robust Criteria and the Study of Departures From Assumptions. Journal of the Royal Statistical Society Series B 17:1-34.

Chu, J. T. 1957 Some Uses of Quasi-ranges. Annals of Mathematical Statistics 28:173-180.

CramÉr, Harald (1945) 1951 Mathematical Methods of Statistics. Princeton Mathematical Series, No. 9. Princeton Univ. Press.

David, H. A. 1962 Order Statistics in Short-cut Tests. Pages 94-128 in Ahmed E. Sarhan and Bernard G. Greenberg (editors), Contributions to Order Statistics. New York: Wiley.

Dudding, Bernard P. 1952 The Introduction of Statistical Methods to Industry. Applied Statistics 1:3-20.

Fraser, Donald A. S. 1957 Nonparametric Methods in Statistics. New York: Wiley.

Greenberg, Joseph H. 1956 The Measurement of Linguistic Diversity. Language 32:109-115.

Hartley, H. O. 1950 The Use of Range in Analysis of Variance. Biometrika 37:271-280.

Hays, William L. 1963 Statistics for Psychologists. New York: Holt.

Henrysson, Sten 1957 Applicability of Factor Analysis in the Behavioral Sciences: A Methodological Study. Stockholm Studies in Educational Psychology, No. 1. Stockholm: Almqvist & Wiksell.

Hirsch, Jerry 1961 The Role of Assumptions in the Analysis and Interpretation of Data. American Journal of Orthopsychiatry 31:474-480. -→ Discussion paper in a symposium on the genetics of mental disease.

Hotelling, Harold 1961 The Behavior of Some Standard Statistical Tests Under Non-standard Conditions. Volume 1, pages 319-359 in Berkeley Symposium on Mathematical Statistics and Probability, Fourth, 1960, Proceedings. Edited by Jerzy Neyman. Berkeley and Los Angeles: Univ. of California Press.

Kamat, A. R. 1958 Contributions to the Theory of Statistics Based on the First and Second Successive Differences. Metron 19, no. 1/2:97-118.

Keen, Joan; and Page, Denys J. 1953 Estimating Variability From the Differences Between Successive Readings. Applied Statistics 2:13-23.

Kitagawa, Tosio 1963 Estimation After Preliminary Tests of Significance. University of California Publications in Statistics 3:147-186.

Kolmogorov, A. N. 1958 Sur les propriétés des fonctions de concentrations de M. P. Lévy. Paris, Université, Institut Henri Poincaré, Annales 16:27-34.

Lehmann, E. L. 1959 Testing Statistical Hypotheses. New York: Wiley.

Lindley, D. V.; East, D. A.; and Hamilton, P. A. 1960 Tables for Making Inferences About the Variance of a Normal Distribution. Biometrika 47:433-437.

Lomnicki, Z. A. 1952 The Standard Error of Gini’s Mean Difference. Annals of Mathematical Statistics23:635-637.

Lubin, Ardie 1962 Statistics. Annual Review of Psychology 13:345-370.

Mchugh, Richard B. 1953 The Comparison of Two Correlated Sample Variances. American Journal of Psychology 66:314-315.

Maxwell, A. E. 1960 Discrepancies in the Variances of Test Results for Normal and Neurotic Children. British Journal of Statistical Psychology 13:165-172.

Miller, George A.; and Chomsky, Noam 1963 Finitary Models of Language Users. Volume 2, pages 419-491 in R. Duncan Luce, Robert R. Bush, and Eugene Galanter (editors), Handbook of Mathematical Psychology. New York: Wiley.

Miller, George A.; and Madow, William G. (1954) 1963 On the Maximum Likelihood Estimate of the Shannon-Wiener Measure of Information. Volume 1, pages 448-469 in R. Duncan Luce, Robert R. Bush, and Eugene Galanter (editors), Readings in Mathematical Psychology. New York: Wiley.

Moore, P. G. 1955 The Properties of the Mean Square Successive Difference in Samples From Various Populations. Journal of the American Statistical Association 50:434-456.

Morgan, W. A. 1939 A Test for the Significance of the Difference Between the Two Variances in a Sample From a Normal Bivariate Population. Biometrika 31: 13-19.

Moses, Lincoln E. 1963 Rank Tests of Dispersion. Annals of Mathematical Statistics 34:973-983.

Page, E. S. 1962 Modified Control Chart With Warning Lines. Biometrika 49:171-176.

Page, E. S. 1963 Controlling the Standard Deviation by Cusums and Warning Lines. Technometrics 5:307-315.

Pearson, Karl 1897 On the Scientific Measure of Variability. Natural Science 11:115-118.

Pitman, E. J. G. 1939 A Note on Normal Correlation. Biometrika 31:9-12.

Rao, C. Radhakrishna 1952 Advanced Statistical Methods in Biometric Research. New York: Wiley.

RÉnyi, AlfrÉd 1961 On Measures of Entropy and Information. Volume 1, pages 547-561 in Berkeley Symposium on Mathematical Statistics and Probability, Fourth, 1960, Proceedings. Edited by Jerzy Neyman. Berkeley and Los Angeles: Univ. of California Press.

RÉnyi, AlfrÉd 1962 Wahrscheinlichkeitsrechnung, mit einem Anhang über Informationstheorie. Berlin: Deutscher Verlag der Wissenschaften.

Siegel, Sidney 1956 Nonparametric Statistics for the Behavioral Sciences. New York: McGraw-Hill.

van der Vaart, H. Robert 1965 A Note on Wilks’ Internal Scatter. Annals of Mathematical Statistics 36: 1308-1312.

van Eeden, Constance 1964 Note on the Consistency of Some Distribution-free Tests for Dispersion. Journal of the American Statistical Association 59:105-119.

Wold, Herman 1935 A Study on the Mean Difference, Concentration Curves and Concentration Ratio. Metron 12, no. 2:39-58.

Yntema, Dwight B. 1933 Measures of the Inequality in the Personal Distribution of Wealth or Income. Journal of the American Statistical Association 28:423-433.