Counted Data
Counted Data
Precautions in the analysis of counted data
Counted data subject to sampling variability arise in demographic sampling, in survey research, in learning experiments, and in almost every other branch of social science. The counted data may relate to a relatively simple investigation, for exampie, estimating sex ratio at birth in some specified human population, or to a complex problem, investigating the interaction among qualitative responses of animals to stimuli in a physiological experiment. Further, a counted data approach is sometimes useful even when the actual data are inherently not counted; for example, a classical approach to socalled goodness of fit uses the counts of numbers of continuous observations in cells or intervals. Again, some nonparametric tests are based on a related device. [SeeGoodness of fit; Nonparametric statistics.]
Investigations leading to counted data are often described by giving percentages of individuals falling in the various categories. It is essential that the total numbers of individuals also be reported; otherwise reliability and sampling error cannot be estimated.
The structure of this article is as follows. First, simple procedures relating to one or two sample percentages are considered. These procedures exemplify the basic chisquare approach; they may be regarded as methods for treating particular contingency tables in a way falling in the domain of the basic chisquare theorem. Second, special aspects of contingency tables are considered in some detail: power, single degrees of freedom, ordered alternatives, dependent samples, measures of association, multidimensional contingency tables. Under the last topic is considered the important topic of threefactor interactions. Third, some alternatives to chisquare are briefly mentioned.
Binomial model
Consider an experiment in which animals of a group are independently subjected to a stimulus. Assume two, and only two, responses are possible (A and Ā). Of 20 animals exposed independently to the stimulus, responses of type A are exhibited by 16. Such a count, or the corresponding percentage, 80 per cent, may be the basis of an estimate of the probability of an A response in all animals of this kind; or it may be the basis of a test of the hypothesis that responses A and Ā are equally likely. The evaluation of either this estimate or the test is dependent upon the assumptions underlying the data collection.
One of the basic models associated with such experiments is the binomial. The binomial model is associated with a series of independent trials in each of which an event A may or may not occur and for which it is assumed that the probability of occurrence of A, denoted p, is constant from trial to trial. If the number of occurrences of A among n such trials is v, then v/n is the maximum likelihood and also the minimum variance unbiased estimator of p. [For further discussion of the binomial distribution, seeDistributions, statistical, article onspecial discrete distributions.]
Additional insight as to the reliability of the estimator is obtained from a confidence interval for p. Tables and graphs have been prepared to provide such confidence intervals for appropriate levels of confidence. The best known of these is the graph, by Clopper and Pearson (1934). This graph or the tables that have been computed for the same purpose (for example, Owen 1962) determine socalled central confidence limits; that is, the intervals that are “false,” in the sense that they do not include the true parameter value, are equally divided between those that are too low and those that are too high. [SeeEstimation, article onconfidence intervals and regions.]
Confidence intervals may also be used to test a null hypothesis that p has the value p_{0}. If p_{0} is not included in the 1 — α level confidence interval, then the null hypothesis p = p_{0} is rejected at level α.
Equivalently, a direct test may be made of this hypothesis by utilizing the extensive tables of the binomial distribution. Two of the best known are those of Harvard University (1955) and the U.S. National Bureau of Standards (1950).
More usually both confidence intervals and test procedures are based upon an approximation to the distribution, that is, on the fact that (v — np) · [np(l — p)]^{– } has a limiting standard normal distribution. [SeeDistributions, statistical, article onapproximations to distributions.]
Denote by Z_{1–α} the 100(1 – α) percentile of the standard normal distribution. The null hypothesis p = p_{0} tested against the alternative p ≠ p_{0} is rejected at level α on the basis of an observation of v successes in n trials if
in case the alternatives of interest are limited to one side of p_{0}, say p > p_{0}, the test procedure at level α is to reject H_{0} if
The subtracted is the socalled continuity correction—useful when a discrete distribution is being approximated by a continuous one.
Thus, in the experiment described above, the experimenter might be testing whether the choice is made at random between A andĀ against the possibility that A is the preferred response. This is a test of the hypothesis against the alternative . Corresponding to the conventional 5 per cent significance level, Z_{0.95} = 1.64; then if v = 16 (16 A responses are observed), the hypothesis is rejected at the 5 per cent level since
The normal approximation to the binomial is thought to be quite satisfactory if np_{0}(l — p_{0}) is 5 or more. However, for many practical situations the normal approximation provides an adequate test (in the sense that the type i error is sufficiently close to the specified level) for values of np_{0}(l – p_{0}) well below the bound of 5 mentioned above.
The simplest confidence limits for p based on the normal approximation are
The binomial model requires independence of the successive trials. Much sampling, especially of human populations, is, however, done without replacement so that successive observations are in fact dependent and the correct model is not the binomial but the hypergeometric. In sampling theory this is taken into account by the finite population correction, which modifies the variance. Thus, where the binomial variance is np(l — p), the hypergeometric variance for a sample of size n from a population of size N is np(l — p) (1 — n/N). If n is a small fraction of N, the finite population correction is negligible; thus, the binomial model is often used as an acceptable approximation.
Chisquare tests
For one or two proportions
The statistic which for sufficiently large n may be used to test the hypothesis p = p_{0} against p ≠ p_{0}, yields, when squared, an equivalent test procedure based on the chisquare distribution with one degree of freedom; this follows from the fact that the square of a standard normal variable has a chisquare distribution with one degree of freedom. [SeeDistributions, statistical, article onspecial continuous distributions.]
Following recent practice, “X^{2}” is written for the test statistic, and the symbol “x^{2}” is reserved for the distributional form.
This algebraic identity shows that the statistic X^{2} may be written (neglecting the continuity correction term, ) as (observed – expected)^{2}/expected, summed over the two categories A and Ā. Such a measure of deviation of observations from their expected values under a null hypothesis is of wide application. For example, consider the counts of individuals with characteristic A that occur in two independent random samples and suppose that the null hypothesis at test is that the probability of occurrence of A is the same in both populations; call the common (but unspecified) probability p. The observations may be tabulated as in Table 1.
Table 1 — Observations in two samples  

NUMBERS OF OBSERVED  
A’s  Ā’s  Totals  
1  v_{11}  v_{12}  n_{1}  
SAMPLE  
2  v_{21}  v_{22}  n_{2}  
Totals  v_{.1}  v_{.2}  n 
If p were known, then under the null hypothesis the expectation of the number of A’s in sample 1 would be n_{1}p and in sample 2 the expectation would be n_{2}p, where p is the probability of occurrence of A. Since p is unknown, however, it must be estimated from the data [seeEstimation, article onpoint estimation].
If the hypothesis were true, the two samples could be pooled and the usual (minimum variance unbiased) estimator of p would be v_{.1}/n. With this estimator the estimated expected number of A’s in sample 1 is n_{1} (v_{.1}/n) and in sample 2 is n_{2}(v_{.2}/n). Similarly the estimated expected numbers of Ā’s are n_{1} (v_{.1}/n and n_{2}(v_{.2}/n in the two samples. These estimated expectations are tabulated in Table 2.
An expression similar to X^{2} can be calculated for each sample where now, however, p_{0} is replaced by the estimator v_{.1}/n These expressions are
and
Since the estimator of p will tend to be close to the true value for large sample sizes, it is intuitive to conjecture that each of these are squares of normal variables (at least approximately for large samples). The sum does have a limiting chisquare distribution but with one degree of freedom, not two. The “loss” of the degree of freedom comes from estimating the unknown parameter, p. The test statistic, which more formally written is
may be simplified to
If v_{11}v_{22} – v_{12}v_{21} is less than or equal to n/2, the correction term is inappropriate and possibly misleading. In practice this problem rarely arises.
Basic chisquare theorem
The above chisquare test statistics for one or two proportions may, as was seen, be written as sums of terms whose numerators are squared deviations of the observed counts from those “expected” under the null hypothesis. (Expected is placed in quotation marks to emphasize that the “expectations” are often estimated expectations obtained via estimation of unknown parameters.) The denominators may be regarded as weights to standardize the ratios. This pattern may be widely extended.
For example, consider a questionnaire with respondents placing themselves in five categories: strongly favor, mildly favor, neutral, mildly oppose, strongly oppose. The n independent responses might furnish data for a test of the hypothesis that each of the responses is equally likely. If the probabilities of the five responses are denoted p_{1} through p5, this null hypothesis specifies p_{1} = p_{2} = p_{3} = p_{4} = and under the null hypothesis the expected number of responses in each category is n/5. The appropriate weights in the denominator of the chisquare test statistic are suggested by the expanded form of X^{2} given above; each term (observed expected)^{2} is divided by its expected value. Thus in this example,
That these weights lead to the usual kind of null distribution can be shown by considering the multinomial distribution, the extension of the binomial distribution to a series of independent trials with several outcomes rather than just two. If the null hypothesis is true, this X^{2} has approximately a chisquare distribution with four degrees of freedom.
More generally, suppose that on each of n independent trials of an experiment exactly one of the events E_{1}, ···,E_{J} occurs. Let P_{J}, depending in a given way (under the null hypothesis under test) on unknown parameters θ_{1}, ···, θ_{m}, be the probability that E_{j} occurs and suppose there are asymptotically efficient estimators of the θ’s, from which are obtained asymptotically efficient estimators of the p_{j}, denoted p_{J}; thus np_{j} estimates the expected frequency of occurrence of E_{j} under the null hypothesis. Let the random variable v_{j} be the number of times E_{j} actually occurs in the n trials. Then
has, under the null hypothesis for large n and under mathematical regularity conditions, approximately the chisquare distribution with J — m — 1 degrees of freedom. When the null hypothesis is false, X^{2} tends to be larger on the average than when it is true, so that a righthand tail critical region is appropriate, that is, the null hypothesis is rejected for large values of X^{2}.
Note that the above “chisquare” statistic is of form
(The quotation marks around expected indicate that this is actually an asymptotically efficient estimator of the expectation under the null hypothesis.)
The above development can readily be extended to l independent sequences of trials, with n_{i} trials in the ith sequence, p_{ij}; denoting the probability under the null hypothesis of event j for sequence i, and v_{ij}; denoting the number of times E_{j}, occurs in sequence i. As before,
is, for large n_{i}, approximately chisquare with I(J – 1) — m degrees of freedom, under the null hypothesis, and with appropriate regularity conditions. Note that when l = 1, J — m — 1 degrees of freedom are obtained, as before.
The primary problem in such tests is the derivation of asymptotically efficient estimators. For example, such estimators may be maximum likelihood estimators or minimum chisquare estimators. The latter are the θ’s that minimize X^{2}, the test statistic, subject to whatever functional restraints are imposed upon the p_{ij}’s. Neyman (1949) has given a method of determining modified minimum chisquare estimators, a method that reduces to solving only linear equations, as many as there are unknown parameters to estimate. A review of the methods of generating such minimum chisquare estimators for this model, and for a more general one, is given by Ferguson (1958).
It is easily seen that the comparison of two percentages is a special case of the general theorem. Here I = J = 2 and the null hypothesis can be put in the form p_{11} = p_{12} = θ; p_{21} = p_{22}= 1 – θ. Here P_{12} is the probability of A occurring on a trial in the first series, p_{12} is the probability of A occurring on a trial in the second series; p_{21}, p_{22} are defined similarly with respect to Ā. The maximum likelihood estimator of θ is v_{.1}/n and the degrees of freedom are seen to be one from insertion in the general formula.
Proofs of the basic chisquare theorem and statements of the mathematical regularity conditions may be found in Cramér (1946) or Neyman (1949).
Power of the chisquare test
The chisquare test is extensively used as an omnibus test without particular alternatives in view. Frequently such applications are almost useless in the sense that their sensitivity (that is, power) is very low. It is therefore important not only to make such tests but also to specify the alternatives of interest and to determine the power, that is, the probability that the null hypothesis is rejected when in fact such alternatives are true. A fairly complete theory of the power of chisquare tests has been given recently by Mitra (1958) and Diamond (1963).
Because chisquare tests are based upon a limiting distribution theorem it is necessary to express the alternative in a special form, depending on the sample size, n, in order to obtain meaningful results. Consider first the case where I = 1 and the null hypothesis completely specifies the as numerical constants. (In the questionnaire experiment above, since there are five responses the null hypothesis that the responses are equally likely specifies ) Write an alternative in the form
If in fact then the test statistic X^{2} has a limiting noncentral chisquare distribution with noncentrality parameter
and with J — 1 degrees of freedom [seeDistributions, statistical, article onspecial continuous distributions].
The λ required to obtain a specified probability of rejection of an alternative for tests at significance levels 0.01 and 0.05 has been tabulated; such a table is given, for example, by Owen (1962, pp. 61–62). These tables are useful not only in calculating the power function but also in specifying sample size in advance. For the example where there are five responses and the null hypothesis is , consider the alternative , . Then , j = 1, ···, 4; , so that λ = n(.25). To achieve a probability of 0.80 of rejecting the null hypothesis for this alternative, it is found from the tables that λ must be 11.94 (four degrees of freedom and 0.05 significance level). This requires a sample size of 11.94/.25 or, to the nearest whole number, 48.
For the comparison of two samples, a similar power theory is available. Consider two sequences of n_{i} trials each of which results in an outcome E_{1}, E_{2}, ···, E_{J},. Here p_{ij} (i = 1, 2, j = 1, ··· J) is the probability of outcome j on sequence i and the null hypothesis of homogeneity is p_{11} = p_{21}, p_{12} = p_{22}, ···, p_{1}J = p_{2}J. Now consider a sequence of alternatives which for some p_{i} (j = 1, 2, ···, J) satisfy the equations , where n = n_{1} + n_{2}, and ∑_{j}C_{1j} = ∑_{j}C_{2j} = 0. Then, for the sequence of alternatives, X^{2} has, in the limit as n → ∞, noncentral chisquare distribution with J — 1 degrees of freedom and noncentrality parameter λ, where
In actual practice, when the statistician considers a specified alternative for finite n, p_{j} is not uniquely defined; it is convenient to define but whether other choices of p_{j}; might improve the goodness of the asymptotic approximation to the actual power appears not to have been investigated. For the case J = 2, and with n_{1} = n_{2}, a nomogram is available showing the sample size required to obtain a specified level of power for onesided hypotheses, that is, for comparison of an experimental and a standard group (Columbia University 1947, chapter 7). In the general case the formulation of λ is more difficult.
Contingency tables
In the example of comparing two percentages, the observations were conveniently set out in a 2 × 2 array. Similarly, in the more general comparative experiment (the power of which was just discussed), it would be convenient to set out the observations in a 2 × J array. These are special cases of contingency tables, which, in general, have r rows and c columns; counted data that may be so represented arise, for example, in many experiments and surveys.
Such arrays or contingency tables may arise in at least three different situations, which may be illustrated by specific examples:
(1) Double polytomy: A sample of n voters is taken from an electoral list and each voter is classified into one of r party affiliations P_{1}, ···, P_{r} and into one of c educational levels E_{1}, ···, E_{c}. Denote by p_{ij} the probability that a voter belongs to party i and educational level j, so that ∑_{j}∑_{j}p_{ij} = 1. The usual null hypothesis of interest is that the classifications are independent, that is, p_{ij} = _{i} · p ·_{j},
where p_{i}, = probability a voter is in party i (regardless of educational level) and p_{·j} = probability a voter is in educational level j, again regardless of the other classification variable (that is, P_{i}. = ∑_{j}p_{ij}) and P_{j}. = ∑_{j}p_{ij}). In this case both the vertical and horizontal marginal totals of the r × c sample array are random.
(2) Comparative trials: Consider instead of a single sample from the general electoral roll, r samples of sizes n_{i} from the r different party rolls. The voters in each sample are classified as to educational level (levels E_{1}, ···, E_{c}). Denote as before by p_{ij} the probability that a voter drawn from party i belongs to educational level j (so that ∑_{i}∑_{j}p_{ij} = 1 ) The hypothesis of homogeneity specifies that P_{1j} = P_{2j} = ··· p_{ij} for each j. In this case the row totals are fixed (n_{1}, ···, n_{r}), while the column totals are random. Into this category falls the twosample experiment discussed earlier; in that case r = 2.
(3) Independence trials (fixed marginal totals): Consider a group of n manufactured articles, of which fixed proportions are in each of the quality categories C_{1}, ···, C_{c}. The articles have been divided into r groups of fixed size n_{1}, ···, n_{r} for further processing or for shipment to customers. The question arises whether the partitioning into the r groups can reasonably be considered to have been done randomly, that is, independently of how the articles fall into the quality categories. Since the number of articles in each of the categories C_{1}, ···, C_{c} as well as the n_{1}, ···, n_{r} are fixed, both marginal totals are fixed in this situation.
For these three cases let v_{ij} denote the number of individuals falling into row i, column j, and denote by v_{i.} and v_{.j} the row and column totals, whether fixed or random. While different probability models are associated with the three cases, the approximate or large sample chisquare test is identical. The test statistic is
which has, if the null hypothesis is true, an approximate chisquare distribution with (r — 1) • (c — 1) degrees of freedom.
For the comparative trials case, this is an extension of the comparison of two percentages. The maximum likelihood estimator of the common value of p_{ij}, p_{2j}, ···, p_{rj} is v_{.j}/n under the null hypothesis. The comparative trials model consists of r sequences of trials, each of which may result in one of c events; c — 1 parameters are estimated. Because as soon as c – 1 of the probabilities are estimated the final one is determined. Hence the degrees of freedom are r(c—1) — (c—1 ) = (r–1)(c–1).
In the double polytomy case there are (r — 1) + (c — 1) independent parameters to be estimated under the null hypothesis : p_{1}., p_{2}., ···, p_{r–1}., p_{.1}, p_{.2}, ···, p·_{c–1}, since again the restrictions ∑_{j}p_{.j} = ∑_{j}p_{·j} = 1 provide the last two needed values. The maximum likelihood estimators of the p_{i} are the v_{i}./n and of the p_{.j}, are the v_{.j}/n, so that the estimated expected values are n(v_{i}./n) (v_{j}./n) or v_{i}.v_{.j}/n The degrees of freedom in this case are (rc– l)—(r — 1)—(c — 1)or(r — 1)(c — 1) since there is only one sequence of trials with rc outcomes.
Like all chisquare tests, these are based upon asymptotic distribution theory and are satisfactory in practice for “large” sample sizes. A number of rules of thumb have been established in regard to the acceptable lower limit of sample size so that the actual type i error, or probability of rejecting the null hypothesis when true, does not depart too far from the prescribed significance level. For a careful discussion of this problem, and of procedures to adopt when the samples are too small, see Cochran (1952; 1954).
2 × 2 tables. The special case of contingency tables with r = c = 2 has been extensively studied, and the socalled Fisher exact test is available. Given v_{1}., v_{2}., v_{.1}, v_{.2}, under any of the null hypotheses, v_{11} has a specific hypergeometric distribution; hence probabilities of deviations as numerically large as, or larger than, the observed deviation can be calculated and a test can be made. The application of the test is now greatly facilitated by use of tables by Finney et al. (1963). For the comparative trials model and the double dichotomy model this exact test is a conditional test, given the marginal counts.
While the hypotheses associated with the three different models in r × c tables, in general, and 2 × 2 tables, in particular, can be tested by the same chisquare procedure, the power of the test varies according to the model. For the 2 × 2 case, approximations and tables have been given for each of the three models. The most recent of these are by Bennett and Hsu (1960) for comparative and independence trials and Harkness and Katz (1964) for the double dichotomy model. Earlier approximations are discussed and compared by these authors.
Single degrees of freedom
The statistic X^{2} used to test the several null hypotheses possible for r x c contingency tables can be partitioned into (r — 1)(c — 1) uncorrelated X^{2} terms, each of which has a limiting chisquare distribution with one degree of freedom when the null hypothesis is true.
Planned comparisons. Planned subcomparisons, however, can be treated most easily by forming new contingency tables and calculating the approximate X^{2} statistic. For example, in the comparison of three experimental learning methods with a standard method the observations might be recorded for each pupil as successful or unsuccessful and tabulated in a 4 × 2 table. These are four comparative trials; and X^{2}, the statistic to test homogeneity, has, under the null hypothesis, an approximate chisquare distribution with three degrees of freedom.
In this situation, two subcomparisons might be indicated: the standard method versus the combined experimental groups and in the experimental groups among themselves. Tables 3a and 3b show the two new contingency tables. The X_{2} statistics calculated from these two subtables may be used to make the indicated secondary tests. The two X^{2} values (with one and two degrees of freedom respectively) will not sum to the X^{2} calculated for the whole 4 × 2 array. Shortcut formulas for a partition that is additive and references to other papers on this subject are given by Kimball ( 1954).
Table 3a — Comparison between standard method and combined experimental methods  

Successful  Unsuccessful  
Standard (method 1)  V_{11}  V_{12} 
Experimental methods combined  v_{21} + v_{22} + v_{41}  v_{22} + v_{22} + v_{42} 
Table 3b — Comparison among experimental methods  

Successful  Unsuccessful  
Experimental (method 2)  v_{21}  v22 
Experimental (method 3)  v_{31}  v32 
Experimental (method 4)  v_{41}  v_{42} 
Unplanned comparisons. As in the analysis of variance of linear models, distinction should be made between such planned comparisons and unplanned comparisons. Goodman (1964a) has given a procedure to find confidence intervals for a family of “contrasts” among multinomial probabilities for the r × c contingency table in the comparativetrials model. A “contrast” is any linear function of the probabilities p_{ij}, with coefficients summing to zero, that is,
[SeeLinear hypotheses, article onmultiple comparisons.]
Thus in the comparison of teaching methods experiment referred to above, where p_{i}, is the probability of a pupil being successful when taught by method i, the unplanned comparisons or contrasts might be p_{21}, — p_{31}, p_{21} — p_{41}, p_{31} — p_{41}. These represent pairwise comparisons of the three experimental methods.
Denote a contrast by θ; an estimator of p_{ij} is p_{ij} = v_{ij}/v_{i}. and an estimator of θ is
An estimator of the variance of θ is
The largesample joint confidence intervals for θ with confidence coefficient 1 — α have the form θ — S(θ)L, θ + S(θ)L, where L is the square root of the upper 100(1 — α)th percentage point of the chisquare distribution with (r — 1)(c — 1) degrees of freedom. An experiment in which one or more of the totality of all such possible intervals fail to include the true θ may be called a violation. The probability of such a violation is α.
If instead of all contrasts, only a few, say G, are of interest, then L in the last formula may be replaced by Z_{1–α/2G} (the 100 [l–α/2G] percentile of the standard normal distribution), which often will be smaller than L and hence yield shorter confidence intervals while the probability of a violation is still less than or at most equal to α.
Comparative trials; ordered alternatives
In the comparative trials model, with r × 2 contingency tables, frequently the only alternative of interest is an ordered set of p_{i1}’s. For example in a 2 × 2 comparative trial involving a control and a test group, the question may be to decide whether the groups are the same or whether the test group yields “better” results than the control group. In the 2 × 2 case, this situation is handled simply by working with the signed square root of X^{2}, which has a standard normal distribution if the null hypothesis is true. A onesided alternative is then treated in the same manner as a test for a percentage referred to earlier.
For the more general r × 2 table, the most complete treatment is that of Bartholomew (1959); his test, however, requires special tables. If the experimenter believes that the p_{i1} have a functional relationship to a known associated variable, x_{i}, then a specific test can be derived from the basic theorem. Such a test would be a particular example of a planned comparison. Many authors have given shortcut formulas and worked out examples of this type of problem (cf. Cochran 1954; Armitage 1955).
Comparative trials; dependent samples
A sample is taken of n voters who have voted in the last two national elections for one of the major parties. Denote the parties by L and C, and suppose that in the sample 45 per cent voted L in the first election and 55 per cent voted L in the second. Does this indicate a significant change in voter behavior in the subpopulation of which this is a sample? To make such a comparison in matched or dependent samples, it is necessary to obtain information on the actual changes in party preference [seePanel studies]. These can be read from a 2 × 2 table such as Table 4.
Table 4 — Voter preference in two elections  

ELECTION 1  
L  C  
ELECTION 2  L  v_{11}  v_{12} 
C  v_{21}  v_{22} 
Such a 2 × 2 table with random marginals appears to fall into the doubledichotomy model, but the hypothesis of independence is not of interest here. The changes are indicated by the offdiagonal elements v_{12},, v_{21}, and the hypothesis of no net change is equivalent to the hypothesis that, given v_{12} + v_{21}, v_{12} is binomially distributed with probability . Thus, the test of comparison of two percentages in identical or matched samples reduces to the test for a percentage. If the normal approximation is adequate, the square of the normal deviate, with the continuity correction, is
which has a limiting chisquare distribution with one degree of freedom under the null hypothesis that the probabilities are the same in the two matched groups. Cochran (1950) has extended this test to the r × c case. The test described above does not at all depend on v_{11} and v_{22}, but of course these quantities would enter into procedures pointed toward issues other than testing the null hypothesis of no net change.
Chisquare tests of goodness of fit
Chisquare tests have been used extensively to test whether sample observations might have arisen from a population with a specified form, such as binomial, Poisson, or normal. Such chisquare tests are again special cases of the general theory outlined above, although there are many other types of tests for goodness of fit [seeGoodness of fit].
There are some special problems in connection with some nonstandard chisquare tests of goodness of fit for the binomial and Poisson distributions. The standard chisquare test of goodness of fit for these two discrete distributions requires (in most cases) estimation of the mean. The sample mean is an efficient estimator of the population mean; it appears to make little difference whether the sample mean is computed from the raw or grouped data.
There is evidence that two simpler tests are more powerful, at least for some alternatives, for testing whether a set of counts does come from one of the distributions. These test statistics are the socalled indices of dispersion, studied by Lexis and Bortkiewicz, which in fact compare two estimators of the variance — the usual sample estimator and the estimator derivable from the fact that for these distributions the variance is a function of the mean. Alternatively they may be viewed as chisquare tests, conditional on the total count and placed in the framework of the basic theorem. Thus if the observations are v_{1}, ···, v_{n}, which according to the null hypothesis come from a Poisson distribution, the appropriate index of dispersion test statistic is
where ύ is the sample mean of the v_{i} For large n and if the null hypothesis is true, X^{2} is approximately distributed as chisquare with n — 1 degrees of freedom [seeBortkiewicz; Lexis].
The corresponding test for the binomial can be expressed similarly, but it is also useful to set out the n observations in a 2 × c contingency table such as Table 5.
Table 5 — Arrangement of data to test for binomial  

Sample 1  Sample 2  ···  Sample c  
Successes  x_{1}  x^{2}  ···  x_{c} 
Failures  n_{1} — x_{1}  n_{2} — x_{2}  ···  n_{c} — x_{c} 
Total  n_{1}  n_{2}  ···  n_{c} 
The variance test is equivalent to the chisquare test of homogeneity in this 2 × c array, which has, of course, c — 1 degrees of freedom.
Whereas in general the chisquare test is a onetailed test, that is, the null hypothesis is rejected for large values of the statistics, the dispersion tests are often twotailed tests, not necessarily with equal probability in the tails. The reason for this is that a too small value of X^{2} reflects a pattern that is more regular than that expected by chance, and such patterns may correspond to important alternatives to the null hypothesis of homogeneous randomness.
Contingency table association measures
If in the doubledichotomy model the hypothesis of independence is rejected, it is logical to seek a measure of association between the classifications. Distinction must be made between purely descriptive measures and sampling estimators of such measures. [SeeStatistics, descriptive, article onassociation.]
A large number of such measures have been presented, usually related to the X^{2} statistic used to test the null hypothesis of independence. Goodman and Kruskal (1954–1963) have emphasized the need to choose measures of association that have contextual meaning in the light of some probability model with predictive or explanatory value. They distinguish between two cases—no ordering among the categories and directed ordering among them.
Multidimensional contingency tables
The analysis of data that have been categorized into three or more classifications involves not only a considerable increase in the variety of possibilities but also introduces some new conceptual problems. The basic test of mutual independence is, however, a straightforward extension of the twodimensional one and a simple application of the main theorem. The test will be discussed for three classifications.
This is a test of the hypothesis that p_{ijk}, the probability of an observation falling in row i, column j, and layer k, can be factored into p_{i}∴p_{.j}. p∴_{k}. Under this null hypothesis the estimated expected value in cell ijk is n^{–2}(v_{i}∴)(v_{.j}.)(v∴^{k}), where the dots indicate summation over the corresponding subscripts of the observed counts, v_{ijk}. The X^{2} statistic has the usual form, sum of (observed — expected)^{2}/expected, and has.rcl — r — c — l + 2 degrees of freedom if there are r rows, c columns, and l layers.
Tests for partial independence, for example that p_{ijk} = (p_{i} ∴)(p_{.ji}), or for homogeneity (between layers, for example) may be derived similarly. New concepts and new tests are introduced by the idea of interaction between the different classifications.
In linear models, interactions are measures of nonadditivity of the effects due to different classifications. With contingency models several definitions of interaction have been given; the present treatment follows Goodman (1964b). Consider, for example, samples drawn from rural and urban populations and classified by sex and age, with age treated dichotomously (see Table 6).
Table 6 — Classification of rural and urban samples by sex and age  

URBAN  RURAL  
Young  Old  Totals  Young  Old  Totals  
Male  20  14  34  34  11  45 
Female  42  24  66  36  19  55 
Totals  62  38  100  70  30  100 
For the urban group there is a sex ratio 20/42 among the “young” and 14/24 among the “old.” The ratio of these may be regarded as a measure of the interaction of age and sex in the urban population. Similarly the same ratio of sex ratios is a measure of the interaction in the rural population. These are, of course, sample values; and the population interactions must be defined in terms of the probabilities p_{ijk}. It is useful to define
and to write the threefactor no interaction hypothesis as Δ_{1} = Δ_{2}, for this 2 × 2 × 2 contingency table. The maximum likelihood estimator of Δ_{k} is d_{k} = v_{22k}/v_{12k}v_{21k}, and its variance can be estimated consistently by where A simple statistic to test the hypothesis Δ_{1} = Δ_{2} is , which, if the null hypothesis is true, has a large sample chisquare distribution with one degree of freedom.
For the data given d_{1} = 0.816, d_{2} = 1.631, , , and X^{2} = 1.01 so that the threefactor no interaction hypothesis is not rejected at the usual significance levels. Goodman has extended this test in an obvious way to the 2 × 2 × l contingency table. Here the test statistic is
which has l — 1 degrees of freedom. The extension to r × c × l tables is based on logarithms of frequencies rather than the actual frequencies. Goodman also provides confidence intervals for the interactions Δ_{k} and indicates a number of equivalent tests. A bibliography of the very extensive literature on this topic is given in this paper.
Alternatives to chisquare
While the chisquare tests are classical for the analysis of counted data, with the original simple tests going back to Karl Pearson, they are not the likelihood ratio tests. The latter are based upon statistics of the form
in the general case with one sequence of trials. It is easy to show, by expanding minus twice the logarithm of the likelihood ratio statistic in a power series, that its leading term is X^{2}, and that further terms are of a smaller order than the leading term so that the tests are equivalent in the limit. However, they are not equivalent for small samples.
Another test for contingency tables (comparative trials model) is that of C. A. B. Smith (1951). Further work appears to be necessary before any of these alternatives is accepted as preferable to the chisquare tests. The most widely used alternative analysis is that indicated in the next section.
ANOVA of transformed counted data
Contingency tables have an obvious analogy to similar arrays of measured data that are often treated by analysis of variance techniques (ANOVA). The analysis of variance models are more satisfactory if the data are such that (1) effects are additive, (2) error variability is constant, (3) the error distribution is symmetrical and nearly normal, and (4) the errors are statistically independent. [SeeLinear hypotheses, article onanalysis of variance.]
Counted data that arise from a binomial or multinomial model fail most obviously on the second property since the variances of such data vary with the mean. However, some function or transformation of the observations may have approximately constant variance. Transformations that have been derived for counted data to make variances nearly constant have been found empirically often to improve the degree of approximation to which properties (1) and (3) hold also. These transformations include: (a) Arc sine transformation for proportions
which is applicable to dichotomous data with an equal number of trials in each sequence. If the number of trials (n_{i}) varies from sequence to sequence the problem is more complicated (see Cochran 1943). (b) Square root transformation: for Poisson data, (c) Logarithmic transformation: y = log (v + 1) for data such that the standard deviation is proportional to the mean. Use of the arc sine transformation, and subsequent analysis, is facilitated by the use of binomial probability paper; graphic techniques are simple and usually adequate. The basic reference for such procedures is Mosteller and Tukey (1949).
Refinements of these transformations and discussion of the choice of transformations is given a thorough treatment by Tukey (1957). If a suitable transformation has been made, the whole battery of tests that have been developed in analysis of variance (including covariance techniques) is applicable. Estimation problems may be more subtle; in some situations estimates may be given in the transformed variable but in others it may be desirable to transform back to the original variable. [SeeStatistical analysis, special problems of, article ontransformations of data.]
Precautions in the analysis of counted data
The transformations discussed above were derived to apply to data that conform to such models as the binomial or Poisson and that could be analyzed by chisquare methods. However, counted data often arise from models that do not conform to the basic assumption; in particular independence may be lacking, so that the chisquare tests are not valid. Such data are often transformed and treated by analysis of variance procedures; the justification for this is largely empirical. Examples of situations where this is necessary are experimental responses of animals in a group where dependence may be present, eye estimates of the numbers in a group of people, and comparisons of proportions in heterogeneous and unequalsized groups. In such situations care is necessary that the proper transformation is selected to achieve the properties listed above and in the interpretation of the results of the analysis.
The lack of independence and the presence of extraneous sources of variation are frequent sources of error in the analysis of counted data because the chisquare tests are invalidated by such factors. A discussion of these errors and others is found in Lewis and Burke (1949). The two careful expository papers by Cochran (1952; 1954) represent an excellent source of further reading on this topic. See also the monograph by Maxwell (1961).
Douglas G. Chapman
[See alsoQuantal response.]
BIBLIOGRAPHY
Armitage, P. 1955 Tests for Linear Trends in Proportions and Frequencies. Biometrics 11:375–386.
Bartholomew, D. J. 1959 A Test of Homogeneity for Ordered Alternatives. Parts 1–2. Biometrika 46:36–48, 328–335.
Bennett, B. M.; and Hsu, P. 1960 On the Power Function of the Exact Test for the 2 × 2 Contingency Table. Biometrika 47:393–398.
Clopper, C. J.; and Pearson, E. S. 1934 The Use of Confidence or Fiducial Limits Illustrated in the Case of the Binomial. Biometrika 26:404–413.
Cochran, William G. 1943 Analysis of Variance for Percentages Based on Unequal Numbers. Journal of the American Statistical Association 38:287–301.
Cochran, William G. 1950 The Comparison of Percentages in Matched Samples. Biometrika 37:256–266.
Cochran, William G. 1952 The X^{2} Test of Goodness of Fit. Annals of Mathematical Statistics 23:315–345.
Cochran, William G. 1954 Some Methods for Strengthening the Common X^{2} Tests. Biometrics 10:417–451.
Columbia University, Statistical Research Group 1947 Techniques of Statistical Analysis for Scientific and Industrial Research and Production and Management Engineering. Edited by Churchill Eisenhart, Millard W. Hastay, and W. Allen Wallis. New York: McGrawHill.
CramÉr, H. 1946 Mathematical Methods of Statistics. Princeton Univ. Press. → See especially Chapter 30.
Diamond, Earl L. 1963 The Limiting Power of Categorical Data Chisquare Tests Analogous to Normal Analysis of Variance. Annals of Mathematical Statistics 34:1432–1441.
Ferguson, Thomas S. 1958 A Method of Generating Best Asymptotically Normal Estimates With Application to the Estimation of Bacterial Densities. Annals of Mathematical Statistics 29:1046–1062.
Finney, David J. et al. 1963 Tables for Testing Significance in a 2 × 2 Contingency Table. Cambridge Univ. Press.
Goodman, Leo A. 1964a Simultaneous Confidence Intervals for Contrasts Among Multinomial Population. Annals of Mathematical Statistics 35:716–725.
Goodman, Leo A. 1964b Simple Methods for Analyzing Threefactor Interaction in Contingency Tables. Journal of the American Statistical Association 59:319–352.
Goodman, Leo A.; and Kruskal, William H. 1954–1963 Measures of Association for Crossclassifications. Parts 1–3. Journal of the American Statistical Association 49:732–764; 54:123–163; 58:310–364.
Harkness, W. L.; and Katz, Leo 1964 Comparison of the Power Functions for the Test of Independence in 2 × 2 Contingency Tables. Annals of Mathematical Statistics 35:1115–1127.
Harvard University, Computation Laboratory 1955 Tables of the Cumulative Binomial Probability Distribution. Cambridge, Mass.: Harvard Univ. Press.
Kimball, A. W. 1954 Short Cut Formulas for the Exact Partition of X^{2} in Contingency Tables. Biometrics 10: 452–458.
Lewis, D.; and Burke, C. J. 1949 The Use and Misuse of the Chisquare Test. Psychological Bulletin 46:433–489. → Discussion of the article may be found in subsequent issues of this bulletin: 47:331–337, 338–340, 341–346, 347–355; 48:81–82.
Maxwell, Albert E. 1961 Analyzing Qualitative Data. New York: Wiley.
Mitra, Sujit Kumar 1958 On the Limiting Power Function of the Frequency Chisquare Test. Annals of Mathematical Statistics 29:1221–1233.
Mosteller, Frederick; and Tukey, John W. 1949 The Uses and Usefulness of Binomial Probability Paper. Journal of the American Statistical Association 44: 174–212.
Neyman, Jerzy 1949 Contribution to the Theory of the x Test. Pages 239–273 in Berkeley Symposium on Mathematical Statistics and Probability, Proceedings. Edited by Jerzy Neyman. Berkeley: Univ. of California Press.
Owen, Donald B. 1962 Handbook of Statistical Tables. Reading, Mass.: AddisonWesley. A list of addenda and errata is available from the author.
Smith, C. A. B 1951 A Test for Heterogeneity of Proportions. Annals of Eugenics 16:15–25.
Tukey, John W. 1957 On the Comparative Anatomy of Transformations. Annals of Mathematical Statistics 28:602–632.
U.S. National Bureau of Standards 1950 Tables of the Binomial Probability Distribution. Applied Mathematics Series, No. 6. Washington: Government Printing Office.
Cite this article
Pick a style below, and copy the text for your bibliography.

MLA

Chicago

APA
"Counted Data." International Encyclopedia of the Social Sciences. 1968. Encyclopedia.com. 28 May. 2016 <http://www.encyclopedia.com>.
"Counted Data." International Encyclopedia of the Social Sciences. 1968. Encyclopedia.com. (May 28, 2016). http://www.encyclopedia.com/doc/1G23045000258.html
"Counted Data." International Encyclopedia of the Social Sciences. 1968. Retrieved May 28, 2016 from Encyclopedia.com: http://www.encyclopedia.com/doc/1G23045000258.html