Analysis of Variance

views updated May 29 2018

ANALYSIS OF VARIANCE

Analysis of variance (ANOVA) is a statistical technique that can be used to evaluate whether there are differences between the average value, or mean, across several population groups. With this model, the response variable is continuous in nature, whereas the predictor variables are categorical. For example, in a clinical trial of hypertensive patients, ANOVA methods could be used to compare the effectiveness of three different drugs in lowering blood pressure. Alternatively, ANOVA could be used to determine whether infant birth weight is significantly different among mothers who smoked during pregnancy relative to those who did not. In the simplest case, where two population means are being compared, ANOVA is equivalent to the independent two-sample t-test.

One-way ANOVA evaluates the effect of a single factor on a single response variable. For example, a clinician may be interested in determining whether there are differences in the age distribution of patients enrolled in two different study groups. Using ANOVA to make this comparison requires that several assumptions be satisfied. Specifically, the patients must be selected randomly from each of the population groups, a value for the response variable is recorded for each sampled patient, the distribution of the response variable is normally distributed in each population, and the variance of the response variable is the same in each population. In the above example, age would represent the response variable, while the treatment group represents the independent variable, or factor, of interest.

As indicated through its designation, ANOVA compares means by using estimates of variance. Specifically, the sampled observations can be described in terms of the variation of the individual values around their group means, and of the variation of the group means around the overall mean. These measures are frequently referred to as sources of "within-groups" and "between-groups" variability, respectively. If the variability within the k different populations is small relative to the variability between the group means, this suggests that the population means are different. This is formally tested using a test of significance based on the F distribution, which tests the null hypothesis (H0) that the means of the k groups are equal:

H₀ = μ₁ = μ₂ = μ₃ = …. μ_k

An F-test is constructed by taking the ratio of the "between-groups" variation to the "within-groups" variation. If n represents the total number of sampled observations, this ratio has an F distribution with k-1 and n-k degrees in the numerator and denominator, respectively. Under the null hypothesis, the "within-groups" and "between-groups" variance both estimate the same underlying population variance and the F ratio is close to one. If the between-groups variance is much larger than the within-groups, the F ratio becomes large and the associated p-value becomes small. This leads to rejection of the null hypothesis, thereby concluding that the means of the groups are not all equal. When interpreting the results from the ANOVA procedures it is helpful to comment on the strength of the observed association, as significant differences may result simply from having a very large number of samples.

Multi-way analysis of variance (MANOVA) is an extension of the one-way model that allows for the inclusion of additional independent nominal variables. In some analyses, researchers may wish to adjust for group differences for a variable that is continuous in nature. For example, in the example cited above, when evaluating the effectiveness of hypertensive agents administered to three groups, we may wish to control for group differences in the age of the patients. The addition of a continuous variable to an existing ANOVA model is referred to as analysis of covariance (ANCOVA).

In public health, agriculture, engineering, and other disciplines, there are numerous study designs whereby ANOVA procedures can be used to describe collected data. Subtle differences in these study designs require different analytic strategies. For example, selecting an appropriate ANOVA model is dependent on whether repeated measurements were taken on the same patient, whether the same number of samples were taken in each population, and whether the independent variables are considered as fixed or random variables. A description of these caveats is beyond the scope of this encyclopedia, and the reader is referred to the bibliography for more comprehensive coverage of this material. However, several of the more commonly used ANOVA models include the randomized block, the split-plot, and factorial designs.

Paul J. Villeneuve

(see also: Epidemiology; Statistics for Public Health )

Bibliography

Cochran, W. G., and Cox, G. M. (1957). Experimental Design, 2nd edition. New York: Wiley.

Cox, D. R. (1966). Planning of Experiments. New York: Wiley.

Kleinbaum, D. G.; Kupper, L. L.; and Muller, K. E. (1987). Applied Regression Analysis and Other Multivariate Methods, 2nd edition. Boston: PWS-Kent Publishing Company.

Snedecor, G. W., and Cochran, W. G. (1989). Statistical Methods, 8th edition. Ames, IA: Iowa State University Press.

Encyclopedia of Public Health Villeneuve, Paul J.

analysis of variance

views updated May 14 2018

analysis of variance (ANOVA) A technique, originally developed by R. A. Fisher, whereby the total variation in a vector of numbers y₁ … y_n, expressed as the sum of squares about the mean

is split up into component sums of squares ascribable to the effects of various classifying factors that may index the data. Thus if the data consist of a two-way m×n array, classified by factors A and B and indexed by i = 1,…,m j = 1,…,n

then the analysis of variance gives the identity

where dots denote averaging over the suffixes involved.

Geometrically the analysis of variance becomes the successive projections of the vector y, considered as a point in n-dimensional space, onto orthogonal hyperplanes within that space. The dimensions of the hyperplanes give the degrees of freedom for each term; in the above example these are mn–1 ≡ (m–1) + (n–1) + (m–1)(n–1)

A statistical model applied to the data allows mean squares, equal to (sum of squares)/(degrees of freedom), to be compared with an error mean square that measures the background “noise”. Large mean squares indicate large effects of the factors concerned. The above processes can be much elaborated (see experimental design, regression analysis).