Variability

views updated

Variability

Variation is one of the most important concepts in quantitative social science research. Associated terms include statistical variation, statistical variability, and dispersion. A dataset containing identical measurements has no variation, whereas a set containing widely dispersed measurements has high variation. There are several different summary statistics that describe, using a single value, the magnitude of the variation (variability) in a set of measurements. In other words, the summary statistics provide a synopsis of the degree to which a set of measurements lacks uniformity.

Among the most popular measures of variation are the range, the mean absolute deviation, the variance, and the standard deviation. Scholars use one or more of these measures to parsimoniously summarize the uniformity of a dataset rather than presenting the set in its entirety because presenting the entire set typically would require too much space and would be inconvenient to examine, particularly when the dataset is large. The disadvantage to using one or more of these summary statistics rather than presenting the entire set is the loss of information. The information loss occurs because summary statistics can be equal across datasets that contain different sequences of numbers. Each summary method attempts to provide important information about the degree to which the set lacks uniformity while minimizing the amount of information lost by not presenting the entire set of measurements. In general, the methods used to describe variability that present more information are more difficult to interpret for those without statistical training.

The measure of variability that most people consider the easiest to interpret is the range, which textbooks often define as the highest number in the set minus the lowest number. More commonly, scholars describe the range as the spread of numbers between the lowest and highest values. The set of numbers {1, 2, 8, 12, 15}, for example, has a range of 14, although many scholars would report the range as 1 through 15. Although the range is easy to understand, it is determined only by the two extreme values in each dataset, one or both of which may simply be errors in data entry. For this reason, the range contains less information than alternative variability measures.

Other methods that summarize the degree of variation in a set of measurements provide more information than the range but are more difficult to interpret. These alternatives describe in slightly different ways how far away each number in the set is from the mean of the measurements. The mean absolute deviation (or average absolute deviation) is one such summary statistic. After the range, it is the easiest to calculate and understand. The first step in calculating the mean absolute deviation is to subtract each number in the set from the set’s mean. As the name implies, the absolute value of each deviation from the mean is subsequently summed, and the sum is then divided by the number of measurements to produce an average. In the example set {1, 2, 8, 12, 15}, the mean absolute deviation is 4.88.

The variance is a slightly more complex summary of variability. The preliminary step to calculating the variance is the same as calculating the mean absolute deviation—determine the distance of each measurement in the set from the set’s mean. Subsequently, each deviation is then squared, and the sum of the squared deviation scores is then averaged. The variance of the example number set {1, 2, 8, 12, 15} is 29.8. When compared to the mean absolute deviation, the variance places more weight on values further from the mean, while not being as decidedly affected only by the two extreme values that are used to calculate the range. Scholars typically modify the variance formula when attempting to estimate the variance of a population of values based upon the variability of a sample of values drawn from the population. Instead of dividing the sum of deviation scores by the number of values in the sample, the sum of the scores is divided by the number of values minus one. This alteration provides a better estimate of the population’s variance, because the variation observed in a sample of measurements is often less than the variation in the population from which the sample is drawn. For a large set of measurements, however, this added complexity makes little difference. Using this modified formula, the variance of the example number set is 37.3.

The most commonly used measure of variability in scholarly writing is the standard deviation, which is simply the square root of the variance. The standard deviation is typically denoted by the Greek letter sigma (σ) or the abbreviation SD. The standard deviation of the example set {1, 2, 8, 12, 15} is 6.1 when the modified formula noted above is used.

Scholars often attempt to explain the variability of one set of measurements (the dependent variable) by considering additional measurements (the independent variables) through the use of more complex statistical methods. For example, we can explain a proportion of the variation in human height by understanding that humans differ in biological sex. This can be visualized by considering that if we calculated the standard deviation of men and women separately, the variability of either group would be less than the overall variability in heights when we do not take gender into account. In other words, because we know that men tend to be taller than women, we can partially explain why the heights of individuals differ. If we wanted to understand more about the variability in human height, we would want to consider factors in addition to biological sex that are associated with height, such as age, parental height, ethnicity, and nutrition. When these explanatory (or independent) variables are found, we say that consideration of the independent variable(s) reduces the unexplained variation in the dependent variable.

Scholars rarely are able to explain all of the variation in any set of measurements, particularly in the social sciences. In fact, in the social sciences it is uncommon to explain more than half of the variation in any variable of interest.

SEE ALSO Mean, The; Measurement; Mode, The; Regression Analysis; Social Science; Standard Deviation; Variance; Variation