Central Tendencies, Measures of

views updated

Central Tendencies, Measures of

The mean, the median, and the mode are measures of central tendency, used singly or jointly, to summarize information about a variable. Percentage frequency distributions and graphs may also be used, but measures of central tendency are more concise and provide a single “typical” or “average” score for the variable under consideration. They are easy to calculate and are readily understood by the public. Each of the three measures of central tendency has assets and liabilities. The combined use of all three measures provides information about the degree of symmetry in the distribution of the variable because in a normal (i.e., Gaussian) distribution, the mean, median, and mode will all be the same.

The mean is generally assumed to be the arithmetic mean or the average; it is calculated by summing the individual scores of the variable and dividing these by the total number of scores. The formula for the population mean is μ = ΣX_i /N, where μ is the population mean, X_i is the score on the variable for the i ^th subject, and N is the total number of subjects. The formula for the sample mean is X̄ = ΣX_i /N, where X ̄, sometimes referred to as X-bar, is the sample mean.

Although the problem of examining a set of observations and estimating an overall value was entertained as early as three centuries BCE by Babylonian astronomers (Plackett 1958), the arithmetic mean as a statistical concept did not appear until many centuries later. The term is first found in the mid-1690s in the writings of Edmund Halley (1656-1742), and it has been used to summarize observations of a variable since the time of Galileo (1564-1642). Carl Friedrich Gauss (1777-1855) may have been the first to show that lacking any other information about a variable’s value for any one subject, the arithmetic mean represents the most probable value (Gauss [1809] 2004, p. 244).

The mean is an efficient description of the distribution of a variable’s scores. For example, educators and students use the mean or grade point average (GPA) to describe academic achievement. Its primary limitation is that it works best when the variable it is describing is distributed normally. If there are outliers in the distribution—for example, one score or several scores lying outside the normal range—the mean will be skewed and may thus be misleading. The mean is heavily influenced by outlying values, and is thus not as robust as the median (see below). Second, the mean requires interval/ratio data, whereas the median can be used with both interval/ratio and ordinal data, and the mode can be used with nominal data.

The median, sometimes abbreviated Md or Mdn, is the halfway point or the midpoint score in a distribution of scores. Half of the scores are greater, and half the scores are less, than the median. The median is the score that divides the distribution exactly in half. If there is an odd number of scores, the median is the middle number; if there is an even number of scores, the median is the average of the two middle scores. The median is less likely to be influenced by extreme outliers, and is usually the preferred measure of central tendency for skewed data, such as income. The median has no special notation and is obtained in the same way for both a population and a sample.

The median gives a better description than the mean of a skewed variable. For example, if one is comparing the income in an area where there is only one person who is a billionaire and the bulk of the population lives in poverty, the median will more accurately reflect the central income of the population. The median can also be used when there are undetermined or infinite scores, making it impossible to determine a mean. The median also has limitations. It works best with small samples, or with large samples that are normally distributed. It is less efficient and more subject to sampling fluctuations than the mean. An early use in English of the statistical concept of median was Francis Galton’s (1822-1911) observation that “the median … is the value which is exceeded by one-half of an infinitely large group, and which the other half falls short of” (1881, p. 245).

The mode is the score that occurs most frequently in a distribution. It has no special notation and is obtained in the same way for both a population and a sample. The mode has several advantages over the other measures of central tendency. First, it may be used with nominal data. Second, one can use the mode as a single number with discrete variables. Third, the mode is simple to calculate and present visually. Fourth, it provides a shape of the distribution as well as a measure of central tendency. Unlike the mean and the median, a distribution can have more than one mode. A single score that occurs most frequently is the modal score. If there are two scores that occur the most frequently, the distribution is referred to as bimodal. If there are multiple scores that occur at the same high frequency, then the distribution is labeled multimodal. The mode also has limitations, one of which is that it is inefficient in its use of data in that much of the data are not used. Although modal descriptions are easily understood, the mode is rarely used in research except as additional information or when included in the narrative. An early use of the statistical concept of the mode was by English mathematician Karl Pearson (1857-1936) in 1895 when he stated that “I have found it convenient to use the term mode for the abscissa corresponding to the ordinate of maximum frequency” (1895, p. 345).

Two limitations of all three measures of central tendency are that although they are commonly used as descriptive statistics, they do not provide sufficient statistical analysis to describe variation in a population or to elaborate the differences between cases or people. Therefore, they usually do not function in a stand-alone manner as a sole statistical description. Fortunately, the mean also functions as a basis for other statistical analyses that fill the gap, since what is most important in social research is not only the average but also the variation within the population. The mean is used in many statistical formulae. It is used as a basis for statistical analysis of variation, including the standard deviation (i.e., deviation from the mean), the coefficient of determination or R ², covariance, analysis of variance, and regression. The mean is also used frequently in meta-analyses.

SEE ALSO Mean, The; Mode, The; Regression Analysis

BIBLIOGRAPHY

Galton, Francis. 1881. Range in Height, Weight, and Strength. In Report of the British Association for the Advancement of Science, 245-261. London: British Association for the Advancement of Science.

Gauss, Carl Friedrich. [1809] 2004. Theory of Motion of the Heavenly Bodies Moving about the Sun in Conic Sections: A Translation of Theoria Motus. Mineola, NY: Dover.

Gravetter, Frederick J., and Larry B. Wallnau. 2003. Statistics for the Behavioral Sciences, 6th ed. Belmont, CA: Wadsworth/Thomson.

Pearson, Karl. 1895. Contributions to the Theory of Evolution: II. Skew Variation in Homogeneous Material. Philosophical Transactions of the Royal Society of London 186: 343-414.

Plackett, R. L. 1958. Studies in the History of Probability and Statistics: VII. The Principle of the Arithmetic Mean. Biometrica 45: 130-135.

Mary Ann Davis

Dudley L. Poston Jr.

International Encyclopedia of the Social Sciences