Errors, Standard

views updated

Errors, Standard

Reports of the values of sample statistics (e.g., means, or regression coefficients) are often accompanied by reports of the “estimated standard error.” A statistic is simply an index or description of some characteristic of the distribution of scores on a variable in a sample. The “standard error” associated with the statistic is an index of how much one might expect the value of the statistic to vary from one sample to another. The estimated standard error of a sample statistic is used to provide information about the reliability or likely accuracy of the sample statistic as an estimate of the population parameter.

For example, the 1,514 adults responding to the General Social Survey (National Opinion Research Center 2006) for 1991 reported that their average age was 45.63 years. Along with this statistic, a “standard error” of .458 was reported. This suggests that if we were to draw another sample (of the same size, using the same methods) and to calculate the mean age again, we might expect the result to differ by about .458 years from that in our first sample.

DEFINITION

Formally, the standard error of a statistic is the standard deviation of the sampling distribution of that statistic. Most texts in estimation theory in statistics contain detailed elaborations (e.g., Kmenta 1986, chapter 6). Some elaboration in less formal terms, however, may be helpful.

When we select a probability sample of cases from a population, collect data, and calculate the value of some statistic (e.g., the mean score on the variable age), we are using the sample statistic to estimate the mean of the whole population (the population parameter). If we were to draw additional samples and calculate the mean each time, we would expect the values of these sample means to vary because different cases will be included in each sample. The random variation of the value of a statistic from one sample to another is termed the sampling variability of the statistic.

Imagine that we collected the values of a statistic from a very large number of independent samples from the same population, and arrayed these values in a distribution. This distribution of the values of a statistic from repeated samples is termed the sampling distribution of that statistic. The statistic (e.g., average age) will sometimes be somewhat lower than the average of all samples, and sometimes higher. We can summarize how much any one sample is likely to differ from the average of all samples by calculating the standard deviation of sampling distribution. This value is the standard error; it is the average amount by which the value of the statistic in any one sample differs from the average value of the statistic across all possible samples.

USES OF THE STANDARD ERROR

The estimated standard error of a statistic has two primary uses. First, the standard error is used to construct confidence intervals; second, the standard error is used to conduct hypothesis tests for “statistical significance.”

In the example above, the mean age of persons in the sample was 45.63 years. The estimated standard error of this statistic was .458 years. Most observers would conclude that our estimate of the population age is fairly accurate—that is, if we drew another sample, it would be expected to differ very little from the current one. On average, the differences between samples will be less than one-half year of age (.458 years).

It has been proven that the sampling distribution of sample means has a normal or Gaussian shape. Because this is true, we can describe the reliability of the sample estimate with a confidence interval. In the case of our example, the “95 percent confidence interval” is equal to the value of the sample statistic (45.63) plus or minus 1.96*.458 (the standard error). That is: 95 percent of all possible samples will report a value of the mean age between 44.73 years and 46.52 years. Confidence intervals are a common way of summarizing the reliability of inferences about population parameters on the basis of sample statistics. Large standard errors indicate low reliability (wide confidence intervals); small standard errors indicate high reliability (narrow confidence intervals).

Standard errors are also used in the process of “hypothesis testing” to determine “statistical significance.” In hypothesis testing, we propose a “null” hypothesis about the true value of a population parameter (e.g., a mean). We then calculate a “test statistic” that is used to determine how likely the sample result actually observed is, if the null hypothesis is true. Test statistics usually take the general form of: ((value observed in sample – value proposed by the null hypothesis) / standard error). Our decision about whether a sample result is likely, assuming that the null hypothesis is true, is based on the size of the observed difference of the sample from the hypothesis relative to sampling variability—summarized by the standard error.

ESTIMATING STANDARD ERRORS

Most reports will show the “standard error” as “estimated” (even if they do not use this terminology, the values reported for the standard errors are probably estimates). When we conduct a study, we usually collect information from only one sample. To directly calculate the sampling distribution and standard error, we would need to collect all possible samples. Hence, we rarely know the actual value of the standard error—it is itself an estimate.

Statistical theory and research has provided a number of standard formulae for estimates of standard errors for many commonly used statistics (e.g., means, proportions, many measures of association). These formulae use the information from a single sample to estimate the standard error. For example, the common estimator of the standard error of a sample mean is the sample standard deviation divided by the square root of the sample size. When computer programs calculate and print standard errors, calculate confidence intervals, and perform hypothesis tests, they are usually relying on these standard formulae.

The standard formulae, however, assume that the observed sample is a simple random one. If this is not the case, the estimates may be wrong. If the actual sampling methodology used involves clustering, estimated standard errors by the standard formulae may be too small. Consequently, we may reach incorrect conclusions that our estimates are more reliable than they actually are, or that null hypotheses may be rejected when they should not be. If the sampling methodology actually used involves stratification, estimated standard errors by the standard formulae may be too small. Consequently, we may think our point estimates of population parameters are less reliable than they actually are; we may fail to reject null hypotheses that are, in fact, false.

Where probability sampling designs are not simple random, there are several alternative approaches to estimating standard errors. For some complex survey designs, more complex formulae are available, as, for example, in the statistical packages Stata (Stata Corporation 2006) and Sudaan (Research Triangle Institute 2006). In other cases, “bootstrap” and “jackknife” methods may be used (Efron and Tibshirani 1993).

Bootstrap methods draw large numbers of samples of size N from the current sample, but with replacement. Bootstrap samples are the same size as the original sample, but contain “duplicate” observations. For each of a very large number of samples (1,000 to 10,000), the statistic of interest is calculated. These estimates are then used to construct the sampling distribution, from which the standard error is estimated. Large sample sizes are required for bootstrap methods.

Jackknife estimators divide the current sample into many random samples that are smaller than the full sample, but are drawn without replacement. Large numbers of samples are selected, and the statistic of interest is calculated for each sample. This then allows the calculation of the sampling distribution and the estimated standard error.

SEE ALSO Logistic Regression; Methods, Quantitative

BIBLIOGRAPHY

Efron, Bradley, and Robert J. Tibshirani. 1993. An Introduction to the Bootstrap. Monographs on Statistics and Applied Probability, no. 57. London: Chapman and Hall.

Kmenta, Jan. 1986. Elements of Econometrics. 2nd ed. New York: Macmillan.

National Opinion Research Center. 2006. General Social Survey Home Page. http://www.norc.org/projects/gensoc.asp.

Research Triangle Institute. 2006. Sudaan Home Page. http://www.rti.org/sudaan.

Stata Corporation. 2006. Stata Home Page. http://www.stata.com.

Robert Hanneman

International Encyclopedia of the Social Sciences