## Central limit theorem

**-**

## Central Limit Theorem

# Central Limit Theorem

The central limit theorem (CLT) is a fundamental result from statistics. It states that the sum of a large number of independent identically distributed (iid) random variables will tend to be distributed according to the normal distribution. A first version of the CLT was proved by the English mathematician Abraham de Moivre (1667– 1754). He showed how the normal distribution can be used to approximate the distribution of the number of heads that will result when a coin is tossed a large number of times.

The CLT is the cornerstone of most estimation and inference of statistical models, which in turn are widely used in empirical work in the social sciences. Statistical models involve unknown population parameters that are estimated from a sample. The estimators often take the form of sample averages. According to the CLT, the estimators will therefore be approximately normally distributed for a sufficiently large sample size. This result can be used to draw inference about the population parameters. One example of a statistical model used in social sciences is the linear regression model. Here, the CLT can be used to quantify whether a chosen set of variables explains the variation in a certain response variable.

## THE THEOREM

Let {*x _{1}, …, x_{n}* } be a sample of

*n*iid random variables with mean

*μ*and variance σ

^{2}Consider the sum

*S*= x

_{n}_{1}+ x

_{2}+ … + x

_{n }. One may easily check that the mean and standard deviation of

*S*is

_{n }*nμ*and . Normalize

*S*as follows,

_{n}such that *Z _{n}* has mean zero and standard deviation 1. The CLT then states that

*Z*≈

_{n}*N*(0,1) for

*n*large enough. Formally the above equation should be read as follows: For any –∞ <

*z*< +∞,

*P*(

*Z*≤

_{n}*z*) →

*Ф*(

*z*) as

*n*→ ∞, where

*Φ*(·) is the cumulative density function of the normal distribution.

A major drawback of the CLT is that it is silent about how large *n* should be before the quality of the approximation is good. This wll depend on the distribution of the x_{i}’s making up the sum.

## APPLICATIONS

The CLT has a broad range of applications. Consider, for example, a binomial random variable *S _{n}* with parameters (

*n,p*). This variable describes the number of heads in

*n*tosses of a coin with probability

*0 < p<*1 of heads. Its distribution is given by

For *n* large, this distribution can be difficult to compute. Another way of representing *S _{n}* is as a sum of

*n*iid Bernoulli random variables {x

_{1}, …, x

_{n }}. That is,

*S*= x

_{n}_{1}+ x

_{2}+ … + x

_{n}where the distribution of x

_{i}is

*P*(x

_{i }= 1) = 1 –

*P*(x

_{i }= 0) =

*p*,

*i*= 1, …,

*n*. So we can apply the CLT on

*S*, which tells us that

_{n}*S*≈

_{n}*N*(

*np, np*(1 –

*p*)) for

*n*large enough since μ = E[

*x*] =

_{i}*p*and σ

^{2}=Var(x

_{i})=P(1-p). This result was first proved by de Moivre in 1733.

The most important use of the CLT is probably in drawng inference about population parameters in statistical models. Most estimators of parameters can be written as sums of the sample, and so the CLT can be used to obtain a measure of the precision of the estimator. In particular, it can be used to test hypotheses regarding the parameters. As a simple example, consider an iid sample {x_{1}, …, x_{n}}with unknown population mean μ and variance σ^{2}. A simple estimator of the parameter μ is the sample average,

We can now use the CLT to conclude that

Since the variance is unknown, it needs to be estimated. This can be done using the sample variance,

One can now use the normal approximation for inferential purposes. For example, we can estimate the standard error of *x* ̄ as Also, we know that with approximately 95 percent probability, where 1.96 is the 97.5th percentile of the normal distribution; one normally refers to this as the *confidence interval*. The CLT can furthermore be used to test specific hypotheses regarding μ

**SEE ALSO** *Descriptive Statistics; Distribution, Normal; Law of Large Numbers; Variables, Random*

## BIBLIOGRAPHY

David, F. N. 1962. *Games, Gods, and Gambling: The Origins and History of Probability and Statistical Ideas from the Earliest Times to the Newtonian Era*. London: Griffin.

Grinstead, Charles M., and J. Laurie Snell. 1997. *Introduction to Probability*. 2nd rev. ed. Providence, RI: American Mathematical Society.

*Dennis Kristensen*

## central limit theorem

**central limit theorem** In statistics, the theorem stating of a series of data sets drawn from any probability distribution, that the distribution of the means of those data sets will follow a normal distribution.

## central limit theorem

**central limit theorem** The theorem stating that the arithmetic mean values for a series of similar-sized, fairly large samples (*n* > 30) taken from a large population will be approximately normally distributed about the true population mean (*n* > 30), irrespective of the actual distribution pattern of the individual counts.

## central limit theorem

**central limit theorem** The theorem stating that the arithmetic mean values for a series of similar-sized, fairly large samples (*n* > 30) taken from a large population will be approximately normally distributed about the true population mean (μ), irrespective of the actual distribution pattern of the individual counts.

## central limit theorem

**central limit theorem** A theorem stating that the arithmetic-mean values for a series of similar-sized, fairly large samples (*n* > 30) taken from a large population will be distributed approximately normally about the true population mean (μ), irrespective of the actual distribution pattern of the individual counts.