## Serial Correlation

## Serial Correlation

# Serial Correlation

*Serial correlation* is a statistical term that refers to the linear dynamics of a random variable. Economic variables tend to evolve parsimoniously over time and that creates temporal dependence. For instance, as the economy grows, the level of gross national product (GNP) today depends on the level of GNP yesterday; or the present inflation rate is a function of the level of inflation in previous periods since it may take some time for the economy to adjust to a new monetary policy.

Consider a time series data set {*Y _{t}*,

*X*

_{1t }. …,

*X*} for

_{kt}*t*= 1, 2, …,

*T*. Our interest is to estimate a regression model like

*Y =1 β*+

_{0}+β_{1}X_{1t}+ …, β_{k}X_{kt}*ε*. For instance,

_{t}*Y*is the inflation rate, and

_{t }*X*

_{1t }, …,

*X*is a set of regressors such as unemployment and other macroeconomic variables. Under the classical set of assumptions, the

_{kt}*Gauss-Markov theorem*holds, and the

*ordinary least squares*(OLS) estimator of the

*β*

_{i }’s is the

*best, linear, and unbiased estimator*(BLUE). Serial correlation is a violation of one of the classical assumptions. Technically, we say that there is serial correlation when the error term is linearly dependent across time, that is, the cov(ε

_{t },ε

_{s }) ≠ 0 for

*t*≠

*s*. We also say that the error term is

*autocorrelated.*The covariance is positive when on average positive (negative) errors tend to be followed by positive (negative) errors; and the covariance is negative when positive (negative) errors are followed by negative (positive) errors. In either case, a covariance that is different from zero will happen when the dependent variable

*Y*is correlated over time and the regression model does not include enough lagged dependent variables to account for the serial correlation in

_{t}*Y*. The presence of serial correlation invalidates the Gauss Markov theorem. The OLS estimator can still be

_{t}*unbiased*and

*consistent*(large sample property), but it is no longer the best estimator, the

*minimum variance estimator.*More importantly, the OLS standard errors are not correct, and consequently the t-tests and F-tests are invalid.

There are several models that can take into account the serial correlation of ε_{t }: the autoregressive model AR(p), that is, *ε _{t} = ρ_{1}ε_{t – 1} + ρ_{2}ε_{t – 2} + … ρ_{p}ε_{t – p} + v_{t}*, where

*v*is now uncorrelated with zero mean and constant variance; the

_{t}*moving average*MA(q)

*ε*=

_{t}*θ*+ … θ

_{q}v_{t – q}_{1}

*v*

_{t – 1 }+

*v*; or a mixture model ARMA(p,q). Testing for serial correlation in the error term of a regression model amounts to assessing whether the parameters ρ

_{t}_{i }’s and θ

_{i }’s are statistically different from zero. A popular model within economics is the AR(1) model:

*ε*. Within this model, the null hypothesis to test is

_{t}= ρ_{1}ε_{t – 1}+ v_{t}*H*:ρ

_{0}_{1}= 0. If we reject the null hypothesis, we conclude that there is serial correlation in the error term. To implement the test, we proceed by running OLS in the regression model. We retrieve the OLS residuals ε̂

_{t }and, assuming that the regressors

*X*

_{1t },

*X*

_{2t }, …,

*X*are strictly exogenous, we regress ε̂

_{kt }_{t }on ε̂

_{t - 1 }. A t-statistic for

*H*: ρ

_{0}_{1}= 0 will be asymptotically valid. If the autoregressive model is of a large order, an F-test for a joint hypothesis as

*H*will also be valid. If the regressors are not strictly exogenous, the auxiliary regression of ε̂

_{0}: ρ_{1}= ρ_{2}= … = ρ_{p}= 0_{t-1 }on ε̂

_{t – 1}should be augmented with the set of regressors

*X*

_{1t },

*X*

_{2t }, …,

*X*for the t-test and F-test to be valid. There is also a very popular statistic, the

_{kt}*Durbin-Watson*, which also requires strict exogeneity that tests for AR(1) serial correlation. The main shortcoming of this test is the difficulty in obtaining its null distribution. Though there are tabulated critical values, the test leads to inconclusive results in many instances.

Once we conclude that there is serial correlation in the error term, we have two ways to proceed depending upon the exogeneity of the regressors. If the regressors are strictly exogenous, we proceed to model the serial correlation and to transform the data accordingly. A regression model based on the transformed data is estimated with generalized least squares (GLS). If the regressors are not exogenous, we proceed to make the OLS standard errors robust against serial correlation. In the first case, let us assume that there is serial correlation of the AR(1) type, that is, *ε _{t} = ρ_{1}ε_{t – 1} + v_{t}*. In order to eliminate the serial correlation, we proceed to transform the data by quasi-differencing. For simplicity, suppose that the regression model is

*Y*=

_{t}*β*. The following transformation will produce a regression model with an uncorrelated error term:

_{0}+ β_{1}X_{1t}+ ε_{t}*Y _{t} – ρ_{1} Y_{t – 1} = β_{0}(1 – ρ_{1}) + β_{1}(X_{1t} – ρ_{1}X_{1t – 1}) + ε_{t} – ρ_{1}ε_{t – 1} = β_{0}(1 – ρ_{1}) + β_{1}(X_{1t} – ρ_{1}X_{1t – 1}) + v_{t}*

If ρ_{1} is known, it is easy to obtain the quasi-differenced data, that is, *Y _{t}* –

*ρ*

_{1}

*Y*and

_{t-1}*X*– ρ

_{t}_{1}

*X*

_{t-1 }and proceed to run OLS in the model with the transformed data. This will produce a GLS estimator of the

*β*

_{i }’s that now will be BLUE as the new error term

*v*is free of serial correlation. In practice,

_{t}*ρ*

_{1}is not known and needs to be consistently estimated. The estimate

*ρ̂*is obtained from the auxiliary regression of

_{1}*ε̂*on

_{t}*ε̂*. We proceed by quasi-differencing the data,

_{t}*Y*and

_{t}– ρ̂_{1}Y_{t –-1}*X*

_{t}– ρ̂_{1}

*X*

_{t – 1}, and as before, running OLS with the transformed data. The estimator of the β

_{i }’s now is called the

*feasible GLS estimator*(FGLS), which is a biased estimator, though asymptotically is still consistent. In practice, the FGLS estimator is obtained by iterative procedures known as the

*Cochrane Orcutt procedure*, which does not consider the first observation, or the

*Prais-Winsten procedure*, which includes the first observation. When the sample size is large, the difference between the two procedures is negligible.

In the second case, when the regressors are not strictly exogenous, we should not apply FGLS estimation because the estimator will not be even consistent. In this instance, we modify the OLS standard errors to make them robust against any form of serial correlation. There is no need to transform the data as we just run OLS with the original data. The formulas for the robust standard errors, which are known as the HAC (heteroscedasticity and autocorrelation consistent) standard errors, are provided by Whitney Newey and Kenneth West (1987). Nowadays, most of the econometric software calculates the HAC standard errors, though the researcher must input the value of a parameter that controls how much serial correlation should be accounted for. Theoretically, the value of this parameter should grow with the sample size. Newey and West advised researchers to choose the integer part of 4(*T* /100)^{2/9} where *T* is the sample size. Although by computing the HAC standard errors we avoid the explicit modeling of serial correlation in the error term, it should be said that they could be inefficient, in particular when the serial correlation is strong and the sample size is small.

**SEE ALSO** *Least Squares, Ordinary; Pooled Time Series and Cross-sectional Data; Properties of Estimators (Asymptotic and Exact); Time Series Regression; Unit Root and Cointegration Regression*

## BIBLIOGRAPHY

Newey, Whitney K., and Kenneth D. West. 1987. A Simple, Positive, Semi-Definite Heteroskedasticity and Autocorrelation Consistent Covariance Matrix. *Econometrica* 55: 703-708.

*Gloria González-Rivera*