Serial Correlation

views updated

Serial Correlation


Serial correlation is a statistical term that refers to the linear dynamics of a random variable. Economic variables tend to evolve parsimoniously over time and that creates temporal dependence. For instance, as the economy grows, the level of gross national product (GNP) today depends on the level of GNP yesterday; or the present inflation rate is a function of the level of inflation in previous periods since it may take some time for the economy to adjust to a new monetary policy.

Consider a time series data set {Yt, X 1t . , Xkt } for t = 1, 2, , T. Our interest is to estimate a regression model like Y =1 β0+β1X1t + , βkXkt + εt. For instance, Yt is the inflation rate, and X 1t , , Xkt is a set of regressors such as unemployment and other macroeconomic variables. Under the classical set of assumptions, the Gauss-Markov theorem holds, and the ordinary least squares (OLS) estimator of the β i s is the best, linear, and unbiased estimator (BLUE). Serial correlation is a violation of one of the classical assumptions. Technically, we say that there is serial correlation when the error term is linearly dependent across time, that is, the cov(εt ,εs ) 0 for t s. We also say that the error term is autocorrelated. The covariance is positive when on average positive (negative) errors tend to be followed by positive (negative) errors; and the covariance is negative when positive (negative) errors are followed by negative (positive) errors. In either case, a covariance that is different from zero will happen when the dependent variable Yt is correlated over time and the regression model does not include enough lagged dependent variables to account for the serial correlation in Yt. The presence of serial correlation invalidates the Gauss Markov theorem. The OLS estimator can still be unbiased and consistent (large sample property), but it is no longer the best estimator, the minimum variance estimator. More importantly, the OLS standard errors are not correct, and consequently the t-tests and F-tests are invalid.

There are several models that can take into account the serial correlation of εt : the autoregressive model AR(p), that is, εt = ρ1εt 1 + ρ2εt 2 + ρpεt p + vt, where vt is now uncorrelated with zero mean and constant variance; the moving average MA(q) εt = θqvt q + θ1v t 1 + vt ; or a mixture model ARMA(p,q). Testing for serial correlation in the error term of a regression model amounts to assessing whether the parameters ρi s and θi s are statistically different from zero. A popular model within economics is the AR(1) model: εt = ρ1εt 1 + vt. Within this model, the null hypothesis to test is H0 :ρ1 = 0. If we reject the null hypothesis, we conclude that there is serial correlation in the error term. To implement the test, we proceed by running OLS in the regression model. We retrieve the OLS residuals ε̂t and, assuming that the regressors X 1t , X 2t , , Xkt are strictly exogenous, we regress ε̂t on ε̂t - 1 . A t-statistic for H0 : ρ1 = 0 will be asymptotically valid. If the autoregressive model is of a large order, an F-test for a joint hypothesis as H0 : ρ1 = ρ2 = = ρp = 0 will also be valid. If the regressors are not strictly exogenous, the auxiliary regression of ε̂t-1 on ε̂t 1 should be augmented with the set of regressors X 1t , X 2t , , Xkt for the t-test and F-test to be valid. There is also a very popular statistic, the Durbin-Watson, which also requires strict exogeneity that tests for AR(1) serial correlation. The main shortcoming of this test is the difficulty in obtaining its null distribution. Though there are tabulated critical values, the test leads to inconclusive results in many instances.

Once we conclude that there is serial correlation in the error term, we have two ways to proceed depending upon the exogeneity of the regressors. If the regressors are strictly exogenous, we proceed to model the serial correlation and to transform the data accordingly. A regression model based on the transformed data is estimated with generalized least squares (GLS). If the regressors are not exogenous, we proceed to make the OLS standard errors robust against serial correlation. In the first case, let us assume that there is serial correlation of the AR(1) type, that is, εt = ρ1εt 1 + vt. In order to eliminate the serial correlation, we proceed to transform the data by quasi-differencing. For simplicity, suppose that the regression model is Yt = β0 + β1 X1t + εt. The following transformation will produce a regression model with an uncorrelated error term:

Yt ρ1 Yt 1 = β0(1 ρ1) + β1(X1t ρ1X1t 1) + εt ρ1εt 1 = β0(1 ρ1) + β1(X1t ρ1X1t 1) + vt

If ρ1 is known, it is easy to obtain the quasi-differenced data, that is, Yt ρ 1 Yt-1 and Xt ρ1X t-1 and proceed to run OLS in the model with the transformed data. This will produce a GLS estimator of the β i s that now will be BLUE as the new error term vt is free of serial correlation. In practice, ρ 1 is not known and needs to be consistently estimated. The estimate ρ̂1 is obtained from the auxiliary regression of ε̂t on ε̂t. We proceed by quasi-differencing the data, Yt ρ̂1Yt -1 and Xt ρ̂ 1X t 1, and as before, running OLS with the transformed data. The estimator of the βi s now is called the feasible GLS estimator (FGLS), which is a biased estimator, though asymptotically is still consistent. In practice, the FGLS estimator is obtained by iterative procedures known as the Cochrane Orcutt procedure, which does not consider the first observation, or the Prais-Winsten procedure, which includes the first observation. When the sample size is large, the difference between the two procedures is negligible.

In the second case, when the regressors are not strictly exogenous, we should not apply FGLS estimation because the estimator will not be even consistent. In this instance, we modify the OLS standard errors to make them robust against any form of serial correlation. There is no need to transform the data as we just run OLS with the original data. The formulas for the robust standard errors, which are known as the HAC (heteroscedasticity and autocorrelation consistent) standard errors, are provided by Whitney Newey and Kenneth West (1987). Nowadays, most of the econometric software calculates the HAC standard errors, though the researcher must input the value of a parameter that controls how much serial correlation should be accounted for. Theoretically, the value of this parameter should grow with the sample size. Newey and West advised researchers to choose the integer part of 4(T /100)2/9 where T is the sample size. Although by computing the HAC standard errors we avoid the explicit modeling of serial correlation in the error term, it should be said that they could be inefficient, in particular when the serial correlation is strong and the sample size is small.

SEE ALSO Least Squares, Ordinary; Pooled Time Series and Cross-sectional Data; Properties of Estimators (Asymptotic and Exact); Time Series Regression; Unit Root and Cointegration Regression


Newey, Whitney K., and Kenneth D. West. 1987. A Simple, Positive, Semi-Definite Heteroskedasticity and Autocorrelation Consistent Covariance Matrix. Econometrica 55: 703-708.

Gloria González-Rivera