Serial Correlation

views updated

Serial Correlation

Serial correlation is a statistical term that refers to the linear dynamics of a random variable. Economic variables tend to evolve parsimoniously over time and that creates temporal dependence. For instance, as the economy grows, the level of gross national product (GNP) today depends on the level of GNP yesterday; or the present inflation rate is a function of the level of inflation in previous periods since it may take some time for the economy to adjust to a new monetary policy.

Consider a time series data set {Y_t, X _1t. …, X_kt } for t = 1, 2, …, T. Our interest is to estimate a regression model like Y =1 β₀+β₁X_1t + …, β_kX_kt + ε_t. For instance, Y_t is the inflation rate, and X _1t, …, X_kt is a set of regressors such as unemployment and other macroeconomic variables. Under the classical set of assumptions, the Gauss-Markov theorem holds, and the ordinary least squares (OLS) estimator of the β _i’s is the best, linear, and unbiased estimator (BLUE). Serial correlation is a violation of one of the classical assumptions. Technically, we say that there is serial correlation when the error term is linearly dependent across time, that is, the cov(ε_t,ε_s) ≠ 0 for t ≠ s. We also say that the error term is autocorrelated. The covariance is positive when on average positive (negative) errors tend to be followed by positive (negative) errors; and the covariance is negative when positive (negative) errors are followed by negative (positive) errors. In either case, a covariance that is different from zero will happen when the dependent variable Y_t is correlated over time and the regression model does not include enough lagged dependent variables to account for the serial correlation in Y_t. The presence of serial correlation invalidates the Gauss Markov theorem. The OLS estimator can still be unbiased and consistent (large sample property), but it is no longer the best estimator, the minimum variance estimator. More importantly, the OLS standard errors are not correct, and consequently the t-tests and F-tests are invalid.

There are several models that can take into account the serial correlation of ε_t: the autoregressive model AR(p), that is, ε_t = ρ₁ε_{t – 1} + ρ₂ε_{t – 2} + … ρ_pε_{t – p} + v_t, where v_t is now uncorrelated with zero mean and constant variance; the moving average MA(q) ε_t = θ_qv_{t – q} + … θ₁v _{t – 1} + v_t ; or a mixture model ARMA(p,q). Testing for serial correlation in the error term of a regression model amounts to assessing whether the parameters ρ_i’s and θ_i’s are statistically different from zero. A popular model within economics is the AR(1) model: ε_t = ρ₁ε_{t – 1} + v_t. Within this model, the null hypothesis to test is H₀ :ρ₁ = 0. If we reject the null hypothesis, we conclude that there is serial correlation in the error term. To implement the test, we proceed by running OLS in the regression model. We retrieve the OLS residuals ε̂_t and, assuming that the regressors X _1t, X _2t, …, X_kt are strictly exogenous, we regress ε̂_t on ε̂_{t - 1}. A t-statistic for H₀ : ρ₁ = 0 will be asymptotically valid. If the autoregressive model is of a large order, an F-test for a joint hypothesis as H₀ : ρ₁ = ρ₂ = … = ρ_p = 0 will also be valid. If the regressors are not strictly exogenous, the auxiliary regression of ε̂_t-1 on ε̂_{t – 1} should be augmented with the set of regressors X _1t, X _2t, …, X_kt for the t-test and F-test to be valid. There is also a very popular statistic, the Durbin-Watson, which also requires strict exogeneity that tests for AR(1) serial correlation. The main shortcoming of this test is the difficulty in obtaining its null distribution. Though there are tabulated critical values, the test leads to inconclusive results in many instances.

Once we conclude that there is serial correlation in the error term, we have two ways to proceed depending upon the exogeneity of the regressors. If the regressors are strictly exogenous, we proceed to model the serial correlation and to transform the data accordingly. A regression model based on the transformed data is estimated with generalized least squares (GLS). If the regressors are not exogenous, we proceed to make the OLS standard errors robust against serial correlation. In the first case, let us assume that there is serial correlation of the AR(1) type, that is, ε_t = ρ₁ε_{t – 1} + v_t. In order to eliminate the serial correlation, we proceed to transform the data by quasi-differencing. For simplicity, suppose that the regression model is Y_t = β₀ + β₁ X_1t + ε_t. The following transformation will produce a regression model with an uncorrelated error term:

Y_t – ρ₁ Y_{t – 1} = β₀(1 – ρ₁) + β₁(X_1t – ρ₁X_{1t – 1}) + ε_t – ρ₁ε_{t – 1} = β₀(1 – ρ₁) + β₁(X_1t – ρ₁X_{1t – 1}) + v_t

If ρ₁ is known, it is easy to obtain the quasi-differenced data, that is, Y_t – ρ ₁ Y_t-1 and X_t – ρ₁X _t-1 and proceed to run OLS in the model with the transformed data. This will produce a GLS estimator of the β _i’s that now will be BLUE as the new error term v_t is free of serial correlation. In practice, ρ ₁ is not known and needs to be consistently estimated. The estimate ρ̂₁ is obtained from the auxiliary regression of ε̂_t on ε̂_t. We proceed by quasi-differencing the data, Y_t – ρ̂₁Y_{t –-1} and X_t – ρ̂ ₁X _{t – 1}, and as before, running OLS with the transformed data. The estimator of the β_i’s now is called the feasible GLS estimator (FGLS), which is a biased estimator, though asymptotically is still consistent. In practice, the FGLS estimator is obtained by iterative procedures known as the Cochrane Orcutt procedure, which does not consider the first observation, or the Prais-Winsten procedure, which includes the first observation. When the sample size is large, the difference between the two procedures is negligible.

In the second case, when the regressors are not strictly exogenous, we should not apply FGLS estimation because the estimator will not be even consistent. In this instance, we modify the OLS standard errors to make them robust against any form of serial correlation. There is no need to transform the data as we just run OLS with the original data. The formulas for the robust standard errors, which are known as the HAC (heteroscedasticity and autocorrelation consistent) standard errors, are provided by Whitney Newey and Kenneth West (1987). Nowadays, most of the econometric software calculates the HAC standard errors, though the researcher must input the value of a parameter that controls how much serial correlation should be accounted for. Theoretically, the value of this parameter should grow with the sample size. Newey and West advised researchers to choose the integer part of 4(T /100)^2/9 where T is the sample size. Although by computing the HAC standard errors we avoid the explicit modeling of serial correlation in the error term, it should be said that they could be inefficient, in particular when the serial correlation is strong and the sample size is small.

SEE ALSO Least Squares, Ordinary; Pooled Time Series and Cross-sectional Data; Properties of Estimators (Asymptotic and Exact); Time Series Regression; Unit Root and Cointegration Regression

BIBLIOGRAPHY

Newey, Whitney K., and Kenneth D. West. 1987. A Simple, Positive, Semi-Definite Heteroskedasticity and Autocorrelation Consistent Covariance Matrix. Econometrica 55: 703-708.

Gloria González-Rivera

International Encyclopedia of the Social Sciences