Fixed Effects Regression

views updated

Fixed Effects Regression

A fixed effects regression is an estimation technique employed in a panel data setting that allows one to control for time-invariant unobserved individual characteristics that can be correlated with the observed independent variables.

Let us assume we are interested in the causal relationship between a vector of observable random variables x = (1, x1, x₂, …, x_K) ' and a dependent random variable y where the true linear model is of the following form:

yi= β 'x_i + μ _i + ε _i with i = 1, …, N

with μ being an unobserved random variable characterizing each unit of observation i and ε the stochastic error uncorrelated with x.

When μ is correlated with x we cannot consistently estimate the vector of parameters of interest β using Ordinary Least Squares because the standard assumption of no correlation between the error term and the regressors is violated. In a cross-sectional setting, typical strategies to solve this omitted variable problem are instrumental variables or the inclusion of proxies for μ However, when the available data is longitudinal, that is, when it contains a cross-sectional as well as a time series dimension, it is possible to adopt alternative estimation methods known in the literature as “panel data” techniques.

Assuming we repeatedly observe N units for T periods of time, and that the unobservable variable μ is time invariant, we can write our model as:

y _it = β' x _it + μ + ε; with i = 1, …, N and t = 1, …, T

Depending on the correlation between the omitted variable μ and the regressors x, alternative estimation techniques are available to the researcher. A fixed effects regression allows for arbitrary correlation between μ and x, that is, E (x _jitμ _i ) ≠ 0, whereas random effects regression techniques do not allow for such correlation, that is, the condition E (x_jit μ_i ) = 0 must be respected. This terminology is somehow misleading because in both cases the unobservable variable is to be considered random. However, the terminology is so widespread in the literature that it has been accepted as standard.

A fixed effects regression consists in subtracting the time mean from each variable in the model and then estimating the resulting transformed model by Ordinary Least Squares. This procedure, known as “within” transformation, allows one to drop the unobserved component and consistently estimate β. Analytically, the above model becomes

ỹ _it = β' x̃_it + ε̃ _it

where ỹ _it = y _it – ȳ _i with ȳ _i = T ^–1 Σ^T _{t = 1} y _it (and the same for x, μ, and ε). Because a μ _i is fixed over time, we have μ _i μ̄ _i = 0.

This procedure is numerically identical to including N – 1 dummies in the regression, suggesting intuitively that a fixed effects regression accounts for unobserved individual heterogeneity by means of individual specific intercepts. In other words, the slopes of the regression are common across units (the coefficients of x₁, x ₂, …, _{x K}) whereas the intercept is allowed to vary.

One drawback of the fixed effects procedure is that the within transformation does not allow one to include time-invariant independent variables in the regression, because they get eliminated similarly to the fixed unobserved component. In addition, parameter estimates are likely to be imprecise if the time series dimension is limited.

Under classical assumptions, the fixed effects estimator is consistent (with N → ∞ and T fixed) in the cases of both E (x_jit μ _i) = 0 and E (x_jit μ _i) ≠ 0, where j = 1, …, K. It is efficient when all the explanatory variables are correlated with μ_i However, it is less efficient than the random effect estimator when E (x_jitμ_i ) = 0.

The consistency property requires the strict exogene-ity of x. However, this property is not satisfied when the estimated model includes a lagged dependent variable, as in y_it = α y_it-1 + 'x_it + μ_i + ε_it .

This suggests the adoption of instrumental variables or Generalized Method of Moments techniques in order to obtain consistent estimates. However, a large time dimension T assures consistency even in the case of the dynamic specification above.

Sometimes the true model includes unobserved shocks common to all units i, but time-varying. In this case, the model includes an additional error component 6 that can be controlled for by simply including time dummies in the equation.

A typical application of a fixed effects regression is in the context of wage equations. Let us assume that we are interested in assessing the impact of years of education in logs e on wages in logs w when the ability of individuals a is not observed. The true model is then

W_i = β₀ + β₁ e_i + v _i

where v_i = a_i + ε_i Given that unobserved ability is likely to be correlated with education, then the composite stochastic error v is also correlated with the regressor and the estimate of β ₁ will be biased. However, since innate ability does not change over time, if our data set is longitudinal we can use a fixed effect estimator to obtain a consistent estimate of β ₁ Applying the within transformation to the preceding equation we end up with W̃_it =βẽ₁ _it + ε̃ _it

where we have eliminated the time invariant unobserved component a _i Being E (ε̃_it ε_it ) = 0, the model now satisfies the classical assumptions and we can estimate it by Ordinary Least Squares.