## Variables, Predetermined

## Variables, Predetermined

# Variables, Predetermined

This entry explains when a variable is a predetermined variable and how identification and inference require a variable to be predetermined. In social science, researchers often try to explain a phenomenon or an event using one or more explanatory variables. For example, how much an individual earns can be explained (to some degree) by his or her education level, and how much an individual consumes can be explained by his or her income and wealth. In many cases, a social scientist will formulate a model in which one variable is a function of another variable. For example, the following is a model that relates consumption to income and wealth:

Consumption = *c* _{0} + *c* _{1} • income + *c* _{2} • wealth

where *c* _{0}, *c* _{1}, and *c* _{2} are numbers. For example, *c* _{0} = $10,000, *c* _{1} = 0.7, and *c* _{2} = 0.05. This model implies that a one-dollar increase of income causes consumption to increase by $*c* _{1} (that is, consumption increases by seventy cents if income increases by one dollar). In order to estimate this model, we need to extend the model with an error term. This error term captures variables other than income or wealth. Let

Consumption = *c* _{0} + *c* _{1} • income + *c* _{2} • wealth + *ε* where the error term *ε* is assumed to be uncorrelated with income and wealth. If *ε* is assumed to be uncorrelated with income and wealth, then income and wealth are exogenous variables. No correlation means that we cannot use the regressors to predict the error term, that is, *E* (*ε* ǀincome, wealth) = 0. If all the explanatory variables are exogenous variables, then the coefficients can be given a causal interpretation. Suppose that a social science researcher does not have access to data on wealth and, therefore, estimates the model

Consumption = *d* _{0} + *d* _{1} • income + *u*.

Note that the new error term *u* consists of the old error term *ε* plus *c* _{2} • wealth. Wealth and income are correlated so that income is correlated with *u*. Therefore, we cannot give a causal interpretation to *d* _{1}. In particular, an estimate of *d* _{1} is likely to overstate the effect of income on consumption. Suppose that we have data on consumption and income for *N* individuals, {Consumption* _{i}*, income

*} where*

_{i}*i*= 1, …,

*N*. Consider the least squares estimator for

*d*

_{1}. This estimator minimizes Σ

*(Consumption*

_{i}*–*

_{i}*d*

_{0}–

*d*

_{1}• income

*)*

_{i}^{2}with respect to

*d*

_{0}and

*d*

_{1}. The least squares estimator for

*d*

_{1}has the following form,

where denotes the mean of income, income_{i}. If income and the error term *u* are uncorrelated, then (1) the expectation of the last term, , is zero so that *E* (*d* _{1}) = *d* _{1}, and (2) this last term is very small for large *N* (the technical term is that converges in probability to zero so that *d* _{1} is a consistent estimator for *d* _{1}). However, in this example, the error term *u _{i}* depends on wealth. Wealth and income are correlated so that the assumption exogeneity (i.e., that all regressors are uncorrelated with the error term) is violated. As a result, the estimate for

*d*

_{1}cannot be given a causal interpretation. In particular, the expectation of the estimator,

*Ed*

_{1}, will be larger than 0.7 because of the positive correlation between income and wealth.

The exogeneity assumption is very strong and can be relaxed somewhat. Consider the following model that describes the squared daily return of a stockmarket (e.g., the daily return of the Standard & Poor’s 500 index),

Squared Return * _{t}* =

*f*

_{0}+

*f*

_{1}• Squared Return

_{t–1}As before, this model can be extended to include an error term,

Squared Return * _{t}* =

*f*

_{0}+

*f*

_{1}• Squared Return

*+*

_{t–1}*v*.

_{t}Suppose there are *T* data points so that *t* = 1, 2, …, *T*. Rather than assuming that the correlation between the squared return and the error term is zero, that is, that *E* (*v _{t}* ǀSquared Return

_{1}, …, Squared Return

*) = 0 for all*

_{T}*t*, we now make the weaker assumption that, given the past values of the squared return, the expectation of the error term is zero, that is,

*E*(

*v*ǀSquared Return

_{t}_{1}, …, Squared Return

*) = 0 for all*

_{t–1}*t*. Note that the past values of the squared return for error term

*v*consist of the squared return of the first period, Squared Return

_{t}_{1}, through period t – 1, Squared Return

*. Regressors that have the property that the error term has zero expectation given past values of the regressor are called*

_{t–1}*predetermined regressors*or

*predetermined variables*. Consider the least squares regressor again to see how predeterminedness helps the estimator,

The term is zero in expectation since *E* (*v _{t}* ǀSquared Return

_{1}, …, Squared Return

*) = 0. Moreover, for large*

_{t–1}*T*, this term, as well as , will be small so that the estimate

*f*

_{1}is close to the true value

*f*

_{1}. This model of squared returns is an ARCH (auto regressive conditional heteroscedasticity) model and can be used to study volatility. In particular, a large decline of the stockmarket in period

*t*– 1 means that the stockmarket will be more volatile in period

*t*. Tim Bollerslev, Robert Engle, and Daniel Nelson (1994) discuss other ARCH models.

An endogenous regressor has the property that *E* (*v _{t}* ǀSquared Return

_{1}, …, Squared Return

*) ≠ 0. Thus, an endogenous regressor cannot be a predetermined regressor. Endogeneity (i.e., having an endogenous regressor) occurs if there is a third unobserved variable that affects both the regressor and the error term. For example, how much an individual earns can be partly explained by his or her education. Data on earnings and education levels are not hard to collect, but reliable data on intelligence are difficult to obtain. For this reason, earnings are usually regressed on the education so that intelligence is part of the error term. However, intelligence will also affect education levels so that the regressor education and the error term are correlated. In other words, there is an unobserved variable that affects both the regressor and the error term so that*

_{t–1}*E*(

*v*ǀSquared Return

_{t}_{1}, …, Squared Return

*) ≠ 0. Therefore, least squares cannot be used to estimate the effect of education on income. Econometricians have developed another technique, namely, two-stage least squares.*

_{t–1}In nonlinear models, a slightly different definition of exogeneity and predeterminedness is sometimes used. In particular, the regressors are exogenous if the regressors and the error term are statistically independently distributed. That is, if the density of the error term conditional on the regressors, *p* (error termǀregressors) is the same as the unconditional density of the error term, *p* (error term). Similarly, the regressors are predetermined if the density of the error term of period *t* conditional on the past regressors, *p* (error term * _{t}* ǀregressors

_{1}, …, regressor

*), is the same as the unconditional density of the error term,*

_{t–1}*p*(error term

*) for all*

_{t}*t*. Robert De Jong and Tiemen Woutersen (2006) use these definitions when they estimate a model to predict monetary policy.

**SEE ALSO** *Autoregressive Models; Causality; Econometric Decomposition; Identification Problem; Probability; Regression; Regression Analysis; Statistics*

## BIBLIOGRAPHY

Bollerslev, Tim, Robert F. Engle, and Daniel B. Nelson. 1994. ARCH Models. In *Handbook of Econometrics*, Vol. 4, eds. Robert. F. Engle and Daniel McFadden, 2961–3031. Amsterdam: North Holland.

De Jong, Robert, and Tiemen Woutersen. 2006. Dynamic Time Series Binary Choice. Working Paper. Baltimore: MD: Johns Hopkins University.

Greene, William H. 2002. *Econometric Analysis*. 5th ed. Upper Saddle River, NJ: Prentice Hall.

Stock, James H., and Mark W. Watson. 2006. *Introduction to Econometrics*. 2nd ed. Boston: Addison-Wesley.

*Tiemen Woutersen*