Simultaneous Equation Bias
Simultaneous Equation Bias
Simultaneous equation bias is a fundamental problem in many applications of regression analysis in the social sciences that arises when a right-hand side, X, variable is not truly exogenous (i.e., it is a function of other variables). In general, ordinary least squares (OLS) regression applied to a single equation from a system of simultaneous equations will produce biased, that is, systematically wrong, parameter estimates. Furthermore, the bias from OLS does not decrease as the sample size increases. Estimating parameters from a simultaneous equation model requires advanced methods, of which the most popular today is two-stage least squares (2SLS).
Consider the following single-equation regression model:
yi = β0 + β1xi + εi
This data generation process (DGP) says that each value (denoted by the i subscript) of the dependent variable, y, is produced by taking β 0 and adding β 1 times the value of the independent variable, x, and adding a draw from the random error distribution, εi
To estimate the value of the slope parameter, β 1, from a sample of x, y observations, we fit a line using ordinary least squares, so named because coefficients are chosen to minimize the sum of squared residuals. A residual is the vertical distance between the actual and predicted value. The equation of the fitted line is
Predicted y = b0 + b1x.
The slope coefficient from the OLS fitted line, b1, is our estimate of the unknown parameter β 1. Because we are dealing with a finite sample, we know that our estimate, b1, is probably not exactly equal to the parameter value, β 1. If we generated another sample, we would get another value of b1 for that sample. This shows that the
slope coefficient from the OLS fitted line is actually a random variable.
Figure 1 provides a concrete example of the abstract ideas underlying OLS. The points in the graph correspond to those in the table. The estimated slope, 4.2, does not equal the true slope, 5, because of the random error term, which in this case is normally distributed with mean zero and standard deviation of 50. A new sample of ten observations would have the same X values, but the Y s would be different and, thus, the estimated slope from the fitted line would change.
There are other estimators (recipes for fitting the line) besides OLS. The circle in Figure 2 represents all of the possible estimators. The vertical oval contains all of the linear estimators. This does not refer to the fitted line itself, which can have a curved or other nonlinear shape, but to the algorithm for computing the estimator. All of the unbiased estimators are included in the horizontal oval. Unbiasedness is a desirable property referring to the accuracy of an estimator. Unbiased estimators produce estimates that are, on average, equal to the parameter value. Bias means that the estimator is systematically wrong, that is, its expected value does not equal the parameter value. The area where the ovals overlap in Figure 2 is that subset of estimators, including OLS, which are both linear and unbiased.
According to the Gauss-Markov Theorem, when the DGP obeys certain conditions, OLS is the best, linear, unbiased estimator (BLUE). Of all of the linear and unbiased estimators, OLS is the best because it has the smallest
variance. In other words, there are other estimators that are linear and unbiased (centered on β1), but they have greater variability than OLS. The goal is unbiased estimators with the highest precision, and the Gauss-Markov Theorem guarantees that OLS fits the bill.
Figure 3 shows histograms for three rival linear estimators for a DGP that conforms to the Gauss-Markov conditions. The histograms reflect the estimates produced by each estimator. Rival 1 is biased. It produces estimates that are systematically too low. Rival 2 and OLS are unbiased because each one is centered on the true parameter value. Although both are accurate, OLS is more precise. In other words, using OLS rather than Rival 2 is more likely to give estimates near the true parameter value. The Gauss-Markov Theorem says that OLS is the most precise estimator in the class of linear, unbiased estimators.
Suppose one faces a simultaneous equation DGP like this:
y 1i = β0 + β1xi + β2y2i + ε
y2i = α0 + α1y1i + ε2i
There are two dependent (or endogenous) variables, y1 and y2. Each equation has a regressor (a right-hand side variable) that is a dependent variable.
If one is interested in the effect of y1 on y2, can one toss out the first equation and treat the second equation as a single-equation model? In other words, what happens if one ignores the simultaneity and simply runs an OLS regression on an individual equation? One gets simultaneous equation bias. The OLS estimator of α1, the slope parameter in the second equation, will be biased, that is, it will not be centered on α 1. With every sample to which one applies the OLS recipe, the resulting estimates will be systematically wrong. OLS is now behaving like the Rival 1 estimator in Figure 3 (although one does not know if the bias centers OLS above or below the true parameter value).
Consider the following concrete example. A researcher is interested in estimating the effect of the crime rate (number of crimes per 100,000 people per year) on enforcement spending (dollars per person per year). As the crime rate rises, more police officers and prison guards are needed, so enforcement spending will rise. The researcher is interested in estimating the slope coefficient, β 1, in the following model:
Enforcement Spendingi = β0 + β1 Crime Rateii + εi.
Unfortunately, in this situation, as in most social science applications, the real world does not follow a single-equation DGP. Although it is true that government policy makers allocate resources to enforcement spending depending on the crime rate, criminals make decisions based on enforcement spending (and other variables). Increased crime causes more enforcement spending, but more enforcement spending causes less crime. This kind of feedback loop is common in the social sciences. The appropriate model is not a single-equation DGP because the crime rate is not a truly exogenous variable. Instead, the researcher must cope with a simultaneous system of equations where both enforcement spending and crime rate are dependent variables.
If the researcher naively applies OLS to the single equation, her estimate of the effect of crime on enforcement spending, β 1, will be biased. Because ignoring the fact that the crime rate is actually a dependent variable with its own DGP equation causes this bias, it is called simultaneous equation (or simultaneity) bias.
The source of the poor performance of the OLS estimator lies in the fact that we have a violation of the conditions required for the Gauss-Markov Theorem: The crime rate is a right-hand side variable that is not independent of the error term. In a given year a high crime rate will result in high enforcement spending, but that will trigger a low crime rate. Conversely, a low enforcement spending year will lead to more crime. When the error term is correlated with a regressor, OLS breaks down and is no longer an unbiased estimator.
Estimating an equation with dependent variables on the right-hand side requires advanced methods. It is important to recognize that increasing the sample size or adding explanatory variables to the single-equation regression will not solve the problem.
The approach typically taken is called two-stage least squares (2SLS). In the first stage, an OLS regression utilizes truly exogenous variables (called instrumental variables) to create artificial variables. In the second stage, these artificial variables are then used in place of the endogenous, right-hand side variables in each equation in the system.
In the enforcement spending and crime rate example, the researcher would first regress the crime rate on a set of truly exogenous variables to create a Predicted Crime Rate variable. Determining the instruments to be used in the first stage regression is a crucial step in the 2SLS procedure. In the second stage, she would substitute the Predicted Crime Rate for the Crime Rate variable and run OLS. It can be shown that as the sample size increases, the expected value of the 2SLS estimator gets closer to the true parameter value. Thus, unlike OLS, 2SLS is a consistent estimator of a parameter in a simultaneous equation model.
In practice, two separate regressions are not actually run. Modern statistical software packages have an option for 2SLS that performs the calculations, computing appropriate standard errors and other regression statistics, in one step. As a practical matter, even if there are strong theoretical reasons to suspect the presence of simultaneous equation bias, it need not be a particularly large bias.
Attempts to estimate demand curves in the first quarter of the twentieth century led economists to model supply and demand equations as a simultaneous system. This work culminated in the probabilistic revolution in the 1940s. In “The Probability Approach in Econometrics,” Trygve Haavelmo called for explicit description of the data generation process, including the source of variation in the error term and the use of a simultaneous system of equations to model complicated interrelationships among variables.
Haavelmo’s program was supported by Tjalling Koopmans and others at the Cowles Commission, a research think tank housed at the University of Chicago from 1939 to 1955. These econometricians made progress in several key areas, including the identification problem, understanding the nature of simultaneous equation bias, and methods for properly estimating an equation embedded in a simultaneous system. They concentrated their simultaneous equation estimation efforts on full- and limited-information maximum likelihood. Two-stage least squares, a much more efficient computational approach, was not discovered—independently by Henri Theil and Robert Basmann—until the 1950s.
Simultaneous equation bias occurs when an ordinary least squares regression is used to estimate an individual equation that is actually part of a simultaneous system of equations. It is extremely common in social science applications because almost all variables are determined by complex interactions with each other. The bias lies in the estimated coefficients, which are not centered on their true parameter values. Advanced methods, designed to eliminate simultaneous equation bias, use instrumental variables in the first stage of a two-stage least squares procedure.
SEE ALSO General Linear Model; Instrumental Variables Regression; Least Squares, Three-Stage; Least Squares, Two-Stage; Ordinary Least Squares Regression; Regression; Regression Analysis
Christ, Carl F. 1994. The Cowles Commission’s Contributions to Econometrics at Chicago, 1939-1995. Journal of Economic Literature 32 (1): 30-59.
Haavelmo, Trygve. 1944. The Probability Approach in Econometrics. In The Foundations of Econometric Analysis, eds. David F. Hendry and Mary S. Morgan, 1995. Cambridge, U.K. and New York: Cambridge University Press.
Morgan, Mary. 1990. The History of Econometric Ideas. Cambridge, U.K. and New York: Cambridge University Press.
Wooldridge, Jeffrey M. 2006. Introductory Econometrics: A Modern Approach. 3rd ed. Mason, OH: Thomson/South-Western.