Simultaneous Equation Estimation
Simultaneous Equation Estimation
Alternative estimation methods
Evaluation of alternative methods
The distinction between partial and general equilibrium analysis in economic theory is well grounded [SeeECONOMIC EQUILIBRIUM]. Early work in econometrics paid inadequate attention to this distinction and overlooked for many years the possibilities of improving statistical estimates of individual economic relationships by embedding them in models of the economy as a whole [seeECONOMETRIC MODELS, AGGREGATE]. The earliest studies in econometrics were concerned with estimating parameters of demand functions, supply functions, production functions, cost functions, and similar tools of economic analysis. The principal statistical procedure used was to estimate the α’s in the relation
using the criterion that be minimized. This is the principle of “least squares” applied to a single equation in which y_{t} is chosen as the dependent variable and x_{l}t, · · · , x_{n}t are chosen as the independent variables. The criterion is the minimization of the sum of squared “disturbances” (u_{t}) which are assumed to be unobserved random errors. The estimation of the unknown parameters a{ is based on the sample of T observations of y_{t} and x_{l}t, · · · , x_{n}t. This is the usual statistical model and estimation procedure that is used in controlled experimental situations where the set of independent variables consists of selected, fixed variates for the experimental readings on y_{t} , the dependent variable. [SeeLINEAR HYPOTHESES, article on REGRESSION].
However, economics, like most other social sciences, is largely a nonexperimental science, and it is generally not possible to control the values of x_{1}t, ··· ,x_{n}t. The values of the independent variables, like those of the dependent variable, are produced from the general outcome of economic life, and the econometrician is faced with the problem of making statistical inferences from nonexperimental data. This is the basic reason for the use of simultaneous equation methods of estimation in econometrics. In some situations x_{l}t, ··· ,x_{n}t may not be controlled variates, but they may have a oneway causal influence on y_{t} . The main point is that least squares yields desirable results only if ut is independent of x_{l}t, ·· · , x_{n}t, that is, if E(u_{t}x_{it},) = 0 for all i and t.
Properties of estimators . If the x_{it} are fixed variates, estimators of a( obtained by minimizing are best linear unbiased estimators. They are linear estimators because, as shown below, they are linear functions of y_{t}. An estimator, ά_{i}, of α_{i} is called unbiased if
i.e., if the expected value of the estimator equals the true value. An estimator is best if among all unbiased estimators it has the least variance, i.e., if
where ά is any other unbiased estimator. Clearly, the properties of being unbiased and best are desirable ones. These properties are defined without reference to sample size. Two related but weaker properties, which are defined for large samples, are consistency and efficiency.
An estimator, ά_{i}, is consistent if plim ά_{i} = α_{i},that is, if
This states that the probability that at deviates from ofj by an amount less than any arbitrarily small ε tends to unity as the sample size T tends to infinity.
Consider now the class of all consistent estimators that are normally distributed asT→ ∞. An efficient estimator of α_{i} is a consistent estimator whose asymptotic normal distribution has a smaller variance than any other member of this class. [SeeESTIMATION].
Inconsistency of least squares . The choice of estimators, ά_{i}, such that is minimized is formally equivalent to the empirical implementation of the condition that E(u_{t}x_{it},) = 0, since the firstorder condition for a minimum is
On the one hand, the α_{i} are estimated so as to minimize the residual sum of squares. On the other hand, they are estimated so that the residuals are uncorrelated with x_{l}t, · · · , x_{n}t. The possible inconsistency of this method is clearly revealed by the latter criterion, for if it is assumed that the u_{t} are independent of x_{l}t, ··· ,x_{n}t when they actually are not, the estimators will be inconsistent. This is shown by the formula
where M is the moment matrix whose typical element is Σ_{t}x_{it} x_{jt}; is the determinant of M; m_{uj} is Σ_{t}u_{t}x_{jt};and M_{ji} is the j, i cofactor of M. The inconsistency in the estimator is due to the nonvanishing probability limit of m_{uj}. In a nonexperimental sample of data, such as that observed as the joint outcome of the uncontrolled simultaneous economic process, we would expect many or all the x_{it} in a problem to be dependent on u_{t}.
Identifying restrictions . Since economic models consist of a set of simultaneous equations generating nonexperimental data, the equations of the model must be identified prior to statistical estimation. Unless some restrictions are imposed on specific relationships in a linear system of simultaneous equations, every equation may look alike to the statistician faced with the job of estimating the unknown coefficients. The economist must place a priori restrictions, in advance of statistical estimation, on each of the equations in order to identify them. These restrictions may specify that certain coefficients are known in advanceespecially that they are zero, for this is equivalent to excluding the associated variable from an economic relation. Other restrictions may specify linear relationships between the different coefficients. Consider the generalization of a single equation,
where E(z_{it} u_{t}) = 0 for all i and t, to a whole system,
where E(z_{jft}u_{it}) = 0 for all k, i, and t. Every variable enters every equation linearly without restriction, and the statistician has no way of distinguishing one relation from another. Zero restrictions, if imposed, would have the form β_{rs} = 0 or γ_{pq} = 0, for some r, s, p, or q. In many equations, we may be interested in specifying that sums or differences of variables are economically relevant combinations, i.e., that β_{rs} = β_{ru} or that β_{rv} = ─ γ_{rw} or, more generally, that
The last restriction implies that a homogeneous linear combination of parameters in the rth equation is specified to hold on a priori grounds. The weights w, and v, are known in advance.
If general linear restrictions are imposed on the equations of a linear system, we may state the following rule: an equation in a linear system is identified if it is not possible to reproduce by linear combination of some or all of the equations in the system an equation having the same statistical form as the equation being estimated.
If the restrictions are of the zero type, a necessary condition for identification of an equation in a linear system of n equations is that the number of variables excluded from that equation be greater than or equal to n – 1. A necessary and sufficient condition is that it is possible to form at least one nonvanishing determinant of order n– 1 out of those coefficients, properly arranged, with which the excluded variables appear in the n – 1 other equations (Koopmans et al. 1950).
Criteria for identifiability are stated here for linear equation systems. A more general treatment in nonlinear systems is given by Fisher (1966). [SeeSTATISTICAL IDENTIFIABILITY]
Alternative estimation methods
Assuming that we are dealing with an identified system, let us turn to the problems of estimation. In the system of equations above, the y_{it}, are endogenous or dependent variables and are equal in number to the number of equations in the system, n. The zkt are exogenous variables and are assumed to be independent of the disturbances, u_{it}.
In one of the basic early papers in simultaneous equation estimation (Mann & Wald 1943), it was shown that largesample theory would, under fairly general conditions, permit lagged values of endogenous variables to be treated like purely exogenous variables as far as consistency in estimation is concerned. Exogenous and lagged endogenous variables are called predetermined variables.
Early econometric studies, for example, that of Tinbergen (1939), were concerned with the estimation of a number of individual relationships in which the possible dependence between variables and disturbances was ignored. These studies stimulated Haavelmo (1943) to analyze the consistency problem, for he noted that the Tinbergen model contained many singleequation least squares estimates of equations that were interrelated in the system Tinbergen was constructing, which was intended to be a theoretical framework describing the economy that generated the observations used.
The lack of independence between disturbances and variables can readily be demonstrated. Consider the twoequation system
The ZM are by assumption independent of w1( and u2t. Some of the y’s are specified to be zero or are otherwise restricted so that the two equations are identified. Suppose we wish to estimate the first equation. To apply least squares to this equation, we would have to select either y^ or j/2 as the dependent variable. Suppose we select wt and set /3n equal to unity. We would then compute the least squares regression of y, on y2 and the zk according to the relation
which incorporates all the identifying restrictions on the y’s.
For this procedure to yield consistent estimators, y,t must be independent of MI(. The question is whether the existence of the second equation has any bearing on the independence of y,t and ult. Multiplying the second equation by ult and forming expectations, we have
From the first equation (with B_{11} = 1), we have
Combining these two expressions, we obtain
In general, this expression does not vanish, and we find that j/2t and w, are not independent.
The maximum likelihood method . The maximum likelihood method plays a normative role in the estimation of economic relationships, much like that played by perfect competition in economic theory. This method provides consistent and efficient estimators under fairly general conditions. It rests on specific assumptions, and it may be hard to realize all these assumptions in practice or, indeed, to make all the difficult calculations required for solution of the estimation equations.
For the singleequation model, the maximum likelihood method is immediately seen to be equivalent to ordinary least squares estimation for normally distributed disturbances. Let us suppose that u,, · · ·, UT are T independent, normally distributed variables. The Telement sample has the probability density function
By substitution we can transform this joint density of u_{1}, · · ·, u_{r} into a joint density of j/i, · · ·, yT, given xlt, · · ·, xnl, x12, · · ·, xn,, · · ·, xnT, namely,
This function will be denoted as L, the likelihood function of the sample, and is seen to depend on the unknown parameters at, · · ·, a,, and r. We maximize this function by imposing the following conditions:
These are recognized as the “normal” equations of singleequation least squares theory and the estimation equation for the residual varianceapart from adjustment for degrees of freedom used in estimating or2.
In a system of simultaneous equations, we wishto estimate the parameters in
Here we have n linear simultaneous equations in n endogenous and m exogenous variables. The parameters to be estimated are the elements of the n x n coefficient matrix
B = β_{ij}),
the n x m coefficient matrix
Γ=(γ_{ik}),
and the n x n variancecovariance matrix
Σ=(σ_{ij})
The variances and covariances are defined by
σ_{ij}=E(u_{i}u_{j})
A rule of normalization is applied for each equation,
In practice, one element of β_{i} = (β_{il}, · · ·,β_{in}) in each equation is singled out and assigned a value of unity.
The likelihood function for the whole system is
where y_{t} = (y_{it}, · · ·, y_{nt}), z = (z_{1t}, · · ·, z_{mt})  B  is the determinant of S, and mod  B  is the absolute value of the determinant of B (Koopmans et al. 1950). The matrix B enters this expression as the Jacobian of the transformation from the variables u_{1t} · · ·, u_{nt} to y_{1t} , · · ·, y_{nt}. The problem of maximum likelihood estimation is to maximize L or log L with respect to the elements of B,Γ, and Σ. This is especially difficult compared with the similar problem for single equations shown above, because log L is highly nonlinear in the unknown parameters, a difficult source of nonlinearity coming from the Jacobian expression mod  B .
Maximizing log L with respect to Σ^{1}, we obtainthe maximum likelihood estimator of Σ, which is
Σ = (B T) M (BT),
where M is the moment matrix of the observations, i.e.,
Substitution of Σ into the likelihood function yields the concentrated form of the likelihood function
logL = Const. + Tlogmod  B  (T/2)log  Σ ,
where Const, is a constant. Hence, we seek estimators of B and F that maximize
In the singleequation case we minimize the oneelement variance expression, written as a function of the cti. In the simultaneous equation case, we maximize , but this can be shown to be equivalent (Chow 1964) to minimization of  Σ , subject to the normalization rule
where C is a constant. This normalization is direction normalization, and as long as it is taken into account, scale normalization (such as β_{ii} = 1, cited previously) is arbitrary. Viewed in this way, the method of maximum likelihood applied to a system of equations appears to be a natural generalization of the method of maximum likelihood applied to a single equation, in which case we minimize cr2 subject to a directionnormalization rule.
Recursive systems. The concentrated form of the likelihood function shows clearly that a new element is introduced into the estimation process, through the presence of the Jacobian determinant, which makes calculations of the maximizing values of B and F highly nonlinear. It is therefore worthwhile to search for special situations in which estimation methods simplify at least to the point of being based on linear calculations.
It is evident that the concentrated form of the likelihood function would lend itself to simpler methods of estimating B and Γ if  B  were a known constant. This would be the case if B were triangular, for then, by a scale normalization, we would have β_{ii} = 1 and  B  = 1. If B is triangular, the system of equations is called a recursive system. We then simply minimize  ʣ  with respect to the unknown coefficients; this can be looked upon as a generalized variance minimization, an obvious analogue of least squares applied to a single equation.
If, in addition, it can be assumed that Σ is diagonal, maximum likelihood estimators become a series of successive singleequation least squares estimators. Since the matrix B is assumed to be triangular, there must be an equation with only one unlagged endogenous variable. This variable (with unit coefficient) is to be regressed on all the predetermined variables in that equation. Next, there will be an equation with one new endogenous variable. This variable is regressed on the preceding endogenous variable and all the predetermined variables in that equation. In the third equation, another new endogenous variable is introduced. It is regressed on the two preceding endogenous variables and all the predetermined variables in that equation, and so on.
If Σ is not diagonal, a statistically consistent procedure would be to use values of the endogenous variables computed from preceding equations in the triangular array instead of using their actual values. Suppose one equation in the system specifies y_{1} as a function of certain z’s. We would regress y_{1} on these z’s and then compute values of y_{1} from the relation
where the γ_{lk} are the least squares regression estimators of the γ_{rt}. (Some of the γ_{1k}, are zero, as a result of the identifying restrictions imposed prior to computing the regression.) Suppose a second equation in the system specifies y_{2} as a function of y_{1} and certain z’s. Our next step would be to regress y_{2}, on ŷ_{1}, and the included z’s and then compute values of y_{2} from the relation
The procedure would be continued until all n equations are estimated.
Methods of dealing with recursive systems have been studied extensively by Wold, and a summary appears in Strotz and Wold (1960). A recursive system without a diagonal Σmatrix is found in Barger and Klein (1954). One of the most familiar types of recursive systems studied in econometrics is the cobweb model of demand and supply for agricultural products [SeeBUSINESS CYCLES, article On MATHEMATICAL MODELS].
Limitedinformation maximum likelihood. Another maximum likelihood approach that is widely used is the limitedinformation maximum likelihood method. It does not hinge on a specific formulation of the model, as do methods for recursive systems; it is a simplified method because it neglects information. As we have seen, identifying restrictions for an equation takes the form of specifying zero values for some parameters or of imposing certain linear relations on some parameters. The term ’limited information” refers to the fact that only the restrictions relating to the particular equation (or subset of equations) being estimated are used. Restrictions on other equations in the system are ignored when a particular equation is being estimated.
Let us again consider the linear system
These equations make up the structural form of the system and are referred to as structural equations. We denote the reduced form of this system by
From the reduced form equations select a subset corresponding to the n, endogenous variables in a particular structural equation, say equation i, which is
The summation limit m, indicates the number of predetermined variables included in this equation; we have excluded all zero elements in γ_{i} and indexed the z’s accordingly. Form the joint distribution of v_{1}, · · ·, v_{nt} over the sample observations and maximize it with respect to the unknown parameters in the ith structural equation, subject to the restrictions on this equation alone. The restrictions usually take the form
where there are m predetermined variables in the whole system; that is, the γ_{ik}, k = m_{1} + 1, · · · ,m, are specified to be zero. The estimated coefficients, AJ, Jut, erf, obtained from this restricted likelihood maximization are the limitedinformation estimators. Methods of obtaining these estimators and a study of their properties are given in Anderson and Rubin (1949).
Linear regression calculations are all that areneeded in this type of estimation, save for the extraction of a characteristic root of a matrix with dimensionality nj x nt. A quickly convergent series of iterations involving matrix multiplication leads to the computation of this root and associated vector. The vector obtained, properly normalized by making one coefficient unity, provides estimates of the j8j,. The estimates of the yik are obtained from
where the are least squares regression coefficients from the reduced form equations.
It is significant that both fullinformation and limitedinformation maximum likelihood estimators are essentially unchanged no matter which variable is selected to have a unit coefficient in each equation. That is to say, if we divide through an estimated equation by the coefficient of any endogenous variable, we get a set of coefficients that would have been obtained by applying the estimation methods under the specification that the same variable have the unit coefficient. Fullinformation and limitedinformation maximum likelihood estimators are invariant under this type of scale normalization. Other estimators are not.
Twostage least squares . The classical method of least squares multiple regression applied to a single equation that is part of a larger simultaneous system is inconsistent by virtue of the fact that some of the “explanatory” variables in the regression (the variables with unknown coefficients) may not be independent of the error variable. If we can “purify” such variables to make them independent of the error terms, we can apply ordinary least squares methods to the transformed variables. The method of twostage least squares does this for us.
Let us return to the equation estimated above by limited information. Choose z/i, say, as the dependent variable, that is, set /3;i equal to unity. In place of y~,, · · · , ynit, we shall use
as explanatory variables. The yjt are computed values from the least squares regressions of z/, on all the zb in the system (k= 1, · · · ,m~). The coefficients 77;t are the computed regression coefficients. The regression of 7/t on y.,, ··,”,, zt, · · ·, zi provides a twostage least squares estimator of the single equation. All the equations of a system may be estimated in this way. This can be seen to be a generalization, to systems with nontriangular Jacobians, of the method suggested previously for recursive models in which the variancecovariance matrix of disturbances is not diagonal.
We may write the “normal” equations for theseleast squares estimators as
In this notation y_{t} is the vector of computed values is the vector (z_{lt}, ··· ,z_{mt},);
b is the estimator of the vector (β_{i2}, · · ·, β_{in1}); and c is the estimator of the vector (γ_{i1} · · ·, γ_{m1}). It should be noted that It should be further observed that
In this expression the whole vector , = (z_{1t}, · · ·, z_{1m},) which includes all the predetermined variables in the system, is used for the evaluation of the relevant moment matrices.
kCIass estimators . Theil (1958) and Basmann (1957), independently, were the first to advocate the method of twostage least squares. Theil suggested a whole system of estimators, called the /eclass. He denned these as the solutions to
In this expression is the vector of residuals computed from the reduced form regressions of y^t, · · ·, J/v on all the z*t. If k = 0, we have ordinary least squares estimators. If k = 1, we have twostage least squares estimators. If k – 1 + X and X is the smallest root of the determinantal equation
we have limitedinformation maximum likelihood estimators. This is a succinct way of showing the relationships between various singleequation methods of estimation. Of these three members of the feclass, ordinary least squares is not consistent; the other two are.
Threestage least squares . Other members of the feclass could be cited for special discussion, but a more fruitful generalization of Theil’s approach to estimation theory lies in the direction of what he and Zellner (see Zellner & Theil 1962) call threestage least squares. Chow (1964) has shown that threestage least squares estimators are better viewed as simultaneous twostage least squares estimators. Let us denote u( as the vector of residuals associated with each equation in a system that has had elements of replaced by yit, that is, where =(u_{1t}, · · · ,u_{nt})
In this formulation Σ means summation over all values of j except j = i, and it is assumed that the ith endogenous variable in the ith equation has a unit coefficient. Some elements of ySi, and yn, are zero or otherwise restricted for identification. Define
Minimization of  Z  with respect to fta and y^ yields the estimators sought. Thus, we have an extension of the principle of least squares in which the generalized variance is minimized. This is like the principle of maximum likelihood, which also minimizes  Z , expressed in terms of the fin and yik, but the direction normalization is different.
Theil and Zellner termed their method threestage least squares because they first derived twostage least squares estimators for each single equation in the system. They computed the residual variance for each equation and used these as estimators of the variances of the true (unobserved) random disturbances. They then used Aitken’s generalized method of least squares (1935) to estimate all the equations in the system simultaneously. Aitken’s method applies to systems of equations in which the variancecovariance matrix for disturbances is a general known positive definite matrix. Theil and Zellner used the twostage estimator of this variancecovariance matrix as though it were known. The advantage of this method is that it is of the fullinformation variety, making use of restrictions on all the equations of the system.
Other methods . If the conditions for identification of a single equation are such that there are just enough restrictions to transform linearly and uniquely the reduced form coefficients into the structural coefficients, an indirect least squares method of estimation can be used. Exact identification under zerotype restrictions would enable one to solve
for a unique set of estimated β_{i1}, apart from scale normalization, given a set of estimated π_{jk} . The latter would be determined from least squares estimators of the reduced forms. Since there are n_{1} – 1 of the β_{i1}, to be determined, the necessary condition for exact identification here is that n_{1} – 1 = mm_{1}.
If there is underidentification, i.e., too few a priori restrictions, structural estimation cannot be completed but unrestricted reduced forms can be estimated by the method of least squares. This is the most information that the econometrician can extract when there is lack of identification. Least squares estimators of the reduced form equations are consistent in the underidentified case, but estimates of the structural parameters cannot be made.
Instrumental variables. The early discussion of estimation problems in simultaneous equation models contained, on many occasions, applications of a method known as the method of instrumental variables. In estimating the ith equation of a linear system, i.e.,
we may choose (n, – 1) + ml variables that are independent of MI, . These are known as the instrumental set. Naturally, the exogenous variables in the equation (z1(, · · ·, zmi are possible members of this set. In addition, we need nl – \ more instruments from the list of exogenous variables in the system but not in the zth equation. For this problem let these be denoted as x2t, · · ·, xni,. Since E(z8Wi() = 0, s = l, ···,m1> and E(xrtu) = 0, r = 2, · · ·, n,, we can estimate the unknown parameters from
With a scalenormalization rule, such as /JH = 1, we have (n! – 1) + TO, linear equations in the same number of unknown coefficients. In exactly identified models there is no problem in picking the xrt, for there will always be exactly n, – 1 z’s excluded from the ith equation. The method is then identical with indirect least squares. If TO – m, > ni – 1, i.e., if there are more exogenous variables outside the zth equation than there are endogenous variables minus one, we have overidentification, and the number of possible instrumental variables exceeds the minimum needed. In order to avoid the problem of subjective or arbitrary choice among instruments, we turn to the methods of limited information or twostage least squares. In fact, it is instructive to consider how the method of twostage least squares resolves this matter. In place of single variables as instruments, it uses linear combinations of them. The computed values
are the new instruments. We can view the method either as the regression of y on · · ·, yni, zi, ··· ,Zmt or as instrumentalvariable estimators with y,t, · · ·, yn,t, ZK , · · ·, Zmtt as the instruments. Both come to the same thing. The method of instrumental variables yields consistent estimators.
Subgroup averages. The instrumentalvariables method can be applied in different forms. One form was used by Wald (1940) to obtain consistent estimators of a linear relationship between two variables each of which is subject to error. This gives rise to a method that can be used in estimating econometric systems. Wald proposed that the estimator of β in
y_{t} = α+βx_{t}
where y_{t} and xt are both measured with error, be computed from
He proposed ordering the sample in ascending magnitudes of the variable x. From two halves of the sample, we determine two sets of mean values of y and x. The line joining these means will have a slope given by ft. Wald showed the conditions under which these estimates are consistent.
This may be called the method of subgroup averages. It is a very simple method, which may readily be applied to equations with more than two parameters. The sample is split into as many groups as there are unknown parameters to be determined in the equation under consideration. If there are three parameters, for example, the sample may be split into thirds and the parameters estimated from
The extension to more parameters is obvious. The method of subgroup averages can be shown to be a form of the instrumentalvariables method by an appropriate assignment of values to “dummy” instrumental variables.
Subgroup averages is a very simple method, and it is consistent, but it is not very efficient.
Simultaneous least squares. The simultaneous least squares method, suggested by Brown (1960), minimizes the sum of squares of all reduced form disturbances, subject to the parameter restrictions imposed on the system, i.e., it minimizes
subject to restrictions. Suppose that the vlt are expressed as functions of the observables and parameters, with all restrictions included; then Brown’s method minimizes the sum of the elements on the main diagonal of”, where” is the variancecovariance matrix of reduced form disturbances, whereas fullinformation maximum likelihood min!imizes ZB.
Brown’s method has the desirable property of being a fullinformation method; it is distribution free; it is consistent; but it has the drawback that its results are not invariant under linear transformations of the variables. This drawback can be removed by expressing the reduced form disturbance in standard units
and minimizing
Evaluation of alternative methods
The various approaches to estimation of whole systems of simultaneous equations or individual relationships within such systems are consistent except for the singleequation least squares method. If the system is recursive and disturbances are independent between equations, least squares estimators are also consistent. In fact, they are maximum likelihood estimators for normally distributed disturbances. But generally, ordinary least squares estimators are not consistent. They are included in the group of alternatives considered here because they have a timehonored status and because they have minimum variance. In largesample theory, maximum likelihood estimators of parameters are generally efficient compared with all other estimators. That is why we choose fullinformation maximum likelihood estimators as norms. They are consistent and efficient. Least squares estimators are minimumvariance estimators if their variances are estimated about estimated (inconsistent) sample means. If their variances are measured about the true, or population, values, it is not certain that they are efficient.
Limitedinformation estimators are less efficient than fullinformation maximum likelihood estimators. This should be intuitively obvious, since fullinformation estimators make use of more a priori information; it is proved in Klein (1960). Twostage least squares estimators have asymptotically the same variancecovariance matrix as limitedinformation estimators, and threestage (or simultaneous twostage) least squares estimators have the same variancecovariance matrix as fullinformation maximum likelihood estimators. Thus, asymptotically the two kinds of limitedinformation estimators have the same efficiency, and the two kinds of fullinformation estimators have the same efficiency. The instrumentalvariables or subgroupaverages methods are generally inefficient. Of course, the instrumentalvariables method can be pushed to the point where it is the same as twostage least squares estimation and can thereby gain efficiency.
A desirable aspect of the method of maximum likelihood is that its properties are preserved under a singlevalued transformation. Thus, efficient estimators of structural parameters by this method transform into efficient estimators of reduced form parameters. The apparently efficient method of least squares may lose its efficiency under this kind of transformation. In applications of models, we use the reduced form in most cases, not the individual structural equations; therefore the properties under conditions of transformation from structural to reduced form equations are of extreme importance. Limitedinformation methods are a form of maximum likelihood methods. Therefore the properties of limited information are preserved under transformation.
To obtain limitedinformation estimators of thesingle equation
we maximize the joint likelihood of vlt, · · ·, vnit in
subject to the restrictions on the ith equation. In this case only the Wj reduced forms corresponding to 2/i , · · · , yn,t are used. It is also possible to simplify calculations, and yet preserve consistency (although at the expense of efficiency), by using fewer than all m predetermined variables in the reduced forms. In this sense the reduced forms of limitedinformation estimation are not necessarily unique, and the same endogenous variable appearing in different structural equations of a system may not have the same reduced form expression for each equation estimator. There is yet another sense in which we may derive reduced forms for the method of limited information. After each equation of a complete system has been estimated by the method of limited information, we can derive algebraically a set of reduced forms for the whole system. These would, in fact, be the reduced forms used in forecasting, multiplier analysis, and similar applications of systems. The efficiency property noted above for limited and full information has not been proved for systems of this type of reduced forms, but this has been studied in numerical analysis (see below).
Ease of computation . Finally we come to an important practical matter in the comparison of the different methods of estimationrelative ease of computation. Naturally, calculations are simpler and smaller in magnitude for singleequation least squares than for any of the other methods except that of subgroup averages. The method of instrumental variables is of similar computational complexity, but for equations with four or more variables it pays to have the advantage of symmetry in the moment matrices, as is the case with singleequation least squares. This is haly a consideration with modern electronic computing machines, but it is worth consideration if electric desk machines are being used.
The nextsimplest calculations are those for twostage least squares. These consist of a repeated application of least squares regression techniques of calculation, but the first regressions computed are of substantial size. There are as many independent variables in the regression as there are predetermined variables in the system, provided there are enough degrees of freedom. Essentially, the method amounts to the calculation of parameters and computed dependent variables in
Only the “forward” part of this calculation by the standard GaussDoolittle method need be made in order to obtain the moment matrix of the ylt. In the next stage we compute the regression
Two important computing problems arise in the first stage. In many systems m>T; i.e., there are insufficient degrees of freedom in the sample for evaluation of the reduced forms. We may choose a subset of the zkt, or we may use principal components of the ZK (Kloek & Mennes 1960). Systematic and efficient ways of choosing subsets of the zkt have been developed by taking account of the recursive structure of the model (Fisher 1965). In many economic models m has been as large as 30 or more, and it is often difficult to make sufficiently accurate evaluation of the reduced form regression equations of this size, given the amount of multicollinearity found in economic data with common trends and cycles. The same procedures used in handling the degreesoffreedom problem are recommended for getting round the difficulties of multicollinearity. Klein and Nakamura (1962) have shown that multicollinearity problems are less serious in ordinary than in twostage least squares. They have also shown that these problems increase as we move on to the methods of limitedinformation and then fullinformation maximum likelihood.
Limitedinformation methods require all the computations of twostage least squares and, in addition, the extraction of a root of an nt x determinantal equation. The latter calculation can be done in a straightforward manner by iterative matrix multiplication, usually involving fewer than ten iterations.
Both limited information and twostage least squares are extremely well adapted to modern computers and can be managed without much trouble on electric desk machines.
Threestage least squares estimators involve the computation of twostage estimators for each equation of a system, estimation of a variancecovariance matrix of structural disturbances, and simultaneous solution of a linear equation system of the order of all coefficients in the system. This last step may involve a large number of estimating equations for a model of 30 or more structural equations.
All the previous methods consist of standard linear matrix operations. The extraction of a characteristic root is the only operation that involves nonlinearities, and the desired root can quickly be found by an iterative process of matrix multiplication. Fullinformation maximum likelihood methods, however, are quite different. The estimation equations are highly nonlinear. For small systems of two, three, or four equations, estimates have been made without much trouble on large computers (Eisenpress 1962) and on desk machines (Chernoff & Divinsky 1953). The problem of finding the maximum of a function as complicated as the joint likelihood function of a system of 15 to 20 or more equations is, however, formidable. Electronic machine programs have been developed for this purpose. The most standardized sets of fullinformation maximum likelihood calculations are for systems that are fully linear in both parameters and variables. Singleequation methods require linearity only in unknown parameters, and this is a much weaker restriction. Much progress in computation has been made since the first discussion of these econometric methods of estimation, in 1943, but the problem is far from solved, and there is no simple, pushbutton computation. This is especially true of fullinformation maximum likelihood.
Efficient programs have recently been developed for calculating fullinformation maximum likelihood estimates in either linear or nonlinear systems, and these have been applied to models of as many as 15 structural equations, involving more than 60 unknown parameters.
Generalization of assumptions . The basis for comparing different estimation methods or for preferring one method over another rests on asymptotic theory. The property of consistency is a largesample property, and the sampling errors used to evaluate efficiency measures are asymptotic formulas. Unfortunately, samples of economic data are frequently not large, especially time series data. The amount of smallsample bias or the smallsample confidence intervals for parameter estimators are not generally known in specific formulas. Constructed numerical experiments, designed according to Monte Carlo methods, havethrown some light on the smallsample properties. These are reported below.
Another assumption sometimes made for the basic model is that the error terms are mutually independent. We noted above that successive least squares treatment of equations in recursive systems is identical with maximum likelihood estimation when the variancecovariance matrix of structural disturbances is diagonal. This implies mutual independence among contemporaneous disturbances. In a time series model we usually make another assumption, namely, that
E(u_{it}u_{jt}) = 0, t = t,for all i, j
The simplest way in which this assumption can be modified is to allow the errors to be related in some linear autoregressive process, such as
where E(ei,e,f’) = 0 (t * t’, for all i, 7). In a formal sense joint maximum likelihood estimation of structural parameters and autoregressive coefficients, put,, can be laid out in estimation equations, but there are no known instances where these have been solved on a large scale, for the estimation equations are very complicated. For singleequation models or for recursive systems which split into a series of singleequation regressions, the autoregressive parameters of first order have been jointly estimated with structural parameters (Barger & Klein 1954). The principal extensions to larger systems have been in cases where the autoregressive parameters are known a priori. Then it is easy to make known autoregressive transformations of the variables and proceed as in the case of independent disturbances. [SeeTIME SERIES].
Related to the above two points is the treatment of lagged values of endogenous variables as predetermined variables. The presence of lagged endogenous variables reflects serial correlation among endogenous variables rather than among disturbances. In large samples it can be shown that for purposes of estimation we are justified in treating lagged variables as predetermined, but in small samples we incur bias on this account.
Another assumption regarding the disturbances in simultaneous equation systems is that they are mainly due to neglected or unmeasurable variables that affect or disturb each equation of the model. They are regarded as errors in behavior or technology. From a formal mathematical point of view, they could equally well be regarded as a direct error in observation of the normalized dependent variable in each equation, assuming that the system is written so that there is a different normalized dependent variable in each equation. There is an implicit assumption that the exogenous variables are measured without error. If we change the model to one in which random errors enter through disturbances to each relation and also through inaccurate observation of each individual variable, we have a more complicated probability scheme, whose estimation properties have not been developed in full generality. This again has been a case for numerical treatment by Monte Carlo methods.
The procedures of estimating simultaneous equation models as though errors are mutually independent when they really are not and as though variables are accurately measured when they really are not are specification errors. Other misspecifications of models can occur. For simplicity we assume linearity or, at least, linearity in unknown parameters, but the true model may have a different functional form. Errors may not follow the normal distribution, as we usually assume. [SeeERRORS, article on EFFECTS OF ERRORS IN STATISTICAL ASSUMPTIONS.]
Fullinformation methods are sensitive to specification error because they depend on restrictions imposed throughout an entire system. Singleequation methods depend on a smaller set of restrictions. If an investigator has particular interest in just one equation or in a small sector of the economy, he may incur large specification error by making too superficial a study of the parts of the economy that do not particularly interest him. There is much to be said for using singleequation methods (limited information or twostage least squares) in situations where one does not have the resources to specify the whole economy adequately.
There are numerous possibilities for specifying models incorrectly. These probably introduce substantial errors in applied work, but they cannot be studied in full generality for there is no particular way of showing all the misspecifications that can occur. We can, however, construct artificial numerical examples of what we believe to be the major specification errors. These are discussed below.
Sampling experiments . The effect on estimation methods of using simplified assumptions that are not fully met in real life often cannot be determined by general mathematical analysis. Econometricians have therefore turned to constructing sampling experiments with largescale computers to test proposed methods of estimation where (1) the sample is small; and (2) there is specification error in the statement of the model, such as(a) nonzero parameters assumed to be zero, (b) dependent exogenous variables and errors assumed to be independent, (c) imperfectly measured exogenous variables assumed to be perfectly measured, or (d) serially correlated errors assumed to be not serially correlated.
Socalled Monte Carlo methods are used to perform the sampling experiments that conceptually underlie sampling error calculations. These sampling experiments are never, in fact, carried out with nonexperimental sources of data, for we cannot relive economic life over and over again; but we can instruct a machine to simulate such an experiment.
Consider a single equation to be estimated by different methods, for example,
y_{1} = α+βx_{i}+u_{i}, t = 1,2, ··· ,T
Fix a and /3 at, say, 3.0 and 0.5, respectively, andset T = 30. This would correspond to the process
We also fix the values of the predetermined variables x,,x~, ··· ,xso once and for all. We set T = 30 to indicate that we are dealing with a 30element small sample. A sample of 30 annual observations would be the prototype.
Employing a source of random numbers scaled to have a realistic standard deviation and a zero mean, we draw a set of random numbers ult · · ·, u30. We then instruct a machine to use ult ··· ,u,a and x\, · · ·, *ao to compute r/,, · · ·, j/so from the above formulas. From the samples of data, z/,, · · · ,yso and x,, · ·· ,x30, we estimate a and /3 by the methods being studied. Let a and $ be the estimated values. We then draw a new set of random numbers, u,, · · ·, MM, and repeat the process, using the same values of xl, · · ·, xM. From many such repetitions, say 100, we have sampling distributions of a an. Means of these distributions, when compared with a (= 3.0) and ft (= 0.5), indicate bias, if any, and standard deviations or rootmeansquare values about 3.0 or 0.5 indicate efficiency. From these sampling distributions we may compare different estimators of a and ft.
What we have said about this simple type of experiment for a single equation can readily be extended to an entire system:
In this case we must start with assumed values of B and F. We choose a Telement vector of values for each element of x,, the predetermined variables, and repeated Telement vectors of values for each element of u(. The random variables are chosen so that their variancecovariance matrix equals some specified set of values. As in the singleequation case, T = 30 or some likely smallsample value. The xt are often chosen in accordance with the values of predetermined variables used in actual models. In practice, Monte Carlo studies of simultaneous equation models have dealt with small systems having only two, three, or four equations.
Two sets of results are of interest from these studies. Estimates of individual elements in B and F can be studied and compared for different estimators; estimates of BT, the reducedform coefficients, can be similarly investigated. In addition, we could form some overall summary statistic, such as standard error of forecast, for different estimators.
The simplest Monte Carlo experiments have been made to test for smallsample properties alone; they have not introduced measurement errors, serial correlation of disturbances, or other specification errors. Generally speaking, these studies clearly show the bias in singleequation least squares estimates where some of the “independent” variables in the regression calculation are not independent of the random disturbances. Maximum likelihood estimators (full or limited information) show comparatively small bias. The standard deviations of individual parameter estimators are usually smallest for the singleequation least squares method, but this standard deviation is computed about the biased sample mean. If estimated about the true mean, least squares sometimes does not show up well, indicating that bias outweighs efficiency. Fullinformation maximum likelihood shows up as an efficient method, whether judged in terms of variation about the sample or the true mean. Twostage least squares estimators appear to have somewhat smaller variance about the true values than do limitedinformation estimators, and both methods measure up to the efficiency of singleequation least squares methods when variability is measured about the true mean.
Asymptotically, limitedinformation and twostage estimators have the same variancecovariance matrices, and they are both inefficient compared with fullinformation estimators. The Monte Carlo results for small samples are not surprising, although the particular experiments studied give a slight edge to twostage estimators.
When specification error is introduced, in the form of making an element of T zero in the estimation process when it is actually nonzero in thepopulation, we find that fullinformation methods are very sensitive. Both limitedinformation and twostage estimators perform better than fullinformation maximum likelihood. Twostage estimators are the best among all methods examined in this situation. Limitedinformation estimators are very sensitive to intercorrelation among predetermined variables.
The principal result for Monte Carlo estimators of reduced form parameters is that transformed singleequation least squares values lose their efficiency properties. Being seriously biased as well, these estimates show a poor overall rating when used for estimating reduced forms for a system as a whole. Fullinformation estimators, which are shown in these experiments to be sensitive to specification error, do better in estimating reduced form coefficients than in estimating structural coefficients. Their gain in making use of all the a priori information outweighs the losses due to the misspecification introduced and, in the end, gives them a favorable comparison with ordinary least squares estimators of the reduced form equations that make no use of the a priori information and have no specification error.
If a form of specification error is introduced in a Monte Carlo experiment by having common time trends in elements of xt and u(, so that they are not independent as hypothesized, we find that limitedinformation estimators are as strongly biased as are ordinary least squares values. If time trend is introduced as an additional variable, however, the limitedinformation method has small bias.
When observation errors are imposed on the xt, both least squares and limitedinformation estimators show little change in bias but increases in sampling errors. In this model, it turns out as before that the superior efficiency of least squares estimators of individual structural parameters does not carry over to the estimators of reduced form parameters.
A comprehensive samplingexperiment study of alternative estimators under correctly specified and under misspecified conditions is given in Summers (1965), and Johnston (1963) compares results from several completed Monte Carlo studies. This approach is in its infancy, and further investigations will surely throw new light on the relative merits of different estimation methods.
For some years economists were digesting the modern approach to simultaneous equation estimation introduced by Haavelmo, Mann and Wald, Anderson and Rubin, and Koopmans, Rubin, and Leipnik, and there was a period of little change in this field. Since the development of the twostage least squares method by Theil, there have been a number of developments. The methods are undergoing interpretation and revision. New estimators are being suggested, and it is likely that many new results will be forthcoming in the next few decades. Wold (1965) has proposed a method based on iterative least squares that recommends itself by its adaptability to modern computers, its consistency, and its capacity to make use of a priori information on all equations simultaneously and to treat some types of nonlinearity with ease. Also, excellent recent books, by Christ (1966), Goldberger (1964), and Malinvaud (1964), greatly aid instruction in this subject.
Lawrence R. Klein
[See alsoLINEAR HYPOTHESES, article on REGRESSION.]
BIBLIOGRAPHY
Aitken, A. C. 1935 On Least Squares and Linear Combination of Observations. Royal Society of Edinburgh, Proceedings 55:4248.
Anderson, T. W.; and Rubin, Herman 1949 Estimation of the Parameters of a Single Equation in a Complete System of Stochastic Equations. Annals of Mathematical Statistics 20:4663.
Barger, Harold; and Klein, Lawrence R. 1954 A Quarterly Model for the United States Economy. Journal of the American Statistical Association 49:413437.
Basmann, R. L. 1957 A Generalized Classical Method of Linear Estimation of Coefficients in a Structural Equation. Econometrica 25:7783.
Brown, T. M. 1960 Simultaneous Least Squares: A Distribution Free Method of Equation System Structure Estimation. International Economic Review 1: 173191.
Chernoff, Herman; and Divinsky, Nathan 1953 The Computation of Maximumlikelihood Estimates of Linear Structural Equations. Pages 236269 in Cowles Commission for Research in Economics, Studies in Econometric Method. Edited by William C. Hood and Tjalling C. Koopmans. New York: Wiley.
Chow, Gregory C. 1964 A Comparison of Alternative Estimators for Simultaneous Equations. Econometrica 32:532553.
Christ, C. F. 1966 Econometric Models and Methods. New York: Wiley.
Eisenpress, Harry 1962 Note on the Computation of Fullinformation Maximumlikelihood Estimates of Coefficients of a Simultaneous System. Econometrica 30:343348.
Fisher, Franklin M. 1965 Dynamic Structure and Estimation in Economywide Econometric Models. Pages 589635 in James S. Duesenberry et al., The Brookings Quarterly Econometric Model of the United States. Chicago: Rand McNally.
Fisher, Franklin M. 1966 The Identification Problem in Econometrics. New York: McGrawHill.
Goldberger, Arthur S. 1964 Econometric Theory. New York: Wiley.
Haavelmo, Trygve 1943 The Statistical Implications of a System of Simultaneous Equations. Econometrica 11:112.
Johnston, John 1963 Econometric Methods. New York: McGrawHill.
Klein, Lawrence R. 1960 The Efficiency of Estimation in Econometric Models. Pages 216232 in Ralph W. Pfouts, Essays in Economics and Econometrics: A Volume in Honor of Harold Hotelling. Chapel’Hill: Univ. of North Carolina Press.
Klein, Lawrence R.; and Nakamura, Mitsugu 1962 Singularity in the Equation Systems of Econometrics: Some Aspects of the Problem of Multicollinearity. International Economic Review 3:274299.
Kloek, T.; and Mennes, L. B. M. 1960 Simultaneous Equations Estimation Based on Principal Components of Predetermined Variables. Econometrica 28:4561.
Koopmans, Tjalling C.; Rubin, Herman; and Leipnik, R. B. (1950) 1958 Measuring the Equation Systems of Dynamic Economics. Pages 53237 in Tjalling C. Koopmans (editor), Statistical Inference in Dynamic Economic Models. Cowles Commission for Research in Economics, Monograph No. 10. New York: Wiley.
Malinvaud, Edmond (1964) 1966 Statistical Methods of Econometrics. Chicago: Rand McNally. First published in French.
Mann, H. B.; and Wald, Abraham 1943 On the Statistical Treatment of Linear Stochastic Difference Equations. Econometrica 11:173220.
Strotz, Robert H.; and Wold, Herman 1960 A Triptych on Causal Chain Systems. Econometrica 28:417463.
Summers, Robert 1965 A Capital Intensive Approach to the Small Sample Properties of Various Simultaneous Equation Estimators. Econometrica 33:141.
Theil, Henri (1958) 1961 Economic Forecasts and Policy. 2d ed., rev. Amsterdam: NorthHolland Publishing.
Tinbergen, Jan 1939 Statistical Testing of Businesscycle Theories. Volume 2: Business Cycles in the United States of America: 19191932. Geneva: League of Nations, Economic Intelligence Service.
Wald, Abraham 1940 The Fitting of Straight Lines if Both Variables Are Subject to Error. Annals of Mathematical Statistics 11:284300.
Wold, Herman 1965 A Fixpoint Theorem With Econometric Background. Arkiv fur Matematik 6:209240.
Zellner, Arnold; and Theil, Henri 1962 Threestage Least Squares: Simultaneous Estimation of Simultaneous Equations. Econometrica 30:5478.
Cite this article
Pick a style below, and copy the text for your bibliography.

MLA

Chicago

APA
"Simultaneous Equation Estimation." International Encyclopedia of the Social Sciences. . Encyclopedia.com. 17 Jul. 2019 <https://www.encyclopedia.com>.
"Simultaneous Equation Estimation." International Encyclopedia of the Social Sciences. . Encyclopedia.com. (July 17, 2019). https://www.encyclopedia.com/socialsciences/appliedandsocialsciencesmagazines/simultaneousequationestimation
"Simultaneous Equation Estimation." International Encyclopedia of the Social Sciences. . Retrieved July 17, 2019 from Encyclopedia.com: https://www.encyclopedia.com/socialsciences/appliedandsocialsciencesmagazines/simultaneousequationestimation
Citation styles
Encyclopedia.com gives you the ability to cite reference entries and articles according to common styles from the Modern Language Association (MLA), The Chicago Manual of Style, and the American Psychological Association (APA).
Within the “Cite this article” tool, pick a style to see how all available information looks when formatted according to that style. Then, copy and paste the text into your bibliography or works cited list.
Because each style has its own formatting nuances that evolve over time and not all information is available for every reference entry or article, Encyclopedia.com cannot guarantee each citation it generates. Therefore, it’s best to use Encyclopedia.com citations as a starting point before checking the style against your school or publication’s requirements and the mostrecent information available at these sites:
Modern Language Association
The Chicago Manual of Style
http://www.chicagomanualofstyle.org/tools_citationguide.html
American Psychological Association
Notes:
 Most online reference entries and articles do not have page numbers. Therefore, that information is unavailable for most Encyclopedia.com content. However, the date of retrieval is often important. Refer to each style’s convention regarding the best way to format page numbers and retrieval dates.
 In addition to the MLA, Chicago, and APA styles, your school, university, publication, or institution may have its own requirements for citations. Therefore, be sure to refer to those guidelines when editing your bibliography or works cited list.