Statistical Identifiability

views updated

Statistical Identifiability

Identifiability is a statistical concept referring to the difficulty of distinguishing among two or more explanations of the same empirical phenomena. Unlike traditional statistical problems (for example, estimation and hypothesis testing), identifiability does not refer to sampling fluctuations stemming from limited data; rather, nonidentifiability, or the inability to distinguish among explanations, would exist even if the statistical distribution of the observables were fully known.

A model represents an attempt to describe, explain, or predict the values of certain variables as the outputs of a formally described mechanism. Yet it is evident that given any specified set of facts or observations to be explained, an infinite number of models are capable of doing so. One way of describing all scientific work is as the task of distinguishing among such eligible models by the introduction of further information.

The problem of identification, as usually encountered, is essentially the same phenomenon in a more restricted context. Suppose that the form of the explanatory model is regarded as specified, but that it involves unknown parameters. Suppose further that the observational material to be explained is so abundant that the basic statistical distributions may be regarded as known. (In practice this will rarely be the case, but identifiability considerations require thinking in these terms.) An important task then is to select from all possible structures (sets of values for the unknown parameters) contained in the model the particular one that, according to some criterion, best fits the observations. It may happen, however, that there are two, several, or even an infinite number of structures generating precisely the same distribution for the observations. In this case no amount of observation consistent with the model can distinguish among such structures. The structures in question are thus observationally equivalent.

It may, however, be the case that some specific parameter or set of parameters is the same in all observationally equivalent structures. In such a case, that set is said to be identifiable. Parameters whose values are not the same for all observationally equivalent structures are not identifiable; their values can never be recovered solely by use of observations generated by the model.

In its simplest form lack of identifiability is easy to recognize. Suppose, for example, that a random variable, X, is specified by the model to be distributed normally, with expectation or mean the difference between two unknown parameters, EX = θ — θ₂. It is evident that observations on X can be used to estimate θ₁ — 0₂, which is identifiable, but that the individual parameters, 9_l and θ₂, are not identifiable. The θ, can be recovered only by combining outside information with observations on X or by changing the whole observational scheme. In cases such as this, the θ are sometimes said not to be estimable. Observations on X do restrict the θ, since their difference can be consistently estimated, but can never distinguish the true θ, generating the observations from among all 0; with the same difference. Although one way of describing the situation is to note that the likelihood function for a random sample has no unique maximum but has a ridge along the line θ_l— θ₂= x, the sample average, it is instructive to note further that the problem persists even if the model is non-stochastic and X is a constant.

(In the context of the general linear hypothesis model, the concept of estimability has been developed by R. C. Bose, C. R. Rao, and others, apparently independently of the more general identifiability concept. [See Linear Hypotheses, article on REGRESSION; for history, references, and discussion, see Reiersø1 1964.] Estimability of a linear parameter, in the linear hypothesis context, means that an unbiased estimator of the parameter exists; within its domain of discussion, estimability is equivalent to identifiability. )

In more complicated cases, lack of identifiability may be less easy to recognize. Because of rounding errors or sample properties, numerical “estimates” of unidentifiable parameters may be obtained, although such estimates are meaningless. As a fanciful, although pertinent, example, suppose that in the situation of the previous paragraph there were two independent observations on X, say, X₁ and X₂, and that, by rounding or other error, these observations were regarded as having expectations not quite equal to θ₁ — θ₂, say,

EX₁= .99θ₁ - I.0Iθ₂,

EX₂=1.01θ₁ - .99θ₂.

It is then easy to see that the least squares (and maximum likelihood) estimators would appear to exist and would be

so that θ₁ — θ₂ =½(X ₁+ X₂), which last does make good sense. The effect of underlying nonidentifi-ability, with the coefficients slightly altered from unity, is that the variance of θ₁ (or θ₂) is very large, about 2,500 times the variance of θ(X₁ + X₂).

In other cases numerical estimates may be obtained in finite samples when in fact no consistent estimator exists and, for example, the matrix inverted in obtaining the numbers is guaranteed to be singular in the probability limit. In such cases it is of considerable interest to know what restrictions on the form or parameters of the model are necessary or sufficient for the identification of sub-sets of parameters. Analyses of identifi ability are typically devoted to this question.

The identification problem can arise in many contexts. Wherever a reasonably complicated underlying mechanism generates the observations, and the parameters of that mechanism are to be estimated, the identification problem may be en-countered. Examples are factor analysis and the analysis of latent structures. [See Factor Analysisand Latent structure;for the analysis of identifiability in factor analysis, see Reiersøl 1950a.] A further example occurs in the analysis of accident statistics, where the occurrence of approximately negative binomial counts led to the concept of accident proneness on the false assumption that a negative binomial can only be generated as a mixture of Poisson distributions. [See Distributions, Statistical, articles on Special Discrete Distributionsand Mixtures Of distributions; Fallacies, STATISTICAL.]

An important case, and one in which the analysis is rich, is that of a system of simultaneous equations such as those frequently encountered in econometrics. The remainder of this article is accordingly devoted to a discussion of identifiability in that context.

dentifiability of a structural equation. Suppose the model to be investigated is given by

where u _t. is an M-component column vector of random disturbances (with properties to be specified below) and x _t is an N-component column vector of variables, partitioned into y _t, the M-component vector of endogenous variables to be explained by the model, and ≈_t, the A = (N - M)-component vector of predetermined variables determined outside the current working of the model (one element of ≈_t can be taken to be identically unity).

The elements of ≈, can thus either be determined entirely outside the model (for example, they can be treated as fixed) or represent lagged values of the elements of y_t. The assumption that ≈_t is determined outside the current working of the model requires, in any case, that movements in the elements of u_t not produce movements in those of ≈_t≈ In its weakest form this becomes the assumption that the elements of ≈_t are asymptotically uncorrelated with those of u_t, in the sense that

where the prime denotes transposition and plim denotes stochastic convergence (convergence in probability). As with all prior assumptions required for identification, this assumption is quite untestable within the framework of the model.

The t subscript denotes the number of the observation and will be omitted henceforth. In (1), A is an M x N matrix of parameters to be estimated and is partitioned into B and F, corresponding to the partitioning of x. As the endogenous variables are to be explained by the model, B (an M x M square matrix) is assumed nonsingular. Finally, the u vectors for different values of t are usually assumed (serially) uncorrelated, with common mean 0 and with common covariance matrix ∑. Thus, ∑ is M x M and is in general unknown and not diagonal, as its typical element is the covariance between contemporaneous disturbances from different equations of the model. (Normality of u is also generally assumed for estimation purposes but has so far been of little relevance for identifiability discussions in the present context; for its importance in another context, see Reiersø1 1950b.)

Such models occur frequently in econometrics, in contexts ranging from the analysis of particular markets to studies of entire economies. The study of identification and, especially, estimation in such models has occupied much of the econometric literature since Haavelmo's pathbreaking article (1943).

For definiteness, this article will concentrate on the identifiability of the first equation of (1), that is, on the identifiability of the elements of the first row of A, denoted by A_t . In general, one is content if A_t is identifiable after the imposition of a normalization rule, since the units in which the variables are measured are arbitrary. This will be understood when A, is spoken of as identifiable.

It is not hard to show that the joint distribution of the elements of y, given ≈, depends only on the parameters ∏ and Ω of the reduced form of the model,

so that observations generated by the model can at most be used to estimate ∏ and Ω (the variancecovariance matrix of the elements of r). Since the elements of r are assumed asymptotically uncorrelated with those of u , such estimation can be consistently done by ordinary least squares regression, provided that the asymptotic variance-covariance matrix of the elements of ≈ exists. On the other hand, if (1) is premultiplied by any nonsingular M x M matrix, the resulting structure has the same reduced form as the original one, so that all such structures are observationally equivalent. Unless outside information is available restricting the class of such transformations to those preserving A_t (up to a scalar multiple), A, is not identifiable.

Examination of the nonstochastic case, in which u = 0, provides another way to describe the phenomenon. Here the investigator can, at most, obtain values of the A predetermined variables and observe the consequences for the endogenous variables. This can be done in, at most, A independent ways. Let the N-rowed matrix whose nh column consists of the tih observation on x so generated be denoted by X . Then X has rank A in the most favorable circumstances (which are assumed to hold here), and AX = 0. It is easy to see that ∏ can be recovered from this.

On the other hand, A_tX = 0 expresses all that observational evidence can tell about A₁ . Since X has rank A and has M = N — A rows and since A has rank M, the rows of A are a basis for the row null space of X , whence the true A₁ can be distinguished by observational information from all vectors which are not linear combinations of the rows of A but not from vectors which are. The second part of this corresponds to the obvious fact that, without further information, there is nothing to distinguish the first equation of (1) from any linear combination of those equations. If, returning to the stochastic case, one replaces X by the same analysis remains valid. The condition that embodies all the information about A ₁ which can be gleaned from the reduced form, and the reduced form, as has been noted, is all that can be recovered from observational evidence, even in indefinitely large samples.

This is the general form of the classic example (Working 1927) of a supply and a demand curve, both of which are straight lines [see Demand AND SUPPLY]. Only the intersection of the two curves is observable, and the demand curve cannot be distinguished from the supply curve or any other straight line through that intersection. Even if the example is made stochastic, the problem clearly remains unless the stochastic or other specification provides further (nonobservational) information enabling the demand curve to be identified.

Thus, the identification problem in this context is as follows: what necessary or sufficient conditions on prior information can be stated so that A ₁ can be distinguished (up to scalar multiplication) from all other vectors in the row space of A ? Equivalently, what are necessary or sufficient conditions on prior information that permit the recovery of A ₁ given ∏ and Ω?

If A_t cannot be so recovered, the first equation of (1) is called underidcntificd, if, given any ∏ and Ω, there is a unique way of recovering A_}, that equation is called just identified; if the prior information is so rich as to enable the recovery of A ₁ in two or more different and nonequivalent ways, that equation is called overidentified. In the last case, while the true reduced form yields the same A ₁ whichever way of recovery is followed, this is in general not true of sample estimates obtained without imposing restrictions on the reduced form. The problem of using overidentifying

information to avoid this difficulty and to secure greater efficiency is the central problem of simultaneous equation estimation but is different in kind from the identification problem discussed here (although the two overlap, as seen below).

Homogeneous linear restrictions on a single equation. The most common type of prior identifying information is the specification that certain variables in the system do not in fact appear in the first equation of (1), that is, that certain elements of A _l are zero. Such exclusion restrictions form the leading special case of homogeneous linear restrictions on the elements of A ₁.

Thus, suppose φ to be an N x K matrix of known elements, such that A_tφ = 0 . Since A ₁ can be distinguished by observational information from any vector not in the row space of A , it is obviously sufficient, for the identification of A _l, that the equation (η'A )φ:=0 be satisfied only for η' a scalar multiple of (1 0 ... 0). If there is no further information on A_t, this is also necessary. This condition is clearly equivalent to the requirement that the rank of Aφ be M — 1, a condition known as the rank condition, which is due to Koopmans, Rubin, and Leipnik (1950, pp. 81-82), as is much of the basic work in this area.

Since the rank of Aφ cannot exceed that of φp, a necessary condition for the identifiability of A_t under the stated restrictions is that the number of those restrictions which are independent be at least M — 1, a requirement known as the order condition. In the case of exclusion restrictions, this becomes the condition that the number of predetermined variables excluded from the first equation of (1) must be at least as great as the number of included endogenous variables.

While the order condition does not depend on unknown parameters, the rank condition does. However, if the order condition holds and the rank condition does not fail identically (because of restrictions on the other rows of A ), then the rank condition holds almost everywhere in the space of the elements of A . This has led to a neglect of the rank condition in particular problems, a neglect that can be dangerous, since the rank condition may fail identically even if the order condition holds. Asymptotic tests of identifiability (and of overidentifying restrictions) are known for the linear restriction case and should be used in doubtful cases.

The difference between the rank of Aφ (appearing in the rank condition) and the rank of φ (appearing in the order condition) is the number of restrictions on the reduced form involving φ (see Fisher 1966, pp. 45-51).

Other restrictions. The case of restrictions other than those just discussed was chiefly investigated by Fisher, in a series of articles leading to a book (Fisher 1966), which in part examined questions opened by Koop-mans, Rubin, and Leipnik (1950, pp. 93-110; Wald 1950, on identification of individual parameters, should also be mentioned). While generalizations of the rank and order conditions tend to have a prominent place in the discussion, other results are also available. The restrictions considered fall into two categories: first, restrictions on the elements of ∑; second, more general restrictions on the elements of A₁.

Working (1927) had observed, for example, that if the supply curve is known to shift greatly relative to the demand curve, the latter is traced out by the intersection points. When such shifting is due to a variable present in one equation but not in the other, the identifiability of the latter is due to an exclusion restriction. On the other hand, such shifting may be due to a greater disturbance variance, which suggests that restrictions on the relative magnitude of the diagonal elements of ∑ can be used for identification. This is indeed the case, provided those restrictions are carefully stated. The results are related to the conditions for the proximity theorem of Wold (1953, p. 189) as to the negligible bias (or inconsistency) of least squares when the disturbance variance or the correlations between disturbance and regressors are small.

Wold's work (1953, pp. 14, 49-53, and elsewhere) on recursive systems, which showed least squares to be an appropriate estimator if B is triangular and ∑ diagonal, and Fisher's matrix generalization (1961) to block-recursive systems, suggested the study of identifiability in such cases and the extension to other cases in which particular off-diagonal elements of ∑ are known to be zero (disturbances from particular pairs of equations uncorrelated). Special cases had been considered by Koopmans, Rubin, and Leipnik (1950, pp. 103-105) and by Koopmans in his classic expository article (1953, p. 34). Aside from the generalization of the rank and order conditions, the results show clearly the way in which such restrictions interact with those on A to make the identifiability of one equation depend on that of others.

Finally, certain special cases and the fact that equations nonlinear in the parameters can frequently (by Taylor series expansion) be made linear in the parameters but nonlinear in the variables, with nonlinear constraints on the parameters, led to the study of identification with nonlinear (or nonhomogeneous) constraints on A _l This is a much more difficult problem than those already discussed, as it may easily happen that the true A ₁ can be distinguished from any other vector in some neighborhood of A _l without over-all identifiability holding. As might be expected, local results (based on the rank and order conditions) are fairly easy to obtain, but useful global results are rather meager.

Other specifications of the model. The Taylor series argument just mentioned, as well as the frequent occurrence of models differing from (1) in that they are linear in the parameters and disturbances but not in the variables, also led Fisher to consider identifiability for such models. In these models, it may turn out that nonlinear transformations of the structure lead to equations in the same form as the first one, so that the result that A _l can be observationally distinguished from vectors not in the row space of A can fail to hold (although such cases seem fairly special). Provided a systematic procedure is followed for expanding the model to include all linearly independent equations resulting from such transformations, the rank and order conditions can be applied directly to the expanded model. In such application, linearly independent functions of the same variable are counted as separate variables. It is clear also that such non-linear transformations are restricted if there is information on the distribution of the disturbances, but the implications of this remain to be worked out.

A somewhat similar situation arises when the assumption that the elements of u are not serially correlated is dropped and the elements of ≈ include lagged values of the endogenous variables. In this case it is possible that the lagging of an equation can be used together with linear transformation of the model to destroy identification. In such cases there may be underidentified equations of the reduced form as well as of the structure. This possibility was pointed out in an example by Koop-mans, Rubin, and Leipnik (1950, pp. 109-110) but was shown to be of somewhat limited significance by Fisher (1966, pp. 168-175). He showed that the problem cannot arise if there is sufficient independent movement among the present and lagged values of the truly exogenous variables, a result connected to one of those for models nonlinear in the variables. In such cases the rank condition remains necessary and sufficient. The problem in nearly or completely self-contained models awaits further analysis.

s identifiability discrete or continuous?. Identifiability is apparently a discrete phenomenon. A set of parameters apparently either is or is not identifiable. This was emphasized by Liu (1960, for example), who pointed out that the prior restrictions used to achieve identification, like the very specification of the model itself, are invariably only approximations. Liu argued strongly that if the true, exact specification and prior restrictions were written down, the interrelatedness of economic phenomena would generally make structural equations underidentified.

Liu's argument that badly misspecified structures and restrictions that are used to lead to identification in fact only lead to trouble is clearly true; true, also, is his contention that econometric models and restrictions are only approximations and that those approximations may not be good ones. More trouble-some than this, however, are the apparent implications of his argument as to the possibility of having a “good” approximation. If identifiability disappears as soon as any approximation enters in certain ways, no matter how close that approximation might be to the truth, then structural estimation ceases altogether to be possible.

This issue of principle was settled by Fisher (1961), who showed that identifiability can be considered continuous, in the sense that the probability limits of estimators known to be consistent under correct specification approach the true parameters as the specification errors approach zero, a generalization of Wold's proximity theorem. If the equation to be estimated is identifiable under correct specification, the commission of minor specification errors leads to only minor inconsistency.

A number of other questions are then raised, however. Among them are the following: How good does an approximation have to be to lead to only minor inconsistency? To what extent should only approximate restrictions be imposed to achieve identification? What about overidentification, where the trade-off may be between consistency and minor gains in variance reduction? What can be said about the relative robustness of the different simultaneous equation estimators to the sorts of minor specification error discussed by Liu?

Clearly, once identifiability is considered continuous, the identification problem tends to merge with the estimation problem, rather than be logically prior to it. It seems likely that both can best

be approached through an explicit recognition of the approximate nature of specification, for example, by a Bayesian analysis with exact prior restrictions replaced by prior distributions on the functions of the parameters to be restricted [see Bayesian inference]. Work on this formidable problem is just beginning (see, for example, Drèze 1962; Reiersø1 1964).

Franklin M. Fisher

BIBLIOGRAPHY

DrÈze, Jacques 1962 The Bayesian Approach to Simultaneous Equations Estimation. O.N.R. Research Memorandum No. 67. Unpublished manuscript, Northwestern University.

Fisher, Franklin m. 1961 On the Cost of Approximate Specification in Simultaneous Equation Estimation. Econometrica 29:139-170.

Fisher, Franklin m. 1966 The Identification Problem in Econometrics. New York: McGraw-Hill.

Haavelmo, Trygve 1943 The Statistical Implications of a System of Simultaneous Equations. Econometrica 11:1-12.

Koopmans, Tjalling c. 1953 Identification Problems in Economic Model Construction. Pages 27-48 in William C. Hood and Tjailing C. Koopmans (editors), Studies in Econometric Method. Cowles Commission for Research in Economics, Monograph No. 14. New York. Wiley.

Koopmans, Tjalling c.; Rubin, H.; and Leipnik, R. B. (1950) 1958 Measuring the Equation Systems of Dynamic Economics. Pages 53-237 in Tjalling C. Koopmans (editor), Statistical Inference in Dynamic Economic Models. Cowles Commission for Research in Economics, Monograph No. 10. New York: Wiley.

Liu, TA-Chung 1960 Underidentification, Structural Estimation, and Forecasting. Econometrica 28:855-865.

Reiersøl, Olav 1950a On the Identifiability of Parameters in Thurstone's Multiple Factor Analysis. Psychometrika 15:121-149.

Reiersøl, Olav 1950b Identifiability of a Linear Relation Between Variables Which Are Subject to Error. Econometrica 18:375-389.

Reiersøl, Olav 1964 Identifiability, Estimability, Pheno-restricting Specifications, and Zero Lagrange Multipliers in the Analysis of Variance. Skandinavisk aktuarietidskrift 46:131-142.

Wald, A. (1950) 1958 Note on the Identification of Economic Relations. Pages 238-244 in Tjalling C. Koopmans (editor), Statistical Inference in Dynamic Economic Models. Cowles Commission for Research in Economics, Monograph No. 10. New York: Wiley.

Wold, Herman 1953 Demand Analysis: A Study in Econometrics. New York: Wiley.

Working, E. J. (1927) 1952 What Do Statistical “Demand Curves” Show? Pages 97-115 in American Economic Association, Readings in Price Theory. Edited by G. J. Stigler and K. E. Boulding. Homewood, 111.: Irwin. → First published in Volume 41 of the Quarterly Journal of Economics.

International Encyclopedia of the Social Sciences