Causal Inference Models
CAUSAL INFERENCE MODELS
note:Although the following article has not been revised for this edition of the Encyclopedia, the substantive coverage is currently appropriate. The editors have provided a list of recent works at the end of the article to facilitate research and exploration of the topic.
The notion of causality has been controversial for a very long time, and yet neither scientists, social scientists, nor laypeople have been able to think constructively without using a set of explanatory concepts that, either explicitly or not, have implied causes and effects. Sometimes other words have been substituted, for example, consequences, results, or influences. Even worse, there are vague terms such as leads to, reflects, stems from, derives from, articulates with, or follows from, which are often used in sentences that are almost deliberately ambiguous in avoiding causal terminology. Whenever such vague phrases are used throughout a theoretical work, or whenever one merely states that two variables are correlated with one another, it may not be recognized that what purports to be an "explanation" is really not a genuine theoretical explanation at all.
It is, of course, possible to provide a very narrow definition of causation and then to argue that such a notion is totally inadequate in terms of scientific explanations. If, for example, one defines causation in such a way that there can be only a single cause of a given phenomenon, or that a necessary condition, a sufficient condition, or both must be satisfied, or that absolute certainty is required to establish causation, then indeed very few persons would ever be willing to use the term. Indeed, in sociology, causal terminology was almost deliberately avoided before the 1960s, except in reports of experimental research. Since that time, however, the notion of multivariate causation, combined with the explicit allowance for impacts of neglected factors, has gradually replaced these more restrictive usages.
There is general agreement that causation can never be proven, and of course in a strict sense no statements about the real world can ever be "proven" correct, if only because of indeterminacies produced by measurement errors and the necessity of relying on evidence that has been filtered through imperfect sense organs or fallible measuring instruments. One may accept the fact that, strictly speaking, one is always dealing with causal models of real-world processes and that one's inferences concerning the adequacy of such models must inevitably be based on a combination of empirical evidence and untested assumptions, some of which are about underlying causal processes that can never be subject to empirical verification. This is basically true for all scientific evidence, though the assumptions one may require in making interpretations or explanations of the underlying reality may be more or less plausible in view of supplementary information that may be available. Unfortunately, in the social sciences such supplementary information is likely to be of questionable quality, thereby reducing the degree of faith one has in whatever causal assertions have been made.
In the causal modeling literature, which is basically compatible with the so-called structural equation modeling in econometrics, equation systems are constructed so as to represent as well as possible a presumed real-world situation, given whatever limitations have been imposed in terms of omitted variables that produce unknown biases, possibly incorrect functional forms for one's equations, measurement errors, or in general what are termed specification errors in the equations. Since such limitations are always present, any particular equation will contain a disturbance term that is assumed to behave in a certain fashion. One's assumptions about such disturbances are both critical for one's inferences and also (for the most part) inherently untestable with the data at hand. This in turn means that such inferences must always be tentative. One never "finds" effects, for example, but only infers them on the basis of findings about covariances and temporal sequences and a set of untested theoretical assumptions. To the degree that such assumptions are hidden from view, both the social scientist and one's readers may therefore be seriously misled to the degree that these assumptions are also incorrect.
In the recursive models commonly in use in sociology, it is assumed that causal influences can be ordered, such that one may designate an X1 that does not depend on any of the remaining variables in the system but, presumably, varies as a result of exogenous causes that have been ignored in the theory. A second variable, X2, may then be found that may depend upon X1 as well as a different set of exogenous factors, but the assumption is that X2 does not affect X1, either directly or through any other mechanism. One then builds up the system, equation by equation, by locating an X3 that may depend on either or both of X1 or X2, plus still another set of independent variables (referred to as exogenous factors), but with the assumption that neither of the first two X's is affected by X3. Adding still more variables in this recursive fashion, and for the time being assuming linear and additive relationships, one arrives at the system of equations shown in equation system 1, in which the disturbance terms are represented by the εi and where for the sake of simplicity the constant terms have been omitted.
The essential property of recursive equations that provides a simple causal interpretation is that changes made in any given equation may affect subsequent ones but will not affect any of the prior equations. Thus, if a mysterious demon were to change one of the parameters in the equation for X3, this would undoubtedly affect not only X3 but also X4, X5, through Xk, but could have no effect on either of the first two equations, which do not depend on X3 or any of the later variables in the system. As will be discussed below, this special property of recursive systems does not hold in the more general setup involving variables that may be reciprocally interrelated. Indeed, it is this recursive property that justifies one's dealing with the equations separately and sequentially as single equations. The assumptions required for such a system are therefore implicit in all data analyses (e.g., log-linear modeling, analysis of variance, or comparisons among means) that are typically discussed in first and second courses in applied statistics.
Assumptions are always critical in causal analyses or—what is often not recognized—in any kind of theoretical interpretation of empirical data. Some such assumptions are implied by the forms of one's equations, in this case linearity and additivity. Fortunately, these types of assumptions can be rather simply modified by, for example, introducing second- or higher-degree terms, log functions, or interaction terms. It is a mistake to claim—as some critics have done—that causal modeling requires one to assume such restrictive functional forms.
Far more important are two other kinds of assumptions—those about measurement errors and those concerning the disturbance terms representing the effects of all omitted variables. Simple causal modeling of the type represented by equation system 1 requires the naive assumption that all variables have been perfectly measured, an assumption that is, unfortunately, frequently ignored in many empirical investigations using path analyses based on exactly this same type of causal system. Measurement errors require one to make an auxiliary set of assumptions regarding both the sources of measurement-error bias and the causal connections between so-called true scores and measured indicators. In principle, however, such measurement-error assumptions can be explicitly built into the equation system and empirical estimates obtained, provided there are a sufficient number of multiple indicators to solve for the unknowns produced by these measurement errors, a possibility that will be discussed in the final section.
In many instances, assumptions about one's disturbance terms are even more problematic but equally critical. In verbal statements of theoretical arguments one often comes across the phrase "other things being equal," or the notion that in the ideal experimental design all causes except one must be literally held constant if causal inferences are to be made. Yet both the phrase "other things being equal" and the restrictive assumption of the perfect experiment beg the question of how one can possibly know that "other things" are in fact equal, that all "relevant" variables have been held constant, or that there are no possible sources of measurement bias. Obviously, an alert critic may always suggest another variable that indeed does vary across settings studied or that has not been held constant in an experiment.
In recursive causal models this highly restrictive notion concerning the constancy of all possible alternative causes is relaxed by allowing for a disturbance term that varies precisely because they are not all constant. But if so, can one get by without requiring any other assumptions about their effects? Indeed not. One must assume, essentially, that the omitted variables affecting any one of the X's are uncorrelated with those that affect the others. If so, it can then be shown that the disturbance term in each equation will be uncorrelated with each of the independent variables appearing on the right-hand side, thus justifying the use of ordinary least-squares estimating procedures. In practical terms, this means that if one has had to omit any important causes of a given variable, one must also be willing to assume that they do not systematically affect any of its presumed causes that have been explicitly included in our model. A skeptic may, of course, be able to identify one or more such disturbing influences, in which case a modified model may need to be constructed and tested. For example, if ε3 and ε4 contain a common cause that can be identified and measured, such a variable needs to be introduced explicitly into the model as a cause of both X3 and X4.
Perhaps the five-variable model of Figure 1 will help the reader visualize what is involved. To be specific, suppose X5, the ultimate dependent variable, represents some behavior, say, the actual number of delinquent acts a youth has perpetrated. Let X3 and X4, respectively, represent two internal states, say, guilt and self-esteem. Finally, suppose X1 and X2 are two setting variables, parental education and delinquency rates within the youth's neighborhood, with the latter variable being influenced by the former through the parents' ability to select among residential areas.
The fact that the disturbance term arrows are unconnected in Figure 1 represents the assumption that they are mutually uncorrelated, or that the omitted variables affecting any given Xi are uncorrelated with any of its explicitly included causes among the remaining X's. If ordinary least squares is used to estimate the parameters in this model, then the empirically obtained residuals ei will indeed be uncorrelated with the independent X's in their respective equations, but since this automatically occurs as a property of least-squares estimation, it cannot be used as the basis for a test of our a priori assumptions about the true disturbances.
If one is unwilling to accept these assumptions about the behavior of omitted variables, the only way out of this situation is to reformulate the model and to introduce further complexities in the form of additional measured variables. At some point, however, one must stop and make the (untestable) assumption that the revised causal model is "closed" in the sense that omitted variables do not disturb the patterning of relationships among the included variables.
Assuming such theoretical closure, then, one is in a position to estimate the parameters, attach their numerical values to the diagram, and also evaluate the model in terms of its consistency with the data. In the model of Figure 1, for instance, there are no direct arrows between X2 and X3, between X4 and both X1 and X2, and between X5 and both X1 and X3. This means that with controls for all prior or intervening variables, the respective partial correlations can be predicted to be zero, apart from sampling errors. One arrives at the predictions in equation system 2.
Thus, for each omitted arrow one may write out a specific "zero" prediction. Where arrows have been drawn in, it may have been possible to predict the signs of direct links, and these directional predictions may also be used to evaluate the model. Notice a very important property of recursive models. In relating any pair of variables, say, X2 and X3, one expects to control for antecedent or intervening variables, but it is not appropriate to introduce as controls any variables that appear as subsequent variables in the model (e.g., X4 or X5). The simple phrase "controlling for all relevant variables" should therefore not be construed to mean variables that are presumed to depend on both of the variables being studied. In an experimental setup, one would presumably be unable to carry out such an absurd operation, but in statistical calculations, which involve pencil-and-paper controlling only, there is nothing to prevent one from doing so.
It is unfortunately the case that controls for dependent variables can sometimes be made inadvertently through one's research design (Blalock 1985). For example, one may select respondents from a list that is based on a dependent variable such as committing a particular crime, entering a given hospital, living in a certain residential area, or being employed in a particular factory. Whenever such improper controls are introduced, whether recognized explicitly or not, our inferences regarding relationships among causally prior variables are likely to be incorrect. If, for example, X1 and X2 are totally uncorrelated, but one controls for their common effect, X3, then even though r12 = 0, it will turn out that r12.3 ≠ 0.
Recursive models also provide justifications for common-sense rules of thumb regarding the conditions under which it is not necessary to control for prior or intervening variables. In the model of Figure 1, for example, it can be shown that although r24.13 = 0, it would be sufficient to control for either X1 or X3 but not both in order for the partial to disappear. Similarly, in relating X3 to X5, the partial will be reduced to zero if one controls for either X2 and X4 or X1 and X4. It is not necessary
to control for all three simultaneously. More generally, a number of simplifications become possible, depending on the patterning of omitted arrows, and these simplifications can be used to justify the omission of certain variables if these cannot be measured. If, for example, one could not measure X3, one could draw in a direct arrow from X1 to X4 without altering the remainder of the model. Without such an explicit causal model in front of us, however, the omission of variables must be justified on completely ad hoc grounds. The important point is that pragmatic reasons for such omissions should not be accepted without theoretical justifications.
PATH ANALYSIS AND AN EXAMPLE
Sewall Wright (1934, 1960) introduced a form of causal modeling long before it became fashionable among sociologists. Wright, a population geneticist, worked in terms of standardized variables with unit variances and zero means. Expressing any given equation in terms of what he referred to as path coefficients, which in recursive modeling are equivalent to beta weights, Wright was able to derive a simple formula for decomposing the correlation between any pair of variables xi and xj. The equation for any given variable can be written as xi = pi1x1+pi2x2+…+pikxk+ui, where the pij represent standardized regression coefficients and where the lower-case x's refer to the standardized variables. One may then multiply both sides of the equation by xj, the variable that is to be correlated with xi. Therefore, xixj = pi1x1xj + pi2x2xj +…+ pikxkxj + uixj. Summing over all cases and dividing by the number of cases N, one has the results in equation system 3.
The expression in equation system 3 enables one to decompose or partition any total correlation into a sum of terms, each of which consists of a path coefficient multiplied by a correlation coefficient, which itself may be decomposed in a similar way. In Wright's notation the path coefficients are written without any dots that indicate control variables but are indeed merely the (partial) regression coefficients for the standardized variables. Any given path coefficient, say p54, can be interpreted as the change that would be imparted in the dependent variable x5, in its standard deviation units, if the other variable x4 were to change by one of its standard deviation units, with the remaining explicitly included independent variables (here x1, x2, and x3) all held constant. In working with standardized variables one is able to simplify these expressions owing to the fact that rij = Σxixj/N, but one must pay the price of then having to work with standard deviation units that may vary across samples or populations. This, in turn, means that two sets of path coefficients for different samples, say men and women, cannot easily be compared since the standard deviations (say, in income earned) may be different.
In the case of the model of Figure 2, which is the same causal diagram as Figure 1, but with the relevant pij inserted, one may write out expressions for each of the rij as shown in equation system 4.
In decomposing each of the total correlations, one takes the path coefficients for each of the arrows coming into the appropriate dependent variable and multiplies each of these by the total correlation between the variable at the source of the arrow and the "independent" variable in which one is interested. In the case of r12, this involves multiplying p21 by the correlation of x1 with itself, namely r11 = 1.0. Therefore one obtains the simple result that r12 = p21. Similar results obtain for r13 and r34. The decomposition of r23, however, results in the expression r23 = p31r12 = p31p21 = r12r13, which also of course implies that r23.1 = 0.
When one comes to the decomposition of correlations with x5, which has two direct paths into it, the expressions become more complex but also demonstrate the heuristic value of path analysis. For example, in the case of r35, this total correlation can be decomposed into two terms, one representing the indirect effects of x3 via the intervening variable x4, namely the product p54p43, and the other the spurious association produced by the common cause x1, namely the more complex product p52p31p21. In the case of the correlation between x4 and x5 one obtains a similar result except that there is a direct effect term represented by the single coefficient p54.
As a numerical substantive example consider the path model of Figure 3, which represents the basic model in Blau and Duncan's classic study, The American Occupational Structure (1967, p. 17). Two additional features of the Blau-Duncan model may be noted. A curved, double-headed arrow has been drawn between father's education and father's occupation, indicating that the causal paths between these two exogenous or independent variables have not been specified. This means that there is no p21 in the model, so that r12 cannot be decomposed. Its value of 0.516 has been inserted into the diagram, however. The implication of a failure to commit oneself on the direction of causation between these two variables is that decompositions of subsequent rij will involve expressions that are sometimes combinations of the relevant p's and the unexplained association between father's education and occupation. This, in turn, means that the indirect effects of one of these variables "through" the other cannot be assessed. One can determine the direct effects of, say, father's occupation on respondent's education, or its indirect effects on occupation in 1962 through first job, but not "through" father's education. If one had, instead, committed oneself to the directional flow from father's education to father's occupation, a not unreasonable assumption, then all indirect effects and spurious connections could be evaluated. Sometimes it is indeed necessary to make use of double-headed arrows when the direction of causation among the most causally prior variables cannot be specified, but one then gives up the ability to trace out those indirect effects or spurious associations that involve these unexplained correlations.
The second feature of the Blau-Duncan diagram worth noting involves the small, unattached arrows coming into each of the "dependent" variables in the model. These of course represent the disturbance terms, which in a correctly specified model are taken to be uncorrelated. But the magnitudes of these effects of outside variables are also provided in the diagram to indicate just how much variance remains unexplained by the model. Each of the numerical values of path coefficients coming in from these outside variables, when squared, turns out to be the equivalent of 1 − R2, or the variances that remain unexplained by all of the included explanatory variables. Thus there is considerable unexplained variance in respondent's education (0.738), first job (0.669), and occupation in 1962 (0.567), indicating, of course, plenty of room for other factors to operate. The challenge then becomes that of locating additional variables to improve the explanatory value of the model. This has, indeed, been an important stimulus to the development of the status attainment literature that the Blau-Duncan study subsequently spawned.
The placement of numerical values in such path diagrams enables the reader to assess, rather easily, the relative magnitudes of the several direct effects. Thus, father's education is inferred to have a moderately strong direct effect on respondent's education, but none on the respondent's occupational status. Father's occupation is estimated to have somewhat weaker direct effects on both respondent's education and first job but a much weaker direct effect on his later occupation. The direct effects of respondent's education on first job are estimated to be only somewhat stronger than those on the subsequent occupation, with first job controlled. In evaluating these numerical values, however, one must keep in mind that all variables have been expressed in standard deviation units rather than some "natural" unit such as years of schooling. This in turn means that if variances for, say, men and women or blacks and whites are not the same, then comparisons across samples should be made in terms of unstandardized, rather than standardized, coefficients.
SIMULTANEOUS EQUATION MODELS
Recursive modeling requires one to make rather strong assumptions about temporal sequences. This does not, in itself, rule out the possibility of reciprocal causation provided that lag periods can be specified. For example, the actions of party A may affect the later behaviors of party B, which in turn affect still later reactions of the first party. Ideally, if one could watch a dynamic interaction process such as that among family members, and accurately record the temporal sequences, one could specify a recursive model in which the behaviors of the same individual could be represented by distinct variables that have been temporally ordered. Indeed Strotz and Wold (1960) have cogently argued that many simultaneous equation models appearing in the econometric literature have been misspecified precisely because they do not capture such dynamic features, which in causal models should ideally involve specified lag periods. For example, prices and quantities of goods do not simply "seek equilibrium." Instead, there are at least three kinds of autonomous actors—producers, customers, and retailers or wholesalers—who react to one another's behaviors with varying lag periods.
In many instances, however, one cannot collect the kinds of data necessary to ascertain these lag periods. Furthermore, especially in the case of aggregated data, the lag periods for different actors may not coincide, so that macro-level changes are for all practical purposes continuous rather than discrete. Population size, literacy levels, urbanization, industrialization, political alienation, and so forth are all changing at once. How can such situations be modeled and what additional complications do they introduce?
In the general case there will be k mutually interdependent variables Xi that may possibly each directly affect the others. These are referred to as endogenous variables, with the entire set having the property that there is no single dependent variable that does not feed back to affect at least one of the others. Given this situation, it turns out that it is not legitimate to break the equations apart in order to estimate the parameters, one equation at a time, as one does in the case of a recursive setup. Since any given variable may affect the others, this also means that its omitted causes, represented by the disturbance terms εi will also directly or indirectly affect the remaining endogenous variables, so that it becomes totally unreasonable to assume these disturbances to be uncorrelated with the "independent" variables in their respective equations. Thus, one of the critical assumptions required to justify the use of ordinary least squares cannot legitimately be made, meaning that a wide variety of single equation techniques discussed in the statistical literature must be modified.
There is an even more serious problem, however, which can be seen more readily if one writes out the set of equations, one for each of the k endogenous variables. To this set are added another set of what are called predetermined variables, Zj, that will play an essential role to be discussed below. Our equation set now becomes as shown in equation system 5.
The regression coefficients (called "structural parameters") that connect the several endogenous variables in equation system 5 are designated as ßij and are distinguished from the γij representing the direct effects of the predetermined Zj on the relevant Xi. This notational distinction is made because the two kinds of variables play different roles in the model. Although it cannot be assumed that the disturbances εi are uncorrelated with the endogenous X's that appear on the right-hand sides of their respective equations, one may make the somewhat less restrictive assumption that these disturbances are uncorrelated with the predetermined Z's.
Some Z's may be truly exogenous, or distinct independent variables, that are assumed not to be affected by any of the endogenous variables in the model. Others, however, may be lagged endogenous variables, or prior levels of some of the X's. In a sense, the defining characteristic of these predetermined variables is that they be uncorrelated with any of the omitted causes of the endogenous variables. Such an assumption may be difficult to accept in the case of lagged endogenous variables, given the likelihood of autocorrelated disturbances, but we shall not consider this complication further. The basic assumption regarding the truly exogenous variables, however, is that these are uncorrelated with all omitted causes of the X's, though they may of course be correlated with the X's and also possibly each other.
Clearly, there are more unknown parameters than was the case for the original recursive equation system (1). Turning attention back to the simple recursive system represented in equation system 1, one sees that the matrix of betas in that equation system is triangular, with all such coefficients above the main diagonal being set equal to zero on a priori grounds. That is, in equation system 1, half of the possible betas have been set equal to zero, the remainder being estimated using ordinary least squares. It turns out that in the more general equation system 5, there will be too many unknowns unless additional restrictive assumptions are made. In particular, in each of the k equations one will have to make a priori assumptions that at least k − 1 coefficients have been set equal to zero or some other known value (which cannot be estimated from the data). This is why one needs the predetermined Zi and the relevant gammas. If one is willing to assume that, for any given endogenous Xi, certain direct arrows are missing, meaning that there are no direct effects coming from the relevant Xj or Z variable, then one may indeed estimate the remaining parameters. One does not have to make the very restrictive assumptions required under the recursive setup, namely that if Xj affects Xi, then the reverse cannot hold. As long as one assumes that some of the coefficients are zero, there is a chance of being able to identify or estimate the others.
It turns out that the necessary condition for identification can be easily specified, as implied in the above discussion. For any given equation, one must leave out at least k − 1 of the remaining variables. The necessary and sufficient condition is far more complicated to state. In many instances, when the necessary condition has been met, so will the sufficient one as well, unless some of the equations contain exactly the same sets of variables (i.e., exactly the same combination of omitted variables). But since this will not always be the case, the reader should consult textbooks in econometrics for more complete treatments.
Returning to the substantive example of delinquency, as represented in Figure 1, one may revise the model somewhat by allowing for a feedback from delinquent behavior to guilt, as well as a reciprocal relationship between the two internal states, guilt and self-esteem. One may also relabel parental education as Z1 and neighborhood delinquency as Z2 because there is no feedback from any of the three endogenous variables to either of these predetermined ones. Renumbering the endogenous variables as X1, X2, and X3, one may represent the revised model as in Figure 4.
In this kind of application one may question whether a behavior can ever influence an internal state. Keeping in mind, however, that the concern is with repeated acts of delinquency, it is entirely reasonable to assume that earlier acts feed back to affect subsequent guilt levels, which in turn affect future acts of delinquency. It is precisely this very frequent type of causal process that is ignored whenever behaviors are taken, rather simply, as "dependent" variables.
Here, k = 3, so that at least two variables must be left out of each equation, meaning that their respective coefficients have been set equal to zero. One can rather simply check on the necessary condition by counting arrowheads coming to each variable. In this instance there can be no more than two arrows into each variable, whereas in the case of guilt X1 there are three. The equation for X1 is referred to as being "underidentified," meaning that the situation is empirically hopeless. The coefficients simply cannot be estimated by any empirical means. There are exactly two arrows coming into delinquency X3, and one refers to this as a situation in which the equation is "exactly identified." With only a single arrow coming into self-esteem X2, one has an "overidentified" equation for which one actually has an excess of empirical information compared to the number of unknowns to be estimated. It turns out that overidentified equations provide criteria for evaluating goodness of fit, or a test of the model, in much the same way that, for recursive models, one obtains an empirical test of a null hypothesis for each causal arrow that has been deleted.
Since the equation for X1 is underidentified, one must either remove one of the arrows, on a priori grounds, or search for at least one more predetermined variable that does not belong in this equation, that is, a predetermined variable that is assumed not to be a direct cause of level of guilt. Perhaps school performance can be introduced as Z3 by making the assumption that Z3 directly affects both self-esteem and delinquency but not guilt level. A check of this revised model indicates that all equations are properly identified, and one may proceed to estimation. Although space does not permit a discussion of alternative estimation methods that enable one to get around the violated assumption required by ordinary least squares, there are various computer programs available to accomplish this task. The simplest such alternative, two-stage least squares (2SLS), will ordinarily be adequate for nearly all sociological applications and turns out to be less sensitive to other kinds of specification errors than many of the more sophisticated alternatives that have been proposed.
CAUSAL APPROACH TO MEASUREMENT ERRORS
Finally, brief mention should be made of a growing body of literature—closely linked to factor analysis—that has been developed in order to attach measurement-error models to structural-equation approaches that presume perfect measurement. The fundamental philosophical starting point of such models involves the assumption that in many if not most instances, measurement errors can be conceived in causal terms. Most often, the indicator or measured variables are taken as effects of underlying or "true" variables, plus additional factors that may produce combinations of random measurement errors, which are unrelated to all other variables in the theoretical system, and systematic biases that are explainable in causal terms. Thus, measures of "true guilt" or "true self-esteem" will consist of responses, usually to paper-and-pencil tests, that may be subject to distortions produced by other variables, including some of the variables in the causal system. Perhaps distortions in the guilt measure may be a function of amount of delinquent behavior or parental education. Similarly, measures of behaviors are likely to overestimate or underestimate true frequencies, with biases dependent on qualities of the observer, inaccuracies in official records, or perhaps the ability of the actor to evade detection.
In all such instances, we may be able to construct an "auxiliary measurement theory" (Blalock 1968; Costner 1969) that is itself a causal model that contains a mixture of measured and unmeasured variables, the latter of which constitute the "true" or underlying variables of theoretical interest. The existence of such unmeasured variables, however, may introduce identification problems by using more unknowns than can be estimated from one's data. If so, the situation will once more be hopeless empirically. But if one has available several indicators of each of the imperfectly measured constructs, and if one is willing to make a sufficient number of simplifying assumptions strategically placed within the overall model, estimates may be obtainable.
Consider the model of Figure 5 (borrowed from Costner 1969), which contains only two theoretical variables of interest, namely, the unmeasured variables X and Y. Suppose one has two indicators each for both X and Y and that one is willing to make the simplifying assumption that X does not affect either of Y's indicators, Y1 and Y2, and that Y does not affect either of X's indicators, X1 and X2. For the time being ignore the variable W as well as the two dashed arrows drawn from it to the indicators X2 and Y1. Without W, the nonexistence of other arrows implies that the remaining causes of the four indicators are assumed to be uncorrelated with all other variables in the system, so that one may assume measurement errors to be strictly random.
If one labels the path coefficients (which all connect measured variables to unmeasured ones) by the simple letters a, b, c, d, and, e, then with 4(3)/2 = 6 correlations among the four indicators, there will be six pieces of empirical information (equation system 5) with which to estimate the five unknown path coefficients.
One may now estimate the correlation or path coefficient c between X and Y by an equation derived from equation system 6.
Also notice that there is an excess equation that may be used to check on the consistency of the model with the data, namely the prediction that rx1y 1 rx2y2 = rx1 y2 rx2y1 = abc2de.
Suppose next that there is a source of measurement error bias W that is a common cause of one of X's indicators (namely X2) and one of Y's (namely Y1). Perhaps these two items have similar wordings based on a social survey, whereas the remaining two indicators involve very different kinds of measures. There is now a different expression for the correlation between X2 and Y1, namely r x2y1 = bcd + fg. If one were to use this particular correlation in the estimate of c2, without being aware of the impact of W, one would obtain a biased estimate. In this instance one would be able to detect this particular kind of departure from randomness because the consistency criterion would no longer be met. That is, (acd)(bce) ≠ (ace)(bcd + fg). Had W been a common cause of the two indicators of either X or Y alone, however, it can be seen that one would have been unable to detect the bias even though it would have been present.
Obviously, most of one's measurement-error models will be far more complex than this, with several (usually unmeasured) sources of bias, possible nonlinearities, and linkages between some of the important variables and indicators of other variables in the substantive theory. Also, some indicators may be taken as causes of the conceptual variables, as for example often occurs when one is attempting to get at experience variables (e.g., exposure to discrimination) by using simple objective indicators such as race, sex, or age. Furthermore, one's substantive models may involve feedback relationships so that simultaneous equation systems must be joined to one's measurement-error models.
In all such instances, there will undoubtedly be numerous specification errors in one's models, so that it becomes necessary to evaluate alternative models in terms of their goodness of fit to the data. Simple path-analytic methods, although heuristically helpful, will no longer be adequate. Fortunately, there are several highly sophisticated computer programs, such as LISREL, that enable social scientists to carry out sophisticated data analyses designed to evaluate these more complex models and to estimate their parameters once it has been decided that the fit to reality is reasonably close. (See Joreskog and Sorbom 1981; Long 1983; and Herting 1985.)
In closing, what needs to be stressed is that causal modeling tools are highly flexible. They may be modified to handle additional complications such as interactions and nonlinearities. Causal modeling in terms of attribute data has been given a firm theoretical underpinning by Suppes (1970), and even ordinal data may be used in an exploratory fashion, provided that one is willing to assume that dichotomization or categorization has not introduced substantial measurement errors that cannot be modeled.
Like all other approaches, however, causal modeling is heavily dependent on the assumptions one is willing to make. Such assumptions need to be made as explicit as possible—a procedure that is unfortunately often not taken sufficiently seriously in the empirical literature. In short, this set of tools, properly used, has been designed to provide precise meaning to the assertion that neither theory nor data can stand alone and that any interpretations of research findings one wishes to provide must inevitably also be based on a set of assumptions, many of which cannot be tested with the data in hand.
Finally, it should be stressed that causal modeling can be very useful in the process of theory construction, even in instances where many of the variables contained in the model will remain unmeasured in any given study. It is certainly a mistake to throw out portions of one's theory merely because data to test it are not currently available. Indeed, without a theory as to how missing variables are assumed to operate, it will be impossible to justify one's assumptions regarding the behavior of disturbance terms that will contain such variables, whether explicitly recognized or not. Causal modeling may thus be an important tool for guiding future research and for providing guidelines as to what kinds of neglected variables need to be measured.
Allison, Paul D. 1995 "Exact Variance of Indirect Effects in Recursive Linear Models." Sociological Methodology 25:253–266.
Arminger, Gehard 1995 "Specification and Estimation of Mean Structure: Regression Models." In G. Arminger, C. C. Clogg, and M. E. Sobel, eds., Handbook of Statistical Modeling for the Social and Behavioral Sciences. New York: Plenum Press.
Blalock, Hubert M. 1968 "The Measurement Problem: A Gap Between the Languages of Theory and Research." In H. M. Blalock and A. B. Blalock, eds., Methodology in Social Research. New York: McGraw-Hill.
——1985 "Inadvertent Manipulations of Dependent Variables in Research Designs." In H. M. Blalock, ed., Causal Models in Panel and Experimental Designs. New York: Aldine.
Blau, Peter M., and Otis Dudley Duncan 1967 The American Occupational Structure. New York: Wiley.
Blossfeld, Hans-Peter, and Gotz Rohwer 1997 "Causal Inference, Time and Observation Plans in the Social Sciences." Quality and Quantity 31:361–384.
Bollen, Kenneth A. 1989 Structural Equations with Latent Variables. New York: Wiley.
Costner, Herbert L. 1969 "Theory, Deduction, and Rules of Correspondence." American Journal of Sociology 75:245–263.
Herting, Jerald R. 1985 "Multiple Indicator Models Using LISREL." In H. M. Blalock, ed., Causal Models in the Social Sciences, 2nd ed. New York: Aldine.
Joreskog, Karl G., and Dag Sorbom 1981 LISREL V: Analysis of Linear Structural Relationships by the Method of Maximum Likelihood; User's Guide. Uppsala, Sweden: University of Uppsala Press.
Long, J. Scott 1983 Confirmatory Factor Analysis: A Preface to LISREL. Beverly Hills, Calif.: Sage.
Pearl, Judea 1998 "Graphs, Causality, and Structural Equation Models." Sociological Methods and Research 27:226–284.
Sobel, Michael E. 1995 "Causal Inference in the Social and Behavioral Sciences." In G. Arminger, C. C. Clogg, and M. E. Sobel, eds., Handbook of Statistical Modeling for the Social and Behavioral Sciences. New York: Plenum Press.
Sobel, Michael E. 1996 "An Introduction to Causal Inference." Sociological Methods and Research 24:353–379.
Spirtes, Peter, Thomas Richardson, Christopher Meek, Richard Scheines, and Clark Glymour 1998 "Using Path Diagrams as a Structural Equation Modeling Tool." Sociological Methods and Research 27:182–225.
Strotz, Robert H., and Herman O. A. Wold 1960 "Recursive Versus Nonrecursive Systems." Econometrica 28:417–427.
Suppes, Patrick 1970 A Probabilistic Theory of Causality. Amsterdam: North-Holland.
Von Eye, Alexander, and Clifford C. Clogg (eds.) 1994 Latent Variables Analysis: Applications for Developmental Research. Thousand Oaks, Calif.: Sage Publications.
Wright, Sewall 1934 "The Method of Path Coefficients." Annals of Mathematical Statistics 5:161–215.
——1960 "Path Coefficients and Path Regressions: Alternative or Complementary Concepts?" Biometrics 16:189–202.
Hubert M. Blalock, Jr.