Causation

views updated Jun 08 2018

CAUSATION.

Philosophers have theorized about causation since well before Aristotle, who distinguished several types of causation: efficient, material, final, and formal. For example, a wood carving is made by an artist (the efficient cause) by chiseling a piece of wood (the material cause) for the purpose of creating a beautiful object (the final cause), arriving at something that has the properties of a wood carving (the formal cause).

Although Aristotle's typology framed discussions of causation until the scientific revolution and in some circles even until David Hume, the focus settled onto analyzing efficient causation and in particular on understanding the kinds of substances that might interact causally. René Descartes, for example, separated material and mental substances and wrote extensively on how cross-substance causation might happen. Discussions of causation thus became entangled with the metaphysics of substance, and positions ranged all the way from Baruch Spinoza, who claimed there is only one type of substance, to Gottfried Wilhelm Leibniz, who claimed there was an infinity of unique substances, one per monad. Everyone wanted to understand causation as involving some "power" to produce change, and different substances possess different sorts of powers over their own and other substances. For example, the empiricist Bishop George Berkeley argued that our ideas (sensations) cannot be caused by other ideas or matter, because ideas and matter are "inert" and do not have the sort of causal "power" necessary for efficient causation. Only an agent like God or a willful person possesses such power. John Locke, in An Essay concerning Human Understanding, wrote voluminously trying to explicate the idea of causal power in empiricist terms. David Hume, the brilliant eighteenth-century Scottish philosopher, finally rejected the notion of causal power as being beyond direct observation, and he recast the problem of understanding the connection between a cause and its effect as another version of the problem of induction. Although causes always seem to be followed by their effects, the bond between them might well be nothing more than a psychological habit we develop as a result of regularly perceiving the idea of one type of object or event (e.g., thunder) just after the idea of another (e.g., lightning). Hume's challenge was to find compelling reasons for believing that when an object similar to one we have seen previously occurs, then the effect must necessarily occur. No one has succeeded in answering Hume's challenge, but his effect on the debate was as powerful as Aristotle's. All modern theories of causation begin with something like Hume's story: there are objects or events that we can group as similar, like the events of walking in the rain with no coat and developing a cold. They all ask what does it mean to assert that the relation between these events is causal.

Modern Theories of Causation

Practically, causation matters. Juries must decide, for example, whether a pregnant mother's refusal to give birth by cesarean section was the cause of the death of one of her twins. Policy makers must decide whether violence on TV causes violence in life. Neither question can be coherently debated without some theory of causation. Fortunately (or not, depending on where one sits), a virtual plethora of theories of causation have been championed in the third of a century between 1970 and 2004.

Before the sketch of a few of the major theories, however, consider what one might want out of a theory of causation. First, although one can agree that causation is a relation, what are the relata? Are causes and effects objects, like moving billiard balls? Are they particular events, like the Titanic hitting an iceberg in 1912? Or are they kinds of events, like smoking cigarettes and getting lung cancer? As it turns out, trying to understand causation as a relation between particular objects or events is quite a different task than trying to understand it as relation between kinds of occurrences or events.

Second, one wants a theory to clarify, explain, or illuminate those properties of causation that one can agree are central. For example, whatever causation is, it has a direction. Warm weather causes people to wear lighter clothing, but wearing lighter clothing does not cause warm weather. A theory that fails to capture the asymmetry of causation will be unsatisfying.

Third, one knows that in many cases one thing can occur regularly before another, and thus appear to be related as cause and effect, but is in fact the effect of a common cause, a phenomenon called spurious causation. For example, flashes of lightning appear just before and seem to cause the thunder-claps that follow them, but in reality both are effects of a common cause: the superheating of air molecules from the massive static electric discharge between the earth and the atmosphere. A good theory of causation ought to successfully separate cases of real from spurious causation.

The history of thinking on causation from 1970 to 2004 can be organized in many ways, but the one that separates matters best, both temporally and conceptually, is captured eloquently by Clark Glymour:

Philosophical theories come chiefly in two flavors, Socratic and Euclidean. Socratic philosophical theories, whose paradigm is The Meno, advance an analysis (sometimes called an "explication"), a set of purportedly necessary and sufficient conditions for some concept, giving its meaning; in justification they consider examples, putative counterexamples, alternative analyses and relations to other concepts. Euclidean philosophical theories, whose paradigm is The Elements, advance assumptions, considerations taken to warrant them, and investigate the consequences of the assumptions. Socratic theories have the form of definitions. Analyses of "virtue," "cause," "knowledge," "confirmation," "explanation," are ancient and recent examples. Euclidian theories have the form of formal or informal axiomatic systems and are often essentially mathematical: Euclid's geometry, Frege's logic, Kolmogorov's probabilities,.… That of course does not mean that Euclidean theories do not also contain definitions, but their definitions are not philosophical analyses of concepts. Nor does it mean that the work of Euclidean theories is confined to theorem proving: axioms may be reformulated to separate and illuminate their contents or to provide justifications (n.p.).

For causation, Socratic-style analyses dominated from approximately 1970 to the mid-1980s. By then, it had become apparent that all such theories either invoked noncausal primitives that were more metaphysically mysterious than causation itself, or were circular, or were simply unable to account for the asymmetry of causation or to separate spurious from real causation. Slowly, Euclidean style theories replaced Socratic ones, and by the early 1990s a rich axiomatic theory of causation had emerged that combined insights from statisticians, computer scientists, philosophers, economists, psychologists, social scientists, biologists, and even epidemiologists.

The 1970s and Early 1980s: The Age
of Causal Analyses

Several different analyses of causation were given serious attention in the 1970s. One school gave an account based on counterfactuals, another used Hume's idea of regularity or constant conjunction, still another attempted to reduce causation to probabilistic relations and another to physical processes interacting spatiotemporally, and yet another was founded on the idea of manipulability.

Event Causation versus Causal Generalizations

Legal cases and accident investigations usually deal with a particular event and ask what caused it. For example, when in February 2003 the space shuttle Columbia burned up during reentry, investigators looked for the cause of the disaster. In the end, they concluded that a chunk of foam insulation that had broken off and hit the wing during launch was the cause of a rupture in the insulating tiles, which was the cause of the shuttle's demise during reentry. Philosophers call this event causation, or actual causation, or token-causation.

Policy makers, statisticians, and social scientists usually deal with kinds of events, like graduating from college, or becoming a smoker, or playing lots of violent video games. For example, epidemiologists in the 1950s and 1960s looked for the kind of event that was causing a large number of people to get lung cancer, and they identified smoking as a primary cause. Philosophers call this type-causation, or causal generalization, or causation among variables.

The properties of causal relationships are different for actual causation and for causal generalizations. Actual causation is typically considered transitive, antisymmetrical, and irreflexive. If we are willing to say that one event A, say the Titanic hitting an iceberg on 12 April 1912, caused another event B, its hull ripping open below the water line and taking on water moments later, which in turn caused a third event C, its sinking a few hours later, then surely we should be willing to say that event A (hitting the iceberg) caused event C (sinking). So actual causation is transitive. (Plenty of philosophers disagree, for example, see the work of Christopher Hitchcock.) It is antisymmetrical because of how we view time. If a particular event A caused a later event B, then B did not cause A. Finally, single events do not cause themselves, so causation between particular events is irreflexive.

Causal generalizations, however, are usually but not always transitive, definitely not antisymmetrical, and definitely not irreflexive. In some cases causal generalizations are symmetrical, for example, confidence causes success, and success causes confidence, but in others they are not, for example, warm weather causes people to wear less clothing, but wearing less clothing does not cause the weather to warm. So causal generalizations are asymmetrical, not antisymmetrical, like actual causation. When they are symmetrical, causal generalizations are reflexive. Success breeds more success and so forth.

The counterfactual theory.

In the late 1960s Robert Stalnaker began the rigorous study of sentences that assert what are called contrary-to-fact conditionals. For example, "If the September 11, 2001, terrorist attacks on the United States had not happened, then the United States would not have invaded Afghanistan shortly thereafter." In his classic 1973 book Counterfactuals, David Lewis produced what has become the most popular account of such statements. Lewis's theory rests on two ideas: the existence of alternative "possible worlds" and a similarity metric over these worlds. For example, it is intuitive that the possible world identical to our own in all details except for the spelling of my wife's middle name ("Anne" instead of "Ann") is closer to the actual world than one in which the asteroid that killed the dinosaurs missed the earth and primates never evolved from mammals.

For Lewis, the meaning and truth of counterfactuals depend on our similarity metric over possible worlds. When we say "if A had not happened, then B would not have happened either," we mean that for each possible world W ₁ in which A did not happen and B did happen, there is at least one world W ₂ in which A did not happen and B did not happen that is closer to the actual world than W ₁. Lewis represents counter-factual dependence with the symbol: □ →, so P □ → Q means that, among all the worlds in which P happens, there is a world in which Q also happens that is closer to the actual world than all the worlds in which Q does not.

That there is some connection between counterfactuals and causation seems obvious. We see one event A followed by another B. What do we mean when we say A caused B? We might well mean that if A had not happened, then B would not have happened either. If the Titanic had not hit an iceberg, it would not have sunk. Formalizing this intuition in 1973, Lewis analyzed causation as a relation between two events A and B that both occurred such that two counterfactuals hold:

A □ → B, and
∼A □ → ∼B

Because both A and B already occurred, the first is trivially true, so we need only assess the second in order to assess whether A caused B.

Is this analysis satisfactory? Even if possible worlds and a similarity metric among them are clearer and less metaphysically mysterious than causal claims, which many dispute, there are two major problems with this account of causation. First, in its original version it just misses cases of overdetermination or preemption, that is, cases in which more than one cause was present and could in fact have produced the effect.

Overdetermination and Preemption

A spy, setting out to cross the desert with some key intelligence, fills his canteen with just enough water for the crossing and settles down for a quick nap. While he is asleep, Enemy A sneaks into his tent and pokes a very small hole in the canteen, and a short while later enemy B sneaks in and adds a tasteless poison. The spy awakes, forges ahead into the desert, and when he goes to drink from his canteen discovers it is empty and dies of thirst before he can get water. What was the cause of the spy's death? According to the counterfactual theory, neither enemy's action caused the death. If enemy A had not poked a hole in the canteen, then the spy still would have died by poison. If enemy B had not put poison into the canteen, then he still would have died from thirst. Their actions overdeter-mined the spy's death, and the pinprick from enemy A preempted the poison from enemy B.

In the beginning of the movie Magnolia, a classic causal conundrum is dramatized. A fifteen-year-old boy goes up to the roof of his ten-story apartment building, ponders the abyss, and jumps to his death. Did he commit suicide? It turns out that construction workers had installed netting the day before that would have saved him from the fall, but as he is falling past the fifth story, a gun is shot from inside the building by his mother, and the bullet kills the boy instantly. Did his mother murder her son? As it turns out, his mother fired the family rifle at his drunk step-father but missed and shot her son by mistake. She fired the gun every week at approximately that time after their horrific regular argument, which the boy cited as his reason for attempting suicide, but the gun was usually not loaded. This week the boy secretly loaded the gun without telling his parents, presumably with the intent of causing the death of his stepfather. Did he, then, in fact commit suicide, albeit unintentionally?

Even more importantly, Lewis's counterfactual theory has a very hard time with the asymmetry of causality and only a slightly better time with the problem of spurious causation. Consider a man George who jumps off the Brooklyn Bridge and plunges into the East River. (This example is originally from Horacio Arlo-Costa and is discussed in Hausman, 1998, pp. 116–117.) On Lewis's theory, it is clear that it was jumping that caused George to plunge into the river, because had George not jumped, the world in which he did not plunge is closer to the actual one than any in which he just happened to plunge for some other reason at approximately the same time. Fair enough. But consider the opposite direction: if George had not plunged, then he would not have jumped. Should we assent to this counterfactual? Is a world in which George did not plunge into the river and did not jump closer to the real one than any in which he did not plunge but did jump? Most everyone except Lewis and his followers would say yes. Thus on Lewis's account jumping off the bridge caused George to plunge into the river, but plunging into the river (as distinct from the idea or goal of plunging into the river) also caused George to jump. (Lewis and many others have amended the counterfactual account of causation to handle problems of overdetermination and preemption, but no account has yet satisfactorily handled the asymmetry of causality.)

For the problem of spurious causation, consider Johnny, who gets infected with the measles virus, runs a fever, and shortly thereafter gets a rash. Is it reasonable to assert that if Johnny had not gotten a fever, he would not have gotten a rash? Yes, but it was not the fever that caused the rash, it was the measles virus. Lewis later responded to this problem by prohibiting "backtracking" and to the problem of overdetermination and preemption with an analysis of "influence," but the details are beyond our scope.

Mackie's regularity account.

Where David Lewis tried to base causation on counterfactuals, John Mackie tried to extend Hume's idea that causes and effects are "constantly conjoined" and to use the logical idea of necessary and sufficient conditions to make things clear. In 1974 Mackie published an analysis of causation in some part aimed at solving the problems that plagued Lewis's counterfactual analysis, namely overdetermination and preemption. Mackie realized that many factors combine to produce an effect, and it is only our idiosyncratic sense of what is "normal" that draws our attention to one particular feature of the situation, such as hitting the iceberg. It is a set of factors, for example, A: air with sufficient oxygen, B: a dry pile of combustible newspaper and kindling, and C: a lit match that combine to cause D: a fire. Together the set of factors A, B, and C are sufficient for D, but there might be other sets that would work just as well, for example A, B, and F: a bolt of lightning. If there were a fire caused by a lit match, but a bolt of lightning occurred that also would have started the fire, then Lewis's account has trouble saying that the lit match caused the fire, because the fire would have started without the lit match; or put another way, the match was not necessary for starting the fire. Mackie embraces this idea and says that X is a cause of Y just in case X is an Insufficient but Necessary part of an Unnecessary but Sufficient set of conditions for Y, that is, an INUS condition. The set of conditions that produced Y need not be the only sufficient set, thus the set is not necessary, but X should be an essential part of a set that is sufficient for Y.

Again, however, the asymmetry of causality and the problem of spurious causation wreak havoc with Mackie's INUS account of causation. Before penicillin, approximately 10 percent of those people who contracted syphilis eventually developed a debilitating disease called paresis, and nothing doctors could measure seemed to tell them anything about which syphilitics developed paresis and which did not. As far as is known, paresis can result only from syphilis, so having paresis is by itself sufficient for having syphilis. Consider applying Mackie's account to this case. Paresis is an INUS condition of syphilis, because it is sufficient by itself for having syphilis, but it is surely not a cause of it.

Consider the measles. If we suppose that when people are infected they either show both symptoms (the fever and rash) or their immune system controls it and they show neither, then the INUS theory gets things wrong. The fever is a necessary part of a set that is sufficient for the rash: {fever, infected with measles virus}, and for that matter the rash is a necessary part of a set that is sufficient for fever: {rash, infected with measles virus}. So, unfortunately, on this analysis fever is an INUS cause of rash and rash is also a cause of fever.

Probabilistic causality.

Twentieth-century physics has had a profound effect on a wide range of ideas, including theories of causation. In the years between about 1930 and 1970 the astounding and unabated success of quantum mechanics forced most physicists to accept the idea that, at bedrock, the material universe unfolds probabilistically. Past states of subatomic particles, no matter how finely described, do not determine their future states, they merely determine the probability of such future states. Embracing this brave new world in 1970, Patrick Suppes published a theory of causality that attempted to reduce causation to probability. Whereas electrons have only a propensity, that is, an objective physical probability to be measured at a particular location at a particular time, perhaps macroscopic events like developing lung cancer have only a probability as well. We observe that some events seem to quite dramatically change the probability of other events, however, so perhaps causes change the probability of their effects. If Pr(E), the probability of an event E, changes given that another event C has occurred, notated Pr(E C), then we say E and C are associated. If not, then we say they are independent. Suppes was quite familiar with the problem of asymmetry, and he was well aware that association and independence are perfectly symmetrical, that is, Pr(E) Pr(E | C) ⇔ Pr(C) Pr(C E). He was also familiar with the problem of spurious causation and knew that two effects of a common cause could appear associated. To handle asymmetry and spurious causation, he used time and the idea of conditional independence. His theory of probabilistic causation is simple and elegant:

C is a prima facie cause of E if C occurs before E in time, and C and E are associated, that is, Pr(E) Pr(E C).
C is a genuine cause of E if C is a prima facie cause of E, and there is no event Z prior to C such that C and E are independent conditional on Z, that is, there is no Z such that Pr(E Z) Pr(E Z, C).

Without doubt, the idea of handling the problem of spurious causation by looking for other events Z that screen off C and E, although anticipated by Hans Reichenbach, Irving John Good, and others, was a real breakthrough and remains a key feature of any metaphysical or epistemological account that connects causation to probability. Many other writers have elaborated a probabilistic theory of causation with metaphysical aspirations, for example, Ellery Eells, David Papineau, Brian Skyrms, and Wolfgang Spohn.

Probabilistic accounts have drawn criticism on several fronts. First, defining causation in terms of probability just replaces one mystery with another. Although we have managed to produce a mathematically rigorous theory of probability, the core of which is now widely accepted, we have not managed to produce a reductive metaphysics of probability. It is still as much a mystery as causation. Second, there is something unsatisfying about using time explicitly to handle the asymmetry of causation and at least part of the problem of spurious causation (we can only screen off spurious causes with a Z that is prior in time to C).

Third, as Nancy Cartwright persuasively argued in 1979, we cannot define causation with probabilities alone, we need causal concepts in the definiens (definition) as well as the definiendum (expression being defined). Consider her famous (even if implausible) hypothetical example, shown in Figure 1: smoking might cause more heart disease, but it might also cause exercise, which in turn might cause less heart disease. If the negative effect of exercise on heart disease is stronger than the positive effect of smoking and the association between smoking and exercise is high enough, then the probability of heart disease given smoking could be lower than the probability of heart disease given not smoking, making it appear as though smoking prevents heart disease instead of causing it.

The two effects could also exactly cancel, making smoking and heart disease look independent. Cartwright's solution is to look at the relationship between smoking and heart disease within groups that are doing the same amount of exercise, that is, to look at the relationship between smoking and heart disease conditional on exercise, even though exercise does not in this example come before smoking, as Suppes insists it should. Why does Suppes not allow Zs that are prior to E but after C in time? Because that would allow situations in which although C really does cause E, its influence was entirely mediated by Z, and by conditioning on Z it appears as if C is not a genuine cause of E, even though it is (Fig. 2).

In Cartwright's language: Smoking should increase the probability of heart disease in all causally homogeneous situations for heart disease. The problem is circularity. By referring to the causally homogeneous situations, we invoke causation in our definition. The moral Cartwright drew, and one that is now widely accepted, is that causation is connected to probability but cannot be defined in terms of it.

Salmon's physical process theory.

A wholly different account of causation comes from Wes Salmon, one of the preeminent philosophers of science in the later half of the twentieth century. In the 1970s Salmon developed a theory of scientific explanation that foundered partly on an asymmetry very similar to the asymmetry of causation. Realizing that causes explain their effects but not vice versa, Salmon made the connection between explanation and causation explicit. He then went on to characterize causation as an interaction between two physical processes, not a probabilistic or logical or counterfactual relationship between events. A causal interaction, according to Salmon, is the intersection of two causal processes and the exchange of some invariant quantity, like momentum. For example, two pool balls that collide each change direction (and perhaps speed), but their total momentum after the collision is (ideally) no different than before. An interaction has taken place, but momentum is conserved. Explaining the features of a causal process is beyond the scope of such a short review article, but Phil Dowe has made them quite accessible and extremely clear in a 2000 review article in the British Journal for Philosophy of Science.

It turns out to be very difficult to distinguish real causal processes from psuedo-processes, but even accepting Salmon's and Dowe's criteria, the theory uses time to handle the asymmetry of causation and has big trouble with the problem of spurious causation. Again, see Dowe's excellent review article for details.

Manipulability theories.

Perhaps the most tempting strategy for understanding causation is to conceive of it as how the world responds to an intervention or manipulation. Consider a well-insulated, closed room containing two people. The room temperature is 58 degrees Fahrenheit, and each person has a sweater on. Later the room temperature is 78 degrees Fahrenheit and each person has taken his or her sweater off. If we ask whether it was the rise in room temperature that caused the people to peel off their sweaters or the peeling off of sweaters that caused the room temperature to rise, then unless there was some strange signal between the taking off of sweaters and turning up a thermostat somewhere, the answer is obvious. Manipulating the room temperature from 58 to 78 degrees will cause people to take off their sweaters, but manipulating them to take off their sweaters will not make the room heat up.

In general, causes can be used to control their effects, but effects cannot be used to control their causes. Further, there is an invariance between a cause and its effects that does not hold between an effect and its causes. It does not seem to matter how we change the temperature in the room from 58 to 78 degrees or from 78 to 58, the co-occurrence between room temperature and sweaters remains. When the temperature is 58, people have sweaters on. When the temperature is 78, they do not. The opposite is not true for the relationship between the effect and its causes. It does matter how they come to have their sweaters on. If we let them decide for themselves naturally, then the co-occurrence between sweaters and temperature will remain, but if we intervene to make them take their sweaters off or put them on, then we will annihilate any cooccurrence between wearing sweaters and the room temperature precisely because the room temperature will not respond to whether or not people are wearing sweaters. Thus manipulability accounts handle the asymmetry problem.

They do the same for the problem of spurious causation. Tar-stained fingers and lung cancer are both effects of a common cause—smoking. Intervening to remove the stains from one's fingers will not in any way change the probability of getting lung cancer, however.

The philosophical problem with manipulability accounts is circularity, for what is it to "intervene" and "manipulate" other than to "cause"? Intervening to set the thermostat to 78 is just to cause it to be set at 78. Manipulation is causation, so defining causation in terms of manipulation is, at least on the surface of it, circular.

Perhaps we can escape from this circularity by separating human actions from natural ones. Perhaps forming an intention and then acting to execute it is special, and could be used as a noncausal primitive in a reductive theory of causation. Writers like George Henrik von Wright have pursued this line. Others, like Paul Holland, have gone so far as to say that we have no causation without human manipulation. But is this reasonable or desirable? Virtually all physicists would agree that it is the moon's gravity that causes the tides. Yet we cannot manipulate the moon's position or its gravity. Are we to abandon all instances of causation where human manipulation was not involved? If a painting falls off the wall and hits the thermostat, bumping it up from 58 to 78 degrees, and a half hour later sweaters come off, are we satisfied saying that the sequence: thermostat goes up, room temperature goes up, sweaters come off was not causal?

Because they failed as reductive theories of causation, manipulability theories drew much less attention than perhaps they should have. As James Woodward elegantly puts it:

Philosophical discussion has been unsympathetic to manipulability theories: it is claimed both that they are unilluminatingly circular and that they lead to an implausibly anthropocentric and subjectivist conception of causation. This negative assessment among philosophers contrasts sharply with the widespread view among statisticians, theorists of experimental design, and many social and natural scientists that an appreciation of the connection between causation and manipulation can play an important role in clarifying the meaning of causal claims and understanding their distinctive features. (p. 25)

The Axiomatic and Epistemological Turn: 1985–2004

Although there will always be those unwilling to give up on a reductive analysis of causation, by the mid-1980s it was reasonably clear that such an account was not forthcoming. What has emerged as an alternative, however, is a rich axiomatic theory that clarifies the role of manipulation in much the way Woodward wants and connects rather than reduces causation to probabilistic independence, as Nancy Cartwright insisted. The modern theory of causation is truly interdisciplinary and fundamentally epistemological in focus. That is, it allows a rigorous and systematic investigation of what can and cannot be learned about causation from statistical evidence. Its intellectual beginnings go back to at least the early twentieth century.

Path analysis.

Sometime around 1920 the brilliant geneticist Sewall Wright realized that standard statistical tools were too thin to represent the causal mechanisms he wanted to model. He invented "path analysis" to fill the gap. Path analytic models are causal graphs (like those shown in Figs. 1 and 2) that quantify the strength of each arrow, or direct cause, which allowed Wright to quantify and estimate from data the relative strength of two or more mechanisms by which one quantity might affect another. By midcentury prominent economists (e.g., Herbert Simon and Herman Wold) and sociologists (e.g., Hubert Blalock and Otis Dudley Duncan) had adopted this representation. In several instances they made important contributions, either by expanding the representational power of path models or by articulating how one might distinguish one causal model from another with statistical evidence.

Path models, however, did nothing much to help model the asymmetry of causation.

In the simplest possible path model representing that X is a cause of Y (Fig. 3), we write Y as a linear function of X and an "error" term (that represents all other unobserved causes of Y besides X. The real-valued coefficient β quantifies X's effect on Y. Nothing save convention, however, prevents us from inverting the equation and rewriting the statistical model as:
X = αY + δ, where α = 1/β and δ = −1/β ε

This algebraically equivalent model makes it appear as if Y is the cause of X instead of vice versa. Equations are symmetrical, but causation is not.

Philosophy.

In the early 1980s two philosophers of causation, David Papineau and Daniel Hausman, paying no real attention to path analysis, nevertheless provided major insights into how to incorporate causal asymmetry into path models and probabilistic accounts of causation. Papineau, in a 1985 article titled "Causal Asymmetry," considered the difference between (1) two effects of a common cause and (2) two causes of a common effect (Fig. 4). He argued that two effects of a common cause (tar-stained fingers and lung cancer) are associated in virtue of having a common cause (smoking) but that two causes of a common effect (smoking and asbestos) are not associated in virtue of having a common effect (lung cancer). In fact he could have argued that the two effects of a common cause C are associated in virtue of C, but are independent conditional on C, whereas the two causes of a common effect E are not associated in virtue of E but are associated conditional on E.

Daniel Hausman, in a 1984 article (and more fully in a 1998 book Causal Asymmetries ), generalized this insight still further by developing a theory of causal asymmetry based on "causal connection." X and Y are causally connected if and only if X is a cause of Y, Y a cause of X, or there is some common cause of both X and Y. Hausman connects causation to probability by assuming that two quantities are associated if they are causally connected and independent if they are not. How does he get the asymmetry of causation? By showing that when X is a cause of Y, anything else causally connected to X is also connected to Y but not vice versa.

Papineau and Hausman handle the asymmetry of causation by considering not just the relationship between the cause and effect but rather by considering the way a cause and effect relate to other quantities in an expanded system. How does this help locate the asymmetry in the path analytic representation of causation? First, consider the apparent symmetry in the statistical model in Figure 3. X and are not causally connected and have Y as a common effect. Thus following both Papineau and Hausman, we will assume that X and are independent and that in any path model properly representing a direct causal relation C → E, C and the error term for E will be independent. But now consider the equation X = α Y + δ, which we used to make it appear that Y → X. Because of the way is defined, Y and will be associated, except for extremely rare cases.

Statistics and computer science.

Path analytic models have two parts, a path diagram and a statistical model. A path diagram is just a directed graph, a mathematical object very familiar to computer scientists and somewhat familiar to statisticians. As we have seen, association and independence are intimately connected to causation, and they happen to be one of the fundamental topics in probability and statistics.

Paying little attention to causation, in the 1970s and early 1980s the statisticians Phil Dawid, David Spiegelhalter, Nanny Wermuth, David Cox, Steffen Lauritzen, and others developed a branch of statistics called graphical models that represented the independence relationships among a set of random variables with undirected and directed graphs. Computer scientists interested in studying how robots might learn began to use graphical models to represent and selectively update their uncertainty about the world, especially Judea Pearl and his colleagues at the University of California, Los Angeles (UCLA). By the late 1980s Pearl had developed a very powerful theory of reasoning with uncertainty using Bayesian Networks and the Directed Acyclic Graphs (DAGs) attached to them. Although in 1988 he eschewed interpreting Bayesian Networks causally, Pearl made a major epistemological breakthrough by beginning the study of indistinguishability. He and Thomas Verma characterized when two Bayesian Networks with different DAGs entail the same independencies and are thus empirically indistinguishable on evidence consisting of independence relations.

Philosophy again.

In the mid-1980s Peter Spirtes, Clark Glymour, and Richard Scheines (SGS hereafter), philosophers working at Carnegie Mellon, recognized that path analysis was a special case of Pearl's theory of DAGs. Following Hausman, Papineau, Cartwright, and others trying to connect rather than reduce causation to probabilistic independence, they explicitly axiomatized the connection between causation and probabilistic independence in accord with Pearl's theory and work by the statisticians Harry Kiiveri and Terrence Speed. Their theory of causation is explicitly nonreductionist. Instead of trying to define causation in terms of probability, counterfactuals, or some other relation, they are intentionally agnostic about the metaphysics of the subject. Instead, their focus is on the epistemology of causation, in particular on exploring what can and cannot be learned about causal structure from statistics concerning independence and association. SGS formulate several axioms connecting causal structure to probability, but one is central:

Causal Markov Axiom: Every variable is probabilistically independent of all of its noneffects (direct or indirect), conditional on its immediate causes.

The axiom has been the source of a vigorous debate (see the British Journal for the Philosophy of Science between 1999 and 2002), but it is only half of the SGS theory. The second half involves explicitly modeling the idea of a manipulation or intervention. All manipulability theories conceive of interventions as coming from outside the system. SGS model an intervention by adding a new variable external to the system that

is a direct cause of exactly the variable it targets and
is the effect of no variable in the system

and by assuming that the resulting system still satisfies the Causal Markov Axiom.

If the intervention completely determines the variable it targets, then the intervention is ideal. Since an ideal intervention determines its target and thus overrides any influence the variable might have gotten from its other causes, SGS model the intervened system by "x-ing out" the arrows into the variable ideally intervened upon. In Figure 5a, for example, we show the causal graph relating room temperature and wearing sweaters. In Figure 5b we show the system in which we have intervened upon room temperature with I₁ and in Figure 5c the system after an ideal intervention I₂ on sweaters on.

This basic perspective on causation, elaborated powerfully and presented elegantly by Judea Pearl (2000), has also been adopted by other prominent computer scientists (David Heckerman and Greg Cooper), psychologists (Alison Gopnik and Patricia Cheng), economists (David Bessler, Clive Granger, and Kevin Hoover), epidemiologists (Sander Greenland and Jamie Robins), biologists (William Shipley), statisticians (Steffen Lauritzen, Thomas Richardson, and Larry Wasserman), and philosophers (James Woodward and Daniel Hausman).

How is the theory epistemological? Researchers have been able to characterize precisely, for many different sets of assumptions above and beyond the Causal Markov Axiom, the class of causal systems that is empirically indistinguishable, and they have also been able to automate discovery procedures that can efficiently search for such indistinguishable classes of models, including models with hidden common causes. Even in such cases, we can still sometimes tell just from the independencies and associations among the variables measured that one variable is not a cause of another, that two variables are effects of an unmeasured common cause, or that one variable is a definite cause of another. We even have an algorithm for deciding, from data and the class of models that are indistinguishable on these data, when the effect of an intervention can be predicted and when it cannot.

Like anything new, the theory has its detractors. The philosopher Nancy Cartwright, although having herself contributed heavily to the axiomatic theory, has been a vocal critic of its core axiom, the Causal Markov Axiom. Cartwright maintains that common causes do not always screen off their effects. Her chief counterexample involves a chemical factory, but the example is formally identical to another that is easier to understand. Consider a TV with a balky on/off switch. When turned to "on," the switch does not always make the picture and sound come on, but whenever it makes the sound come on, it also makes the picture come on (Fig. 6). The problem is this: knowing the state of the switch does not make the sound and the picture independent. Even having been told that the switch is on, for example, also being told that the sound is on adds information about whether the picture is also on.

The response of SGS and many others (e.g., Hausman and Woodward) is that it only appears as if we do not have screening off because we are not conditioning on all the common causes, especially those more proximate to the effects in question. They argue that we must condition on the Circuit Closed, and not just on the Switch, in order to screen off Sound and Picture.

A deeper puzzle along these same lines arises from quantum mechanics. A famous thought experiment, called the Einstein-Podolosky-Rosen experiment, considered a coupled system of quantum particles that are separated gently and allowed to diverge. Each particle is in superposition, that is, it has no definite spin until it is measured. J. S. Bell's famous inequality shows that no matter how far apart we allow them to drift, the measurements on one particle will be highly correlated with the other, even after we condition on the state of the original coupled system. There are no extra hidden variables (common causes) we could introduce to screen off the measurements of the distant particles. Although the details are quite important and nothing if not controversial, it looks as if the Causal Markov Axiom might not hold in quantum mechanical systems. Why it should hold in macroscopic systems when it might not hold for their constituents is a mystery.

The SGS model of an intervention incorporates many controversial assumptions. In a 2003 tour de force, however, James Woodward works through all the philosophical reasons why the basic model of intervention adopted by the interdisciplinary view is reasonable. For example, Woodward considers why a manipulation must be modeled as a direct cause of only the variable it targets. Not just any manipulation of our roomful of sweater-wearing people will settle the question of whether sweater wearing causes the room temperature. If we make people take off their sweaters by blowing super-hot air on them—sufficient to also heat the room—then we have not independently manipulated just the sweaters. Similarly, if we are testing to see if confidence improves athletic performance, we cannot intervene to improve confidence with a muscle relaxant that also reduces motor coordination. These manipulations are "fat hand"—they directly cause more than they should.

Woodward covers many issues like this one and develops a rich philosophical theory of intervention that is not reductive but is illuminating and rigorously connects the wide range of ideas that have been associated with causation. For example, the idea of an independent manipulation illuminates and solves the problems we pointed out earlier when discussing the counter-factual theory of causation. Instead of assessing counterfactuals like (1) George would not have plunged into the East River had he not jumped off the Brooklyn Bridge and (2) George would not have jumped off the bridge had he not plunged into the East River, we should assess counterfactuals about manipulations: (1) George would not have plunged into the East River had he been independently manipulated to not jump off the Brooklyn Bridge and (2) George would not have jumped off the bridge had he been independently manipulated not to have plunged into the East River. The difference is in how we interpret "independently manipulated." In the case of 2 we mean if we assign George to not plunging but leave everything else as it was, for example, if we catch George just before he dunks. In this way of conceiving of the counterfactual, George would have jumped off the bridge, and so we can recover the asymmetry of causation once we augment the counterfactual theory with the idea of an independent manipulation, as Woodward argues.

Conclusion

Although the whirlwind tour in this short article is woefully inadequate, the references below (and especially their bibliographies) should be sufficient to point interested readers to the voluminous literature on causation produced in the late twentieth century and early twenty-first century. Although the literature is vast and somewhat inchoate, it is safe to say that no reductive analysis of causation has emerged still afloat and basically seaworthy. What has been described here as the interdisciplinary theory of causation takes direct causation as a primitive, defines intervention from direct causation, and then connects causal systems to probabilities and statistical evidence through axioms, including the Causal Markov Axiom. Although it provides little comfort for those hoping to analyze causation Socratically, the theory does open the topic of causal epistemology in a way that has affected statistical and scientific practice, hopefully for the better. Surely that is some progress.

The Asymmetry of Causation through Causal Connection

Two variables A and B are "causally connected" if either A is a cause of B, B a cause of A, or a third variable causes them both. If causation is transitive, then it turns out that everything causally connected to X is connected to its effects, but not everything connected to Y is connected to its causes. When X → Y, everything causally connected to X is causally connected to Y (Fig. 7a), but something causally connected to Y is not necessarily causally connected to X (Fig. 7b).

See also Empiricism ; Epistemology ; Logic ; Quantum ; Rationalism .

bibliography

Bell, J. "On the Einstein-Podolsky-Rosen Paradox." Physics 1 (1964): 195–200.

Cartwright, Nancy. "Against Modularity, the Causal Markov Condition, and Any Link between the Two." British Journal for Philosophy of Science 53 (2002): 411–453.

——. "Causal Laws and Effective Strategies." Noûs 13 (1979).

——. How the Laws of Physics Lie. New York: Oxford University Press. 1983.

——. Nature's Capacities and Their Measurement. New York: Oxford University Press, 1989.

Dowe, Phil. "Causality and Explanation." British Journal for Philosophy of Science 51 (2000): 165–174.

Glymour, Clark. "Review of James Woodward, Making Things Happen: A Theory of Causal Explanation. " British Journal for Philosophy of Science. Forthcoming.

Hausman, Daniel. Causal Asymmetries. New York: Cambridge University Press, 1998.

——. "Causal Priority." Noûs 18 (1984): 261–279.

Halpern, J., and Judea Pearl. "Actual Causality." IJCAI Proceedings. 2002.

Hitchcock, C. "The Intransitivity of Causation Revealed in Equations and Graphs." Journal of Philosophy 98 (2001): 273–299.

——. "Of Humean Bondage." British Journal for Philosophy of Science 54 (2003): 1–25.

Holland, Paul. "Statistics and Causal Inference." Journal of the American Statistical Association 81 (1986): 945–960.

Kiiveri, H., and T. Speed. "Structural Analysis of Multivariate Data: A Review." In Sociological Methodology, edited by S. Leinhardt. San Francisco: Jossey-Bass, 1982.

Lauritzen, Steffen. Graphical Models. New York: Oxford University Press, 1996.

Lewis, David. "Causation as Influence." Journal of Philosophy 97 (2000): 182–197.

——. Counterfactuals. Cambridge, Mass.: Harvard University Press, 1973.

Mackie, John. The Cement of the Universe. New York: Oxford University Press, 1974.

McKim, S., and S. Turner. Causality in Crisis? Statistical Methods and the Search for Causal Knowledge in the Social Sciences. Notre Dame, Ind.: University of Notre Dame Press, 1997.

Meek, C., and C. Glymour. "Conditioning and Intervening." British Journal for Philosophy of Science 45 (1994): 1001–1021.

Papineau, David. "Causal Asymmetry." British Journal for Philosophy of Science 36 (1985): 273–289.

Pearl, Judea. Causality: Models, Reasoning, and Inference. New York: Cambridge University Press, 2000.

——. Probabilistic Reasoning in Intelligent Systems. San Mateo, Calif.: Morgan and Kaufman, 1988.

Reichenbach, Hans. The Direction of Time. Berkeley: University of California Press, 1956.

Salmon, Wes. Scientific Explanation and the Causal Structure of the World. Princeton, N.J.: Princeton University Press, 1984.

Simon, Herbert. "Spurious Correlation: A Causal Interpretation." JASA 49 (1954): 467–479.

Spirtes, Peter, Clark Glymour, and Richard Scheines. Causation, Prediction, and Search. 2nd ed. Cambridge, Mass.: MIT Press, 2000.

Spohn, Wolfgang. "Deterministic and Probabilistic Reasons and Causes." Erkenntnis 19 (1983): 371–396.

Suppes, Patrick. A Probabilistic Theory of Causality. Amsterdam: North-Holland, 1970.

Woodward, James. Making Things Happen: A Theory of Causal Explanation. Oxford: Oxford University Press, 2003.

Wright, Sewall. "The Method of Path Coefficients." Annals of Mathematical Statistics 5 (1934): 161–215.

Richard Scheines

New Dictionary of the History of Ideas Scheines, Richard

Causation

views updated Jun 11 2018

CAUSATION

Role of causation in the criminal law

The place of causation in criminal law doctrines. The part of the substantive criminal law commonly called the "special part" consists of several thousand prohibitions and requirements. Criminal codes typically prohibit citizens from doing certain types of action and sometimes (but much less frequently) require citizens to do certain types of actions. Causation enters into both the prohibitions and the requirements of a typical criminal code, for such statutes either prohibit citizens from causing certain results or require them to cause certain results. In either case causation is central to criminal liability.

It is sometimes urged that omission liability (that is, liability for not doing an act required by law) is noncausal, and there is a sense in which this is true. A defendant who omits to do an act the law requires him to do is not liable for having caused the harm that the act omitted would have prevented; rather, he is liable for not preventing the harm (Moore, 1993, pp. 267–278). Yet notice that to assess whether a defendant is liable for an omission to prevent some harm, a causal judgment is still necessary: we have to know that no act of the defendant prevented (i.e., caused the absence of) any such harm. For if some act of the defendant did cause the absence of a certain harm, then the defendant cannot be said to have omitted to have prevented the harm. One can, for example, only be liable for omitting to save another from drowning if none of one's acts have the causal property, saving-the-other-fromdrowning (Moore, 1993, pp. 29–31).

It is also sometimes said that many prohibitions of the criminal law do not involve causation. Criminal law typically prohibits theft, rape, burglary, conspiracy, and attempt, and (so the argument goes) these are types of actions that have no causal elements in them. Although this view has been elevated to a dogma accepted by both American and English criminal law theorists (Fletcher, 1978, pp. 388–390; Fletcher, 1998, pp. 60–62; Buxton, p. 18; Williams, p. 368), it is manifestly false. A theft occurs, for example, only when an actor's voluntary act causes movement ("asportation") of the goods stolen. Similarly a burglary occurs only when there is a breaking and an entering of a building, and these occur only when a defendant's voluntary act causes a lock on a window to be broken and causes the alleged burglar to be in the building in question (Moore, 1993, pp. 213–225). The temptation to accept the dogma (of noncausal criminal actions) stems from the fact that many of the results the criminal law prohibits are usually brought about rather directly. Penetration in rape, for example, usually is not the result of a lengthy chain of events beginning with the rapist's voluntary act. But this is not always the case, as where the defendant inserts the penis of another into the victim (Dusenberry v. Commonwealth, 220 Va. 770, 263 S.E2d 392(1980)); and in any case, that the causal conclusion is often easy to reach should not obscure the fact that a causal judgment is involved in all actions prohibited or required by the criminal law.

The place of causation in criminal law policy. It is a much debated question whether the criminal law should be so result-oriented. Why is the defendant who intends to kill another and does all he can to succeed in his plan less punishable when he fails to cause the harm intended than when he succeeds? Utilitarians about punishment typically justify this causation-oriented grading scheme by alluding either to popular sentiment or to the need to give criminals incentives not to try again. Retributivists about punishment typically invoke a notion of "moral luck" according to which a defendant's moral blameworthiness increases with success in his criminal plans (Moore, 1997, pp. 191–247). In any case, for one set of reasons or another, causation is an element of criminal liability for all completed crimes, in addition to mens rea and voluntariness of action.

Causation in criminal law and causation in tort law. Many of the leading cases on causation, most of the causal doctrines finding some acceptance in the law, and most of the theorizing about causation, originate in the law of tort and not in the criminal law. The reasons for this are not hard to discern. Unlike the thousands of specific actions prohibited or required by the criminal law, tort law largely consists of but one injunction: do not unreasonably act so as to cause harm to another. Such an injunction places greater weight on causation. It leaves open a full range of causal questions, much more than do injunctions of criminal law such as, "do not intentionally hit another."

Criminal law thus has been a borrower from torts on the issue of causation. Such borrowing has not been uniform or without reservations. Aside from the greater demands of directness of causation implicit in specific criminal prohibitions (noted above), the criminal sanction of punishment is sometimes said to demand greater stringency of causation than is demanded by the less severe tort sanction of compensation. Still, the usual form such reservations take is for criminal law to modify causation doctrines in tort by a matter of degree only (Moore, 1997, p. 363 n.1). Foreseeability, for example, is a test of causation in both fields, but what must be foreseeable, and the degree with which it must be foreseeable, is sometimes thought to be greater in criminal law than in torts. Such variation by degree only has allowed causation in criminal law and in torts to be discussed via the same tests, which we shall now do.

Conventional analysis of causation in the law

The two-step analysis. The conventional wisdom about the causation requirement in both criminal law and torts is that it in reality consists of two very different requirements. The first requirement is that of "cause-in-fact." This is said to be the true causal requirement because this doctrine adopts the scientific notion of causation. Whether cigarette smoking causes cancer, whether the presence of hydrogen or helium caused an explosion, are factual questions to be resolved by the best science the courts can muster. By contrast, the second requirement, that of "proximate" or "legal" cause, is said to be an evaluative issue, to be resolved by arguments of policy and not arguments of scientific fact. Suppose a defendant knifes his victim, who then dies because her religious convictions are such that she refuses medical treatment. Has such a defendant (legally) caused her death? The answer to such questions, it is said, depends on the policies behind liability, not on any factual issues.

The counterfactual analysis of cause-in-fact. By far the dominant test for cause-in-fact is the common law and Model Penal Code "sine qua non," or "but-for" test (MPC §2.03(1)). Such a test asks a counterfactual question: "but for the defendant's action, would the victim have been harmed in the way the criminal law prohibits?" This test is also sometimes called the necessary condition test, because it requires that the defendant's action be necessary to the victim's harm. The appeal of this test stems from this fact. The test seems to isolate something we seem to care a lot about, both in explaining events and in assessing responsibility for them, namely, did the defendant's act make a difference? Insofar as we increase moral blameworthiness and legal punishment for actors who do cause bad results (not just try to), we seemingly should care whether a particular bad result would have happened anyway, even without the defendant.

The policy analysis of legal cause. There is no equivalently dominant test of legal or proximate cause. There are nonetheless four distinguishable sorts of tests having some authority within the legal literature. The first of these are what we may call "ad hoc policy tests" (Edgarton). The idea is that courts balance a range of policies in each case that they adjudicate where a defendant has been found to have caused-in-fact a legally prohibited harm. They may balance certain "social interests" like the need for deterrence with certain "individual interests" like the unfairness of surprising a defendant with liability. Courts then decide wherever such balance leads. Whatever decision is reached on such case-by-case policy balancing is then cast in terms of "proximate" or "legal" cause. Such labels are simply the conclusions of policy balances; the labels have nothing to do with causation in any ordinary or scientific sense.

The second sort of test here is one that adopts general rules of legal causation. Such rules are adopted for various policy reasons also having nothing to do with causation, but this "rules-based" test differs from the last by its eschewal of case-by-case balancing; rather, per se rules of legal causation are adopted for policy reasons. Thus, the common law rule for homicide was that death must occur within a year and a day of the defendant's harmful action, else the defendant could not be said to have legally caused the death. Analogously, the "last wrongdoer rule" held that when a single victim is mortally wounded by two or more assailants, acting not in concert and acting seriatim over time, only the last wrongdoer could be said to be the legal cause of the death (Smith, p. 111). Such sorts of tests also found a temporary home in tort law with its "first house rule," according to which a railroad whose negligently emitted sparks burned an entire town was only liable for the house or houses directly ignited by its sparks, not for other houses ignited by the burning of those first burnt houses (Ryan v. New York Central R.R., 35 N.Y. 210, 91 Am. Dec.49 (1866)). There is no pretense in such rules of making truly causal discriminations; rather, such rules were adopted for explicit reasons of legal policy.

The third sort of test here is the well-known foreseeability test (Moore, 1997, pp. 363–399). Unlike the "rules-based" test, here there is no multiplicity of rules for specific situations (like homicide, intervening wrongdoers, railroad fires, etc.). Rather, there is one rule universally applicable to all criminal cases: was the harm that the defendant's act in fact caused foreseeable to him at the time he acted? This purportedly universal test for legal causation is usually justified by one of two policies: either the unfairness of punishing someone for harms that they could not foresee, or the inability to gain any deterrence by punishing such actors (since the criminal law's threat value is nonexistent for unforeseeable violations).

Some jurisdictions restrict the foreseeability test to one kind of situation. When some human action or natural event intervenes between the defendant's action and the harm, the restricted test asks whether that intervening action or event was foreseeable to the defendant when he acted (Moore, 1997, p. 363 n.1). This restricted foreseeability test is like the restricted rules we saw before and is unlike the universal test of legal causation the foreseeability test usually purports to be.

The fourth and last sort of test here is the "harm-within-the-risk" test (Green). Like the foreseeability test, this test purports to be a test of legal cause universally applicable to all criminal cases. This test too is justified on policy grounds and does not pretend to have anything to do with factual or scientific causation. Doctrinally, however, the test differs from a simple foreseeability test.

Consider first the arena from which the test takes its name, crimes of risk creation. If the defendant is charged with negligent homicide, for example, this test requires that the death of the victim be within the risk that made the actor's action negligent. Similarly, if the charge is manslaughter (for which consciousness of the risk is required in some jurisdictions), this test requires that the death of the victim be within the risk the awareness of which made the defendant's action reckless.

Extension of this test to nonrisk-creation crimes requires some modification. For crimes of strict liability, where no mens rea is required, the test requires that the harm that happened be one of the types of harms the risk of which motivated the legislature to criminalize the behavior. For crimes requiring knowledge or general intention for their mens rea, the test asks whether the harm that happened was an instance of the type of harm foreseen by the defendant as he acted. For crimes requiring purpose or specific intent for their mens rea, the test asks whether the harm that happened was an instance of the type of harm the defendant intended to achieve by his action.

What motivates all of these variations of the harm-within-the-risk test is the following insight: when assessing culpable mens rea, there is always a "fit problem" (Moore, 1997, pp. 469–476). Suppose a defendant intends to hit his victim in the face with a stick; suppose further he intends the hit to put out the victim's left eye. As it happens, the victim turns suddenly as he is being hit, and loses his right ear. Whether the harm that happened is an instance of the type of harm intended is what the present author calls the "fit problem." Fact finders have to fit the mental state the defendant had to the actual result he achieved and ask whether it is close enough for him to be punished for a crime of intent like mayhem. (If it is not close enough, then he may yet be convicted of some lesser crime of battery or reckless endangerment.)

The essential claim behind the harm within the risk test is that "legal cause" is the inapt label we have put on a problem of culpability, the fit problem. Proponents of this test urge that legal cause, properly understood, is really a mens rea doctrine, not a doctrine of causation at all.

Problems with the conventional analysis

Problems with the counterfactual test. Very generally there are four sorts of problems with the counterfactual test for causation in fact. One set of these problems has to do with proof and evidence. As an element of the prima facie case, causation-in-fact must be proven by the prosecution beyond a reasonable doubt. Yet counterfactuals by their nature are difficult to prove with that degree of certainty, for they require the fact finder to speculate what would have happened if the defendant had not done what he did. Suppose a defendant culpably destroys a life preserver on a seagoing tug. When a crewman falls overboard and drowns, was a necessary condition of his death the act of the defendant in destroying the life preserver? If the life preserver had been there, would anyone have thought to use it? Thrown it in time? Thrown it far enough? Have gotten near enough to the victim that he would have reached it? We often lack the kind of precise information that could verify whether the culpable act of the defendant made any difference in this way.

A second set of problems stems from an indeterminacy of meaning in the test, not from difficulties of factual verification. There is a great vagueness in counterfactual judgments. The vagueness lies in specifying the possible world in which we are to test the counterfactual (Moore, 1997, pp. 345–347). When we say, "but for the defendant's act of destroying the life preserver," what world are we imagining? We know we are to eliminate the defendant's act, but what are we to replace it with? A life preserver that was destroyed by the heavy seas (that themselves explain why the defendant couldn't destroy the life preserver)? A defendant who did not destroy the life preserver because he had already pushed the victim overboard when no one else was around to throw the life preserver to the victim? And so on. To make the counterfactual test determinate enough to yield one answer rather than another, we have to assume that we share an ability to specify a possible world that is "most similar" to our actual world, and that it is in this possible world that we ask our counterfactual question (Lewis, 1970).

The third and fourth sets of problems stem from the inability of the counterfactual test to match what for most of us are firm causal intuitions. The third set of problems arise because the counterfactual test seems too lenient in what it counts as a cause. The criticism is that the test is thus overinclusive. The fourth set of problems arise because the counterfactual test seems too stringent in what it counts as a cause. The criticism here is that the test is underinclusive.

The overinclusiveness of the test can be seen in at least four distinct areas. To begin with, the test fails to distinguish acts from omissions, in that both can be equally necessary to the happening of some event (Moore, 1993, pp. 267–278; Moore, 1999). Thus, on the counterfactual test both my stabbing the victim through the heart and your failure to prevent me (though you were half a world away at the time) are equally the cause of the victim's death. This is, to put it bluntly, preposterous.

It is important to see that there is a counterfactual question to ask about omissions before we blame someone for them. We do need to know, counterfactually, if the defendant had not omitted to do some action, whether that action would have prevented the harm in question. Yet the counterfactual test of causation would turn this question about an ability to prevent some harm, into a question of causing that which was not prevented. It is a significant objection to the counterfactual theory that it blurs this crucial distinction.

A second way in which the counterfactual test is overinclusive is with regard to coincidences. Suppose a defendant culpably delays his train at t₁; much, much later and much further down the track at t₂, the train is hit by a flood, resulting in damage and loss of life (Denny v. N.Y. Central R.R., 13 Gray (Mass.) 481 (1859)). Since but for the delay at t₁, there would have been no damage or loss of life at t₂, the counterfactual test yields the unwelcome result that the defendant's delaying caused the harm.

While such cases of overt coincidences are rare, they are the tip of the iceberg here. Innumerable remote conditions are necessary to the production of any event. Oxygen in the air over England, timber in Scotland, Henry the VIII's obesity, and Drake's perspicacity were all probably necessary for the defeat of the Spanish Armada (Moore, 1993, pp. 268–269), but we should be loath to say that each of these was equally the cause of that defeat.

A third area of overinclusiveness stems from the rockbed intuition that causation is asymmetrical with respect to time (Moore, 1999). My dynamite exploding at t₁ may cause your mother minks to kill their young at t₂, yet your mother minks killing their young at t₂ did not cause my dynamite to explode at t₁. The counterfactual test has a difficult time in accommodating this simple but stubborn intuition.

To see this, recall the logic of necessary and sufficient conditions. If event c is not only necessary for event e but also sufficient, then (of necessity) e is also necessary for c. In such a case c and e are symmetrically necessary conditions for each other and, on the counterfactual analysis, each is therefore the cause of the other. Intuitively we know that this is absurd, yet to avoid this result we must deny that some cause c is ever sufficient (as well as necessary) for some effect e. And the problem is that almost all proponents of the necessary condition test readily admit that every cause c is, if not sufficient by itself, then sufficient when conjoined with certain other conditions c' c", etc. (Mill, 1965, book 3, chap. 5, sec. 3). Sufficiency seems to well capture the commonsense view that causes make their effects inevitable. Yet, with such inevitability of effects from their causes come a necessity of those effects for those causes. Therefore, every effect is also a cause of its cause?

The fourth sort of overinclusiveness of the counterfactual analysis can be seen in cases of epiphenomena. One event is epiphenomenal to another event when both events are effects of a common cause (Moore, 1999). I jog in the morning with my dog. This has two effects: at t₂, my feet get tired; at t₃, my dog gets tired. Intuitively we know that my feet getting tired did not cause my dog to get tired. Yet the counterfactual analysis suggests just the opposite. My jogging in the morning was not only necessary for my feet getting tired, it (sometimes at least) was also sufficient. This means (see above) that my feet getting tired was necessary to my jogging in the morning. Yet we know (on the counterfactual analysis) that my jogging in the morning was necessary to my dog getting tired. Therefore, by the transitivity of "necessary," my feet getting tired was necessary to my dog getting tired. Therefore, the tiring of my feet did cause the tiring of my dog, contrary to our firm intuitions about epiphenomena.

The fourth set of problems for the counterfactual test has to do with the test's underinclusiveness. Such underinclusiveness can be seen in the well-known overdetermination cases (Moore, 1999; Wright, 1985, pp. 1775–1798), where each of two events c₁ and c₂ is independently sufficient for some third event e; logically, this entails that neither c₁ nor c₂ is necessary for e, and thus, on the counterfactual analysis of causation, neither can be the cause of e. Just about everybody rejects this conclusion, and so such cases pose a real problem for the counterfactual analysis.

There are two distinct kinds of overdetermination cases. The first are the concurrent-cause cases: two fires, two shotgun blasts, two noisy motorcycles, each are sufficient to burn, kill, or scare some victim. The defendant is responsible for only one fire, shot, or motorcycle. Yet his fire, shot, or noise joins the other one, and both simultaneously cause some single, individual harm. On the counterfactual analysis the defendant's fire, shot, or noise was not the cause of any harm because it was not necessary to the production of the harm—after all, the other fire, shot, or noise was by itself sufficient. Yet the same can be said about the second fire, shot, or noise. So, on the but-for test, neither was the cause! And this is absurd.

The preemptive kind of overdetermination cases are different. Here the two putative causes are not simultaneous but are temporally ordered. The defendant's fire arrives first and burns down the victim's building; the second fire arrives shortly thereafter, and would have been sufficient to have burned down the building, only there was no building to burn down. Here our intuitions are just as clear as in the concurrent overdetermination cases but they are different: the defendant's fire did cause the harm, and the second fire did not. Yet the counterfactual analysis again yields the counterintuitive implication that neither fire caused the harm because neither fire was necessary (each being sufficient) for the harm.

Situated rather nicely between these two sorts of overdetermination cases is what this author has called the asymmetrical overdetermination cases (Moore, 1999). Suppose the defendant nonmortally stabs the victim at the same time as another defendant mortally stabs the same victim; the victim dies of loss of blood, most of the blood gushing out of the mortal wound. Has the nonmortally wounding defendant caused the death of the victim? Not according to the counterfactual analysis: given the sufficiency of the mortal wound, the nonmortal wound was not necessary for, and thus not a cause of, death. This conclusion is contrary to common intuition as well as legal authority (People v. Lewis, 124 Cal. 551, 57 P. 470 (1899)).

Defenders of the counterfactual analysis are not bereft of replies to these objections. As to problems of proof they assert that counterfactuals are no harder to verify than other judgments applying causal laws to unobservable domains (such as those parts of the past for which there is no direct evidence, or those aspects of the universe too far removed for us to observe, or those future events beyond our likely existence). As to the problem of indeterminacy, they assert that we test counterfactuals in that possible world that is relatively close to our actual world; usually this means removing the defendant's action only, and then suspending enough causal laws so that events that normally cause such action just did not on this occasion (Wright, 1988). As to the problems of omissions and asymmetry through time, they assert that we should simply stipulate that a cause is not only a necessary condition for its effect, but it is also an event (not the absence of an event) that precedes (not succeeds) the event which is its effect. Such stipulations are embarrassingly ad hoc, but they do eliminate otherwise troublesome counterexamples. With regard to coincidences and epiphenomenal pairs of events, they assert that there are no causal laws connecting classes of such events with one another; one type of event is not necessary for another type of event, however necessary one particular event may be for its putative (coincidental or epiphenomerical) "effect." With regard to the embarrassment of riches in terms of how many conditions are necessary for any given event or state, they typically bite the bullet and admit that causation is a very nondiscriminating relation; however our usage of "cause" is more discriminating by building in pragmatic restrictions on when certain information is appropriately imparted to a given audience. As to the problem posed by the concurrent overdetermination cases, they usually urge that if one individuates the effect finely enough in such cases, one will see that each concurrent cause is necessary to that specific effect (American Law Institute, 1985). A two-bullet death is different than a one-bullet death, so that each simultaneous, mortally wounding bullet is necessary to the particular death (i.e., a two-bullet death) suffered by the victim shot by two defendants. Similarly, in the preemptive overdetermination cases, they assert that the first fire to arrive was necessary to the burning of the house, but the second was not, because had the first fire not happened the second fire still would have been prevented from burning the house (Lewis, 1970).

There are deep and well-known problems with all of these responses by the counterfactual theorists (Moore, 1999). Rather than pursue these, we should briefly consider modifications of the counterfactual test designed to end run some of these problems. With regard to the problem posed by the overdetermination cases, the best known alternative is to propose the NESS test: an event c causes an event e if and only if c is a necessary element in a set of conditions sufficient for e (Mackie; Wright, 1985). It is the stress on sufficiency that is supposed to end run the overdetermination problems. In the concurrent cause cases, where the two fires join to burn the victim's house, each fire is said to be a necessary element of its own sufficient set, so each fire is a cause. In the pre-emptive case, where the fires do not join and one arrives first, the first fire is a necessary element of a sufficient set, and so is the cause; but the second fire is not because absent from its set is the existence of a house to be burned.

There are problem with this NESS alternative too (Moore, 1999). For example, it is not stated how one individuates sets of conditions. Why aren't the two fires part of the same set, in which event neither is necessary? Also, in the preemptive case, isn't the addition of the condition, "existence of the victim's house at the time the second fire would be sufficient to destroy it," already sliding in the causal conclusion that the first fire already caused the house not to exist? Again these problems are not conclusive, and debate about them will no doubt continue for the foreseeable future. Such problems cause grave doubt to exist about any version of the counter-factual test among many legal theoreticians. Such academic doubts seem to have shaken the doctrinal dominance of the test very little, however.

Problems with the policy tests for legal cause. The main problem with both the ad hoc and the rule-based policy tests is that they seek to maximize the wrong policies. The general "functionalist" approach of such tests to legal concepts is correct: we should always ask after the purpose of the rule or institution in which the concept figures in order to ascertain its legal meaning. Yet the dominant purpose of the law's concept of causation is to grade punishment proportionately to moral blameworthiness. One who intentionally or recklessly causes a harm that another only tries to cause or risks causing, is more blameworthy (Moore, 1997, pp. 191–247). We must thus not seek the meaning of causation in extrinsic policies; rather, the legal concept of causation will serve its grading function only if the concept names some factual state of affairs that determines moral blameworthiness. By ignoring this dominant function of causation in criminal law, the explicit policy tests constructed an artificial concept of legal cause unusable in any just punishment scheme.

This problem does not infect the foreseeability and harm-within-the-risk tests. For those tests do seek to describe a factual state of affairs that plausibly determines moral blameworthiness. They are thus serving the dominant policy that must be served by the concept of causation in the criminal law. Their novelty lies in their reallocation of the locus of blame. On these theories, "legal cause" is not a refinement of an admitted desert-determiner, true causation; it is rather a refinement of another admitted desert-determiner, namely, mens rea (or "culpability").

Precisely because it is a culpability test, however, the foreseeability test becomes subject to another policy-based objection, that of redundancy. Why should we ask two culpability questions in determining blameworthiness? After we have satisfied ourselves that a defendant is culpable—either because she intended or foresaw some harm, or because she was unreasonable in not foreseeing some harm, given the degree of that harm's seriousness, the magnitude of its risk, and the lack of justification for taking such a risk—the foreseeability test bids us to ask, "was the harm foreseeable?" This is redundant, because any harm intended or foreseen is foreseeable, and any harm foreseeable enough to render an actor unreasonable for not foreseeing it, is also foreseeable.

The only way the foreseeability test avoids redundancy is by moving toward the harm-within-the-risk test. That is, one might say that the defendant was culpable in intending, foreseeing, or risking some harm type H, but that what his act in fact caused was an instance of harm type J; the foreseeability test of legal cause becomes nonredundant the moment one restricts it to asking whether J was foreseeable, a different question than the one asked and answered as a matter of mens rea about H. Yet this is to do the work of the harm-within-the-risk test, namely, the work of solving the "fit problem" of mens rea. Moreover, it is to do such work badly. Foreseeability is not the right question to ask in order to fit the harm in fact caused by a defendant to the type of harm he either intended to achieve or foresaw that he would cause. If the foreseeability test is to be restricted to this nonredundant work it is better abandoned for the harm-withinthe-risk test.

The main problem for the harm-within-the-risk test itself does not lie in any of the directions we have just explored. The test is in the service of the right policy in its seeking of a true desert-determiner, and the test does not ask a redundant question. To grade culpability by the mental states of intention, foresight, and risk we have to solve the fit problem above described. The real question for the harm-within-the-risk test is whether this grading by culpable mental states is all that is or should be going on under the rubric of "legal cause."

Consider in this regard two well-known sorts of legal cause cases. It is a time honored maxim of criminal law (as well as tort law) that "you take your victim as you find him." Standard translation: no matter how abnormal may be the victim's susceptibilities to injury, and no matter how unforeseeable such injuries may therefore be, a defendant is held to legally cause such injuries. Hit the proverbial thin-skulled man or cut the proverbial hemophiliac, and you have legally caused their deaths. This is hard to square with the harm-within-the-risk test. A defendant who intends to hit or to cut does not necessarily (or even usually) intend to kill. A defendant who foresees that his acts will cause the victim to be struck or cut, does not necessarily (or even usually) foresee that the victim will die. A defendant who negligently risks that his acts will cause a victim to be struck or cut is not necessarily (or even usually) negligent because he also risked death.

The second sort of case involves what are often called "intervening" or "superseding" causes. Suppose the defendant sets explosives next to a prison wall intending to blow up the wall and to get certain inmates out. He foresees to a practical certainty that the explosion will kill the guard on the other side of the wall. He lights the fuse to the bomb and leaves. As it happens, the fuse goes out. However: a stranger passes by the wall, sees the bomb, and relights the fuse for the pleasure of seeing an explosion; or, a thief comes by, sees the bomb and tries to steal it, dropping it in the process and thereby exploding it; or, lightning hits the fuse, reigniting it, and setting off the bomb; and so on. In all variations, the guard on the other side of the wall is killed by the blast. Standard doctrines of intervening causation hold that the defendant did not legally cause the death of the guard (Hart and Honore, 1985, pp. 133–185, 325–362). Yet this is hard to square with the harm-within-the-risk test. After all, did not the defendant foresee just the type of harm an instance of which did occur? Because the harm-within-the-risk question asks a simple type-to-token question—was the particular harm that happened an instance of the type of harm whose foresight by the defendant made him culpable—the test is blind to freakishness of causal route.

The American Law Institute's Model Penal Code modifies its adoption of the harm-withinthe-risk test in section 2.03 by denying liability for a harm within the risk that is "too remote or accidental in its occurrence to have a [just] bearing on the actor's liability or on the gravity of his offense." Such a caveat is an explicit recognition of the inability of the harm-within-the-risk test to accommodate the issues commonly adjudicated as intervening cause issues.

Such a recognition is not nearly broad enough to cover the inadequacy of the harm-within-the-risk approach. The basic problem with the test is that it ignores all of the issues traditionally adjudicated under the concept of legal cause. Not only is the test blind to freakishiness of causal route in the intervening cause situations, and to the distinction between antecedent versus after-arising abnormalities so crucial to resolution of the thin-skulled-man kind of issue, but the test also ignores all those issues of remoteness meant to be captured by Sir Francis Bacon's coinage, "proximate causation." Even where there is no sudden "break" in the chain of causation as in the intervening cause cases, there is a strong sense that causation peters out over space and time (Moore, 1999). Caesar's crossing the Rubicon may well be a necessary condition for my writing this article, but so many other events have also contributed that Caesar's causal responsibility has long since petered out. The logical relationship at the heart of the harm-within-the-risk test—"was the particular harm that happened an instance of the type of harm whose risk, foresight, or intention made the defendant culpable?"—is incapable of capturing this sensitivity to remoteness. As such, the harm-within-the-risk test is blind to the basic issue adjudicated under "legal cause." The harm-withinthe-risk test asks a good question, but it asks it in the wrong place.

Less conventional approaches to causation in the criminal law

The problems with the conventional analysis of causation have tempted many to abandon the conventional analysis, root and branch. This generates a search for a unitary notion of causation that is much more discriminating (in what it allows as a cause) than the hopelessly promiscuous counterfactual cause-in-fact test of the conventional analysis. Indeed, the search is for a unitary concept of causation that is so discriminating that it can do the work that on the conventional analysis is done by both cause-in-fact and legal cause doctrines. It is far from obvious that causation is in fact a sufficiently discriminating relation that it can do this much work in assigning responsibility. Nonetheless, there are four such proposals in the academic literature, each having some doctrinal support in the criminal law.

Space time proximateness and the substantial factor test. The oldest of the proposals conceives of causation as a metaphysical primitive. Causation is not reducible to any other sort of thing or things, and thus there is little by way of an analysis that one can say about it. However, the one thing we can say is that the causal relation is a scalar relation, which is to say, a matter of degree. One thing can be more of a cause of a certain event than another thing. Moreover, the causal relation diminishes over the number of events through which it is transmitted. The causal relation is thus not a fully transitive relation, in that if event c causes e, and e causes f, and f causes g, it may still be the case that c does not cause g.

On this view of causation, all the law need do is draw the line for liability somewhere on the scale of causal contribution. On matters that vary on a smooth continuum, it is notoriously arbitrary to pick a precise break-point; where is the line between middle age and old age, red and pink, bald and not-bald, or caused and not caused? This approach thus picks an appropriately vague line below which one's causal contribution to a given harm will be ignored for purposes of assessing responsibility. Let the defendant be responsible and liable for some harm only when the degree of his causal contribution to that harm has reached some non-de minimus, or "substantial," magnitude. This is the "substantial factor" test, first explicitly articulated by Jeremiah Smith (1911) and then adopted (but only as a test of cause in fact, not of causation generally) by the American Law Institute in its Restatement of Torts. To the common objection that the test tells us little, its defenders reply that that is a virtue, not a vice, for there is little to be said about causation. It, like obscenity, is something we can "know when we see it," without need of general definitions and tests.

Force, energy, and the mechanistic conception of cause. Other theorists have thought that we can say more about the nature of the causal relation than that it is scalar and diminishes over intervening events. On this view the nature of causation is to be found in the mechanistic concepts of physics: matter in motion, energy, force (Beale; Epstein; Moore, 1999). This test is similar to the substantial factor view in its conceiving the causal relation to be scalar and of limited transitivity.

This view handles easily the overdetermination cases that are such a problem for the conventional analysis. When two fires join, two bullets strike simultaneously, two motorcycles scare the same horse, each is a cause of the harm because each is doing its physical work. When one non-mortal wound is inflicted together with a larger, mortal wound, the victim dying of loss of blood, each is a cause of death because each did some of the physical work (loss of blood) leading to death.

Such a mechanistic conception of causation is mostly a suggestion in the academic literature because of the elusive and seemingly mysterious use of "energy" and "force" by legal theorists. One suspects some such view is often applied by jurors, but unless theorists can spell out the general nature of the relation being intuitively applied by jurors (as is attempted in Fair), this test tends to collapse to the metaphysically sparer substantial factor test.

Aspect causation and the revised counterfactural test. There is an ambiguity about causation that we have hitherto ignored but which does find intuitive expression in the decided cases. The ambiguity lies in the sorts of things that can be causes and effects, what are called the "relata" of the causal relation. The usual assumption is that causal relata are whole events; in the phrase "the firing of his gun caused the death of the victim," the descriptions "the firing of his gun" and "the death of the victim" each name events. Sometimes, however, we might say, "it was the fact that the gun fired was of such large caliber that caused the victim to die." That it was a large-caliber-gun firing is an aspect of the event. The whole event was the firing of the gun; one of that event's properties was that it was a large-caliber-gun firing.

Lawyers adopt this shift in causal relata when they distinguish the defendant's action as a cause, from some wrongful aspect of the defendant's action which is not causally relevant. Thus, when an unlicensed driver injuries a pedestrian, they say: "while the driving did cause the injuries, the fact that it was unlicensed driving did not."

A restrictive notion of causation can be found by restricting things eligible to be causal relata to aspects of a defendant's action that make him culpable (either by foresight, intent, or risk). Typically, this restriction is married to some counterfactual conception of causation (Wright, 1985). The resulting conception of causation promises fully as discriminating a notion as was achieved by the harm-within-the-risk approach of the conventional analysis (for notice that this conception really is just harm-within-the-risk conceptualized as a true causal doctrine rather than a construction of legal policy). Such a conception of causation must thus face the challenges faced by the harm-within-the-risk conception, namely, the inadequacy of either analysis to deal with intervening causation, remoteness, freakishness of causal route, and so on. In addition, this proposed conception faces metaphysical hurdles not faced by the harm-within-the-risk analysis, for it must make sense of the idea of aspects of events being causes, rather than events themselves.

Hart and Honore's direct cause test. Beginning in a series of articles in the 1950s and culminating in their massive book, Causation in the Law (1959), Herbert Hart and Tony Honore sought to describe a unitary conception of causation they saw as implicit both in the law and in everyday usages of the concept. One can see their concept most easily in three steps. First, begin with some version of the counterfactual analysis: a cause is a necessary condition for its effect (or perhaps a NESS condition). Second, a cause is not any necessary condition; rather, out of the plethora of conditions necessary for the happening of any event, only two sorts are eligible to be causes. Free, informed, voluntary human actions, and those abnormal conjunctions of natural events we colloquially refer to as "coincidences," are the two kind of necessary conditions we find salient and honor as "causes" (versus mere "background conditions"). Third, such voluntary human action and abnormal natural events cause a given effect only if some other voluntary human action or abnormal natural event does not intervene between the first such event and its putative effect. Such salient events, in other words, are breakers of causal chains as much as they are initiators of causal chains, so that if they do intervene they relegate all earlier such events to the status of mere background conditions.

Hart and Honore built on considerable case law support for their two candidates for intervening causes (Carpenter, pp. 471–530). Indeed, it is arguable that the basic distinction between principal and accomplice liability depends in part on this conceptualization of causation (Kadish). One concern for this view of causation, nonetheless, is the worry that it is incomplete with respect to the remoteness range of issues usually dealt with under the rubric of "legal cause" in the law. Causation fades out gradually as much as it breaks off suddenly in the law, and the Hart and Honore analysis ignores this.

Michael S. Moore

See also Attempt; Civil and Criminal Divide; Homicide: Legal Aspects; Punishment.

BIBLIOGRAPHY

American Law Institute. Model Penal Code: Proposed Official Draft. Philadelphia: American Law Institute, 1962.

——. Model Penal Code and Commentaries. Philadelphia: American Law Institute, 1985.

Beale, Joseph. "The Proximate Consequences of an Act." Harvard Law Review 33 (1920): 633–658.

Buxton, R. "Circumstances, Consequences, and Attempted Rape." Criminal Law Review (1984): 25–34.

Carpenter, Charles. "Workable Rules for Determining Proximate Cause." California Law Review 20 (1932): 229–259, 396–419, 471–539.

Edgarton, Henry. "Legal Cause." University of Pennsylvania Law Review 72 (1924): 211–244, 343–376.

Epstein, Richard. "A Theory of Strict Liability." Journal of Legal Studies 2 (1973): 151–204.

Fair, Davis. "Causation and the Flows of Energy." Erkenntnis (1979): 219–250.

Fletcher, George. Rethinking Criminal Law. Boston: Little Brown, 1978.

——. Basic Concepts of Criminal Law. Oxford, U.K.: Oxford University Press, 1998.

Green, Leon. Rationale of Proximate Cause. Kansas City, Mo.: Vernon Law Book Co., 1927.

Hart, H. L. A., and Honore, Tony. Causation in the Law. Oxford, U.K.: Oxford University Press, 1959.

Kadish, Sanford. "Causation and Complicity: A Study in the Interpretation of Doctrine." California Law Review 73 (1985): 323–410.

Lewis, David. "Causation." Journal of Philosophy 70 (1973): 556–567.

Mackie, John. The Cement of the Universe. Oxford, U.K.: Oxford University Press, 1974.

Mill, J. S. A System of Logic, 8th ed. London: Longman, 1961.

Moore, Michael. Act and Crime: The Implications of the Philosophy of Action for the Criminal Law. Oxford, U.K.: Clarendon Press, 1993.

——. Placing Blame: A General Theory of the Criminal Law. Oxford, U.K.: Oxford University Press, 1997.

——. "Causation and Responsibility." Social Philosophy and Policy 16 (1999): 1–51.

Smith, Jeremiah. "Legal Cause in Actions of Tort." Harvard Law Review 25 (1911): 103–128, 223–252, 253–269, 303–321.

Williams, Glanville. "The Problem of Reckless Attempts." Criminal Law Review (1983): 365–375.

Wright, Richard. "Causation in Tort Law." California Law Review 73 (1985): 1737–1828.

——. "Causation, Responsibility, Risk, Probability, Naked Statistics, and Proof: Pruning the Bramble Bush by Clarifying the Concepts." Iowa Law Review 73 (1988): 1001–1077.

Encyclopedia of Crime and Justice MOORE, MICHAEL S.

Causation

views updated Jun 08 2018

Causation

The problem of Hume

Causal ordering

Psychology of causal inference

Spurious correlation

Purpose and motivation

BIBLIOGRAPHY

A cause is something that occasions or effects a result (the usual lexical definition) or a uniform antecedent of a phenomenon (J. S. Mill’s definition). When a question is asked in the form “Why …?” it can usually be answered appropriately by a statement in the form “Because . …” Thus, to state the causes of a phenomenon is at least one way to explain the phenomenon; and a careful explication of the concept of causation in science must rest on a prior analysis of the notions of scientific explanation and scientific law.

Explanations in terms of causation are sought both for particular events and for classes of events or phenomena. Thus, a statement of the causes of World War II might include references to German economic difficulties during the 1930s or to the failure of the League of Nations to halt the Ethiopian conquest. On the other hand, a statement of the causes of war might include references to the outward displacement of aggression arising from internal frustrations, the absence of legitimized institutions for legal settlements of disputes between nations, and so on.

Causal explanations generally involve a combination of particular and general statements. In classical price theory, for example, a drop in the price of a commodity can be caused by an increase in its supply and/or a decrease in demand. If an explanation is desired for a drop in the price of wheat in a particular economy in a given year, it may be sought in the unusually large wheat crop of that year. (The large wheat crop may be explained, in turn, by a combination of general laws asserting that the size of the wheat crop is a function of acreage, rainfall, fertilization, and specific facts relevant to these laws—the actual acreage planted, rainfall, and amount of fertilizer applied that year.)

A general paradigm can be given for this kind of causal explanation. Let a be a particular situation (for example, the wheat market in 1965); let A(x) be a statement about situation x (for example, the supply in market x increases); and let B(x) be another such statement about x (for example, the price in market x declines). Suppose there is an accepted scientific law of the form

(x)(A(x)→B(x))

(for example, in any market, if the supply of the commodity increases, the price will decline). Upon substitution of a for x, this becomes (A(a) → B(a)); if the supply of the commodity in market a increases, its price will decline (for example, if the supply of wheat increased in 1965, its price declined). Then A(a) and (x)(A(x)→B(x)) provide, conjointly, a causal explanation for B(a). That is to say, A(a) occasions or effects B(a), while A(x) is the uniform antecedent of B(x). Thus, the paradigm incorporates both the lexical and Mill’s definitions of cause.

Explication of causation along these lines gives rise to three sets of problems that have been discussed extensively by philosophers of science. The first of these may be called the “problem of Hume” because all treatments of it in modern times—both those that agree with Hume and those that oppose him—take Hume’s analysis as their starting point (see Hume 1777). It is the logical and epistemo-logical problem of the nature of the connection between the “if” and the “then” in a scientific law. Is it a “necessary” connection, or a connection in fact, and how is the existence of the connection verified?

The second problem may be called the problem of causal ordering or causal asymmetry. If A(a) is the cause of B(a), we do not ordinarily think that B(a), or its absence, can cause A(a) or its absence. But in the standard predicate calculus of formal logic, (x)(A(x)→B(x)) implies (x)(~B(x)→ ~A(x)), where “~” stands for “not.” (“If much rainfall, other things equal, then a large wheat crop” implies “If a small wheat crop, other things equal, then not much rainfall.”) While we accept the inverse inference, we do not regard it as causal. (The size of the wheat crop does not retrospectively affect the amount of rain; knowledge of the size of the wheat crop may, however, affect our inference of how much rain there had been.) Thus two statements corresponding to the same truth-function (that is, either both are true or both are false ) need not express the same causal ordering. The asymmetry between A(x) and B(x) cannot, therefore, rest solely on observation of the situations in which either or both of these predicates hold. How can the causal relation be defined to preserve the asymmetry between cause and effect?

The third set of problems surrounding causation are the psychological problems. Michotte has explored the circumstances under which one event will in fact be perceived as causing another. Piaget and his associates have investigated the meanings that “cause,” “why,” “because,” and other terms of explanation have for children at various ages. In succeeding sections we shall examine these three sets of problems: the problem of Hume, the problem of causal ordering, and the psychological problems of causal perception and inference.

The problem of Hume

David Hume pointed out that even though empirical observation could establish that events of type B had in each case followed events of type A, observation could never establish that the connection between A and B was necessary or that it would continue to hold for new observations. Thus, general statements like “If A, then B” can serve as convenient summaries of numbers of particular facts but cannot guarantee their own validity beyond the limits of the particular facts they summarize.

Hume did not deny, of course, that people commonly make general inductions of the form “If A, then B” from particular facts; nor that they use these generalizations to predict new events; nor that such predictions are often confirmed. He did deny that the inductive step from particulars to generalization could be provided with a deductive justification, or that it could establish a “necessary” connection between antecedent and consequent, that is, a connection that could not fail in subsequent observations. Modern philosophers of science in the empiricist tradition hold positions very close to Hume’s. Extensive discussions and references to the literature can be found in Braithwaite (1953, chapter 10), Popper ([1934] 1959, chapter 3), and Nagel (1961, pp. 52–56).

In everyday usage, however, a distinction is often made between generalizations that denote “lawful” regularities and those that denote “accidental” regularities. “If there is a large wheat crop, the price of wheat will fall” states a lawful regularity. “If X is a man, then X’s skin color is not bright greenish blue” states an accidental regularity. But since our basis for accepting both generalizations is the same—we have observed confirming instances and no exceptions—the distinction between laws, on the one hand, and generalizations that are only factually or accidentally true, on the other, must be sought not in the empirical facts these generalizations denote but in their relations to other generalizations, that is, in the structure of scientific theories.

Within some given body of scientific theory, a general statement may be called a law if it can be deduced from other statements in that theory. The connection between the quantity of a commodity offered on a market and the price is lawful, relative to general economic theory, because it can be deduced from other general statements in economic theory: that price reaches equilibrium at the level where quantity offered equals quantity demanded; that the quantity demanded is smaller (usually) at the higher price. These latter statements may be derived, in turn, from still others: statements about the characteristics of buyers’ utility functions, postulates that buyers act so as to maximize utility, and so on.

Although lawfulness does not exempt a generalization from the need for empirical verification, a scientific law’s logical connections with others may subject it to indirect disconfirmation; and its direct disconfirmation may affect the validity of other generalizations in the system. A scientific theory may be viewed as a system of simultaneous relations from which the values of particular observations can be deduced. When a reliable observation conflicts with the prediction, some change must be made in the theory; but there is no simple or general way to determine where in the system the change must be made.

Nothing has been said here about the issue of determinism versus indeterminism, which is often discussed in relation to the problem of Hume (see Braithwaite 1953, chapter 10; Popper [1934] 1959, chapter 3; Nagel 1961, chapter 10). The concept of causal ordering is entirely compatible with either deterministic or probabilistic scientific theories. In a probabilistic theory, events are only incompletely determined by their causes, and formalization of such theories shows that the causal relations implicit in them hold between probability distributions of events rather than between the individual events. Thus, in a system described by a so-called Markov process, the probability distribution of the system among its possible states at time t is causally (and not probabilistically) determined by the probability distribution at time t – 1. [SeeMarkov chains.] In this situation, disconfirming a theory requires not just a simple disconfirming observation but rather a sufficiently large set of observations to show that a predicted distribution does not hold.

Causal ordering

It was pointed out above that we cannot replace “A causes B” with the simple truth-functional (x)(A(x) →B(x)) without creating difficulties of interpretation. For (x)(A(x)→B(x)) implies (x)(~B(x) →~A(x)), while we do not ordinarily infer “not-B causes not-A” from “A causes B.” On the contrary, if A causes B, it will usually also be the case that not-A causes not-B. Thus, if a large wheat crop causes a low price, we will expect a small wheat crop to cause a high price. The appropriate asymmetrical statement is that “the size of the wheat crop is a cause of the price of wheat,” and the example shows that the asymmetry here is different from that of “if-then” statements. Attempts (Burks 1951) to base a definition of the causal relation on the logical relation of implication have been unsuccessful for this reason.

The concept of scientific law, introduced in the last section, provides an alternative approach to explicating causal ordering. This approach makes the causal connection between two variables depend on the context provided by a scientific theory —a whole set of laws containing these variables (Simon [1947–1956] 1957, chapters 1, 3). This approach views the causal ordering as holding between variables rather than between particular values of those variables. As noted in the last paragraph, the variable “size of the wheat crop” (x) is to be taken as the cause for “price of wheat” (p), a large crop causing a low price and a small crop causing a high price (p = f (x), with dp/dx≤0).

Linear structures. For concreteness, consider the important special situation where a scientific theory takes the form of a set of n simultaneous linear algebraic equations in n variables. Then, apart from certain exceptional cases, these equations can be solved for the unique values of the variables. (In the general case the system is called a linear structure.) In solving the equations, algebraic manipulations are performed that do not change the solutions. Equations are combined until one equation is derived that contains only a single variable. This equation is then solved for that variable, the value is inserted in the remaining equations, and the process is repeated until the values of all variables have been found. In general, there is no single, set order in which the variables must be evaluated. Hence, from an algebraic viewpoint, there is no distinction between “independent” and “dependent” variables in the system. The variables are all interdependent, and the linear structure expresses that interdependence.

However, it may be found in particular systems of this kind that certain subsets of equations containing corresponding subsets of variables can be solved independently of the remaining equations. Such subsets are called self-contained subsets. In the extreme case, a particular equation may contain only one variable, which can then be evaluated as the dependent variable of that equation. Substituting its value as an independent variable in another equation, we may find that only one dependent variable remains, which can now be evaluated.

A causal ordering among variables of a linear structure that has one or more self-contained subsets can now be defined as follows: Consider the minimal self-contained subsets (those that do not themselves contain smaller self-contained subsets) of the system. With each such subset, associate the variables that can be evaluated from that subset alone. These are the endogenous or dependent variables of that subset and are exogenous to the rest of the system. Call them variables of order zero. Next, substitute the values of these variables in the remaining equations of the system, and repeat the whole process for these remaining equations, obtaining the variables of order one, two, and so on, and the corresponding subsets of equations in which they are the dependent variables. Now if a variable of some order occurs with nonzero co-efficient in an equation of the linear structure belonging to a subset of higher order, the former variable has a direct causal connection to the endogenous variables of the latter subset.

An example. The wheat price example will help make the above notions more concrete. Suppose a theory in the following form: (1) The amount of rain in a given year is taken as exogenous to the remainder of the system; that is, it is set equal to a constant. (2) The wheat crop is assumed to increase linearly with the amount of rain (within some range of values). (3) The price of wheat is assumed to move inversely, but linearly, with the size of the crop. The system thus contains three equations in three variables. The first equation, determining the amount of rain, is the only self-contained subset. Hence, amount of rain is a variable of order zero. Given its value, the second equation can be solved for the size of the wheat crop, a variable of order one. Finally, the third equation can be solved for the price of wheat, a variable of order two. Thus, there is a direct causal connection from the amount of rain to the size of the crop and from the size of the crop to the price.

Operational meaning—mechanisms. The causal ordering would be altered, of course, if before solving the equations the system is modified by taking linear combinations of them. This process is algebraically admissible, for it does not change the solutions, and, indeed, is employed as an essential means for solving simultaneous equations. If the causal ordering is not invariant under such transformations, can it be said to have operational meaning?

Operational meaning is assigned to the equations of the initial, untransformed system by associating with each equation a mechanism (meaning an identifiable locus of intervention or alteration in the system). “Intervention” may be human (for example, experimentation) or natural (for example, change in initial conditions). Just as the operational identity of individual variables in a system depends on means for measuring each independently of the others, so the operational identity of mechanisms depends on means for intervention in each independently of the others. Thus, in the wheat price model, the nature of the mechanism determining rainfall is unspecified, but the mechanism can be “modified” by taking a sample of years with different amounts of rain. Coefficients in the mechanism relating rainfall to the size of the wheat crop can be modified by irrigation or by growing drought-resistant strains of wheat. The mechanism relating the size of the crop to price can be modified by changing buyers’ incomes.

When particular mechanisms can be identified and causal ordering inferred, this knowledge permits predictions to be made of the effects on the variables of a system of specific modifications of the mechanisms—whether these be produced by policy intervention, experimental manipulation, or the impact of exogenous variables. Thus, although there is algebraic equivalence between a system in which each equation corresponds to a separate mechanism and systems obtained by taking linear combinations of the original equations, the derived systems are not operationally equivalent to the original one from the standpoint of control, experiment, or prediction. In the statistical literature, the equations that represent mechanisms and causal ordering are called structural equations; certain equivalent equations derived for purposes of statistical estimation are called reduced-form equations.

Temporal sequence. The method described here for defining causal ordering accounts for the asymmetry of cause and effect but does not base the asymmetry on temporal precedence. It imposes no requirement that the cause precede the effect. We are free to limit our scientific theories to those in which all causes do precede their effects, but the definition of causal ordering does not require us to do so. Thus, in experimental situations, average values of the independent variables over the period of the experiment may be interpreted as the causal determinants of the values of the dependent variables over the same period.

Many scientific theories do, however, involve temporal sequence. In dynamic theories, the state of the system at one point in time is (causally) determined by the state of the system at an earlier point in time. Generally, a set of initial conditions is given, specifying the state of the system at a point in time taken as origin. The initial conditions, together with the differential or difference equations of the system (the general laws), induce a causal ordering like that defined above.

Interdependence. If almost all the variables in a dynamic system are directly interdependent, so that the value of each variable at a given time depends significantly on the values of almost all other variables at a slightly earlier time, the causal ordering provides little information and has little usefulness. When the interrelations are sparse, however, so that relatively few variables are directly dependent on each other, a description of the causal ordering provides important information about the structure of the system and about the qualitative characteristics of its dynamic behavior (for example, the presence or absence of closed feedback loops). For this reason, the language of causation is used more commonly in relation to highly organized and sparsely connected structures —man-made mechanisms and organisms with their systems of organs—than in relation to some of the common systems described by the partial differential equations of chemistry and physics, where the interactions are multitudinous and relatively uniform.

Psychology of causal inference

The considerations of the preceding sections are purely logical and say nothing of the circumstances under which persons will infer causal connections between phenomena. Extensive studies of the psychology of causal inference have been made by Michotte and Piaget.

Michotte (1946) has shown that a perception of causal connection can be induced, for example, by two spots of light, the first of which moves toward the second and stops on reaching it, while the second then continues the motion in the same direction. Using numerous variants of this scheme, he has explored the circumstances under which subjects will or will not interpret the events causally. His evidence tends to show that the process by which the subject arrives at a causal interpretation is “direct,” subconscious, and perceptual— that causal attribution is not a conscious act of inference or induction from a sequence of events perceived independently. While the general distinction intended by Michotte is clear, its detailed interpretation must depend on a more precise understanding of the neural mechanisms of visual perception. In particular, little is known as yet about the respective roles of peripheral and central mechanisms or of innate and learned processes in the perception of causality.

Piaget (1923; 1927) has brought together a sizable body of data on children’s uses of causal language. A principal generalization from these data is that the earliest uses of “Why?” are directed toward motivation of actions and justification of rules; demands for naturalistic causal explanations appear only as the child’s egocentrism begins to wane.

Spurious correlation

Previous sections have dealt with the principal problems surrounding the concept of causation. The remaining sections treat some significant applications of causal language to the social sciences.

It has often been pointed out that a statistical correlation between two variables is not sufficient grounds for asserting a causal relation between them. A correlation is called spurious if it holds between two variables that are not causally related. The definition of causal ordering provides a means of distinguishing genuine from spurious correlation [seeFallacies, statistical; Multivariate analysis, articles oncorrelation].

Since the causal orderings among variables can be determined only within the context of a scientific theory—a complete structure—it is only within such a context that spurious correlation can be distinguished from genuine correlation. Thus, to interpret the correlation between variables x and y in causal terms, either there must be added to the system the other variables that are most closely connected with x and y, or sufficient assumptions of independence of x and/or y from other variables must be introduced to produce a self-contained system. When this has been done, the simple correlation between x and y can be replaced with their partial correlation (other variables being held constant) in the larger self-contained system. The partial correlation provides a basis for estimating the coefficients of the self-contained system and hence can be given a causal interpretation in the manner outlined earlier. It can be shown (Simon [1947–1956] 1957, chapter 2) that all causal inference from correlation coefficients involves, explicitly or implicitly, this procedure.

Suppose, for example, that per capita candy consumption is found to be (negatively) correlated with marital status. Can it be concluded that marriage causes people to stop eating candy or that candy eating inhibits marriage? The question can be answered only in the context of a more complete theory of behavior (Zeisel [1947] 1957, p. 198). If age is introduced as a third variable, it is found to have a high negative correlation with candy consumption but a high positive correlation with marital status. When age is held constant, the partial correlation of candy consumption with marital status is almost zero. If age is taken as the exogenous variable, these facts permit the inference that age causally influences both candy consumption and marital status but that there is no causal connection between the latter two variables —their correlation was spurious.

Practical techniques for interpreting correlations and distinguishing spurious from genuine relations have been discussed by Blalock (1964), Hyman (1955, chapters 6, 7), Kendall and Lazarsfeld (1950, pp. 135–167), Simon ([1947–1956] 1957, chapter 2), and Zeisel ([1947] 1957, chapter 9).

Purpose and motivation

Since the social sciences are much concerned with purposeful and goal-oriented behavior, it is important to ascertain how causal concepts are to be applied to systems exhibiting such behavior. Purposeful behavior is oriented toward achieving some desired future state of affairs. It is not the future state of affairs, of course, that produced the behavior but the intention or motive to realize this state of affairs. An intention, if it is to be causally efficacious for behavior, must reside in the central nervous system of the actor prior to or at the time of action. Hence, present intention, and not the future goal, provides the causal explanation for the behavior. The influence of expectations and predictions on behavior can be handled in the same way: the expectations are about the future but exist at present in the mind (and brain) of the actor (Rosenblueth et al. 1943).

The simplest teleological system that illustrates these points is a house thermostat. The desired state of affairs is a specified air temperature. The thermostat setting is the thermostat’s (present) representation of that goal—its intention. Measurements of the difference between actual temperature and setting are the causal agents that produce corrective action. Thus, the causal chain runs from the setting and the temperature-measuring device, to the action of a heat source, to the temperature of the room.

The term function (in its sociological, not mathematical, sense) can be analyzed similarly. To say that the family has the function of nurturing children is to say (a) that it is causally efficacious to that end; (b) that it operates in a goal-oriented fashion toward that end; and, possibly, (c) that it contributes causally to the survival of the society.

As Piaget’s studies show, teleological explanation —explanation in terms of the motives for and justifications of action—is probably the earliest kind of causal analysis observable in children. Similarly, children tend to interpret causation anthropomorphically—to treat the cause as an active, living agent rather than simply a set of antecedent circumstances. Thus teleological explanation, far from being distinct from causal explanation in science, is probably the prototype for all causal analysis.

Influence and power relations. An influence or power mechanism, in the terms of the present discussion, is simply a particular kind of causal mechanism—the cause and effect both being forms of human behavior. The asymmetry of the causal relation is reflected in the asymmetry of these mechanisms, considered singly. This does not mean that there cannot be reciprocal relations and feedback loops but simply that the influence of A on B can be analyzed (conceptually and sometimes empirically) independently of the influence of B on A (Simon [1947–1956] 1957, chapter 4).

In sum, causal language is useful language for talking about a scientific theory, especially when the variables the theory handles are interconnected, but sparsely so, and especially when there is interest in intervention (for reasons of policy or experiment) in particular mechanisms of the system. In a formalized theory, a formal analysis can be made of the causal relations asserted by the theory. When causal language is used in this way—and most everyday use fits this description—it carries no particular philosophical implications for the problem of Hume or the issue of determinism. Causal concepts are as readily applied to living teleological systems as to inanimate systems, and influence and power relations are special cases of causal relations.

Herbert A. Simon

[See alsoPower; Prediction; Scientific explanation.]

BIBLIOGRAPHY

Blalock, Hubert M. JR. 1964 Causal Inferences in Nonexperimental Research. Chapel Hill: Univ. of North Carolina Press.

Braithwaite, Richard B. 1953 Scientific Explanation. Cambridge Univ. Press.

Bredemeier, Harry C. 1966 [Review of] Cause and Effect, edited by Daniel Lerner. American Sociological Review 31:280–281.

Brown, Robert R. 1963 Explanation in Social Science. London: Routledge; Chicago: Aldine.

Burks, Arthur W. 1951 The Logic of Causal Propositions. Mind 60:363–382.

Cause and Effect. Edited by Daniel Lerner. 1965 New York: Free Press.

Hume, David (1777) 1900 An Enquiry Concerning Human Understanding. Chicago: Open Court. → A paperback edition was published in 1955 by Bobbs-Merrill.

Hyman, Herbert H. 1955 Survey Design and Analysis: Principles, Cases, and Procedures. Glencoe, III.: Free Press.

Kendall, Patricia L.; and Lazarsfeld, Paul F. 1950 Problems of Survey Analysis. Pages 133–196 in Robert K. Merton and Paul F. Lazarsfeld (editors), Continuities in Social Research: Studies in the Scope and Method of The American Soldier. Glencoe, III.: Free Press.

Michotte, Albert (1946) 1963 The Perception of Causality. Paterson, N.J.: Littlefield. → First published in French.

Nagel, Ernest 1961 The Structure of Science: Problems in the Logic of Scientific Explanation. New York: Harcourt.

Piaget, Jean (1923) 1959 The Language and Thought of the Child. 3d ed., rev. New York: Humanities Press. → First published as Le langage et la pensée chez I’enfant.

Piaget, Jean (1927) 1930 The Child’s Conception of Physical Causality. New York: Harcourt; London: Routledge. → First published as La causalité physique chez I’enfant. A paperback edition was published in 1960 by Littlefield.

Popper, Karl R. (1934) 1959 The Logic of Scientific Discovery. New York: Basic Books. → First published as Logik der Forschung.

Rosenblueth, A.; Wiener, Norbert; and Bigelow, J. 1943 Behavior, Purpose and Teleology. Philosophy of Science 10, no. 1:18–24.

Simon, Herbert A. (1947–1956) 1957 Models of Man, Social and Rational: Mathematical Essays on Rational Human Behavior in a Social Setting. New York: Wiley.

Wold, Herman (editor) 1964 Econometric Model Building: Essays on the Causal Chain Approach. Amsterdam: North-Holland Publishing.

Zeisel, Hans (1947) 1957 Say It With Figures. 4th ed., rev. New York: Harper.

International Encyclopedia of the Social Sciences

Causation

views updated May 18 2018

Causation

The notion of cause is one of the most common yet thorniest concepts in the history of philosophy. This should come as no surprise. Questions of causation tie up with such divisive issues as determinism and moral responsibility, as well as with the principle of the causal closure of the physical universe and the possibility of divine action. Furthermore, causation is intimately intertwined with the notion of change. Together these two notions stood at the cradle of such momentous intellectual traditions as Western philosophy in Asia Minor, the Vedic hymns and the Upanishads in Central and South Asia, and early Buddhism along the borders of the ancient Ganges. They constituted the first and fundamental challenge to systematic thought, inspiring a variety of solutions still resonating in intellectual debates.

People use causal idiom in everyday life with great ease, yet upon closer scrutiny this family of notions seems to defy analysis and justification. The famous comment of Augustine of Hippo (354–430) regarding the question of time applies with equal force to the analogous question concerning causation: When nobody asks us, we know what it means; when queried, we don't.

Quite generally, a cause produces something called the effect ; and the effect can be explained in terms of the cause. Usually the effect is taken to be a change in something already existing. Yet in traditional theology it has also been assumed that causes may give rise to new substances out of nothing. Thus in the Judeo-Christian tradition God is seen as the creator of the universe, which God created out of nothing. Similarly, theories of self-causation and creation by God were two of the major causal theories in the Vedic tradition. By contrast, early Buddhism rejected these two views, arguing that the idea of self-causation would imply the prior existence of the effect, while the idea of external causation would imply the production of a nonexistent effect out of nothing. Similarly, Thomas Aquinas (c. 1225–1274) rejected the theological notion of self-causation as philosophically untenable. God cannot possibly be regarded as causa sui, he argued, since either God existed to cause God, in which case God did not need to cause God; or else God did not yet exist, in which case God could not be anything to be able to cause God.

Aristotle's theory of causation

Aristotle (384–322 b.c.e.), too, regarded causes as producing changes in preexisting substances only. To be sure, when a moth emerges from a caterpillar the change is so striking that a new word is naturally used for the causal product. And yet the moth emerged from the pre-existing caterpillar. By contrast, when a leaf turns red, it is still called a leaf, because the change is less striking. Aristotle called the former type of change generation, as opposed to the merely qualitative change—or, in his terminology, motion (kinesis )—taking place in the latter kind of case. Yet the distinction is plainly a relative one, opposing rather than licensing the idea of new substances being producible out of nothing by a cause. Indeed, the conception of creation ex nihilo is foreign to the whole tradition of ancient Greek thought.

Commenting on Plato (428–347 b.c.e.) as well as on his pre-Socratic predecessors, Aristotle famously distinguished four types of causes or explanatory principles (the Greek word aitia is ambiguous between these two rather different meanings). A statue of Zeus, for example, is wrought by a sculptor (its efficient cause or causa efficiens, also known as the causa quod ) out of marble (the material cause or causa materialis ), which thereby takes on the shape the artist has in mind (the formal cause or causa formalis ) in order that it may serve as an object of worship (the final cause or causa finalis, also known as the causa ut ). Plato's forms, or formal causes (causa exemplaris ), had been transcendent ideas in the mind of the Demiurge. By contrast, in Aristotle's theory of natural change the four causes together have an immanent teleological character. The form being developed is an integral part of the thing itself. Thus, the formal cause for an acorn developing into an oak tree is the seed's intrinsic character disposing it to become an oak tree rather than, say, a maple tree.

Naturally Aristotle's largely teleological theory of causation authorized the abundant use of final causes in explanations of natural phenomena. Thus his theory of motion espoused the principle that objects strive toward their locus naturalis, while medieval hydraulics—just to give another example—promulgated the principle that nature abhors a vacuum (Natura abhorret vacuum ).

Mechanicism and the demise of the teleological theory of causation

While medieval scholastic thought was still dominated by Aristotle's theory of causation, seventeenth-century science opposed its teleological underpinnings. Natural order and change, it claimed, could be produced by "blind" efficient causation alone, without the need of final or formal causes intervening in the process. Having created the matter of the universe together with the laws of mechanics, God could have left the world to its own in any disordered fashion, claims René Descartes (1596–1650) in Le Monde, and yet in due course the universe would have taken on its current natural order of celestial motions and "terrestrial" physics mechanically, driven blindly by efficient causes alone and without "striving" to achieve any final perfections or divine purposes.

This conception of the causal "machinery" of the universe being limited to efficient causation presented a stimulating and exceedingly fruitful research program to modern science. In due course its validity was proclaimed to extend not just to mechanics proper, but also to physiology and chemistry, to biology (in the Darwinian program), to ethology, and even to the realm of human action in twentieth-century sociobiology and of human thought in late-twentieth-century cognitive science. And yet, from the very start, the program spawned riddles and grave philosophical difficulties. Chief among these was the difficulty involved in the widely held view that linked (efficient) causation to necessity. David Hume (1711–1776) notoriously pointed out that causal pairs are related neither by logical nor by empirical necessity. It is both logically and empirically possible for an effect to fail to follow a given cause. In fact Hume's influential argument had a theological background. French theological tradition, including notably Descartes and Nicolas Malebranche (1638–1715), had always been keen to stress the point that God's freedom is unfettered by any restrictions whatsoever. Hence given any cause, God is always free not to permit the effect to follow. Thus, causes alone, unaccompanied by the will of God, are never sufficient conditions for their effects. Nor, given God's omnipotence, can they be allowed to be necessary conditions for their effects. For God is free to bring about the effect by any other mediating cause or even by simply willing it.

Having failed to find an empirical basis for the idea of necessary connection in the case of singular causation, Hume turned his attention (without making a clear distinction) to the case of causation as it exists between classes of (similar) events. Analyzing this latter notion Hume advanced a regularity theory of causation. Eschewing powers and necessary connections, Hume thought causation could be adequately dealt with in terms of the "constant conjunction" of similar causes with similar effects. In addition, conditions of temporal priority and spatiotemporal contiguity were also required. This analysis, in Hume's view, had the distinct merit of being entirely empirical. Yet subsequent generations of (logical) empiricists have found, to their exasperation, that the empiricist ideals are not that easily fulfilled.

One difficulty is that of distinguishing between accidental and genuinely causal regularities. As the Scottish philosopher Thomas Reid (1710–1796) famously remarked in his Essays on the Active Powers of the Human Mind (1788), day is invariably followed by night and night by day and yet neither is the cause of the other. One tempting way to find a distinctive mark is to say that statements of causal regularities, unlike those that express merely accidental generalizations, are supported by corresponding counterfactuals. Thus, it is presumably true that a given piece of metal would have expanded had it been heated. By contrast, even if all the marbles in a given bag happen to be red, that fact alone doesn't add credence to the counterfactual that had the green marble in my hand been a marble in that bag, it would have been red. Yet, reliance on counterfactuals would involve a high price for empiricists to pay. For the truth conditions of counterfactuals are notoriously beyond the reach of empirical verification.

Another difficulty was raised by Bertrand Russell (1872–1970), who noted that in order for events to be causally connected they must be similar not just in arbitrary respects, but in relevant respects. For example, two matches may differ only in color or, alternatively, only in one being wet while the other is dry. Yet for the question of whether striking them will cause them to ignite only the latter dissimilarity counts while the former is entirely irrelevant. But how is one going to specify this notion of relevance? One is tempted to rely on an undefined notion of causal relevance. But that would critically trivialize Hume's analysis because it would utilize the notion of causation in the very attempt to analyze it.

Oriental theories of causation

For Hume, then, the idea of causation, insofar as it is mistakenly bound up with such unfounded notions as power or necessary connection, does not represent anything objective. The implied idea of necessity does not arise from anything in the external world. Rather it results from a mental response to the constant conjunction of causes and their effects. By comparison, in Indian philosophy the objectivity of causation has been subject to considerable shifts of opinion. The first to deny the principle of causation was the idealist school of the Upanishads. Insisting that reality and soul (atman ) were permanent and eternal, they denied change and therefore causation. Like Hume, but for different "Parmenidean" reasons, these thinkers considered change and causation mental constructs, or purely subjective phenomena. Conversely, the consequent denial of atman or self among early Buddhist materialists led to fruitful speculation regarding causality and change. However, in their extreme aversion to the idealist metaphysics of the Upanishads, these materialists went on to deny all mental phenomena. This annihilationism is opposed to the earlier belief of Upanishad philosophers in eternalism. However, according to the "middle path" preached by the Buddha, both positions are errors stemming from two opposite extremes with regard to causation, which early Buddhism set out to steer clear of: on the one hand the belief in self-causation, resulting in a belief in eternalism; on the other the belief in external causation, fostering a belief in annihilationism. While early Buddhism, like Hume, rejected the belief in a mental substance or "self," it did not share his conclusion that "were all my perceptions remov'd by death [ … ] I shou'd be entirely annihilated … " (Hume, p. 252). The reason for this is precisely that unlike Hume early Buddhism insisted on the objective validity of causal processes, which it referred to as constituting the "middle" between the two extremes of eternalism and annihilationism. Consequently it regarded such causal processes sufficient for sustaining the continuity of a thing without positing a "self" or a "substance."

The importance of causality as an objective category in early Buddhism is brought out clearly by the fact that of the four noble truths discovered by the Buddha, the second and the third refer to the theory of causation. In the early Pali Nikayas and Chinese Agamas causation is not a category of relations among ideas but represents an objective ontological feature of the external world. Yet there has been much debate concerning the notion of avitathata, the second characteristic of the causal nexus in Buddhist philosophy. The Buddhist philosopher Buddhaghosa (late fourth and early fifth centuries c.e.) rendered this concept as "necessity," while others have championed a rather deflationary Humean interpretation of mere regularity and constant conjunction. From a more balanced perspective, what seems to be at stake in such discussions is to free causation from strict determinism. Thus a fourth characteristic of causation, idappaccayata or conditionality, is supposed to place causality midway between fatalism (niyativada ), or unconditional necessity, and accidentalism (yadrc-chavada ), or unconditional arbitrariness. Clearly, the underlying concern here is the problem of moral responsibility, which Buddhist thinkers are anxious to uphold.

Volitional causation

Taking their clue from Hume that causality is not a physical connection inasmuch as one never observes any hidden power in any given cause, philosophers of an empiricist bent have insisted ever since on analyses of causality in terms of necessary and sufficient conditions for the applicability of the term. Thus, they focused on the logical and linguistic aspects of the notion of causality to the neglect of trying to find a physical connection between cause and effect. An example is John L. Mackie's (1917–) sophisticated regularity account in The Cement of the Universe (1974).Yet the contrary opinion has not been without its adherents. Even before Hume, John Locke (1632–1704), while discussing causality, appealed to the model of human volition. When one raises one's arm, he argued, one is directly aware of the power of one's volition to bring about the action.

This purposive perspective on causation has independent merits. For one thing it can make perfectly good sense of singular causation. For another, it avoids the vexing problem of so-called causal asymmetry. The fact that one ordinarily refuses to allow effects to precede their causes—Hume's condition of the temporal priority of causes—may on this view simply be seen as a natural consequence of the familiar experience that whatever actions one initiates cannot bring about the past. In fact, this volitional model of causation has been more influential than is generally acknowledged even among protagonists of the scientific revolution. Thus Isaac Newton (1642–1727) toyed with, and George Berkeley (1685–1753) championed, a theological construal of gravitation. Instead of invoking gravitational action-at-a-distance, a notion that Newton himself had deemed embarrassing enough to keep his theory locked up in a drawer for almost twenty years, it was, according to this view, God's own intervention that caused the sun ever so slightly to drift toward such large but immensely distant planets such as Jupiter, in accordance with mathematical patterns and laws that Newton had the genius to unravel. Needless to say, such animistic astronomy fails to carry conviction at the present time. But it is good to realize, if only for expository purposes, that Berkeley's animistic world is not a world without causation. Rather it is a world where all causation is volitional. When this world is stripped of volitional causation, what remains is a "Hume world," a world truly without causation. If philosophers have found such a world equally unconvincing, they could then ask the critical question: What crucial ingredient is the Hume world lacking that our world supposedly possesses?

Recent debates: realist vs. pragmatist views on causation

Apparently there are at least two ways to go from there: One can follow either the realist or the Kantian-pragmatist way out. The opposition in question is neatly exemplified by two contemporary schools of thought on causation, one represented by Wesley Salmon (1925–2001), the other highlighted by the philosophy of Philip Kitcher (1947–). Salmon has argued that there does exist, after all, an empirically verifiable physical connection between cause and effect. It is to be found in the notion of a causal process, rather than in that of a causal interaction, which Hume mistakenly took as his paradigm. Furthermore, thanks to the theory of relativity that sets an upper limit to the transmission of causal signals, we can now empirically distinguish between genuinely causal processes (e.g., light rays traveling at straight lines from a rotating beacon to the surrounding wall of, say, the Colosseum) and mere pseudoprocesses (e.g., a spot of light "traveling" along the inner wall of the Colosseum as a result of a central beacon rotating at very high speed). While pseudo-processes may travel at arbitrarily high velocities, they cannot transmit information as only causal processes can. Similarly, the actions of a cowboy on a cinema screen are pseudoprocesses. When, in excessive excitement, you shoot him, it has no lasting effect on the cowboy, but only on the screen. Thus, in Salmon's view, the capacity to transmit information (or rather, conserved energy) constitutes empirical proof that the relevant process is genuinely causal in nature rather than a mere pseudoprocess.

According to this realist view, therefore, causation is a robust physical ingredient within our world itself, entailing necessary and sufficient conditions (or causal laws, probabilistic or otherwise), rather than being entailed by these. Causation is essentially a "local" affair, depending on the intrinsic features of two causally related events. By contrast, causal laws and necessary and sufficient conditions are "global" features, depending on the world as a whole. Consequently, on this realist view, causality may be entirely compatible with indeterminism, while theories couched in terms of necessary and sufficient conditions run into grave difficulties when confronted with the pervasiveness of indeterminacy in the subatomic realm.

Yet Salmon's theory has not been without its detractors. Thus, having confronted the theory with ingenious counter examples, Kitcher has argued that Salmon's theory, just like the empiricist theories before him, ultimately comes to rest on the truth of empirically unverifiable counterfactuals. By contrast, Kitcher's own theory places causality squarely within a Kantian-Peircian perspective. Immanuel Kant (1724–1804), while conceding to Hume that causality may be unobservable in the physical world, contradicted Hume's conclusion that therefore causality is not a real feature of the world as we know it. Indeed, causality may not be a feature positively discoverable in what Kant called the noumenal world, that is, the world as it exists in itself, without regard to the structural limitations of human knowledge. But then again, nothing is so discoverable or attributable. And yet causality is a property objectively ascribable to the phenomenal world, that is, the world as structured by the conceptual and perceptual features inherent in human cognitive capacities. As a result of the necessarily synthetic activities of human reason, one cannot conceive of the empirical world except in terms of causes and effects. The causal relation is therefore as firmly and objectively established as are space and time, which constitute the a priori forms of perception of the empirical world. These are all verifiable attributes of the physical world, which is part of the phenomenal world, the only kind of world humans are capable of knowing in principle.

Thus, the fundamental notion of causation receives a distinctly epistemological underpinning in Kantian philosophy. This is what ties Kitcher's philosophy of causation in part to the Kantian tradition. Thus, Kitcher has stated that the because of causation derives from the because of explanation. Rather than being an independent metaphysical notion, what may and may not be recognized as truly causal relations depends in the final analysis on epistemological constraints. In Kitcher's view the ultimate aim of science is to generate theories of the universe as unified and simple (or all-encompassing) as possible. Which theories are finally recognized—in the ideal end of inquiry, to borrow the famous words of the pragmatist Charles Sanders Peirce (1839–1914)—as optimally unified and robust thus determine what causes are recognized as genuinely operative and effective in the only world humans can possibly come to understand. Thus, in Kitcher's view, the metaphysical significance of causation ultimately derives from its key role in the best possible theory of the universe we will be able to generate. In a sense, therefore, causation, rather than being a metaphysically realist notion, is better seen as an unadulterated epistemological notion, dependent not on what we stumble upon in observation of singular cases of causation, as realists like Salmon would have it, but rather on the excellency of the theories that best account for the physical features of the world as a whole.

See also Downward Causation; Upward Causation