Philosophy of Statistical Mechanics

Updated About content Print Article Share Article
views updated


Probabilistic modes of description and explanation first entered into physics in the theory of statistical mechanics. Some aspects of the theory that are of interest to the general philosopher of science are the nature of probability and probabilistic explanations within the theory, the kind of intertheoretical relation displayed between this theory and the nonprobabilistic theory it supplants, and the role to be played in scientific explanations by the invocation of cosmological special initial conditions. In addition, this theory provides the framework for attempts to account for the intuitive sense that time is asymmetric by reference to asymmetric physical processes in time.

History of the Theory

It was in the seventeenth century that thinkers first realized that many material systems were describable by a small number of physical quantities related to one another by simple lawsfor example, the ideal gas law, relating the volume, temperature, and pressure of a gas.

It was soon understood that a fundamental notion was that of equilibrium. Left alone, systems might spontaneously change the value of their parameters, as when a gas expands to fill a box. But they would soon reach an unchanging final state, that of equilibrium. And it was realized that this process was asymmetrical in time, in that systems went from earlier non-equilibrium states to later equilibrium states, but not from earlier equilibrium states to later states of non-equilibrium.

Studies of steam engines initiated by S. Carnot showed that stored heat could be converted to mechanical work, but only by a process that converted stored heat at a higher temperature to residual heat at a lower temperature. This result was made mathematically elegant by R. Clausius, who introduced the notion of entropy as a measure of heat's ability to be converted into external work into physics. That heat was a form of stored energy and that the total amount of energy in heat and work was conserved became a fundamental principle of physics, as did the idea that energy could spontaneously only go from a more ordered to a more a less orderly state. These results were formalized in the First and Second Laws of Thermodynamics. But why were these laws true?

The latter half of the nineteenth century and the beginning of the twentieth saw the development of an intensive debate about the place of thermodynamics within the more general sciences that dealt with dynamics and with the constitution of matter. P. Duhem, E. Mach, and others argued that the laws should be understood as autonomous principles. But others sought an account of heat as the hidden energy of motion of the microscopic constituents of matter. This was later understood for gases in terms of a simple model of molecules in free motion except for collisions among them. The early work on kinetic theory of W. Herepath and J. Waterston, followed by work of A. Kronig, made this a rich area for theoretical exploration. J. C. Maxwell and L. Boltzmann discovered laws governing the distribution of velocity of the molecules in the equilibrium state, and they developed a law governing how such distributions changes as a system in nonequilibrium approached equilibrium, at least for the simple system of a nondense gas.

The theory of approach to equilibrium soon met with profound objections. J. Loschmidt pointed out that the apparently demonstrated time-asymmetrical approach to equilibrium was hard to understand in light of the fact that the laws governing the underlying dynamics of the molecules allowed for the time reverse of each possible process to be possible as well. Later H. Poincaré showed that the kind of systems being dealt with would, except possibly for exceptional initial conditions in a class of probability zero, return over infinite time infinitely often to states arbitrarily close to their initial states. Once again this seemed incompatible with the monotonic increase of entropy described by thermodynamics and apparently deduced from the dynamics in kinetic theory.

Both Maxwell and Boltzmann introduced probabilistic elements into their theory. The equilibrium distribution might be thought of as the most probable distribution of the molecules in space and in velocity. Alternatively, in an approach later systematically developed by J. W. Gibbs, equilibrium values might be calculated by computing the average of macroscopic features over all possible distributions of the molecules. Both Maxwell and Boltzmann also argued that approach to equilibrium should also be thought of probabilistically. Maxwell discussed the possibility of a "demon" who could, by inspecting molecules one by one, change an equilibrium state of a system to a nonequilibrium state without doing external work on the system. Critics such as S. Burbury and E. Culverwell noted that the introduction of probabilistic notions was not sufficient by itself to overcome the puzzles of reversibility and recurrence.

In his last view of the theory, Boltzmann, following his assistant Dr. Scheutz, offered a time-symmetrical version of the theory. On this view, isolated systems spend most of their life near equilibrium over very long periods of time. There would be occasional fluctuations away from equilibrium. A system found in a nonequilibrium state would probably be closer to equilibrium both in the past and future. Our local region of the universe, a universe that as a whole was itself in equilibrium, was one such fluctuation. Scientists could only exist in such a nonequilibrium regions because only such a region could support sentient creatures. Why do we find our local world approaching equilibrium in the future and not in the past? Because the time direction of increase in entropy determined the future just as the local direction of gravitational force determined the down spatial direction.

In an important study of the foundations of the theory in 1910, P. and T. Ehrenfest (1959) surveyed the basis of the theory as understood in different ways by Maxwell, Boltzmann and Gibbs. They also offered an important interpretation of Boltzmann's equation describing approach to equilibrium in which the solution of the equation described not the inevitable or even probable behavior of an individual system but rather the sequence of states that would be found dominant at each time in a collection of systems all of whose members started in the same macroscopically nonequilibrium condition.

Probability and Statistical Explanation

Probability is characterized formally by simple mathematical postulates, the additivity of probabilities over disjoint sets of events being the most important of these. Philosophers have long debated the interpretation of probability. Some interpretations are subjectivist, taking probabilities to be measures of partial belief. Others are logical, holding probabilities to represent partial entailments. Other interpretations are objectivist. Some varieties of this last are frequency, limits of frequency, or dispositional interpretations.

At least one proposal (by E. Jaynes) has held that the probabilities in statistical mechanics are subjective, or rather of a kind of logical sort resting upon a principle of indifference. Most interpreters of statistical mechanics hold to objectivist interpretations of probability, but even among them there is much debate. Are the probabilities somehow dependent on the underlying dynamical laws, as ergodic approaches suggest? Or are they reflective of a deeper lawlike structure of tychistic chance, as Albert suggests, referring to Ghirardi-Rimini-Weber (GRW) stochastic theories introduced in the interpretation of quantum mechanics? Or is it the case, rather, that the probabilities have an autonomous place within the theories requiring their independent postulation?

Philosophers analyzing statistical explanations have usually focused on uses of probabilistic explanation in everyday circumstances or in the application of statistics to such fields as biology. Here some suggestions have been that high probability is explanatory, that increased probability is what matters, or that explanations are only genuinely probabilistic when pure tychistic chance is relevant.

In statistical mechanics explanation in the nonequilibrium theory has many aspects that fit familiar patterns of statistical explanation as analyzed by philosophers. Within the theory the main areas of controversy are over the nature and rationale for the particular kind of probabilistic explanation that does justice to the empirical facts. In the equilibrium theory a kind of transcendental use of probability in the statistical explanations offered by ergodic theory is quite unlike the usual kind of causal-probabilistic explanations familiar in other contexts.

The Theory of Equilibrium

Boltzmann and Maxwell developed a standard method for calculating the equilibrium values of the macroscopic parameters of a system. This became formalized by Gibbs as the method of the microcanonical ensemble. Here a probability distribution is placed over the microstates possible for the system, given its constraints. For each microstate the values of the macroscopic parameter are calculable. One takes as the observed equilibrium values the average value of these parameters calculated over all the possible microstates, using the stipulated probability distribution. But why does the method work? What rationalizes the choice of probability distribution and the identification of average values with equilibrium quantities?

Boltzmann argued that the method could be partly justified if one thought of equilibrium values as average values over an infinite time as the system changes its microstates under dynamic evolution. Another component of this way of thinking is a claim that, given the large numbers of molecules in a system, average values would coincide with overwhelmingly most probable values for a macroscopic parameter. Boltzmann and Maxwell argued that one could identify such time averages with so-called phase averages, calculated using the posited probability distribution over the microscopic conditions possible for the system, if one thought of any one system as going through all possible microstates compatible with the macroscopic constraints on the system as time went on. This became formalized by the Ehrenfests in the form of the Ergodic Hypothesis.

Early versions of the Ergodic Hypothesis were provably false. Weaker versions, such as the claim that the microstate of the system would come arbitrarily close to every possible microstate over infinite time, were impossible to demonstrate and could not support the equality of time and phase averages even if true.

These early ideas gave rise to the mathematical discipline of ergodic theory. The results of J. von Neumann, and, in stronger form, those of G. Birkhoff, showed that for certain idealized dynamical systems, except for a set of initial conditions of zero probability in the standard probability distribution, the time average of quantities calculated from the microstate of the system over infinite time would, indeed, equal the phase average of that quantity calculated using the standard-probability distribution over all possible microstates of the system.

But did any realistic models of a system meet the conditions needed for these theorems to hold? Many decades of work, culminating in that of Sinai, showed that a familiar model of a dilute gas, hard spheres in a box, was a model of an ergodic system. On the other hand, important work in theoretical dynamics showed that more realistic models of the gas would necessarily fail to be strictly ergodic (the KAM theorem). So any hope of applying ergodicity to rationalize the standard theory would require subtle reasoning involving the fact that the system was composed of vast numbers of molecules and might be, therefore, "ergodiclike."

From ergodicity many consequences follow. Except for a set of initial points of probability zero, infinite time averages of a phase quantity will equal the phase average of that quantity. For any measurable region of the phase space, the proportion of time spent by the system in that region over infinite time will equal the probabilistic size of that region. Most important is the following: Boltzmann realized that the standard probability distribution was invariant over time under the dynamics of the system. But could there be other such time invariant distributions? If the system is ergodic, one can show that the standard distribution is the unique time-invariant distribution, which assigns zero probability to regions assigned zero probability by the standard distribution.

These results provide us with a kind of transcendental rationale for the standard equilibrium theory. Equilibrium is an unchanging state. So if we are to identify macroscopic features of it with quantities calculated by using a probability distribution over the microstates of the system, this probability distribution should be unchanging under the dynamics of the system. Ergodicity shows us, with a qualification, that only one such probability distribution, the standard one, will do the trick.

But as a full rationale for the theory, ergodicity must be looked at cautiously. Real systems are not genuinely ergodic. We need to simply swallow the claim that we may ignore sets of conditions of probability zero in the standard measure. And the kind of rationale we get seems to ignore totally the place of equilibrium as the end point of a dynamic evolution from nonequilibrium conditions.

The Theory of Nonequilibrium

Maxwell and Boltzmann found equations describing the approach to equilibrium of a dilute gas. Later a number of other such kinetic equations were found, although attempts at generalizations to such situations as dense gases have proved intractable.

But how can such equations, whose solutions are time asymmetric, possibly be correct if the underlying dynamics of the molecules are symmetrical in time? Careful analysis showed that the Boltzmann equation depended upon a time-asymmetrical assumption, the Stosszahlansatz. This posited that molecules had their motions uncorrelated with one another before, but not after, collisions. Other forms of the kinetic equations made similar assumptions in their derivation. Two general approaches to deriving such equations are that of the master equation and the approach that works by imposing a coarse graining of cells over the phase space available to the system and postulating fixed transition probabilities from cell to cell. But the time-asymmetrical assumption must be imposed at all times and might even be inconsistent with the underlying deterministic dynamics of the molecules.

Many attempts have been made to understand the kinetic equations and to resolve the paradoxes. Some of these explore how an initial probability distribution over a collection of systems can, in a "coarse-grained" sense, distribute itself over the increased phase volume available to a system. This way of looking at things was first described by Gibbs. The coarse-grained spreading of the probability distribution is taken to represent the approach to equilibrium of the system. This interpretation fits with the understanding of the solution curve of the Boltzmann equation outlined by the Ehrenfests.

To show that such spreading of the initial probability distribution occurs, one relies upon the underlying dynamics and generalizations of the results of ergodic theory. Systems can be characterized as randomizing in a variety of senses of increasing strength such as being a mixing system, a K-system, or a Bernoulli system. Then one can rely upon the model of the systemhard spheres in a box, for exampleand the dynamics to show the system randomizing in the specified sense. This approach often relies upon many idealizations, such as calculating what happens in the infinite time limit. And the results often depend upon the use of unrealistic models of systems. For these reasons the applicability of the results to real systems and their real finite time behavior requires care.

Crucially these results, following as they do from the time-symmetrical dynamics, cannot by themselves introduce time asymmetry into the account. To do that one must make a time-asymmetrical assumption about how the initial probability distribution over the microstates of the system is constrained. This problem was studied by N. Krylov and others. Krylov's solution was a kind of nonquantum uncertainty principle applicable to the preparation of systems. Others look for the solution in cosmological facts, as we shall later note. Still others seek to modify the underlying dynamics by postulating some time-asymmetrical fundamental physical principle in play, such as the time-asymmetrical GRW stochastic field proposed in some interpretations of quantum mechanics.

There are ways of trying to understand an approach to equilibrium quite at odds with the mixing approach just described. O. Lanford, for example, has produced a "rigorous derivation of the Boltzmann equation." Going to an idealized limit, the Boltzmann-Grad limit, Lanford imposes an initial probability distribution, and then shows that with probability one systems will evolve for a short time as described by the Boltzmann equation. Because the results can be proved only for very short timesless than the mean free time to the first collisiontheir applicability to the real world is again in question. As usual, interesting issues about time asymmetry arise, here in the form of the choice of the initial probability distribution.


Why is it that, although the underlying dynamic principles are symmetrical in time, the thermodynamic laws describe a world asymmetrical in time, a world in which entropy spontaneously increases in one time direction but not the other? Merely introducing probabilities into the account by itself will not provide the grounds for understanding the physical origins of irreversibility.

Throughout the history of thermodynamics and statistical mechanics, the suggestion has been repeatedly made that the source of thermodynamic time asymmetry lies in the existence of some time-asymmetrical law governing the underlying dynamics. The recent invocation of time asymmetric GRW stochastic influences is the latest such proposal.

Sometimes it has been suggested that the entropic increase experienced by an "isolated" system is to be accounted for in terms of the fact that systems can never really be fully causally isolated from their external environment. Even the most carefully insulated system, for example, has its molecules' motion influenced by gravitational forces exerted by matter outside the system. Whether the fact that isolation is an idealization is really relevant to thermodynamic time asymmetry has been much debated. Of great importance to this debate is the existence of systems that seem to show the usual macroscopic entropic increase familiar from thermodynamics, but which are systems sufficiently isolated from their surrounding environments such that a simple external trigger can have their microstates follow a reverse course, with the system recurring to its original nonequilibrium statespin-echo experiments, for example. For these systems seem to show that a kind of entropic increase cannot be accounted for in terms of external interference with the system.

As noted above, it was Boltzmann's assistant, Dr. Scheutz, who first suggested a cosmological solution to the problem. Scheutz suggested that the universe as a whole is in a time-symmetrical equilibrium state, with our local portion of the cosmos in a rare fluctuation away from equilibrium. Such a region would be very likely, from a time symmetrical probabilistic perspective, to evince higher entropy in one time direction but lower entropy in the other direction of time, since it is unlikely to be at the turning point of maximal deviation from equilibrium. Boltzmann then supplemented this with his assertion that the very meaning of the future is that is the time direction in which entropy is increasing.

Current cosmological theories describe a very different sort of universe, one that, to the best of our knowledge, is in an overall nonequilibrium state and that has entropic increase in the same time direction in all its regions. In current Big Bang cosmology the universe is said to be spatially expanding from a singularity some tens of billions of years ago. Some theorists take the thermodynamic time asymmetry to have its roots in the cosmic expansion. The more general opinion is that this cannot be correct, since, according to the prevailing but not universal opinion, even if the universe began to contract, entropy would continue to increase.

In the dominant opinion, rather, the source of entropic increase is found in a special physical condition of the universe just after the Big Bang. In these accounts the matter of the universe is taken to be, at that early date, in thermal equilibrium. But matter is thought to be smoothly distributed in space. This is a very low entropy state because of the fact that gravity, unlike intermolecular forces in a gas, is a purely attractive force. The theory goes on to propose a clumping of matter into dense galactic clusters, galaxies, and stars, leaving most of space almost devoid of matter. This results in an enormous increase in spatial-gravitational entropy. Matter so clumped goes into a lower entropy state than its original equilibrium, since it now consists of hot stars in cold interstellar space. The general increase of entropy from the Big Bang onward is then accounted for by positing both the usual time-symmetrical probability assumptions and initial low entropy for the universe as a whole.

One question that then arises is why the initial state should be one of such low entropy. Here one is up against the usual perplexities that arise if we ask for an answer to a why question about "the initial state of everything." Why is such a low-probability state the one we find? Should one posit many universes, of which our low-probability case is a rare example? Here one is reminded of the speculation of Scheutz about our region of the universe just being an improbable sample from the whole. Can one explain why we find ourselves in such a universe by some version of the anthropic principle, first used by Boltzmann to explain why we find ourselves in a low-entropy region of his speculated high-entropy universe? Can one attribute probabilities to initial singular states or to universes at all? Here one thinks of the criticism offered by D. Hume of the teleological argument for the existence of God.

The second law of thermodynamics is not concerned, of course, with the entropy change of the entire cosmos, but rather with the parallel in time-entropic increases of small systems temporarily causally isolated from their external environments. The study of the connection between cosmic entropy increase and that of the "branch systems" was initiated by H. Reichenbach. Many of the arguments in the literature claiming to derive changes of entropy of branch systems that are parallel in time to the entropy increase of the cosmic whole are badly flawed, but a reasonable inference can likely be constructed using probabilistic posits that themselves do not smuggle time asymmetry into the derivation.

Thermodynamics and Statistical Mechanics

We often speak of an older theory being reduced to a newer theory, and it is often said that thermodynamics has been reduced to statistical mechanics. But, as we have learned in general, the relation of older theory to newer theory may be of some complexity and some subtlety.

Thermodynamics, traditionally, was not a theory framed in probabilistic terms. Its laws, especially the second law, could not be exactly true, as Maxwell noted, in the light of the new probabilistic account. Alternative ways of dealing with this problem are available. One way is to stick with traditional thermodynamics and offer an account of the relation between newer and older theory that is far from a simple derivation of the latter from the former. Another possibility is to use the new knowledge of the probabilistic aspects of thermal phenomena to construct a novel statistical thermodynamics that imports probabilistic elements directly into the older theory.

There must be a high degree of complexity in the relations between the concepts of the older theorysuch as volume, pressure, temperature and entropyand those of the newer theorysuch as concepts dealing with molecular constitution, the dynamics governing the molecules, and probabilistically framed concepts dealing either with the distribution of states of constituents of the individual system or with the distribution of microstates of systems in a collection of systems characterized by some macroscopic parameters.

Consider, for example, thermodynamic entropy. Associated with it are many distinct entropy concepts in statistical mechanics. Boltzmann entropy, for example, is defined as the fluctuating property of an individual system, defined in terms of the actual spatial and momentum distribution of the molecules of the system at a time. Gibbs's entropies, on the other hand, are defined in terms of some probability distribution imposed over some imagined ensemble of systems characterized by some specified constraints. To make matters even more complicated, there is Gibbs's fine-grained entropy, defined by the probability distribution alone and useful for describing the equilibrium states of systems, and Gibbs's coarse-grained entropy, whose definition requires a specification of some coarse-grained partition of the phase space as well as the probability distribution, and whose place is in characterizing the approach to equilibrium of nonequilibrium systems. Other notion of entropy, such as those defined in terms of topology rather than measure theory, exist as well.

None of this complexity shows that one is wrong in thinking that in some appropriate sense, statistical mechanics explains the success of thermodynamics or that it might be plausible to speak of a reduction of thermodynamics to statistical mechanics. The complexity and subtlety of the relations between the two theories informs the philosopher of science of just how varied and complicated such reductive relations might be.

Philosophers outside the field of philosophy of physics might take some interest in the relationship that thermodynamics bears to the underlying physical description of the systems to which thermodynamic concepts are applied. A material object composed of atoms or molecules, for example, can exist in equilibrium with a system of electromagnetic radiation, leading physicists to speak of both such systems as having a common temperature. What this shows is that concepts such as entropy and temperature have a kind of functional role, with their meanings fixed by the place they play in a theory that is applicable to physical systems of many different kinds. This bears some analogy with the claim, so familiar in the philosophy of mind, that mental terms are functional and that mental states are multiply realizible in physical systems of varied natures.

The Direction of Time

The claim that our very notion of the asymmetry of time is rooted in entropic asymmetries of physical systems in time was first made by Boltzmann, as we have noted. The claim has often been repeated but remains controversial. Much needs to be done to provide a completely convincing case that our deepest intuitions about the difference between past and future are somehow grounded in entropic asymmetries.

A first question relates to what an entropic theory of the direction of time is claiming. It certainly cannot be that we find out which direction of time is the future by somehow checking up directly on the entropic behavior of systems around us, for that claim has little plausibility. So what does the claim come down to?

What intuitively distinguishes future from past? We think we have a direct insight into which of a pair of events is later than the other. We take it that we have asymmetric epistemological access into past and future, there being memories and records of the past and not of the future. We usually take it that causation goes from an earlier event as cause to a later event as effect. We are anxious about future events but not about past events, although we may regret the latter. We often think of the past as being over and done with and hence not subject to change, whereas the future is open to many possibilities. Some philosophers have argued that past events have determinate reality, whereas there is no such thing as a determinate being to the future.

The most plausible version of the entropic theory of the direction of time is best understood by looking at the analogy introduced by Boltzmann. What lies behind our intuitions that space is distinguished by an asymmetry because one direction is down and its opposite up? Surely it is the existence of gravitational force that fully accounts for the down-up distinction. It is gravity that explains why rocks fall down and, in our atmosphere, flames and helium balloons go up. Even the fact that we can tell, directly and without using our sensory awareness of the external world, which direction is down is explained in terms of the local direction of gravitational force. For it is the behavior of fluids in our semicircular canals that tells us which way is up, and the behavior of that fluid is entirely explained in terms of its gravitationally induced weight. In regions of the universe with no gravitational field, there is no distinction between the up and the down direction to be drawn.

The entropic theorist of the direction of time argues that the situation is exactly analogous to the case of down directionality and gravity. The claim is that we can account for all the intuitive differences by which we distinguish past from future by a scientific account at whose core are entropic asymmetries in the behavior of systems in time. If there were regions of the cosmos in which entropic changes were antiparallel to one another in time, the entropic theorist claims, the inhabitants of such regions would take opposite directions of time to be the future direction of time. And in regions of the cosmos in equilibrium, there would be no past-future distinction, although, of course, there would still be opposite directions in time.

There have been numerous proposals, starting with the seminal work of H. Reichenbach, to try to justify the claim that is it is, indeed, entropic change that lies at the heart of any explanation of why we have memories and records of the past and not of the future, of why we think of causation as going from past to future, of why we have differential concerns about past and future, and of why we think of the past as determinate but think of the future as an open realm of mere possibilities. Despite much important work on this problem, however, the very possibility of constructing such entropic accounts remains controversial.

See also Causal Approaches to the Direction of Time; Counterfactuals; Physics and the Direction of Time.


Albert, D. Time and Chance. Cambridge MA: Harvard University Press, 2000.

Brush, S. The Kind of Motion That We Call Heat. Amsterdam: North-Holland, 1976.

Brush, S., ed. Kinetic Theory. Oxford: Pergamon Press, 1965.

Ehrenfest, P. and T. The Conceptual Foundations of the Statistical Approach in Mechanics. Ithaca, NY: Cornell University Press, 1959.

Guttman, Y. The Concept of Probability in Statistical Physics. Cambridge, U.K.: Cambridge University Press, 1999.

Price, H. Time's Arrow and the Archimedean Point. Oxford: Oxford University Press, 1996.

Reichenbach, H. The Direction of Time. Berkeley: University of California Press, 1956.

Sklar, L. Physics and Chance: Philosophical Issues in the Foundations of Statistical Mechanics. Cambridge, U.K.: Cambridge University Press, 1993.

Lawrence Sklar (2005)


Philosophy of Statistical Mechanics