views updated

# Modern Probability As Part of Mathematics

## Overview

Probability theory developed into a branch of abstract mathematics during the first 30 years of the twentieth century. Until the late nineteenth century probabilities were treated mostly in context, be it as the probability of testimony or arguments, of survival or death, of making errors in measurement, or in statistical mechanics. This is the era of classical probability. It was rife with paradoxes and had a low mathematical status. In the early twentieth century various efforts were made to develop a probability theory that was independent of applications and possessed a provably consistent structure. The theory that found near universal acceptance tied probability theory to measure theory. The Russian mathematician Andrei Kolmogorov (1903-1987) gave in 1933 a definitive axiomatic formulation of measure theoretic probability. Probability is now defined as a measure over an algebra of subsets of an abstract space, the space of elementary events.

## Background

Probability theory as mathematics goes back to the 1650s when Blaise Pascal (1623-1662) solved a problem about the fair division of an interrupted game of chance. At that time the expression "calculus of chances" was used. Soon afterwards a connection was made with the art of conjecture, an analysis of "probable arguments," in which the conclusion is not conclusively, but only partially, established by the premises. Towards the end of the eighteenth century the term "theory" was sometimes used, but "calculus of chances" or "calculus of probabilities" remained the dominant expression until the end of the nineteenth century.

When much of mathematics was made more abstract and rigorous in the course of the nineteenth century, the calculus of probabilities was left behind as a part of so-called mixed mathematics. Besides games of chance, various other applications were tried, like trustworthiness of witnesses, different forms of insurance, and the statistics of suicides, but the most prominent applications were in the theory of making errors in measurement and in statistical mechanics.

In the theory of errors probabilistic assumptions were made about the frequency distribution of errors to justify using, in the most simple case, the arithmetical mean as the best estimate. Probability also entered in the calculation of the probable error of an estimate.

Later in the nineteenth century probabilities were used in statistical mechanics. In this theory the temperature of a gas is identified with the average velocity of the gas molecules. This allows the application of classical mechanics. However, the laws of mechanics are temporally symmetric: for every mechanical process the reversed process is equally possible. But observation tells that two gases easily mix but do not spontaneously separate. Probabilities were introduced to bridge the temporally symmetric treatment of gas phenomena within classical mechanics with the obvious time-directedness of mixing. Unmixing is highly improbable but not impossible. That is why it is never observed.

None of these applications implied that chance was anything real or that natural processes were indeterminate. In fact, due to the successes of science, determinism became more and more ingrained in the nineteenth century. Probabilities were only used because of ignorance of the true causes.

Ignorance was the hallmarks of the early or classical probability theory. With it came the principle of indifference that says that two possibilities are equally probable if one is equally ignorant about them. Many inconsistencies occurred in classical probability theory because of its connection with the principle of indifference. Different formulations of the same problem led to different pairs of possibilities about which one would be equally ignorant, and thus the principle of indifference gave different results dependent on how one had formulated the problem.

Another hallmark of classical probability theory was its concern with a finite number of alternative possibilities. Even the classical limit theorems—like Bernoulli's theorem, which says that, in coin tossing, for instance, the relative frequency of heads approaches, with probability going to 1, the probability of heads on a single toss when the number of tosses grows indefinitely—preserves the finite nature of classical probability, since it is formulated in terms of a limit of finitary probabilities rather than the probability of a limit. Later, in the twentieth century, after mathematicians had learned how to treat the probabilities of limits, results of this type concerning limits of finitary probabilities were called weak laws of large numbers.

It still comes as a surprise to learn that David Hilbert (1862-1943), in his call for an axiomatic treatment of probability theory during his famous lecture of 1900, listing important open problems in mathematics, still discussed probability theory as an applied field under the heading "Mathematical treatment of problems in physics." Hilbert mentioned in particular probability in the context of averages in statistical mechanics.

For Hilbert, a mathematical or physical theory constitutes a system of ideas possessing a certain structure. As the theory matures, certain key ideas emerge, serving as foundational principles from which the remaining results can be derived. But these are provisional. As the theory develops further results will require the reformulation of the axiomatic foundations. In a logical axiomatic treatment of any part of mathematics, the three prime considerations are: internal consistency, mutual independence, and completeness. The study of these properties for a mathematical theory is called metamathematics.

Internal consistency means that the various axioms do not contradict each other, and forms, for Hilbert, a proof of the existence of the mathematical concepts that are said to be implicitly defined by the axioms. Independence of axioms shows that none of the axioms are redundant or could have been derived as theorems. Completeness of a system of axioms means that the system is sufficiently strong to derive all results of the field as theorems.

At the heart of Hilbert's philosophical outlook stood his belief in the fundamental unity and harmony of mathematical ideas. One purpose of axiomatics is to show how the particular field is part of the whole of mathematics. The axiomatization of probability that was accepted 30 years after Hilbert's lecture does exactly that.

After Hilbert's call for a rigorous axiomatic treatment of probability theory, a number of his students worked on this problem. But the approach to the foundations of probability theory that attracted the most attention in the early twentieth century came from the German applied mathematician Richard von Mises (1883-1953). He developed an empirical frequency theory of probability, taking up earlier ideas of Wilhelm Lexis (1837-1914) and Heinrich Bruns (1848-1919). For von Mises, "the theory of probability is a science of the same order as geometry or theoretical mechanics. (...) just as the subject matter of geometry is the study of space phenomena, so probability theory deals with mass phenomena and repetitive events." To give a mathematical treatment of probability, von Mises considered an idealized situation, an infinite sequence of trials. Probability, according to his frequency theory, applies to the outcomes of an infinite sequence of trials if, first, the ratio of successes/trials has a limit, and, second, this limit is the same for all blindly chosen infinite subsequences. Von Mises calls such a sequence a collective.

The first condition corresponds to the idea that a probability is a stable frequency, although the stability may express itself only in the limit. The second condition is a randomness condition. A probability sequence should be highly irregular—no gambler who follows a gambling strategy, like betting heads every fifth time, or every time five tails have appeared, should be able to increase his odds of winning.

Various objections were raised against von Mises's frequency theory. Only later was it shown that the concept of blindly chosen subsequence can be defined in a consistent and satisfactory manner. The objection that a sequence which approaches its limit from above can be random was harder to answer. Such a sequence is not a typical probability sequence, since one expects a running average to fluctuate around the limit, not to hover constantly above it.

A more consequential problem with von Mises's approach may have been that it was very tedious to develop the known mathematics of probability theory within his framework. A probabilistic process in which the probability of an outcome is dependent on an earlier outcome has to be modeled, within von Mises's frequency theory, as a combination of dependent collectives. The measure theoretic approach of Andrei Kolmogorov would give a more elegant treatment of such dependencies.

The measure theoretic approach to probability derives from the measure theoretic study of asymptotic properties of sequences of natural numbers. Originally astronomers had studied these sequences in their efforts to prove that our solar system is a stable system in which planets could not suddenly run off into the depth of space. Around 1900 mathematicians started asking such questions as: How many rational numbers are there relative to all the real numbers? Or, put probabilistically, if one picks a real number at random what is the probability that it is rational? Or, formulated in measure theoretic terms, what is the measure of the set of rational numbers between 0 and 1, if the set of real numbers between 0 and 1 has measure 1?

In 1933 the Russian mathematician Andrei Kolmogorov published a book in German titled "Basic Concepts of the Calculus of Probability." This influential monograph transformed the character of the calculus of probabilities, moving it into mathematics from its previous state as a collections of calculations inspired by practical problems. Whereas von Mises's frequency theory had focused on the properties of a typical sequence obtained in sampling a sequence of independent trials, Kolmogorov axiomatized the structure of the underlying probabilistic process itself, and independence of successive trials is only a special condition on the probabilistic structure.

Kolmogorov axiomatics starts with a basic set, the event space of elementary events, Ω. Events are identified as subsets of the elementary event space. In tossing a dice the event space consists of six elements {1, 2, 3, 4, 5, 6}, corresponding to the various numbers of eyes one can obtain. Getting an even number of eyes is then the subset {2, 4, 6}. But typically the space of elementary events will be a product space, corresponding to various combinations of outcomes, as when a diced is tossed repeatedly. Kolmogorov requires that the space of all events, F, is an algebra. This means that if some subsets are events their union is also an event and the complement of an event is also an event. Moreover, the set of all elementary events is an event. A probability is defined as a measure function on the event space: each event should have a measure or probability, a number between 0 and 1. The largest event, the set of all elementary events, should have probability 1. The probability of the union of a (countable) number of disjunct events should be the sum of the probabilities of these disjunct events.

A consequence of this approach is that an event can have probability 0 without being impossible. In the measure theoretic approach the link between probability 0 and impossibility is broken, just as the set of rational numbers within the real interval [0, 1] has measure zero, but obviously is not empty. This led to a problem for Kolmogorov: how to define conditional probability for those cases where the conditioning event has probability 0. In classical probability theory conditional probability is defined as a ratio: P(A given B) = P(A and B)/P(B). But if the probability of the event B is 0 the definition is ill defined. In a startling innovation, Kolmogorov was able to define conditional probabilities as random variable, and prove that all the defining characteristics of probability could be satisfied.

## Impact

Kolmogorov's measure theoretic axiomatization of probability theory opened up many new avenues of research, but also earlier work was expressed in it more precisely. Andrei Markov (1856-1922) had introduced in 1906 what are now called Markov chains with discrete time: sequences of trials on generally the same event space in which the probability of outcomes depends solely on the outcome of the previous trial. Kolmogorov's work made it possible to define the Markov property precisely. Problems from physics motivated the generalization to stochastic processes with continuous time. The general theory of stochastic processes became the central object of study of modern probability theory.

ZENO G. SWIJTINK