## Bayes theorem

**-**

## Bayes’ Theorem

# Bayes’ Theorem

With one posthumous publication on probability, Reverend Thomas Bayes inspired the development of a new approach to statistical inference known as Bayesian Inference. “An Essay Toward Solving a Problem in the Doctrine of Chances” was published in 1764, but its impact was not felt until nearly two hundred years after his death, when in the 1950s Bayesian statistics began to flourish. His work remains at the center of one of the main intellectual controversies of our time.

Bayes was the first to solve the problem of inverse probability. In its simplest form, given two events *A* and *B* with nonzero probability, the probability of *A* and *B* can be written as:

(1) Pr(*A* and *B* ) = Pr(*A* ǀ*B* )*Pr(*B* ) or

(2) Pr(*A* and *B* ) = Pr(*B* ǀ*A* )*Pr(*A* )

Equating both right hand sides of (1) and (2) yields:

(3) Pr(*A* ǀ*B* ) = Pr(*B* ǀ*A* )*Pr(*A* )/Pr(*B* )

In words, given the conditional probability of *B* given *A*, Pr(*B* ǀ*A* ), one can obtain the reverse conditional probability of *A* given *B*, Pr(*A* ǀ*B* ). For example, given that *r* heads out of *n* coin flips are observed, what is the probability of a head in a single coin flip? This allows one to work backwards, given the outcome or effect, to discover what is the probability of the cause. Viewed in this manner there is no controversy concerning Bayes’ theorem. It is a direct consequence of the laws of probability. However, viewing *A* as the parameters *θ* and *B* as the sample *D* one obtains the following result from Bayes’ theorem:

(4) P(*θ* ǀ*D* ) = P(*D* ǀ*θ* )*P(*θ* )/(*D* )

where P(*θ* ǀ*D* ) = posterior distribution of the parameters given the information in the sample

P(*D* ǀ*θ* ) = likelihood function summarizing the information in the sample

P(*θ* ) = prior distribution of the parameters before the data is observed

and P(*D* ) = normalizing constant so that one obtains a proper posterior distribution.

In words, (4) states that:

(5) posterior distribution *α* likelihood *x* prior distribution,

where ∞ represents the relation “is proportional to.”

This relation is the foundation of Bayesian statistical inference, with the posterior distribution being the main component of statistical analysis. This provides a formal process of subjective learning from experience by showing how one can revise or update prior beliefs about parameters in the light of relevant sample evidence. The role of judgment or outside information in statistical modeling is made explicit in the Bayesian approach. The Bayesian approach views the parameters of the model as being random, and thus one can make meaningful probability statements about the parameters.

There are two major issues of contention in the Bayesian approach. The first is the subjective, or reasonable degree of belief, view of probability, which differs from the classical view of probability as the limit of the relative frequency of an event occurring in infinite trials. The second issue is the necessity and choice of an accurate prior distribution incorporating known information. The controversy over views of probability is a philosophical one that has yet to be resolved. Bayesians have suggested a wide variety of possible approaches for obtaining the prior distribution. The Bayesian approach requires more thought and effort, and thus the classical approach has a significant advantage in that it is much easier to apply in practice. The debate and interaction between these two contrasting approaches to statistical inference promises to lead to fruitful developments in statistical inference. Donald Gillies asks the interesting question, “Was Bayes a Bayesian?” and concludes, “Yes, he was a Bayesian, but a cautious and doubtful Bayesian” (1987, p. 328).

**SEE ALSO** *Bayesian Econometrics; Bayesian Statistics; Classical Statistical Analysis; Probability Theory; Statistics*

## BIBLIOGRAPHY

Bayes, Thomas. 1764. An Essay Towards Solving a Problem in the Doctrine of Chances. By the late Rev. Mr. Bayes, communicated by Mr. Price, in a letter to John Canton, M.A. and F.R.S. *Philosophical Transactions of the Royal Society of London* 53: 370–418. Reprinted in *Biometrika* 45 (1958): 293–315, with a biographical note by G. A. Barnard.

Gillies, Donald A. 1987. Was Bayes a Bayesian? *Historia Mathematica* 14: 325–346.

Kennedy, Peter 2003. The Bayesian Approach. In *A Guide to Econometrics,* 230–247. Cambridge, MA: MIT Press.

*William Veloce*

## Bayes' Theorem

# BAYES' THEOREM

Bayes' theorem deals with the role of new information in revising probability estimates. The theorem assumes that the probability of a hypothesis (the posterior probability) is a function of new evidence (the likelihood) and previous knowledge (prior probability). The theorem is named after Thomas Bayes (1702–1761), a nonconformist minister who had an interest in mathematics. The basis of the theorem is contained in as essay published in the *Philosophical Transactions* of the Royal Society of London in 1763.

Bayes' theorem is a logical consequence of the product rule of probability, which is the probability (P) of two events (A and B) happening— P(A,B)—is equal to the conditional probability of one event occurring given that the other has already occurred—P(A|B)—multiplied by the probability of the other event happening—P(B). The derivation of the theorem is as follows: P(A,B) = P(A|B)×P(B) = P(B|A)×P(A)

Thus: P(A|B) = P(B|A)×P(A)/P(B).

Bayes' theorem has been frequently used in the areas of diagnostic testing and in the determination of genetic predisposition. For example, if one wants to know the probability that a person with a particular genetic profile (B) will develop a particular tumour type (A)—that is, P(A|B). Previous knowledge leads to the assumption that the probability that any individual will develop the specific tumour (P(A)) is 0.1 and the probability that an individual has the particular genetic profile (P(B)) is 0.2. New evidence establishes that the probability that an individual with the tumor—P(B|A)—has the genetic profile of interest is 0.5.

Thus: P(A|B) = 0.1×0.5/0.2 = 0.25

The adoption of Bayes' theorem has led to the development of Bayesian methods for data analysis. Bayesian methods have been defined as "the explicit use of external evidence in the design, monitoring, analysis, interpretation and reporting" of studies (Spiegelhalter, 1999). The Bayesian approach to data analysis allows consideration of all possible sources of evidence in the determination of the posterior probability of an event. It is argued that this approach has more relevance to decision making than classical statistical inference, as it focuses on the transformation from initial knowledge to final opinion rather than on providing the "correct" inference.

In addition to its practical use in probability analysis, Bayes' theorem can be used as a normative model to assess how well people use empirical information to update the probability that a hypothesis is true.

George Wells

(see also: *Bayes, Thomas; Probability Model; Statistics for Public Health* )

## Bibliography

Spiegelhalter, D.; Myles, J.; Jones, D.; and Abrams, K. (1999). "An Introduction to Bayesian Methods in Health Technology Assessment." *British Medical Journal* 319:508–512.

—— (2000). "Bayesian Methods in Health Technology Assessment: A Review." *Health Technology Assessment* 4(38):1–130.

## Bayess theorem

**Bayes's theorem** A theorem used for calculating the conditional probability of an event, where conditional probability, Prob(*x*|*y*), is the probability of *x* while *y* holds.

This is a method in probabilistic reasoning where Prob(causes|symptoms) can be computed from knowledge of Prob(symptoms|causes), i.e. if we know statistical data on the occurrence of symptoms associated with a disease we can find the probability of those symptoms correctly indicating the disease. A classic application of Bayes's theorem is found in the Prospector expert system, which successfully predicted the location of valuable mineral deposits.

The combinatorial number of conditional probabilities that have to be computed by the method can be significantly reduced by using *Bayesian networks*, where arcs between propositions define causal influences and the independence of relations.