The likelihood function is important in nearly every part of statistical inference, but concern here is with just the likelihood principle, a very general and problematic concept of statistical evidence. [For discussion of other roles of the likelihood function, see ESTIMATION; HYPOTHESIS TESTING; SUFFICIENCY.]
The likelihood function is defined in terms of the probability law (or density function) assumed to represent a sampling or experimental situation: When the observation variables are fixed at the values actually observed, the resulting function of the unknown parameter(s) is the likelihood function. (More precisely, two such functions identical except for a constant factor are considered equivalent representations of the same likelihood function.)
The likelihood principle may be stated in two parts: (1) the likelihood function, determined by the sample observed in any given case, represents fully the evidence about parameter values available in those observations (this is the likelihood axiom); and (2) the evidence supporting one parameter value (or point) as against another is given by relative values of the likelihood function (likelihood ratios).
For example, suppose that a random sample of ten patients suffering from migraine are treated by an experimental drug and that four of them report relief. The sampling is binomial, and the investi-gator is interested in the unknown proportion, p, in the population of potential patients, who would report relief. The likelihood function determined by the sample is a function of p,
whose graph is shown in Figure 1. This likelihood function has a maximum at p — A and becomes very small, approaching 0, as p approaches 0 or 1. Hence, according to the likelihood principle, values of p very near .4 are supported by the evidence in this sample, as against values of p very near 0 or 1, with very great strength, since the corresponding likelihood ratios (.4)4(.6)6/p4(l - P)6 are verY large.
Figure 7 - The likelifioocf function
A different rule for sampling patients would be to treat and observe them one at a time until just four had reported relief. A possible outcome would be that just ten would be observed, with six reporting no relief and of course four reporting relief. The probability of that observed outcome is
This function of p differs from the previous one by only a constant factor and hence is considered to be an alternative, equivalent representation of the same likelihood function. The likelihood principle asserts that therefore the evidence about p in the two cases is the same, notwithstanding other differences in the two probability laws, which appear for other possible samples.
Relation to other statistical theory . The likelihood principle is incompatible with the main body of modern statistical theory and practice, notably the Neyman-Pearson theory of hypothesis testing and of confidence intervals, and incompatible in general even with such well-known concepts as standard error of an estimate and significance level [seeEstimation; Hypothesis Testing].
To illustrate this incompatibility, observe that in the example two distinct sampling rules gave the same likelihood function, and hence the same evidence under the likelihood principle. On the other hand, different determinations of a lower 95 per cent confidence limit for p are required under the respective sampling rules, and the two confidence limits obtained are different. The likelihood principle, however, is given full formal justification and interpretation within Bayesian inference theories and much interest in the principle stems from recently renewed interest and developments in such theories[seeBayesian Inference].
Finally, on grounds independent of the crucial and controversial Bayesian concepts of prior or personal probability, interest and support for the likelihood principle arises because most standard statistical theory fails to include (and clearly implicitly excludes) any precise general concept of evidence in an observed sample, while several concepts of evidence that many statisticians consider appropriate have been found on analysis to entail the likelihood axiom. Some of these concepts have become part of a more or less coherent widespread body of theory and practice in which the Neyman-Pearson approach is complemented by concepts of evidence often left implicit. Such concepts also appear as basic in some of Fisher’s theories. When formulated as axioms and analyzed, these concepts have been discovered to be equivalent to the likelihood axiom and hence basically incompatible with, rather than possible complements to, the Neyman-Pearson theory.
General concepts of statistical evidence . The central one of these concepts is that of conditionality (or ancillarity), a concept that appeared first in rather special technical contexts. Another some-what similar concept of evidence, which can be illustrated more simply here and which also entails the major part of the likelihood axiom, is the censoring axiom: Suppose that after interpretation of the outcome described with the second sampling rule of the example, it is discovered that the reserve supply of the experimental drug had been accidentally destroyed and is irreplaceable and that no more than ten patients could have been treated with the supply on hand for the experiment. Is the interpretation of the outcome to be changed? In the hypothetical possible case that seven or more patients reported no relief before a fourth reported relief, the sampling plan could not have been carried through even to the necessary eleventh patient: The change of conditions makes unavailable (‘censored’) the information whether if an outcome were to include more than six patients reporting no relief, that number would be seven, or eight, or any specific larger number. But in fact the outcome actually observed was a physical event unaffected, except in a hypothetical sense, by the differences between intended and realizable sampling plans. Many statisticians consider such a hypothetical distinction irrelevant to the evidence in an outcome. It follows readily from the general formulation of such a concept that the evidence in the observed outcome is characterized by just the function More generally this censoring concept is seen to be the likelihood axiom, slightly weakened by disallowance of an arbitrary constant factor; the qualification is removable with adoption of another very weak ‘sufficiency’ concept concerning evidence (see Birnbaum 1961; 1962; Pratt 1961).
Interpreting evidence . The only method proposed for interpreting evidence just through likelihood functions, apart from Bayesian methods, is that stated as part (2) of the likelihood principle above. The briefness and informality of these statements and interpretation are typical of those given by their originator, R. A. Fisher (1925; 1956), and their leading proponent, G. A. Barnard (1947; 1949; 1962). ‘Likelihood ratio’ appears in such interpretations as a primitive term concerning statistical evidence, associated in each case with a nonnegative numerical value, with larger values representing qualitatively greater support for one parameter point against the other, and with unity (representing ‘no evidence’) the only distinguished point on the scale. But likelihood ratio here is not subject to definition or interpretation in terms of other independently defined extramathematical concepts.
Only in the simplest case, where the parameter space has but two points (a case rare in practice but of real theoretical interest), are such interpretations of likelihood functions clearly plausible; and in this case they appear to many to be far superior to more standard methods, for example, significance tests. In such cases the likelihood function is represented by a single likelihood ratio.
In the principal case of larger parameter spaces, such interpretations can be seriously misleading with high probability and are considered unacceptable by most statisticians (see Stein 1962; Armitage 1963). Thus progress in clarifying the important problem of an adequate non-Bayesian concept of statistical evidence leaves the problem not only unresolved but in a anomalous state.
Another type of argument supporting the likelihood principle on non-Bayesian grounds is based upon axioms characterizing rational decision makin situations of uncertainty, rather than concepts of statistical evidence (see, for example, Cornfield 1966; Luce Raiffa 1957).
Likelihoods in form of normal densities . Attention is sometimes focused on cases where likelihood functions have the form of normal density functions. This form occurs in the very simple and familiar problem of inferences about the mean of a normal distribution with known variance (with ordinary sampling). Hence adoption of the likelihood axiom warrants and invites identification with this familiar problem of all other cases where such likelihood functions occur. In particular, the maximum likelihood estimator in any such case is thus related to the classical estimator of the normal mean (the sample mean), and the curvature of the likelihood function (or of its logarithm) at its maximum is thus related to the variance of that classical estimator. In similar vein, transformations of parameters have been considered that tend to give likelihood functions a normal density shape (in problems with several parameters as well as those with only one). (See, for example, Anscombe 1964.)
Likelihoods in nonparametric problems . In nonparametric problems, there is no finite set of parameters that can be taken as the arguments of a likelihood function, and it may not be obvious that the likelihood axiom has meaning. However, “nonparametric” is a sometimes misleading name for a very broad mathematical model that includes all specific parametric families of laws among those allowed; hence in principle it is simple to imagine (although in practice formidably awkward to represent) the extremely inclusive parameter space containing a point representing each (absolutely) continuous law, and for each pair of such points a likelihood ratio (determined as usual from the observed sample).
[Other relevant material may be found in Distribu-Tions, Statistical,articles on Special Continuous Distributions and Special Discrete Distributions.]
Anscombe, F. J. 1964 Normal Likelihood Function. Institute of Statistical Mathematics,Annals 26:1-19.
Armitage, Peter 1963 Sequential Medical Trials: Some Comments on F. J. Anscombe’s Paper. Journal of the American Statistical Association 58:384–387.
Barnard, G. A. 1947 [Review of] Sequential Analysis by Abraham Wald. Journal of the American Statistical Association 42:658–664.
Barnard, G. A. 1949 Statistical Inference. Journal of the Royal Statistical SocietySeries B 11:116–149.
Barnard, G. A.; Jenkins, G. M.; and Winsten, C. B. 1962 Likelihood Inference and Time Series. Journal of the Royal Statistical Society Series A 125:321–375. > Includes 20 pages of discussion.
Birnbaum, Allan 1961 On the Foundations of Statistical Inference: I. Binary Experiments. Annals of Mathematical Statistics 32:414–435.
Birnbaum, Allan 1962 On the Foundations of Statistical Inference. Journal of the American Statistical Association 57:269–326. > Includes 20 pages of discussion. See especially John W. Pratt’s comments on pages 314-315.
Cornfield, Jerome 1966 Sequential Trials, Sequential Analysis and the Likelihood Principle. American Statistician 20:18-23.
Cox, D. R. 1958 Some Problems Connected With Statistical Inference. Annals of Mathematical Statistics 29:357–372.
Fisher, R. A. (1925) 1950 Theory of Statistical Estimation. Pages 11.699a-l 1.725 in R. A. Fisher, Contributions to Mathematical Statistics. New York: Wiley. > First published in Volume 22, Part 5 of the Cambridge Philosophical Society, Proceedings.
Fisher, R. A. (1956) 1959 Statistical Methods and Scientific Inference. 2d ed., rev. New York: Hafner; London: Oliver Boyd.
Luce, R. Duncan; and Raiffa, Howard 1957 Games and Decisions: Introduction and Critical Survey. A Study of the Behavioral Models Project, Bureau of Applied Social Research, Columbia University. New York: Wiley.
Pratt, John W. 1961 [Review of] Testing Statistical Hypotheses by E. L. Lehmann. Journal of the American Statistical Association 56:163–167.
Stein, Charles M. 1962 A Remark on the Likelihood Principle. Journal of the Royal Statistical Society Series A 125:565–573. > Includes five pages of comments by G. A. Barnard.
The method of maximum likelihood, originated by R. A. Fisher, estimates parameters in statistical models by maximizing the likelihood of observing the data with respect to the parameters of the model. The values taken by the parameters at the maximum are known as maximum likelihood estimates. This method is computationally equivalent to the method of least squares when the distribution of the observations about their theoretical means is the normal distribution.