Fisher, R. A. (1890–1962)

views updated

FISHER, R. A.
(1890–1962)

Ronald Aylmer Fisher was a titan who bestrode two signature disciplines of twentieth-century science: population genetics (or the mathematical theory of evolution), of which he was a cofounder and principal architect, and mathematical statistics, in which he played a pivotal role. On the one hand, he led a revolution that replaced the Bayesian approach of inverse probability with one based solely on direct probabilities (i.e., probabilities of outcomes conditional on hypotheses). On the other hand, he unequivocally rejected the conception of statistics as decision making under uncertainty that his own work inspired. This rift in the new statistical orthodoxy has never healed. Thus, Fisher's conception of probability was at once frequentist and epistemic, his approach to statistics at once inferential and non-Bayesian, and the chief question his life's work poses is whether a consistent theory can be built along these lines.

After excelling in mathematics at the secondary level, Fisher won a scholarship to Cambridge University in 1909 and graduated in 1912 as a Wrangler (i.e., with honors) in the Mathematical Tripos, then spent another year at Cambridge studying statistical mechanics and quantum theory under the astronomer James Jeans. In the 1911 paper (unpublished at the time) "Mendelism and Biometry," he pointed the way to a synthesis of Mendelian genetics and Darwinian evolution.

Fisher received two important job offers in 1919: one as chief statistician under Karl Pearson at the Galton Laboratory of University College, London, and the other a temporary position at the Rothamsted (Agricultural) Experimental Station. Fisher was already on famously bad terms with Pearson, so he accepted the Rothamsted offer, leaving him free to develop his own non-Bayesian approach to statistics free of Pearson's supervision. Over the next fifteen years Fisher developed a world renowned Department of Statistics at Rothamsted that became a training ground for many statisticians who disseminated his new methods far and wide. Fisher's "golden age of invention" at Rothamsted ended in 1933 when Karl Pearson retired and his department was split into a Department of Statistics, with Egon S. Pearson (Karl's son) as head, and a Department of Eugenics, with Fisher as head. Relations between the two departments were never cordial. Further details may be found in the biography by Fisher's daughter, Joan Fisher Box (1978), who gives excellent sketches of his many and varied contributions as well as his side of the many protracted debates in which he engaged.

Fisher and the Bayesians

Although weaned on inverse probability at school (Fisher 1950, 27.248), Fisher came to regard Bayesian solutions as vitiated by the arbitrary and subjective character of prior distributions not squarely based on frequency data. Replying to criticism by Karl Pearson of a Bayesian solution he had proposed in his earliest published paper, he noted that the solution favored by Pearson "depends almost wholly upon the preconceived opinions of the computer and scarcely at all upon the actual data" (Fisher 1971–1974, 14.17). This led him to emphasize the need to "allow the data to speak for themselves," an injunction some of his followers carry to the extreme of deliberately ignoring, for example, all prior information bearing on the efficacy of a new medical treatment. To the Bayesians' palliative that whatever errors of estimation arise from use of an inappropriate prior will become negligible with accumulating data, he retorts that "it appears more natural to infer that it should be possible to draw valid conclusions from the data alone and without a priori assumptions." Then he adds, "we may question whether the whole difficulty has not arisen in an attempt to express in terms of the single concept of mathematical probability, a form of reasoning which requires for its exact statement different though equally well-defined concepts" (Fisher 1950, 24.287).

Of the alternative measures suitable for "supplying a natural order of preference" among competing estimates or hypotheses, Fisher recommended the likelihood function (LF) or the data distribution qua function of the unknown parameter(s) of one's model. Or, when the LF is undefined (i.e., when the probability of the observed outcome conditional on the alternative hypotheses cannot be computed from the model), significance tests are in order. Now the LF provides only relative probabilities and is nonadditive, but the logarithm of the LF is additive and this allows one to combine evidence from different (independent) sources. The value of the unknown parameter that maximizes the LF—the so-called maximum likelihood estimate (MLE)—when it exists and is unique, must then be the best supported value. Fisher's first task was to provide a rationale for this evidential use of the LF, which Pierre Simon de Laplace and Carl Gauss had drawn as a corollary of Bayesian conditioning, but that, from Fisher's perspective, "has no real connection with inverse probability" (Statistical Methods for Research Workers in Fisher 2003, p. 22).

Theory of Estimation

The first thing that struck him is that, unlike the uniform prior Thomas Bayes and Laplace seemed to conger out of ignorance, the MLE is invariant. That is, if a problem is reparametrized as ζ = g (θ ), then the MLE of the new parameter, ζ, is g (θ), writing θ (throughout) for the MLE of θ (De Groot 1986, p. 348). At the same time, he noted, unbiased estimators—those whose mean is equal to θ —are noninvariant, an unbiased estimator of θ being a biased estimator of θ ² or θ ⁻¹. His requirement of invariance is, in reality, a requirement of consistency, namely, that one's estimates and inferences do not depend on which of several equivalent forms of a problem one adopts. This already brings Fisher closer to the position of his protagonist, Harold Jeffreys, or that of Jeffreys's worthy successor, Edwin T. Jaynes. Nor did it ever occur to Fisher, as it did to Jeffreys and Jaynes, to use an invariant prior to represent, not pure ignorance, but a state of knowledge that is unaltered by a specifiable group of transformations. Knowing, for example, no more than that θ is a scale parameter, a suitable prior—the Jeffreys prior—would be one invariant under changes of scale.

However, Fisher was not satisfied with this justification of MLEs, but insisted that "the reliance to be placed" on one "must depend on its frequency distribution." (Fisher 1950, 10.327) Thus, Gauss had shown that the arithmetic average (or sample mean) of a set of normally distributed errors of known variance, σ ²/n, is itself normally distributed about the population mean, μ, with variance μ ²/n. Since a normal distribution is determined by its location parameter, μ, which locates the bell-shaped density curve along the x -axis, and its scale parameter, σ ², which measures the spread, the variance presents itself as the uniquely suitable measure of the concentration of any estimator whose distribution is normal or asymptotically normal about the estimated parameter. What Fisher claimed to show in his seminal 1922 paper, "On the Mathematical Foundations of Theoretical Statistics" (Fisher 1971–1974: paper 18; Fisher 1950, paper 10), is that MLEs are the most concentrated. He dubbed such estimators of (asymptotically) smallest variance efficient.

One source of tension in Fisher is that his use of likelihood implies the irrelevance of outcomes that might have been but were not observed, and, at various places, he explicitly endorses this implication (Statistical Methods and Scientific Inference in Fisher 2003, pp. 71, 91; hereafter SMSI ). For if as he says "the whole of the information supplied by a sample … is comprised in the likelihood" (p. 73), the LF of the outcome actually observed, all other points of the sample space must be irrelevant. However, the sampling (or frequency) distribution of an estimator, T, depends on the whole sample space, and its use to compare estimators therefore violates this likelihood principle.

In the course of his investigation of the large-sample properties of MLEs, Fisher uncovered a class of statistics a knowledge of which renders all other statistics irrelevant for inferences about θ, and so he termed them sufficient for θ. In the classic 1922 paper, he showed that sufficient estimators are asymptotically efficient, thus linking a purely logico-informational requirement—that of utilizing all the information supplied by the data—with a performance characteristic—that of having maximal precision. In fact, he virtually equated the property of not wasting information with efficiency. Then he could describe the statistician's job succinctly in purely cognitive terms as that of effecting the maximum information-preserving reduction of the data (Fisher 1950, 26.366). Such a maximal reduction is called a minimal sufficient statistic and is mathematically a function of every other sufficient statistic. Philosophers will recognize sufficiency as a close relative of Rudolf Carnap's requirement of total evidence, and Fisher remarks that "our conclusions must be warranted by the whole of the data, since less than the whole may be to any degree misleading" (Fisher 1950, 26.54).

Fisher's claim that maximal likelihood estimation is "unequivocally superior" to all other methods (Fisher 1950, 24.287) would then be vindicated, at least for large samples, by showing that MLEs are sufficient (hence, asymptotically efficient). His proof of this in the 1922 paper was less than rigorous, as he candidly admitted (Fisher 1950, 10.323), and he offered improved versions in sequels to that paper. In the 1934 paper "On Two New Properties of Mathematical Likelihood," CMS paper #.24, he presented a new criterion of sufficiency, namely, that the LF factors as
(1) p (x |θ ) = g (T, θ )h (x )
which allows one to recognize a sufficient statistic at sight. This was of great importance because the property of utilizing all the information in one's data can be applied to estimators based on small samples. And Fisher's experimental work in genetics and agronomy (at Rothamsted) had impressed on him the great practical importance of statistical methods applicable to small samples, and, hence, of exact tests or estimates based on exact, as opposed to approximate, sampling distributions. In this he was also strongly influenced by W. S. Gossett's 1908 discovery of the exact distribution of the statistic,
n ^½(x̄ − θ )/s
where

is the sample variance, which could then be used to test hypotheses about normal means using a small sample when the variance of the measurements is unknown. Thus, he came to view large sample theory, concerned with the never-never world of asymptotic behavior, as a mere preliminary to the study of small samples (SMSI, p. 163).

To facilitate the study of small samples, he introduced a quantitative measure of information. His leading idea was to measure the information an experiment with outcome variate X conveys about an unknown parameter θ by the precision (or inverse variance) of an MLE of θ. Earlier work of Karl Pearson and Francis Ysidro Edgeworth, the two leading figures of the British school of statisticians of the generation preceding Fisher's, had linked the precision of an estimator to the second derivative of the logarithm of the LF, ln p (x |θ ), where x = (x ₁, …, x _n), which one denotes L (x |θ ), or even by L (θ ). For example, to find the MLE of a binomial parameter, p, noting that the LF and its logarithm have the same maxima, one solves the likelihood equation,

the observed relative frequency of successes. Taking the second derivative, one finds:

whereupon replacing x by its mean, np, reduces this to:

the variance of p. "This formula," he declares, "supplies the most direct way known to me of finding the probable error of statistics," adding (with critical reference to Pearson) that "the above proof [not shown here] applies only to statistics obtained by the method of maximum likelihood" (Fisher 1950, 10.329).

Now one might hope to show that the Fisher information, defined by
(2) I _n(θ ) = -E [L ″(x |θ )]
imposes an upper limit on the precision of any estimator of θ for any given sample size n. To make a long tangled story short, Edgeworth proved special cases of this using the Schwarz inequality and Fisher extended his results (see Hald 1998, pp. 703–707, 716–719, 724–726, 734), offering a proof (again less than rigorous) that V (T ) ≥ 1/I _n(θ ). The first rigorous proofs came in the 1940s (Cramer 1946, p. 475; De Groot 1986, p. 425) and a general form of this so-called Cramer-Rao inequality reads:
(3) var(T ) ≥ m ′(θ )²/I _n(θ )
where m (θ ) = E (T ) = ∫T (x )p (x |θ )dx. One's assumptions are that the density is defined for a nondegenerate interval that does not depend on θ and has (finite) moments up to second order. When m (θ ) = θ, so that T is unbiased, (3) simplifies to var(T ) ≥ 1/I _n(θ ), as anticipated by Edgeworth and Fisher. Estimators that achieve this minimum variance bound are called MVB estimators, and this condition effectively replaces asymptotic efficiency since it applies to samples of all sizes. Cramer then proved (1946, pp. 499ff) that if an efficient (or MVB) estimator T of θ exists, then the likelihood equation has a unique solution given by T, and that if a sufficient estimator of θ exists, any solution of the likelihood equation will be a function of that estimator. These results round out Fisher's small sample theory of estimation.

Fisher used his factorization criterion (1) for sufficient statistics to show that the distributions admitting a sufficient statistic are precisely those of the form:
(4) p (x |θ ) = F (x )G (θ )exp[u (x )v (θ )]
provided that the range of X does not depend on θ, as it does for the uniform distribution on [0, θ ] with θ unknown. Called the exponential class, (4) includes almost all the other distributions that figure prominently in applied probability and statistics, including the normal, Poisson, beta, gamma, and chi-squared distributions (and there is also a multiparameter form of (4)). Thus, the class (4) occupies a position of central importance, akin to that of the central limit theorem. Using a clever change of variable in the condition for equality in (3), Jaynes (2003, p. 519) shows that the exponential class is also the class of maxent distributions, those yielded by the principle of maximizing the (Shannon) entropy subject to one or more given mean value constraints. Thus, as Jaynes proclaims, "if we use the maximum entropy principle to assign sampling distributions, this automatically generates the distributions with the most desirable properties from the standpoint of … sampling theory (because the sampling variance of an estimator is then the minimum possible value)" (520). Once again, the fruits of Fisher's own investigations drew him closer to the objectivist Bayesian position that he so vigorously opposed. Indeed, the maximum entropy formalism can be used to generate either data distributions or prior distributions and is supported by the kinds of consistency properties Fisher also endorsed. Mathematics makes strange bedfellows!

Fisher information defined by (2), or, equivalently, by I _n(θ ) = E [L ′(x |θ )²] = var[L ′(x |θ ), also plays a prominent role, as one would expect, in Fisher's theory of experimental design. Given multinomial data with category counts a ₁, …, a _k and category probabilities p ₁(θ ), …, p _k(θ ) that depend on a parameter θ, the Fisher information for a sample of one is:
(5)

Examples arise in genetics, especially linkage. For example, one may wish to compare the information about the linkage parameter θ (the recombination fraction) yielded by a double backcross, AB /ab × ab /ab, with that given by a single backcross, Abab × Abab. Under the former mating, the genotypes AB /ab, Ab /ab, aB /ab, ab /ab occur among the offspring with probabilities ½(1 − θ ), ½(θ ), ½(θ ), and ½(1 − θ ), and so

while for the single backcross one similarly finds I (θ ) = 1/2θ (1 − θ ), or half the information yielded by the double backcross. Further refinements arise when there is dominance in one or both factors (see Edwards 1992, pp. 148–149). For more examples, see chapter 11 of The Design of Experiments (in Fisher 2003; hereafter DE ) and Kenneth Mather's The Measurement of Linkage in Heredity (1938).

Significance Tests

One comes, at last, to Fisher's second important measure for ordering hypotheses, namely, significance tests. The earliest significance tests were aimed at distinguishing a hypothesis of chance from one of cause or design (Hald 1998, §4.1). For example, is the perfect agreement of the wrong answers of two students on a multiple choice test due to collusion or a mere coincidence? In the usage of Laplace, one compares the probability of such agreement on the two hypotheses and when this probability is "incomparably greater" on the hypothesis of design, "we are led," he says, "to disbelieve" that of chance. Laplace readily extended this reasoning to the separation of "real" from "spurious" physical causes, as when he concluded that "the actual disposition of our planetary system," by which he meant that all six planets and their satellites move in the same direction as the earth and have inclinations to the ecliptic within a small neighborhood of zero, "would be infinitely small if it were due to chance" and so indicates a "regular cause" (§4.4). In the same vein, Gustav Kirkhoff concluded that the perfect coincidence of the sixty dark lines in the solar spectrum of iron with sixty bright lines of the spectrum obtained by heating iron filings in a Bunsen burner could not be due to chance but indicated the presence of iron in the sun.

In such cases, the probability of agreement on the hypothesis of design may be only qualitatively defined, but the logic is essentially that of a likelihood ratio test. Nor did Laplace speak in terms of rejecting the hypothesis of chance or prescribe a threshold of improbability beyond which belief gives way (or should give way) to disbelief. He took as his test criterion the tail area probability, that is, the probability of a deviation at least as large as that observed (Hald 1998, p. 25). Moreover, a low probability of observing so large a deviation by chance points to some alternative explanation that, however, need not be formulated beforehand. Rather, "by letting the remarkable feature [of the data] determine the statistic used in the test, we concentrate implicitly on an alternative hypothesis" (p. 67).

Fisher embraced most but not all these features. The locus classicus of his account is the famous treatment of the tea-tasting lady who claims to be able to tell whether milk or tea was added first to a mixture of the two (DE, chapter 2). Every serious student of inductive reasoning should read and reread this chapter with infinite care. Of great importance, too, is the fourth chapter of SMSI, "Some Misapprehensions about Tests of Significance."

To begin with, a significance test is, emphatically, not a decision rule (DE, §12.1; SMSI, §4.1], the differences between them being characterized as "many and wide" (SMSI, p. 80). Thus opens Fisher's trenchant critique of the Neyman-Pearson theory of testing. In choosing a test statistic, "the experimenter will rightly consider all points on which, in the light of current knowledge, the hypothesis may be imperfectly accurate, and will select tests … sensitive to these possible faults, rather than to others" (p. 50).

However, Fisher is clear that the hypothesis one chooses to test may be suggested by one's data (p. 82). Thus, in tossing a coin, the outcome may lead one to test the hypothesis that the coin is fair, that the trials are independent, or that the same coin was tossed each time. Each test will require a different reference set and a different measure of deviation from the null hypothesis. This point is further illustrated by examples from genetics, where departures from posited 9:3:3:1 Mendelian ratios for a hybrid cross may be due to linkage, partial dominance in one of the factors, linked lethals, or other causes. In such cases, the partitioning of the chi-squared statistic into orthogonal components allows one to pinpoint the source(s) of such a discrepancy (for illustrations of this method, see Mather 1938, chapter 4). This practice is markedly at odds with the Neyman-Pearsonite insistence on predesignating all the elements of a test. Fisher goes on to draw three more such contrasts between significance testing and the acceptance sampling paradigm that informs the Neyman-Pearsonite theory.

First, in acceptance sampling, the population of lots from which one is sampling is well defined and one has a real sequence of repeated trials, "whereas the only populations that can be referred to in a test of significance have no objective reality, being exclusively the product of the statistician's imagination through the hypothesis which he has decided to test" (SMSI, p. 81). Thus, a test is possible where no repetition of one's experiment is contemplated. However, Fisher's hypothetically infinite populations lead a shadowy existence and, as Jaynes (2003) remarks, it is hard to see how such imaginings can confer greater objectivity on one's methods.

Second, decisions are final, and conclusions are provisional. And, third, "in the field of pure research, no assessment of the cost of wrong conclusions … can conceivably be more than a pretence, and in any case … would be inadmissible and irrelevant in judging the state of scientific evidence" (DE, pp. 25–26; also see SMSI, pp. 106–107). Still, Fisher could easily have admitted the relevance of cost functions to the planning of an experiment and still deny their relevance to the weighing of the evidence that results.

The main thrust of Fisher's critique of the Neyman-Pearsonite theory, however, was to deny that the significance level, which measures the strength of the evidence against the null hypothesis of no difference, can be identified with the frequency with which the null hypothesis is erroneously rejected—with the Neyman-Pearsonite's "type I error probability" (SMSI, pp. 93–96). Varying Fisher's more complicated example, J. G. Kalbfleisch and D. A. Sprott (1976, p. 262) consider the composite hypothesis H that at least one of m coins is fair (m > 1). Each coin is tossed ten times, and if each shows 0, 1, 9, or 10 heads (with at least one showing 1 or 9), one can quote an exact significance level of 22 × 2⁻¹⁰ = 0.0215 against the fairness of each coin, hence evidence no stronger than this against H. (Intuitively, the evidence that all the coins are biased can be no stronger than the evidence that any particular one of them is biased.) However, the frequency of rejecting H using this criterion, even when H is "truest" (i.e., when all the coins are fair) is only .0215^m, which, even for moderately large m, is much smaller than .0215. This leads Kalbfleisch and Sprott to conclude, with Fisher, that "the frequency with which a true hypothesis would be rejected by a test in repetitions of the experiment will not necessarily be indicative of the strength of the evidence against H " (p. 263). More generally, it may be nearly impossible to obtain strong evidence simultaneously ruling out all the simple constituents of a composite hypothesis (SMSI, p. 93), which prompts Fisher to conclude that "the infrequency with which, in particular circumstances, decisive evidence is obtained, should not be confused with the force, or cogency, of such evidence" (p. 96).

Fisher, like Laplace, refrains from imposing a universal critical level of significance and almost always reports exact significance levels or tail area probabilities, but, unlike Laplace, he does speak of rejecting hypotheses, even though in most instances this is just shorthand for "regard the data as discordant or inconsistent with the hypothesis." Nevertheless, this language invited confusion with the different decision theoretic approach of Jerzy Neyman and Egon Pearson, and, in fact, misled generations of textbook writers, who regularly graft the Neyman-Pearson account of testing onto Fisher's and paper over the many and wide differences between them.

Fisher's crucial departure from Laplace is to construe significance levels as evidence against the null hypothesis. Like Karl Popper, he steadfastly refuses to concede that evidence sufficient to reject the null hypothesis at a stringent level of significance is evidence for the alternative hypothesis of interest. However, his own practice belies his precept. In testing for genetic linkage, rejection of the hypothesis of independent assortment is routinely followed by estimation of the recombination fraction, that is, the degree of association. And in the example of the tea-tasting lady, his language is that the lady "makes good her claim" when she classifies all the cups presented to her correctly (DE, p. 14). The reason he gives for denying that an experiment can do more than disprove the null hypothesis (p. 16) is that the alternative hypothesis that the lady can discriminate "is ineligible as a null hypothesis to be tested by experiment, because it is inexact." That reason is rather question-begging. The real reason, one suspects, is that Fisher wanted to be able to disprove a null hypothesis without providing evidence for any alternative hypothesis. The possibility of such purely negative significance tests has been at the heart of the controversies that have swirled about this topic (see Royall 1997, chapter 3, especially §3.9).

For Laplace, as it was seen, significance tests are extensions of likelihood ratio tests to rather amorphous ill-defined alternatives. And for Fisher, too, they come into play when the LF is unavailable—a point that seems to have been lost on Neyman and Pearson, whose methodology assumes that outcome probabilities conditional on the alternative hypotheses can be computed from the model. However, for Fisher, the logic of a test is a probabilistic form of modus tollens. A hypothesis is rejected when the outcome it entails does not occur; similarly, it is rejected at a stringent level of significance when an outcome it predicts with high probability does not occur. And this eliminativist logic applies whether or not alternative hypotheses have entered the arena.

Kalbfleisch and Sprott (1976) also strongly insist that the alternative to, say, a null hypothesis of homogeneity may be too amorphous to admit specification. Significance tests allow one to postpone the hard work of formulating such an alternative until a significance test has demonstrated the need for one. No doubt, there are strong arguments on both sides and the issue may be considered unresolved. An interesting case in point is provided by the maximum entropy method wherein the signs and magnitudes of the deviations from expected values indicate a new mean value constraint that then leads to a new maxent distribution. The presence of such an additional constraint is indicated when the entropy of the current maxent distribution lies sufficiently far below the maximum allowed by the current mean value constraints. Ultimately, however, one must agree with Gossett (see Royall 1997, p. 68) that one cannot securely reject a hypothesis or a model unless or until one has a better fitting one to put in its place (compare de Groot 1986, p. 523).

Critics of significance testing have also questioned the use of tail areas, which as Fisher admits, "is not very defensible save as an approximation" (SMSI, p. 71), for it appears to make the import of what was observed depend on possible outcomes that were not observed. Actually, in cases where the measure of deviation is a continuous variate, like Pearson's chi square or Gossett's n ^½(x̄ − μ )/s, the probability of a deviation exactly as large as that observed is nil and so one has no choice but to use a tail area. However, more to the point, tail areas give (approximately) the proportion of possible outcomes that agree with the hypothesis of cause, design, or efficacy as well as that observed, and this provides a sort of absolute standard of comparison, one that even allows one to compare the strength of the evidence in favor of hypotheses in disparate fields. In any case, the Laplacean logic of significance testing, which views such a test as an index of the evidence in favor of some hypothesis of design, averts a host of interpretive difficulties and fits well with a form of argument—the piling up of improbabilities—that occurs across a broad spectrum of the sciences.

Conclusion

No article of reasonable length could hope to touch on more than a fraction of Fisher's vast output and the many thorny issues raised therein. Nothing has been said here, for example, about Fisher's notorious third measure of uncertainty, namely, fiducial probabilities. A good place to start is with the example of Gossett's t-test (SMSI, pp. 84–86). Turn next to the critique of the fiducial argument by A. W. F. Edwards (1992, §10.5), and then to the excellent papers by Teddy Seidenfeld (1992) and Sandy L. Zabell (1992). Oscar Kempthorne somewhere remarked that it would require at least ten years of preliminary study before attempting a definitive account of Fisher's work in statistics alone, but the effort would be well repaid. The same may be said of his work in genetics and evolution.

One may view Fisher as a "foiled circuitous wanderer," for his heroic attempts to construct a comprehensive alternative to the Bayesian account of inductive reasoning drew him ever more firmly back into the Bayesian position he started from and then rejected. The question one must address, however, is not whether Fisher would ultimately have returned to the Bayesian fold had he lived, say, another decade, but whether the consistency requirements he endorsed force one "back to Bayes." As it has been seen, his position is close to the objectivist Bayesianism of Laplace, Jeffreys, and Jaynes at many points (see Zabell 1992, p. 381 and notes 42 and 56). At the same time, it has to be admitted that Fisher created almost single-handedly the conceptual framework and technical vocabulary all statisticians, whether Bayesian or non-Bayesian, utilize. For sheer fertility of invention, Fisher has few equals in the history of the mathematical sciences.

Bibliography

Cramer, Harald. Mathematical Methods of Statistics. Princeton, NJ: Princeton University Press, 1946.

De Groot, Morris. Probability and Statistics. Reading, MA: Addison-Wesley, 1986.

Edwards, A. W. F. Likelihood. 2nd ed. Baltimore, MD: Johns Hopkins University Press, 1992.

Hald, Anders. A History of Mathematical Statistics from 1750 to 1930. New York: Wiley, 1998.

Jaynes, Edwin T. Probability Theory: The Logic of Science. Edited by G. Larry Bretthorst. New York: Cambridge University Press, 2003.

Kalbfleisch, J. G., and D. A. Sprott. "On Tests of Significance." In Foundations of Probability Theory, Statistical Inference, and Statistical Theories of Science. Vol. 2, edited by W. L. Harper and C. A. Hooker. Dordrecht, Netherlands: D. Reidel, 1976.

Mather, Kenneth. The Measurement of Linkage in Heredity. London: Metheun, 1938.

Royall, Richard. Statistical Evidence: A Likelihood Paradigm. London: Chapman-Hall, 1997.

works by fisher

Contributions to Mathematical Statistics. New York: Wiley, 1950.

Collected Papers of R. A. Fisher, 5 vols, edited by J. H. Bennett. Adelaide, Australia: University of Adelaide, 1971–1974.

Statistical Inference and Analysis: Selected Correspondence of R. A. Fisher. Edited by J. H. Bennett. Oxford, U.K.: Clarendon Press, 1990.

Statistical Methods, Experimental Design, and Scientific Inference. New York: Oxford University Press, 2003. Reprints of the latest editions of Fisher's three books, Statistical Methods for Research Workers, The Design of Experiments, and Statistical Methods and Scientific Inference. All page references are to this edition.