views updated

# Advances in the Field of Statistics

## Overview

Statistics provides a theoretical framework for analyzing numerical data and for drawing inferences from such data. Statistics uses the concept of probability—the likelihood that something will happen—to analyze data that are produced by probabilistic processes or to express the uncertainty in the inferences it draws.

In the first part of the twentieth century statisticians elaborated a number of theoretical frameworks within which statistical methods can be evaluated and compared. This led to a number of schools of statistical inference that disagree on basic principles and about the reliability of certain methods. Users of statistics are often happily unaware of these disagreements, and are taught a hybrid mixture of various approaches that may serve them satisfactorily in their practices.

## Background

In the course of the nineteenth century huge masses of data were gathered by state agencies as well as private organizations and individuals on social phenomena like poverty and suicide, on physical phenomena like the heights of mountain tops, daily rainfall, and agricultural yields, and on repeated measurements, like the speed of light in a vacuum or the intensity of the gravitational field. But the means to analyze these data were found to be wanting, and where they did exist they lacked an overarching theoretical framework within which they could be compared and justified.

Statistics was often called the science of mass phenomena, and the ideal statistical investigation involved a complete enumeration of the mass that was studied: all the suicides, all the poor, the yields of all the fields in a certain region. Sampling, or the representative method, came into use towards the end of the nineteenth century, and then only in the form of purposive selection, where the sample is carefully constructed to mimic the whole population, rather than the more modern form of random sampling that gives a statistical control of error bounds.

Within agricultural experimentation comparison of means was common, but researchers were unclear how variability of yields affected the firmness of the conclusions they drew. While studying anthropological data comparing the height of children with the average height of their parents, Englishman Francis Galton (1822-1911) went beyond a simple comparison of averages and introduced the concepts of regression (1885) and later of correlation. Galton was motivated in this work by his interest in eugenics, the idea that a country's population could be improved by having people with desirable characteristics have more children, and people with undesirable characteristics, such as mental illness, have fewer, or no, children. Later British statisticians, such as Karl Pearson (1857-1936), who founded the influential journal Biometrika and the Biometric Laboratory at University College, London, and statistician and geneticist Ronald A. Fisher (1890-1962), were similarly motivated.

In the 1890s Karl Pearson introduced a system of frequency curves that differed in shape from the normal distribution in symmetry and peakedness. He also introduced a statistical test, the chi-square test, to determine the most fitting curve for a body of data. Earlier statisticians had developed a barrage of tests for the identification and rejection of "outliers"—data that were suspected to have errors of such a size that an analysis would be better if they were first thrown away. But the rules lacked a convincing justification and were often incompatible.

By 1950 the situation had changed dramatically. A number of different theoretical frameworks for statistical analysis had been elaborated: by Ronald A. Fisher, who stressed the design and analysis of comparative experiments using randomization, significance testing, analysis of variance, and the likelihood method; by Jerzy Neyman (1894-1981) and Egon Pearson, Karl Pearson's son, who stressed the basic idea that statistical inference is trustworthy when it derives, with high frequency, true conclusions from true premises; and by Harold Jeffreys (1891-1989) and Bruno de Finetti, who elaborated an analysis of statistical inference in which uncertain conclusions are expressed as probabilities, and inference is approached through looking at conditional probabilities—the probability of a hypothesis that is conditional on the collected data. In the meantime, statistical analysis had become essential to all the empirical sciences, from psychology to physics, and moreover had become indispensable for control processes in industry and management as stressed by the American W. Edwards Deming (1900-1993).

Ronald A. Fisher is the central figure in the founding of modern statistics. Educated as a mathematician and geneticist in Cambridge, England, in the early 1910s, Fisher was hired as a statistician by the venerable Rothamsted Experimental Station, an agricultural research institute, to analyze the backlog of data the institute had collected. Soon, he realized that it is not possible to derive reliable estimates from experiments that are not well designed, nor is it possible to calculate a measure of the reliability of the estimates. He laid down three fundamental principles—randomization, replication, and local control—to be followed in designing experiments. Randomization means that treatments are randomly allocated within the group of comparable experimental subjects. Replication is the repetition of the same treatment, and local control refers to the insight that only subjects that agree on covariates should be directly compared. These three principles made a calculation of valid error bounds possible and minimized the variance of the estimates.

The design of experiments is possibly Fisher's most outstanding contribution to statistics, and it has been incorporated into the theory of the various schools of statistical inference. He was the first to draw a clear distinction between a population and a sample drawn from it, and he introduced the classification of statistics problems into model specification, estimation, and distribution.

In his later work Fisher stressed that there are various forms of quantitative statistical inference and that a monolithic structure of statistical inference as was developed by the two rival schools, the frequentist school of Neyman and Pearson and the Bayesian school of Jeffreys and de Finetti, is not possible. The nature of the problem dictates the assumptions one can objectively make. When one can make few assumptions, a test of significance is appropriate. Here, one calculates the improbability of a deviation as large as observed assuming the truth of a socalled null hypothesis. Alternately, when one may assume a full parametric model, more powerful means of analysis come into play. The method of mathematical likelihood gives an ordering of rational belief, in which case the maximum likelihood estimate gives the most likely value of the parameter. In rare instances one may use the method of fiducial probability. This controversial derivation depends on an inversion of a pivotal quantity, a function of parameters and data that has a known distribution that is independent of the data. Fisher believed that future generations of statisticians may come up with further methods of inference, and he saw the future of statistics as necessarily open-ended.

Fisher occupies a position in between the two rival schools of probabilistic inference, the frequentist school of Neyman and Pearson and the Bayesian school of Jeffreys and de Finetti. The frequentist school rejects the idea that there is such a thing as statistical inference altogether, in the sense of data giving partial support to a hypothesis. The Bayesian school, named after English mathematician Thomas Bayes (1702-1761), relates all forms of statistical inference to the transition from a prior probability to a posterior probability given the data.

The Pole Jerzy Neyman was also trained as a mathematician and also worked, although briefly, for an agricultural research station. But Neyman had more of an affinity with a rigorous approach to mathematics. When Neyman came to England in the 1920s, he was disappointed by the low level of mathematical research in Karl Pearson's Biometric Laboratory. Being intrigued by Ronald Fisher's conceptual framework, he set out with his research partner, Egon Pearson, to provide a rigorous underpinning to Fisher's ideas, thus infuriating Fisher, who felt that the subtle points in his thinking were disregarded.

In a defense of his theory of statistical inference, Neyman argued that it would be better to speak of a theory of "inductive" behavior, since "inference" wrongly suggested that there is logical relation of partial support between data and hypothesis. Neyman rejected significance testing and replaced it by hypothesis testing. In significance testing the exact level of significance is a measure of discordance between the data obtained and the null hypothesis. A low level of significance will tend to make an experimenter reject the null hypothesis, but a significance test can not lead to the acceptance of the null hypothesis, since there are many other hypotheses under which the data may fail to be significant. In Neyman-Pearson hypothesis testing, one needs to have at least two hypotheses, and the goal of the statistician is to devise a data-dependent rule for the rejection of one of these hypotheses and acceptance of the other. Accepting means behaving as if a hypothesis is true without necessarily having a belief about the hypothesis, one way or another. The statistician will try to identify rules with good "operating characteristics," such as a high frequency of getting it right and low frequency of making an error. Fisher believed that this theory may be appropriate for testing batches of lightbulbs in industrial quality control programs but not for scientific inference, where a scientist should weigh what data means for the various hypotheses he entertains.

A framework for statistical inference in which Bayes's theorem is central was developed in England by the astrophysicist Harold Jeffreys and in Italy by Bruno de Finetti. Jeffreys tried to work out a version of objective Bayesianism in which the prior probability over the unknown parameters has an objective status dictated by the structure of the problem assuming further ignorance. Bruno de Finetti's version of Bayesian inference asserted famously that "probability does not exist." By this he meant that a statistician does not have to assume that there are stable frequencies or chances out there in nature in order to use the calculus of probability as a measure of personal uncertainty. We can measure this uncertainty by considering the various odds we are willing to accept on any event that we can observe in nature. Thus, when we repeatedly toss a coin, we are uncertain as to whether it will come up heads or tails, and we can express that uncertainty by a probability distribution. But talking of the chance of heads as an objective propensity inherent in the coin introduces a nonobservable property about which no bets can be settled. In his famous representation theorem, de Finetti showed that we can artificially introduce a probability distribution over such an unobservable parameter if we judge the sequences of heads and tails to be exchangeable—that is, that the probability is independent of the order of heads and tails. Especially after the 1950s the Bayesian school of scientific inference has come to great fruition and has become both conceptually and technically very sophisticated.

## Impact

Users of statistical methods tend to be unaware of the great disputes that have occurred between the various schools of statistical inference. The popular statistical textbooks one studies in college tend to present a uniform hybrid theory of statistics in which hypothesis testing is interpreted as significance testing, but with the twist of the possibility of accepting a null hypothesis.

Notwithstanding these foundational disputes, the empire of probability greatly expanded in the first half of the twentieth century. Descriptive statistics became common fare in every newspaper, and statistical inference became indispensable to public health and medical research, to marketing and quality control in business, to accounting, to economic and meteorological forecasting, to polling and surveys, to sports, to weapon research and development, and to insurance. Indeed, for practitioners in many areas of the biological, social, and applied sciences, standardized procedures from inferential statistics virtually define what it means to use "the scientific method."

ZENO G. SWIJTINK