Statistics: History, Interpretation, and Application
Statistics: History, Interpretation, and Application
Statistics: HISTORY, INTERPRETATION, AND APPLICATION
Numerous jokes are associated with statistics and reflected in such caustic definitions as "Statistics is the use of methods to express in precise terms that which one does not know" and "Statistics is the art of going from an unwarranted assumption to a foregone conclusion." Then there is the time-worn remark attributed to the English statesman Benjamin Disraeli (1804–1881): "There are three kinds of lies: lies, damned lies, and statistics."
Statistics may refer to individual data, to complete sets of numbers, or to inferences made about a large population (of people or objects) from a representative sample of the population. The concern here is with inferential statistics. Its methodology is complex and subtle, and the risk of its abuse very real. There is no end in sight for the public being inundated with numbers, by the market and all kinds of interest groups. It has been estimated that children growing up in a pervasive television culture are exposed to more statistics than sex and violence combined. It was another Englishman, the novelist and historian H. G. Wells (1866–1946), who said: "Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write."
For those who understand, statistics is an exciting venture, a bold reaching out by the human mind to explore the unknown, to seek order in chaos, to harness natural forces for the benefit of all. Its development was integral to the rise of modern science and technology, its critical role recognized by the brilliant founders of new disciplines.
After a brief sketch of the history of statistical inference, this article offers a commentary on interpretations of statistics and concludes with a discussion of its applications that includes a case study of statistics in a scientific context.
Highlights of History
This quick survey of the history of statistics is presented in two sections, beginning with the emergence of statistical inference and then turning to the use of statistical concepts in philosophical speculation.
FROM STATISTICAL THINKING TO MATHEMATICAL STATISTICS. The normal distribution, which plays such a central role in statistics, was anticipated by Galileo Galilei (1564–1642) in his Dialogue concerning the Two Chief World Systems—Ptolemaic and Copernican (1632). He spoke of the errors in measuring the distance of a star as being symmetric (the observed distances equally likely to be too high as too low), the errors more likely to be small than large, and the actual distance as the one in which the greatest number of measurements concurred—a description of the bell-shaped curve. Discovered by Abraham de Moivre (1667–1754), the normal distribution was fully developed as the law of error in astronomy by Pierre-Simon de Laplace (1749–1827) and Carl Friedrich Gauss (1777–1855).
The statistical approach was applied to social phenomena by the Belgian astronomer Adolphe Quetelet (1796–1874), in what he called social physics, by analogy with celestial physics. He introduced the concept of the average man to show that observed regularities in the traits and behavior of groups followed the laws of probability. He strongly influenced Florence Nightingale (1820–1910), the British nursing pioneer and hospital reformer, who urged governments to keep good records and be guided by statistical evidence.
The fundamental contributions of the Scottish physicist James Clark Maxwell (1831–1879) to electromagnetic theory and the kinetic theory of gases would lead to communications technology and ultimately to Albert Einstein's special theory of relativity and Max Planck's quantum hypothesis. Having learned of Quetelet's application of the statistical error law to social aggregates, Maxwell theorized that the same law governed the velocity of gas molecules. His work in statistical mechanics and statistical thermodynamics foreshadowed a new conception of reality in physics.
The Austrian monk Gregor Johann Mendel (1822–1884) carried out plant crossbreeding experiments, in the course of which he discovered the laws of heredity. Traits exist as paired basic units of heredity, now called genes. The pairs segregate in the reproductive cell, and the offspring receive one from each parent. Units corresponding to different traits recombine during reproduction independently of each other. Mendel presented his results at a scientific meeting in 1865 and published them in 1866, but they were ignored by the scientific community and he died unknown.
Statistical inference as a distinct discipline began with Francis Galton (1822–1911), a cousin of Charles Darwin, whose On the Origin of Species (1859) became the inspiration of Galton's life. The theory of evolution by natural selection offered Galton a new vision for humanity. He coined the term eugenics to express his belief that the conditions of humankind could best be improved by scientifically controlled breeding. He devoted himself to the exploration of human inheritance in extensive studies of variability in physical and mental traits, constructing what would become basic techniques of modern statistics, notably regression and correlation. In 1904 he established the Eugenics Record Office at University College, London, which in 1911 became the Galton Laboratory of National Eugenics, with Karl Pearson (1857–1936) appointed its director.
A man of classical learning and deep interest in social issues, Pearson was attracted to Galton's work in eugenics. Becoming absorbed in the study of heredity and evolution by the measurement and analysis of biologic variation, he developed a body of statistical techniques that includes the widely used chi-square test. In 1901 he founded the journal Biometrika. But he never accepted Mendel's newly rediscovered laws of inheritance involving hereditary units as yet unobserved, and engaged in a feud with Mendelian geneticists. Pearson was appointed the first professor of eugenics in 1911, with his Biometric Laboratory incorporated into the Galton Laboratory of National Eugenics, and the department became a world center for the study of statistics. When he retired in 1933, the department was split in two; his son Egon Pearson (1895–1980) obtained the chair in statistics, and Ronald A. Fisher (1890–1962) became professor of eugenics.
Trained in mathematics and physics, Fisher emerged as the greatest single contributor to the new disciplines of statistics and genetics and the mathematical theory of evolution. He did fundamental work in statistical inference, and developed the theory and methodology of experimental design, including the analysis of variance. Through his books Statistical Methods for Research Workers (1925), The Design of Experiments (1935), and Statistical Methods and Scientific Inference (1956), he created the path for modern inquiry in agronomy, anthropology, astronomy, bacteriology, botany, economics, forestry, genetics, meteorology, psychology, and public health. His breeding experiments with plants and animals and his mathematical research in genetics led to the publication of his classic work, The Genetical Theory of Natural Selection (1930), in which he showed Mendel's laws of inheritance to be the essential mechanism for Darwin's theory of evolution.
Egon Pearson collaborated with the Russian-born mathematician Jerzy Neyman (1894–1981) to formulate what is now the classical (Neyman-Pearson) theory of hypothesis testing, published in 1928. This is the theory used across a wide range of disciplines, providing what some call the null hypothesis method. Neyman left London in 1937 to become a strong force in establishing the field in the United States. Another major contributor to American statistics was the Hungarian-born mathematician Abraham Wald (1902–1950), founder of statistical decision theory and sequential analysis.
STATISTICS AND PHILOSOPHY. Statistical developments in the eighteenth century were intertwined with natural theology, because for many the observed stable patterns of long-run frequencies implied intelligent design in the universe. For Florence Nightingale in the nineteenth century, the study of statistics was the way to gain insight into the divine plan.
Francis Galton had a different view. For him the theory of evolution offered freedom of thought, liberating him from the weight of the design argument for the existence of a first cause that he had found meaningless. Karl Pearson, author of The Grammar of Science (1892), was an advocate of logical positivism, holding that scientific laws are but descriptions of sense experience and that nothing could be known beyond phenomena. He did not believe in atoms and genes. For him the unity of science consisted alone in its method, not in its material. Galton and Pearson gave the world statistics, and left as philosophical legacy their vision of eugenics.
James Clark Maxwell was a thoughtful and devout Christian. He argued that freedom of the will, then under vigorous attack, was not inconsistent with the laws of nature being discovered by contemporary science. The statistical method, the only means to knowledge of a molecular universe, yielded information only about masses of aggregates, not about individuals. He urged recognition of the limits of science: "I have endeavored to show that it is the peculiar function of physical science to lead us to the confines of the incomprehensible, and to bid us behold and receive it in faith, till such time as the mystery shall open" (quoted in Porter 1986, p. 195).
In 1955 Fisher, by then Sir Ronald Fisher, said in a London radio address on the BBC: "It is one of the evils into which a nation can sometimes drift that, for about three generations in this country, the people have been taught to assume that scientists are the enemies of religion, and, naturally enough, that the faithful should be enemies of science" (Fisher 1974, p. 351). Scientists, he insisted, needed to be clear about the extent of their ignorance and not claim knowledge for which there was no real evidence. Fisher's advice remains sound at the start of the twenty-first century.
Interpretation: A Commentary
The following are comments on various aspects of statistics, painted of necessity in broad strokes, and concluding with some thoughts concerning the future.
STATISTICS AND THE PHILOSOPHY OF SCIENCE. Two distinct types of probability—objective and subjective—have been recognized since the emergence of the field in the seventeenth century. The classical (Neyman-Pearson) theory of hypothesis testing is based on the objective, frequentist interpretation. The subjective, degree-of-belief interpretation yields variations of so-called Bayesian inference. The latter involves combining observations with an assumed prior probability of a hypothesis to obtain an updated posterior probability, a procedure of enduring controversy. But the frequentist theory, as pointed out by its critics, does not provide any measure of the evidence contained in the data, only a choice between hypotheses. The American mathematical statistician Allan Birnbaum (1923–1976) did pioneering work to establish principles of statistical evidence in the frequentist framework, his two major related studies being "On the Foundations of Statistical Inference" (1962) and "Concepts of Statistical Evidence" (1969). Exploring the likelihood principle, Birnbaum reached the conclusion that some sort of confidence intervals were needed for the evaluation of evidence. A leading advocate of the subjective approach, of what he called personal probability, was another American statistician, Leonard J. Savage (1917–1971), author of the classic work The Foundations of Statistics (1954).
Statistics as commonly taught and used is that based on the frequentist theory. But there is lively interest in Bayesian inference, also the focus of serious study by philosophers (Howson and Urbach 1993). The entire subject has been engaging philosophers of science, giving rise to a new specialty called the philosophy of probability. An example is the edited volume Probability Is the Very Guide of Life: The Philosophical Uses of Chance (Kyburg and Thalos 2003), a collection of essays by philosophers of probability that explores aspects of probability as applied to practical issues of evidence, choice, and explanation—although without consensus on conceptual foundations. The title refers to a famous remark of Bishop Joseph Butler, one of the eighteenth-century natural theologians who saw statistical stability as a reflection of design and purpose in the universe (Butler 1736). Another edited volume, The Nature of Scientific Evidence: Statistical, Philosophical, and Empirical Considerations (Taper and Lele 2004), has contributions by statisticians, philosophers, and ecologists, with ecology used as the illustrative science. What remains clear is the persistent conflict between the frequentist and Bayesian approaches to inference. There is no unified theory of statistics.
STATISTICS IN THE FIELD. At the other end of the statistical spectrum is the approach expressed by the term exploratory data analysis (EDA), introduced by John W. Tukey (1915–2000), the most influential American statistician of the latter half of the twentieth century. Exploratory data analysis refers to probing the data by a variety of graphic and numeric techniques, with focus on the scientific issue at hand, rather than a rigid application of formulas. Tukey's textbook on EDA (1977) contains techniques that can be carried out with pencil and paper, but the approach is well suited to computer-based exploration of large data sets—the customary procedure. EDA is an iterative process, as tentative findings must be confirmed in precisely targeted studies, also called confirmatory data analysis. The aim is flexibility in the search for insight, with caution not to oversimplify the science, to be wary of pat solutions.
Practicing statisticians need to understand established theory, know the methods pertaining to their area of application, and be familiar with the relevant software. They must know enough about the subject matter to be able to ask intelligent questions and have a quick grasp of the problems presented to them. For effective communication they must be sensitive to the level of mathematical skills of the researchers seeking their assistance. It is easy to confuse and alienate with technical jargon, when the intention is to be of service. What is asked of them may range from short-term consultation—analysis of a small set of data, or help with answering a statistical reviewer's questions on a manuscript submitted for publication—to joining the research team of a long-range study that is being planned. Unless otherwise agreed, it is understood that frequentist theory will be used, with routine preliminary exploration of the data. A statistician who strongly prefers the Bayesian approach may recruit investigators interested in collaborating on Bayesian analysis of suitable scientific problems.
Some points to remember: Statistics is a tool—more precisely, a collection of tools. Creative researchers know a lot of facts and have hunches and ideas; they may seek interaction with a compatible statistician to help sort things out, and that is where the tools come in. Which ones are actually used may not matter so much in the end. On occasion, the statistician's real contribution may not even involve formal analysis. A mind trained in mathematics views problems from a special perspective, which in itself may trigger insight for the scientist immersed in the material. Other situations require structured research designs with specification of proposed methods of analysis. These include cooperative studies, such as large multinational clinical trials involving hundreds of investigators. But in any case and even in the most masterful hands, statistics can be no better than the quality of the underlying science.
THE FUTURE OF STATISTICS. The explosive growth of information technology, with its capacity to generate data globally at a fast pace and in great volume, presents the statistical profession with unprecedented opportunity and challenge. The question is not that of either/or, of theory versus practice, but of perspective and balance: Continue exploration on every front, but make what is established widely available. Apply what is known, and do it well. Make sure that wherever statistics is potentially useful, it is at hand.
A promising development here is the Cochrane Collaboration, founded in 1993, an independent international organization dedicated to making accurate, upto-date information about health care interventions readily available around the globe (Cochrane Collaboration). The organization promotes the search for evidence in the form of randomized clinical trials and provides ongoing summary analyses. By late 2004 there were twelve Cochrane centers worldwide, functioning in six languages, serving as reference centers for 192 nations, and coordinating the work of thousands of investigators. Such a vast undertaking must use objective criteria and uniform statistical methods that can be precisely communicated. That is the strength of the standard frequency approach.
In the realm of theoretical advances, some economic constraints may be cause for concern. Young graduates in academic positions, often struggling in isolation while carrying heavy teaching loads, are under great pressure to produce publications, any publications, to attain job security and professional advancement. This may not be the wisest use of their intellectual potential. A man of wit, Tukey would say that one should do theory only if it is going to be immortal. By contrast, those in a practical setting, such as a large biostatistics department, have to cope with the endless flow of data to be analyzed, under the constant pressure of immutable deadlines. The loss of major research grants may put many jobs in jeopardy, including their own. There should be other, readily available and steady sources of support that provide time for reflection, to find and explore areas of interest that seem to offer promise down the road. Such a path should include attention to what is happening in philosophy and close involvement with a field of cutting-edge empirical research. The great founders of statistics were widely read, hands-on scientists.
Application of Statistics
In the last decades of the twentieth century statistics continued its vigorous growth into a strong presence not only in the sciences but also in political and social affairs. Its enormous range of applications, with specialized methodology for diverse disciplines, is reflected in the thirteen-volume Encyclopedia of Statistical Sciences, published between 1982 and 1999 (Kotz, Johnson, and Read). The term statistical science refers to statistical theory and its applications to the natural and social sciences and to science-based technology. The best general advice in the application of statistics is to proceed with care and suspend hasty judgment. This is illustrated by a case study of the diffusion of neonatal technology.
STATISTICS IN CONTEXT: A CASE STUDY. The role of statistics in the interplay of forces affecting technological innovation was explored in a case study in neonatal medicine, a specialty created by technology (Miké, Krauss, and Ross 1993, 1996, 1998). It is the story of transcutaneous oxygen monitoring (TCM) in neonatal intensive care, introduced as a scientific breakthrough in the late 1970s and rapidly adopted for routine use, but abandoned within a decade. The research project included interviews with executives and design engineers of ten companies marketing the device, with investigators who had pioneered the technology, and with directors of neonatal intensive care units (NICUs).
Supplemental oxygen, essential for the survival of premature infants, had been administered since the 1930s, first via incubators and then by mechanically assisted ventilation. But in the 1940s an eye disease often leading to blindness, initially called retrolental fibroplasia (RLF) and later renamed retinopathy of prematurity (ROP), became the major clinical problem of surviving prematurely born infants. Over fifty causes were suggested, and about half of these were formally evaluated, a few in prospective clinical trials. When in the mid-1950s supplemental oxygen was identified as the cause of ROP in two large randomized clinical trials, the recommended policy became to administer oxygen only as needed and in concentrations below 40 percent. By this time more than 10,000 children had been blinded by ROP worldwide.
But subsequent studies noted higher rates of mortality and brain damage in surviving infants, as the incidence of ROP persisted and then rose, with many malpractice suits brought on behalf of children believed to have been harmed by improper use of oxygen. There was an urgent need for better monitoring of oxygen in the NICU.
Measurement of oxygen tension in arterial blood by means of the polarographic Clark electrode had been possible since the 1960s. The procedure was only intermittent, however, and the related loss of blood harmful to tiny, critically ill newborns. The new technology of TCM involved a miniaturized version of the Clark electrode that could monitor oxygen continuously across the skin, bypassing the need for invasive blood sampling. But the device was difficult to use, babies were burned by the electrode, and ROP was not eliminated. Within years TCM was being replaced by pulse oximetry, a still more recent technology with problems of its own.
A number of issues emerged. Subsequent review found serious flaws in the two randomized clinical trials that had implicated oxygen, and a series of methodological errors was noted in the early studies of other possible causes. The effectiveness of TCM in the prevention of ROP had not been shown before the adoption of the technology, and results of a randomized trial finally published in 1987 were inconclusive. It became clear that the oxygen hypothesis was an oversimplified view. ROP had a complex etiology related to premature physiology, even as the patient population itself was changing, with the survival of smaller and smaller infants.
A mistaken view of disease physiology, coupled with preventive technology advocated by its pioneers, heralded by the media, and demanded by the public—with industry only too eager to comply—led to the adoption of an untested technology that was itself poorly understood by those charged with its use. There was no special concern with statistical assessment, reliance on regulations of the Food and Drug Administration (FDA) being the norm. And there is no clear-cut way to assign ultimate responsibility. The study concluded with the overarching theme of complexity and uncertainty.
SUMMING UP Statistics is a powerful tool when in competent hands, one of the great intellectual achievements of the twentieth century. Ethical issues pertain to its misuse or lack of adequate use.
Elementary texts of applied statistics have traditionally been called "cookbooks," teaching mainly the "how" and not the "why." But in the present-day fast food culture hardly anyone cooks any more, and this applies equally to statistics. Computer software provides instant analysis of the data by a variety of techniques, allowing the user to pick and choose from the inevitable sprinkling of "significant" results (by definition of the meaning of P-value) to create a veneer of scientific respectability. Such meaningless and misleading activity, whatever the reason, can have harmful consequences. Another danger of abuse can come in the phrasing of questions in public opinion polls, known to affect the response, in a way that biases the results in favor of the sponsor's intended conclusion.
The ideal role of statistics is to be an integral part of the investigative process, to advise, assess, and warn of remaining uncertainties. The public needs to be informed and offer its support, so that the voice of statistics may be clearly heard in national life, over the cacophony of confusion and conflicting interests. This theme has been developed further in the framework of a proposed Ethics of Evidence, an approach for dealing with uncertainty in the context of contemporary culture (Miké 2003). The call for education and responsibility is its predominant message.
Birnbaum, Allan. (1962). "On the Foundations of Statistical Inference (with Discussion)." Journal of the American Statistical Association 57(298): 269–326.
Birnbaum, Allan. (1969). "Concepts of Statistical Evidence." In Philosophy, Science, and Method: Essays in Honor of Ernest Nagel, ed. Sidney Morgenbesser, Patrick Suppes, and Morton White. New York: St. Martin's Press.
Box, Joan Fisher. (1978). R. A. Fisher: The Life of a Scientist. New York: Wiley. Biography written by Fisher's daughter who had also been his research assistant.
Butler, Joseph. (1736). The Analogy of Religion, Natural and Revealed, to the Constitution and Course of Nature. London: Printed for James, John, and Paul Knapton.
Fisher, R. A. (1930). The Genetical Theory of Natural Selection. Oxford: Clarendon Press. 2nd edition, New York: Dover, 1958.
Fisher, R. A. (1974). "Science and Christianity: Faith Is Not Credulity." In Collected Papers of R. A. Fisher, Vol. 5 (1948–1962), ed. J. H. Bennett. Adelaide, Australia: University of Adelaide. A 1955 talk given in London on BBC radio.
Fisher, R. A. (1990). Statistical Methods, Experimental Design, and Scientific Inference, ed. J. H. Bennett. Oxford: Oxford University Press. Three classic works by a founder of modern statistics, published in one volume.
Galilei, Galileo. (1967 ). Dialogue concerning the Two Chief World Systems—Ptolemaic and Copernican, trans. Stillman Drake. 2nd edition. Berkeley and Los Angeles: University of California Press. Includes description of the bell-shaped curve of error, known subsequently as the normal distribution.
Gigerenzer, Gerd; Zeno Swijtink; Theodore Porter; et al. (1989). The Empire of Chance: How Probability Changed Science and Everyday Life. Cambridge, UK: Cambridge University Press. Summary of a two-volume work by a team of historians and philosophers of science, written for a general audience.
Hacking, Ian. (2001). An Introduction to Probability and Inductive Logic. Cambridge, UK: Cambridge University Press. Introductory textbook for students of philosophy, with many examples, explaining different interpretations of probability.
Howson, Colin, and Peter Urbach. (1993). Scientific Reasoning: The Bayesian Approach, 2nd edition. Chicago: Open Court.
Kotz, Samuel; Norman L. Johnson; and Campbell B. Read, eds. (1982–1999). Encyclopedia of Statistical Sciences. 9 vols. plus supp. and 3 update vols. New York: Wiley.
Kruskal, William H., and Judith M. Tanur, eds. (1978). International Encyclopedia of Statistics. 2 vols. New York: Free Press.
Kyburg, Henry E., Jr., and Mariam Thalos, eds. (2003). Probability Is the Very Guide of Life: The Philosophical Uses of Chance. Chicago: Open Court. Collection of essays by philosophers of probability.
Miké, Valerie. (2003). "Evidence and the Future of Medicine." Evaluation & the Health Professions 26(2): 127–152. Presents the Ethics of Evidence in the context of contemporary medicine and culture.
Miké, Valerie; Alfred N. Krauss; and Gail S. Ross. (1993). "Reflections on a Medical Innovation: Transcutaneous Oxygen Monitoring in Neonatal Intensive Care." Technology and Culture 34(4): 894–922.
Miké, Valerie; Alfred N. Krauss; and Gail S. Ross. (1996).
"Doctors and the Health Industry: A Case Study of Transcutaneous Oxygen Monitoring in Neonatal Intensive Care." Social Science & Medicine 42(9): 1247–1258.
Miké, Valerie; Alfred N. Krauss; and Gail S. Ross. (1998). "Responsibility for Clinical Innovation: A Case Study in Neonatal Medicine." Evaluation and the Health Professions 21(1): 3–26.
Pearson, Karl. (1991 ). The Grammar of Science. Bristol, UK: Thoemmes Press. 3rd edition, London: Adam and Charles Black, 1911. A classic work on the philosophy of science.
Porter, Theodore M. (1986). The Rise of Statistical Thinking, 1820–1900. Princeton, NJ: Princeton University Press. A general history, considering scientific and economic currents that gave rise to the field.
Savage, Leonard J. (1954). The Foundations of Statistics. New York: Wiley. 2nd edition, New York: Dover, 1972. A classic work by the advocate of personal probability.
Stigler, Stephen M. (1986). The History of Statistics: The Measurement of Uncertainty before 1900. Cambridge, MA: Harvard University Press, Belknap Press. A thoroughly researched history, with detailed discussion of the origin of statistical methods.
Stigler, Stephen M. (1999). Statistics on the Table: The History of Statistical Concepts and Methods. Cambridge, MA: Harvard University Press. A collection of essays, a sequel to the author's 1986 work.
Tanur, Judith M.; Frederick Mosteller; William H. Kruskal; et al., eds. (1972). Statistics: A Guide to the Unknown. San Francisco: Holden-Day. A collection of forty-four essays describing applications of statistics in everyday life, written for the general reader.
Taper, Mark L., and Subhash R. Lele, eds. (2004). The Nature of Scientific Evidence: Statistical, Philosophical, and Empirical Considerations. Chicago: University of Chicago Press. A collection of essays, with ecology used to illustrate problems in assessing scientific evidence.
Tukey, John W. (1977). Exploratory Data Analysis. Reading, MA: Addison-Wesley. Textbook for an approach championed by the author, a prominent statistician, with many simple examples that do not require use of a computer.
Wald, Abraham. (1971). Statistical Decision Functions, 2nd edition. New York: Chelsea Publishing. Classic work of the founder of the field.
Wald, Abraham. (2004 ). Sequential Analysis. New York: Dover. Another classic work by Wald, who also founded sequential analysis.
"The Cochrane Collaboration." Available from http://www.cochrane.org. Web site of the organization.