The articles under this heading provide an introduction to the field of statistics and to its history. The first article also includes a survey of the statistical articles in the encyclopedia. At the end of the second article there is a list of the biographical articles that are of relevance to statistics.
I. The FIELDWilliam H. Kruskal
II. The History Of Statistical METHODM. G. Kendall
A scientist confronted with empirical observations goes from them to some sort of inference, decision, action, or conclusion. The end point of this process may be the confirmation or denial of some complicated theory; it may be a decision about the next experiment to carry out; or it may simply be a narrowing of the presumed range for some constant of nature. (The end point may even be the conclusion that the observations are worthless.) An end point is typically accompanied by a statement, or at least by a feeling, of how sure the scientist is of his new ground.
These inferential leaps are, of course, never made only in the light of the immediate observations. There is always a body of background knowl-edge and intuition, in part explicit and in part tacit. It is the essence of science that a leap to a false position—whether because of poor observational data, misleading background, or bad leaping form —is sooner or later corrected by future research.
Often the leaps are made without introspection or analysis of the inferential process itself, as a skilled climber might step from one boulder to an-other on easy ground. On the other hand, the slope may be steep and with few handholds; before moving, one wants to reflect on direction, where one's feet will be, and the consequences of a slip.
Statistics is concerned with the inferential process, in particular with the planning and analysis of experiments or surveys, with the nature of observational errors and sources of variability that obscure underlying patterns, and with the efficient summarizing of sets of data. There is a fuzzy boundary, to be discussed below, between statistics and other parts of the philosophy of science.
Problems of inference from empirical data arise, not only in scientific activity, but also in everyday life and in areas of public policy. For example, the design and analysis of the 1954 Salk vaccine tests in the United States were based on statistical concepts of randomization and control. Both private and public economic decisions sometimes turn on the meaning and accuracy of summary figures from complex measurement programs: the unemployment rate, the rate of economic growth, a consumer price index. Sometimes a lack of statistical back-ground leads to misinterpretations of accident and crime statistics. Misinterpretations arising from in-sufficient statistical knowledge may also occur in the fields of military and diplomatic intelligence.
There is busy two-way intellectual traffic between statisticians and other scientists. Psychologists and physical anthropologists have instigated and deeply influenced developments in that branch of statistics called multivariate analysis; sociologists sometimes scold statisticians for not paying more attention to the inferential problems arising in surveys of human populations; some economists are at once consumers and producers of statistical methods.
Theoretical and applied statistics. Theoretical statistics is the formal study of the process leading from observations to inference, decision, or whatever be the end point, insofar as the process can be abstracted from special empirical contexts. This study is not the psychological one of how scientists actually make inferences or decisions; rather, it deals with the consequences of particular modes of inference or decision, and seeks normatively to find good modes in the light of explicit criteria.
Theoretical statistics must proceed in terms of a more or less formal language, usually mathematical, and in any specific area must make assumptions—weak or strong—on which to base the formal analysis. Far and away the most important mathematical language in statistics is that of probability, because most statistical thinking is in terms of randomness, populations, masses, the single event embedded in a large class of events. Even approaches like that of personal probability, in which single events are basic, use a highly probabilistic language. [See Probability.]
But theoretical statistics is not, strictly speaking, a branch of mathematics, although mathematical concepts and tools are of central importance in much of statistics. Some important areas of theoretical statistics may be discussed and advanced without recondite mathematics, and much notable work in statistics has been done by men with modest mathematical training. [For discussion of nonstatistical applications of mathematics in the social sciences, see, for example, Mathematics; Models, MATHEMATICAL; and the material on mathematical economics in Econometrics.]
Applied statistics, at least in principle, is the informed application of methods that have been theoretically investigated, the actual leap after the study of leaping theory. In fact, matters are not so simple. First, theoretical study of a statistical procedure often comes after its intuitive proposal and use. Second, there is almost no end to the possible theoretical study of even the simplest procedure. Practice and theory interact and weave together, so that many statisticians are practitioners one day (or hour) and theoreticians the next.
The art of applied statistics requires sensitivity to the ways in which theoretical assumptions may fail to hold, and to the effects that such failure may have, as well as agility in modifying and ex-tending already studied methods. Thus, applied statistics in the study of public opinion is concerned with the design and analysis of opinion surveys. The main branch of theoretical statistics used here is that of sample surveys, although other kinds of theory may also be relevant—for example, the theory of Markov chains may be useful for panel studies, where the same respondents are asked their opinions at successive times. Again, applied statistics in the study of learning includes careful design and analysis of controlled laboratory experiments, whether with worms, rats, or humans. The statistical theories that enter might be those of experimental design, of analysis of variance, or of quantal response. Of course, nonstatistical, substantive knowledge about the empirical field— public opinion, learning, or whatever—is essential for good applied statistics.
Statistics is a young discipline, and the number of carefully studied methods, although steadily growing, is still relatively small. In the applications of statistics, therefore, one usually reaches a point of balance between thinking of a specific problem in formal terms, which are rarely fully adequate (few problems are standard), and using methods that are not as well understood as one might hope. (For a stimulating, detailed discussion of this theme, see Tukey 1962, where the term “data analysis” is used to mean something like applied statistics.)
The word “statistics” is sometimes used to mean, not a general approach like the one I have outlined, but—more narrowly—the body of specific statistical methods, with associated formulas, tables, and traditions, that are currently understood and used. Other uses of the word are common, but they are not likely to cause confusion. In particular, “statistics” often refers to a set of numbers describing some empirical field, as when one speaks of the mortality statistics of France in 1966. Again, “a statistic” often means some numerical quantity computed from basic observations.
Variability and error; patterns. If life were stable, simple, and routinely repetitious, there would be little need for statistical thinking. But there would probably be no human beings to do statistical thinking, because sufficient stability and simplicity would not allow the genetic randomness that is a central mechanism of evolution. Life is not, in fact, stable or simple, but there are stable and simple aspects to it. From one point of view, the goal of science is the discovery and elucidation of these aspects, and statistics deals with some general methods of finding patterns that are hidden in a cloud of irrelevancies, of natural variability, and of error-prone observations or measurements.
Most statistical thinking is in terms of variability and errors in observed data, with the aim of reaching conclusions about obscured underlying patterns. What is meant by natural variability and by errors of measurement? First, distinct experimental and observational units generally have different characteristics and behave in different ways: people vary in their aptitudes and skills; some mice learn more quickly than others. Second, when a quantity or quality is measured, there is usually an error of measurement, and this introduces a second kind of dispersion with which statistics deals: not only will students taught by a new teaching method react in different ways, but also the test that determines how much they learn cannot be a perfect measuring instrument; medical blood-cell counts made independently by two observers from the same slide will not generally be the same.
In any particular experiment or survey, some sources of variability may usefully be treated as constants; for example, the students in the teaching experiment might all be chosen from one geo-graphical area. Other sources of variability might be regarded as random—for example, fluctuations of test scores among students in an apparently homogeneous group. More complex intermediate forms of variability are often present. The students might be subdivided into classes taught by different teachers. Insofar as common membership in a class with the same teacher has an effect, a simple but important pattern of dependence is present.
The variability concept is mirrored in the basic notion of a population from which one samples. The population may correspond to an actual population of men, mice, or machines; or it may be conceptual, as is a population of measurement errors. A population of numerical values defines a distribution, roughly speaking, and the notion of a random variable, fluctuating in its value according to this distribution, is basic. For example, if a student is chosen at random from a school and given a reading-comprehension test, the score on the test— considered in advance of student choice and test administration—is a random variable. Its distribution is an idealization of the totality of such scores if student choice and testing could be carried out a very large number of times without any changes because of the passage of time or because of inter-actions among students. [For a more precise formulation, see Probability.]
Although much statistical methodology may be regarded as an attempt to understand regularity through a cloud of obscuring variability, there are many situations in which the variability itself is the object of major interest. Some of these will be discussed below.
Planning . An important topic in statistics is that of sensible planning, or design, of empirical studies. In the above teaching example, some of the more formal aspects of design are the following: How many classes to each teaching method? How many students per class to be tested? Should variables other than test scores be used as well— for example, intelligence scores or personality ratings?
The spectrum of design considerations ranges from these to such subject-matter questions as the following: How should the teachers be trained in a new teaching method? Should teachers be chosen so that there are some who are enthusiastic and some who are skeptical of the new method? What test should be used to measure results?
No general theory of design exists to cover all, or even most, such questions. But there do exist many pieces of theory, and—more important—a valuable statistical point of view toward the planning of experiments.
History. The history of the development of statistics is described in the next article [see Statistics, article on THE HISTORY OF STATISTICAL METHOD]. It stresses the growth of method and theory; the history of statistics in the senses of vital statistics, government statistics, censuses, economic statistics, and the like, is described in relevant separate articles [see Census; Cohort analysis; Economic data; Government statistics; Life tables; MORTALITY; Population; Sociology, article on THE EARLY HISTORY OF SOCIAL RESEARCH; Vital statistics]. Two treatments of the history of statistics with special reference to the social sciences are by Lundberg (1940) and Lazarsfeld (1961).
It is important to distinguish between the history of the word “statistics” and the history of statistics in the sense of this article. The word “statistics” is related to the word “state,” and originally the activity called statistics was a systematic kind of comparative political science. This activity gradually centered on numerical tables of economic, demographic, and political facts, and thus “statistics” came to mean the assembly and analysis of numerical tables. It is easy to see how the more philosophical meaning of the word, used in this article, gradually arose. Of course, the abstract study of inference from observations has a long history under various names—such as the theory of errors and probability calculus—and only comparatively recently has the word “statistics” come to have its present meaning. Even now, grotesque misunderstandings abound—for example, thinking of statistics as the routine compilation of uninteresting sets of numbers, or thinking of statistics as mainly a collection of mathematical expressions.
Functions. My description of statistics is, of course, a personal one, but one that many statisticians would generally agree with. Almost any characterization of statistics would include the following general functions:
(1) to help in summarizing and extracting relevant information from data, that is, from observed measurements, whether numerical, classificatory, ordinal, or whatever;
(2) to help in finding and evaluating patterns shown by the data, but obscured by inherent random variability;
(3) to help in the efficient design of experiments and surveys;
(4) to help communication between scientists (if a standard procedure is cited, many readers will understand without need of detail).
There are some other roles that activities called “statistical” may, unfortunately, play. Two such misguided roles are
(1) to sanctify or provide seals of approval (one hears, for example, of thesis advisers or journal editors who insist on certain formal statistical procedures, whether or not they are appropriate );
(2) to impress, obfuscate, or mystify (for ex-ample, some social science research papers contain masses of undigested formulas that serve no pur-pose except that of indicating what a bright fellow the author is).
Some consulting statisticians use more or less explicit declarations of responsibility, or codes, in their relationships with “clients,” to protect themselves from being placed in the role of sanctifier. It is a good general rule that the empirical scientist use only statistical methods whose rationale is clear to him, even though he may not wish or be able to follow all details of mathematical derivation.
A general discussion, with an extensive bibliography, of the relationship between statistician and client is given by Deming (1965). In most applied statistics, of course, the statistician and the client are the same person.
An example. To illustrate these introductory comments, consider the following hypothetical experiment to study the effects of propaganda. Suppose that during a national political campaign in the United States, 100 college students are exposed to a motion picture film extolling the Democratic candidate, and 100 other students (the so-called control group) are not exposed to the film. Then all the students are asked to name their preferred candidate. Suppose that 95 of the first group prefer the Democratic candidate, while only 80 of the second group have that preference. What kinds of conclusions might one want about the effectiveness of the propaganda?
(There are, of course, serious questions about how the students are chosen, about the details of film and questionnaire administration, about possible interaction between students, about the artificiality of the experimental arrangement, and so on. For the moment, these questions are not discussed, although some will be touched on below.)
If the numbers preferring the Democratic candi-date had been 95 and 5, a conclusion that a real effect was present would probably be reached without much concern about inferential methodology (although methodological questions would enter any attempt to estimate the magnitude of the effect). If, in contrast, the numbers had both been 95, the conclusion “no effect observed” would be immediate, although one might wonder about the possibility of observing the tie by chance even if an underlying effect were present. But by and large it is the middle ground that is of greatest statistical interest: for example, do 95 and 80 differ enough in the above context to suggest a real effect?
The simplest probability model for discussing the experiment is that of analogy with two weighted coins, each tossed 100 times. A toss of the coin corresponding to the propaganda is analogous to selecting a student at random, showing him the motion picture, and then asking him which candi-date he prefers. A toss of the other coin corresponds to observing the preference of a student in the control group. “Heads” for a coin is analogous, say, to preference for the Democratic candidate. The hypothetical coins are weighted so that their probabilities of showing heads are unknown (and in general not one-half), and interest lies in the difference between these two unknown heads probabilities.
Suppose that the students are regarded as chosen randomly from some large population of students, and that for a random propagandized student there is a probability pA of Democratic preference, whereas a random nonpropagandized student has probability pB of Democratic preference. Suppose further that the individual observed expressions of political preference are statistically independent; roughly speaking, this means that, even if pA and pB were known, and it were also known which groups the students are in, prediction of one student's response from another's would be no better than prediction without knowing the other’s response. (Lack of independence might arise in various ways, for example, if the students were able to discuss politics among them-selves during the interval between the motion picture and the questionnaire.) Under the above conditions, the probabilities of various outcomes of the experiment, for any hypothetical values of pA and pB, may be computed in standard ways.
In fact, the underlying quantities of interest, the so-called parameters, pA and p,B, are not known; if they were, there would be little or no reason to do the experiment. Nonetheless, it is of fundamental importance to think about possible values of the parameters and to decide what aspects are of primary importance. For example, is pA — pB basic? or perhaps pA/pB? or, again, perhaps (1 — pB)/(l — PA) the ratio of probabilities of an expressed Republican preference (assuming that preference is between Democratic and Republican candidates only)? The choice makes a difference: if pA = .99 and pB = .95, use of a statistical procedure sensitive to Pa — Pn (= .04 in this example) might suggest that there is little difference between the parameters, whereas a procedure sensitive to (1 — pB)/(1—pA) (in the example, .05/.01 = 5) might show a very large effect. These considerations are, unhappily, often neglected, and such neglect may result in a misdirected or distorted analysis. In recent discussions of possible relationships between cigarette smoking and lung cancer, controversy arose over whether ratios or differences of mortality rates were of central importance. The choice may lead to quite different conclusions.
Even apparently minor changes in graphical presentation may be highly important in the course of research. B. F. Skinner wrote of the importance to his own work of shifting from a graphical record that simply shows the times at which events occur (motion of a rat in a runway) to the logically equivalent cumulative record that shows the number of events up to each point of time. In the latter form, the rate at which events take place often becomes visually clear (see Skinner 1956, p. 225). This general area is called descriptive statistics, perhaps with the prefix “neo.” [See Statistics, DESCRIPTIVE; Graphic presentation; Tabular presentation.]
As suggested above, the assumption of statistical independence might well be wrong for various reasons. One is that the 100 students in each group might be made up of five classroom groups that hold political discussions. Other errors in the assumptions are quite possible. For example, the sampling of students might not be at random from the same population: there might be self-selection, perhaps with the more enterprising students attending the motion picture. Another kind of deviation from the original simple assumptions (in this case planned) might come from balancing such factors as sex and age by stratifying according to these factors and then selecting at random within strata.
When assumptions are in doubt, one has a choice of easing them (sometimes bringing about a more complex, but a more refined, analysis) or of studying the effects of errors in the assumptions on the analysis based on them. When these effects are small, the errors may be neglected. This topic, sometimes called robustness against erroneous assumptions of independence, distributional form, and so on, is difficult and important. [See Errors, article on EFFECTS OF ERRORS IN STATISTICAL ASSUMPTIONS.]
Another general kind of question relates to the design of the experiment. Here, for example, it may be asked in advance of the experiment whether groups of 100 students are large enough (or perhaps unnecessarily large); whether there is merit in equal group sizes; whether more elaborate structures—perhaps allowing explicitly for sex and age —are desirable; and so on. Questions of this kind may call for formal statistical reasoning, but answers must depend in large part on substantive knowledge. [See Experimental design.]
It is important to recognize that using better measurement methods or recasting the framework of the experiment may be far more important aspects of design than just increasing sample size. As B. F. Skinner said,
.. . we may reduce the troublesome variability by changing the condition of the experiment. By discovering, elaborating, and fully exploiting every relevant variable, we may eliminate in advance of measurement the individual differences which obscure the difference under analysis. (1956, p. 229)
In the propaganda experiment at hand, several such approaches come to mind. Restricting oneself to subjects of a given sex, age, kind of background, and so on, might bring out the effects of propaganda more clearly, perhaps at the cost of reduced generality for the results. Rather than by asking directly for political preference, the effects might be better measured by observing physiological reactions to the names or pictures of the candidates, or by asking questions about major political issues. It would probably be useful to try to follow the general principle of having each subject serve as his own control: to observe preference both before and after the propaganda and compare the numbers of switches in the two possible directions. (Even then, it would be desirable to keep the control group—possibly showing it a presumably neutral film—in order to find, and try to correct for, artificial effects of the experimental situation.)
Such questions are often investigated in side studies, ancillary or prior to the central one, and these pilot or instrumental studies are very important.
For the specific simple design with two groups, and making the simple assumptions, consider (conceptually in advance of the experiment) the two observed proportions of students expressing preference for the Democratic candidate, P,A and PB, corresponding respectively to the propagandized and the control groups. These two random variables, together with the known group sizes, contain all relevant information from the experiment itself, in the sense that only the proportions, not the particular students who express one preference or another, are relevant. The argument here is one of sufficiency [see Sufficiency, where the argument and its limitations are discussed]. In practice the analysis might well be refined by looking at sex of stu-dent and other characteristics, but for the moment only the simple structure is considered.
In the notational convention to be followed here, random variables (here PA and PB) are denoted by capital letters, and the corresponding parameters (here pA and pB) by parallel lower-case letters.
Estimation. The random variables lOOPA and 100PB have binomial probability distributions depending on pA, pB, and sample sizes, in this case 100 for each sample [see Distributions, Statistical, article on SPECIAL DISCRETE DISTRIBUTIONS]. The fundamental premise of most statistical methods is that pA and pB should be assessed on the basis of PA and PB in the light of their possible probability distributions. One of the simplest modes of assessment is that of point estimation, in which the result of the analysis for the example consists of two numbers (depending on the observations) that are regarded as reasonable estimates of pA and pB[see Estimation, article on POINT ESTIMATION]. In the case at hand, the usual (not the only) estimators are just PA and PB themselves, but even slight changes in viewpoint can make matters less clear. For example, suppose that a point estimator were wanted for PA/PB, the ratio of the two underlying probability parameters. It is by no means clear that PA/PB would be a good point estimator for this ratio.
Point estimators by themselves are usually inadequate in scientific practice, for some indication of precision is nearly always wanted. (There are, however, problems in which point estimators are, in effect, of primary interest: for example, in a hand-book table of natural constants, or in some aspects of buying and selling.) An old tradition is to follow a point estimate by a “±” (plus-or-minus sign) and a number derived from background experience or from the data. The intent is thus to give an idea of how precise the point estimate is, of the spread or dispersion in its distribution. For the case at hand, one convention would lead to stating, as a modified estimator for pA,
that is, the point estimator plus or minus an estimator of its standard deviation, a useful measure of dispersion. (The divisor, 100, is the sample size.) Such a device has the danger that there may be misunderstanding about the convention for the number following “±”; in addition, interpretation of the measure of dispersion may not be direct unless the distribution of the point estimator is fairly simple; the usual presumption is that the distribution is approximately of a form called normal [see Distributions, Statistical, article on Special Continuous distributions].
To circumvent these problems, a confidence interval is often used, rather than a point estimator [see Estimation, article on Confidence Intervals And regions]. The interval is random (before the experiment), and it is so constructed that it covers the unknown true value of the parameter to be esti-mated with a preassigned probability, usually near 1. The confidence interval idea is very useful, although its subtlety has often led to misunderstandings in which the interpretation is wrongly given in terms of a probability distribution for the parameter.
There are, however, viewpoints in which this last sort of interpretation is valid, that is, in which the parameters of interest are themselves taken as random. The two most important of these viewpoints are Bayesian inference and fiducial inference [see Bayesian inference; Fiducial inference; Probability, article on INTERPRETATIONS]. Many variants exist, and controversy continues as the philosophical and practical aspects of these approaches are debated [see Likelihoodfor a discussion of related issues].
Hypothesis testing. In the more usual viewpoint another general approach is that of hypothesis (or significance) testing [see Hypothesis testing; Significance, tests of]. This kind of procedure might be used if it is important to ascertain whether pA and pB are the same or not. Hypothesis testing has two aspects: one is that of a two-decision procedure leading to one of two actions with known controlled chances of error. This first approach generalizes to that of decision theory and has generated a great deal of literature in theoretical statistics [see Decision theory]. In this theory of decision functions, costs of wrong decisions, as well as costs of observation, are explicitly considered. Decision theory is related closely to game theory, and less closely to empirical studies of decision making [see Game theory; Decision making].
The second aspect of hypothesis testing—and the commoner—is more descriptive. From its view-point a hypothesis test tells how surprising a set of observations is under some null hypothesis at test. In the example, one would compute how probable it is under the null hypothesis pA— pB that the actual results should differ by as much as or more than the observed 95 per cent and 80 per cent. (Only recently has it been stressed that one would also do well to examine such probabilities under a variety of hypotheses other than a traditional null one.) Sometimes, as in the propaganda example, it is rather clear at the start that some effect must exist. In other cases, for example, in the study of parapsychology, there may be serious question of any effect whatever.
There are other modes of statistical analysis, for example, classification, selection, and screening [see Multivariate Analysis, article on Classification And discrimination; Screening And selection]. In the future there is likely to be investigation of a much wider variety of modes of analysis than now exists. Such investigation will mitigate the difficulty that standard modes of analysis, like hypothesis testing, often do not exactly fit the inferential needs of specific real problems. The standard modes must usually be regarded as ap-proximate, and used with caution.
One pervasive difficulty of this kind surrounds what might be called exploration of data, or datadredging. It arises when a (usually sizable) body of data from a survey or experiment is at hand but either the analyst has no specific hypotheses about kinds of orderliness in the data or he has a great many. He will naturally wish to explore the body of data in a variety of ways with the hope of finding orderliness: he will try various graphical presentations, functional transformations, perhaps factor analysis, regression analysis, and other de-vices; in the course of this, he will doubtless carry out a number of estimations, hypothesis tests, confidence interval computations, and so on. A basic difficulty is that any finite body of data, even if wholly generated at random, will show orderliness of some kind if studied long and hard enough. Parallel to this, one must remember that most theoretical work on hypothesis tests, confidence intervals, and other inferential procedures looks at their behavior in isolation, and supposes that the procedures are selected in advance of data inspection. For example, if a hypothesis test is to be made of the null hypothesis that mean scores of men and women on an intelligence test are equal, and if a one-sided alternative is chosen after the fact in the same direction as that shown by the data, it is easy to see that the test will falsely show statistical significance, when the null hypothesis is true, twice as often as the analyst might expect.
On the other hand, it would be ridiculously rigid to refuse to use inferential tools in the exploration of data. Two general mitigating approaches are (1) the use of techniques (for example, multiple comparisons) that include explicit elements of exploration in their formulation [see Linear Hypotheses, article on Multiple comparisons], and (2) the splitting of the data into two parts at random, using one part for exploration with no holds barred and then carrying out formal tests or other inferential procedures on the second part.
This area deserves much more research. Selvin and Stuart have given a statement of present opinions, and of practical advice [see Selvin & Stuart 1966; see also Survey analysis; Scalingand Statistical Analysis, Special Problems Of, article on Transformations Of Data, are also relevant].
Breadth of inference. Whatever the mode of analysis, it is important to remember that the inference to which a statistical method directly relates is limited to the population actually experimented upon or surveyed. In the propaganda example, if the students are sampled from a single university, then the immediate inference is to that university only. Wider inferences—and these are usually wanted—presumably depend on subject-matter background and on intuition. Of course, the breadth of direct inference may be widened, for example, by repeating the study at different times, in different universities, in different areas, and so on. But, except in unusual cases, a limit is reached, if only the temporal one that experiments cannot be done now on future students.
Thus, in most cases, a scientific inference has two stages: the direct inference from the sample to the sampled population, and the indirect inference from the sampled population to a much wider, and usually rather vague, realm. That is why it is so important to try to check findings in a variety of contexts, for example, to test psychological generalizations obtained from experiments within one culture in some very different culture.
Formalization and precise theoretical treatment of the second stage represent a gap in present-day statistics (except perhaps for adherents of Bayesian methodology), although many say that the second step is intrinsically outside statistics. The general question of indirect inference is often mentioned and often forgotten; an early explicit treatment is by von Bortkiewicz (1909); a modern discussion in the context of research in sexual behavior is given by Cochran, Mosteller, and Tukey (1954, pp. 18-19, 21-22, 30-31).
An extreme case of the breadth-of-inference problem is represented by the case study, for example, an intensive study of the history of a single psycho-logically disturbed person. Indeed, some authors try to set up a sharp distinction between the method of case studies and what they call statistical methods. I do not feel that the distinction is very sharp. For one thing, statistical questions of measurement reliability arise even in the study of a single person. Further, some case studies, for example, in anthropology, are of a tribe or some other group of individuals, so that traditional sampling questions might well arise in drawing inferences about the single (collective) case.
Proponents of the case study approach emphasize its flexibility, its importance in attaining subjective insight, and its utility as a means of conjecturing interesting theoretical structures. If there is good reason to believe in small relevant intercase variability, then, of course, a single case does tell much about a larger population. The investigator, however, has responsibility for defending an assumption about small intercase variability. [Further discussion will be found in Interviewing; Observation, article on SOCIAL OBSERVATION AND SOCIAL CASE STUDIES.]
Linear hypotheses. One way of classifying statistical topics is in terms of the kind of assumptions made, that is—looking toward applications—in terms of the structure of anticipated experiments or surveys for which the statistical methods will be used. The propaganda example, in which the central quantities are two proportions with integral numerators and denominators, falls under the general topic of the analysis of counted or qualitative data; this topic includes the treatment of so-called chi-square tests. Such an analysis would also be applicable if there were more than two groups, and it can be extended in other directions. [See Counted data.]
If, in the propaganda experiment, instead of proportions expressing one preference or the other, numerical scores on a multiquestion test were used to indicate quantitatively the leaning toward a candidate or political party, then the situation might come under the general rubric of linear hypotheses. To illustrate the ideas, suppose that there were more than two groups, say, four, of which the first was exposed to no propaganda, the second saw a motion picture, the third was given material to read, and the fourth heard a speaker, and that the scores of students under the four conditions are to be compared. Analysis-of-variance methods (many of which may be regarded as special cases of regression methods) are of central importance for such a study [see Linear Hypotheses, articles on Analysis Of Varianceand REGRESSION]. Multiple comparison methods are often used here, although —strictly speaking—they are not restricted to the analysis-of-variance context [see Linear Hypotheses, article on Multiple comparisons].
If the four groups differed primarily in some quantitative way, for example, in the number of sessions spent watching propaganda motion pictures, then regression methods in a narrower sense might come into play. One might, for example, suppose that average test score is roughly a linear function of number of motion picture sessions, and then center statistical attention on the constants (slope and intercept) of the linear function.
Multivariate statistics. “Regression” is a word with at least two meanings. A meaning somewhat different from, and historically earlier than, that described just above appears in statistical theory for multivariate analysis, that is, for situations in which more than one kind of observation is made on each individual or unit that is measured [see Multivariate analysis], For example, in an educational experiment on teaching methods, one might look at scores not only on a spelling examination, but on a grammar examination and on a reading-comprehension examination as well. Or in a physical anthropology study, one might measure several dimensions of each individual.
The simplest part of multivariate analysis is concerned with association between just two random variables and, in particular, with the important concept of correlation [see Statistics, Descriptive, article on ASSOCIATION; Multivariate Analysis, articles on CORRELATION]. These ideas extend to more than two random variables, and then new possibilities enter. An important one is that of partial association: how are spelling and grammar scores associated if reading comprehension is held fixed? The partial association notion is important in survey analysis, where a controlled experiment is often impossible [see Survey analysis; EXPERI-Mental Design, article on QUASI-Experimental design].
Multivariate analysis also considers statistical methods bearing on the joint structure of the means that correspond to the several kinds of observations, and on the whole correlation structure.
Factor analysis falls in the multivariate area, but it has a special history and a special relationship with psychology [see Factor analysis]. Factor-analytic methods try to replace a number of measurements by a few basic ones, together with residuals having a simple probability structure. For example, one might hop6 that spelling, grammar, and reading-comprehension abilities are all proportional to some quantity not directly observable, perhaps dubbed “linguistic skill,” that varies from person to person, plus residuals or deviations that are statistically independent.
The standard factor analysis model is one of a class of models generated by a process called mixing of probability distributions [see Distributions, Statistical, article on Mixtures Of distributions]. An interesting model of this general sort, but for discrete, rather than continuous, observations, is that of latent structure [see Latent structure].
Another important multivariate topic is classification and discrimination, which is the study of how to assign individuals to two or more groups on the basis of several measurements per individual [see Multivariate Analysis, article on Classification And discrimination]. Less well understood, but related, is the problem of clustering, or numerical taxonomy: what are useful ways for forming groups of individuals on the basis of several measurements on each? [See Clustering.]
Time series. Related to multivariate analysis, because of its stress on modes of statistical dependence, isJ the large field of time series analysis, sometimes given a title that includes the catchy phrase “stochastic processes.” An observed time series may be regarded as a realization of an under-lying stochastic process [see Time series]. The simplest sort of time series problem might arise when for each child in an educational experiment there is available a set of scores on spelling tests given each month during the school year. More difficult problems arise when there is no hope of observing more than a single series, for example, when the observations are on the monthly or yearly prices of wheat. In such cases—so common in economics—stringent structural assumptions are required, and even then analysis is not easy.
This encyclopedia's treatment of time series begins with a general overview, oriented primarily toward economic series. The overview is followed by a discussion of advanced methodology, mainly that of spectral analysis, which treats a time series as something like a radio signal that can be de-composed into subsignals at different frequencies, each with its own amount of energy. Next comes a treatment of cycles, with special discussion of how easy it is to be trapped into concluding that cycles exist when in fact only random variation is present. Finally, there is a discussion of the important technical problem raised by seasonal variation, and of adjustment to remove or mitigate its effect, The articles on business cycles should also be consulted [see Business cycles].
The topic of Markov chains might have been included under the time series category, but it is separate [see Markov chains]. The concept of a Markov chain is one of the simplest and most useful ways of relaxing the common assumption of independence. Methods based on the Markov chain idea have found application in the study of panels (for public opinion, budget analysis, etc.), of labor mobility, of changes in social class between generations, and so on [see, for example, Panel studies; Social mobility].
Sample surveys and related topics. The subject of sample surveys is important, both in theory and practice [see Sample surveys]. It originated in connection with surveys of economic and social characteristics of human populations, when samples were used rather than attempts at full coverage. But the techniques of sample surveys have been of great use in many other areas, for example in the evaluation of physical inventories of indus-trial equipment. The study of sample surveys is closely related to most of the other major fields of statistics, in particular to the design of experiments, but it is characterized by its emphasis on finite populations and on complex sampling plans.
Most academically oriented statisticians who think about sample surveys stress the importance of probability sampling—that is, of choosing the units to be observed by a plan that explicitly uses random numbers, so that the probabilities of possible samples are known. On the other hand, many actual sample surveys are not based upon probability sampling [for a discussion of the central issues of this somewhat ironical discrepancy, see Sample surveys, article on Nonprobability sampling].
Random numbers are important, not only for sample surveys, but for experimental design generally, and for simulation studies of many kinds [see Random numbers; Simulation].
An important topic in sample surveys (and, for that matter, throughout applied statistics) is that of nonsampling errors [see Errors, article on NON-Sampling errors]. Such errors stem, for example, from nonresponse in public opinion surveys, from observer and other biases in measurement, and from errors of computation. Interesting discussions of these problems, and of many others related to sampling, are given by Cochran, Mosteller, and Tukey (1954).
Sociologists have long been interested in survey research, but with historically different emphases from those of statisticians [see Survey analysis; INTERVIEWING]. The sociological stress has been much less on efficient design and sampling variation and much more on complex analyses of highly multivariate data. There is reason to hope that workers in these two streams of research are coming to understand each other's viewpoint.
Nonparametric analysis and related topics. I re-marked earlier that an important area of study is robustness, the degree of sensitivity of statistical methods to errors in assumptions. A particular kind of assumption error is that incurred when a special distributional form, for example, normality, is assumed when it does not in fact obtain. To meet this problem, one may seek alternate methods that are insensitive to form of distribution, and the study of such methods is called nonparametric analysis or distribution-free statistics [see Nonparametric statistics]. Such procedures as the sign test and many ranking methods fall into the nonparametric category.
For example, suppose that pairs of students— matched for age, sex, intelligence, and so on—-form the experimental material, and that for each pair it is determined entirely at random, as by the throw of a fair coin, which member of the pair is exposed to one teaching method (A) and which to another (B), After exposure to the assigned methods, the students are given an examination; a pair is scored positive if the method A student has the higher score, negative if the method B student has. If the two methods are equally effective, the number of positive scores has a binomial distribution with basic probability 1/2. If, however, method A is superior, the basic probability is greater than 1/2; if method B is superior, less than 1/2. The number of observed positives provides a simple test of the hypothesis of equivalence and a basis for estimating the amount of superiority that one of the teaching methods may have. (The above design is, of course, only sensible if matching is possible for most of the students.)
The topic of order statistics is also discussed in one of the articles on nonparametric analysis, although order statistics are at least as important for procedures that do make sharp distributional assumptions [see Nonparametric statistics, article on Order statistics]. There is, of course, no sharp boundary line for distribution-free procedures. First, many procedures based on narrow distributional assumptions turn out in fact to be robust, that is, to maintain some or all of their characteristics even when the assumptions are relaxed. Second, most distribution-free procedures are only partly so; for example, a distribution-free test will typically be independent of distributional form as regards its level of significance but not so as regards power (the probability of rejecting the null hypothesis when it is false). Again, most nonparametric procedures are nonrobust against dependence among the observations.
Nonparametric methods often arise naturally when observational materials are inherently non-metric, for example, when the results of an experiment or survey provide only rankings of test units by judges.
Sometimes the form of a distribution is worthy of special examination, and goodness-of-fit procedures are used [see Goodness of FIT]. For example, a psychological test may be standardized to a particular population so that test scores over the population have very nearly a unit-normal distribution. If the test is then administered to the individuals of a sample from a different population, the question may arise of whether the score distribution for the different population is still unit normal, and a goodness-of-fit test of unit-normality may be performed. More broadly, an analogous test might be frarned to test only normality, without specification of a particular normal distribution.
Some goodness-of-fit procedures, the so-called chi-square ones, may be regarded as falling under the counted-data rubric [see Counted data]. Others, especially when modified to provide confidence bands for an entire distribution, are usually studied under the banner of nonparametric analysis.
Dispersion. The study of dispersion, or variability, is a topic that deserves more attention than it often receives [see Variances, Statistical study OF]. For example, it might be of interest to compare several teaching methods as to the resulting heterogeneity of student scores. A particular method might give rise to a desirable average score by increasing greatly the scores of some students while leaving other students’ scores unchanged, thereby giving rise to great heterogeneity. Clearly, such a method has different consequences and applications than one that raises each student’s score by about the same amount.
(Terminology may be confusing here. The traditional topic of analysis of variance deals in substantial part with means, not variances, although it does so by looking at dispersions among the means.)
Design. Experimental design has already been mentioned. It deals with such problems as how many observations to take for a given level of accuracy, and how to assign the treatments or factors to experimental units. For example, in the study of teaching methods, the experimental units may be school classes, cross-classified by grade, kind of school, type of community, and the like. Experimental design deals with formal aspects of the structure of an experimental layout; a basic principle is that explicit randomization should be used in assigning “treatments” (here methods of teaching) to experimental units (here classes). Some-times it may be reasonable to suppose that randomization is inherent, supplied, as it were, by nature; but more often it is important to use so-called random numbers. Controversy centers on situations in which randomization is deemed impractical, un-ethical, or even impossible, although one may sometimes find clever ways to introduce randomization in cases where it seems hopeless at first glance. When randomization is absent, a term like “quasi experiment” may be used to emphasize its absence, and a major problem is that of obtaining as much protection as possible against the sources of bias that would have been largely eliminated by the unused randomization [see Experimental design, article on QUASI-Experimental design].
An important aspect of the design of experiments is the use of devices to ensure both that a (human) subject does not know which experimental treatment he is subjected to, and that the investigator who is measuring or observing effects of treatments does not know which treatments particular observed individuals have had. When proper precautions are taken along these two lines, the experiment is called double blind. Many experimental programs have been vitiated by neglect of these precautions. First, a subject who knows that he is taking a drug that it is hoped will improve his memory, or reduce his sensitivity to pain, may well change his behavior in response to the knowledge of what is expected as much as in physiological response to the drug itself. Hence, whenever possible, so-called placebo treatments (neutral but, on the surface, in-distinguishable from the real treatment) are administered to members of the control group. Second, an investigator who knows which subjects are having which treatments may easily, and quite un-consciously, have his observations biased by pre-conceived opinions. Problems may arise even if the investigator knows only which subjects are in the same group. Assignment to treatment by the use of random numbers, and random ordering of individuals for observation, are important devices to ensure impartiality.
The number of observations is traditionally regarded as fixed before sampling. In recent years, however, there have been many investigations of sequential designs in which observations are taken in a series (or in a series of groups of observations), with decisions made at each step whether to take further observations or to stop observing and turn to analysis [see Sequential analysis].
In many contexts a response (or its average value) is a function of several controlled variables. For example, average length of time to relearn the spellings of a list of words may depend on the number of prior learning sessions and the elapsed period since the last learning session. In the study of response surfaces, the structure of the dependence (thought of as the shape of a surface) is investi-gated by a series of experiments, typically with special interest in the neighborhood of a maximum or minimum [see Experimental design, article on Response surfaces].
Philosophy. Statistics has long had a neighborly relation with philosophy of science in the epistemo-logical city, although statistics has usually been more modest in scope and more pragmatic in out-look. In a strict sense, statistics is part of philosophy of science, but in fact the two areas are usually studied separately.
What are some problems that form part of the philosophy of science but are not generally regarded as part of statistics? A central one is that of the formation of scientific theories, their careful statement, and their confirmation or degree of confirmation. This last is to be distinguished from the narrower, but better understood, statistical concept of testing hypotheses. Another problem that many statisticians feel lies outside statistics is that of the gap between sampled and target population.
There are other areas of scientific philosophy that are not ordinarily regarded as part of statistics. Concepts like explanation, causation, operationalism, and free will come to mind.
A classic publication dealing with both statistics and scientific philosophy is Karl Pearson's Grammar of Science (1892). Two more recent such publications are Popper's Logic of Scientific Discovery (1935) and Braithwaite's Scientific Explanation (1953). By and large, nowadays, writers calling themselves statisticians and those calling themselves philosophers of science often refer to each other, but communication is restricted and piece-meal. [See Science, article on The philosophy OF SCIENCE; See also CAUSATION; POWER; PREDICTION; Scientific explanation.]
Measurement is an important topic for statistics, and it might well be mentioned here because some aspects of measurement are clearly philosophical. Roughly speaking, measurement is the process of assigning numbers (or categories) to objects on the basis of some operation. A measurement or datum is the resulting number (or category). But what is the epistemological underpinning for this concept? Should it be broadened to include more general kinds of data than numbers and categories? What kind of operations should be considered?
In particular, measurement scales are important, both in theory and practice. It is natural to say of one object that it is twice as heavy as another (in pounds, grams, or whatever—the unit is immaterial). But it seems silly to say that one object has twice the temperature of another in any of the everyday scales of temperature (as opposed to the absolute scale), if only because the ratio changes when one shifts, say, from Fahrenheit to Centi-grade degrees. On the other hand, it makes sense to say that one object is 100 degrees Fahrenheit hotter than another. Some measurements seem to make sense only insofar as they order units, for example, many subjective rankings; and some measurements are purely nominal or categorical, for example, country of birth. Some measurements are inherently circular, for example, wind direction or time of day. There has been heated discussion of the question of the meaningfulness or legitimacy of arithmetic manipulations of various kinds of measurements; does it make sense, for example, to average measurements of subjective loudness if the individual measurements give information only about ordinal relationships?
The following are some important publications that deal with measurement and that lead to the relevant literature at this date: Churchman and Ratoosh (1959); Coombs (1964); Pf anzagl (1959); Adams, Fagot, and Robinson (1965); Torgerson (1958); Stevens (1946); Suppes and Zinnes (1963). [See Statistics, Descriptive; also relevant are Psychometrics; Scaling; Utility.]
Communication and fallacies. There is an art of communication between statistician and non-statistician scientist: the statistician must be al-ways aware that the nonstatistician is in general not directly interested in technical minutiae or in the parochial jargon of statistics. In the other direction, consultation with a statistician often loses effectiveness because the nonstatistician fails to mention aspects of his work that are of statistical relevance. Of course, in most cases scientists serve as their own statisticians, in the same sense that people, except for hypochondriacs, serve as their own physicians most of the time.
Statistical fallacies are often subtle and may be committed by the most careful workers. A study of such fallacies has intrinsic interest and also aids in mitigating the communication problem just mentioned [see Fallacies, STATISTICAL; see also Errors, article on Nonsampling errors].
If statistics is defined broadly, in terms of the general study of the leap from observations to inference, decision, or whatever, then one can hardly quarrel with the desirability of a study so embracingly characterized. Criticisms of statistics, there-fore, are generally in terms of a narrower characterization, often the kind of activity named “statistics” that the critic sees about him. If, for example, a professor in some scientific field sees colleagues publishing clumsy analyses that they call statistical, then the professor may understand-ably develop a negative attitude toward statistics. He may not have an opportunity to learn that the subject is broader and that it may be used wisely, elegantly, and effectively.
Criticisms of probability in statistics. Some criticisms, in a philosophical vein, relate to the very use of probability models in statistics. For example, some writers have objected to probability because of a strict determinism in their Weltanschauung. This view is rare nowadays, with the success of highly probabilistic quantum methods in physics, and with the utility of probability models for clearly deterministic phenomena, for example, the effect of rounding errors in complex digital calculations. The deterministic critic, however, would probably say that quantum mechanics and probabilistic analysis of rounding errors are just temporary expedients, to be replaced later by nonprobabilistic approaches. For example, Einstein wrote in 1947 that
. . . the statistical interpretation [as in quantum mechanics] . . . has a considerable content of truth.
Yet I cannot seriously believe it because the theory is inconsistent with the principle that physics has to rep-resent a reality. .. . I am absolutely convinced that one will eventually arrive at a theory in which the objects connected by laws are not probabilities, but conceived facts. . . . However, I cannot provide logical arguments for my conviction, but can only call on my little finger as a witness, which cannot claim any authority to be respected outside my own skin. (Quoted in Born 1949, p. 123)
Other critics find vitiating contradictions and paradoxes in the ideas of probability and randomness. For example, G. Spencer Brown sweepingly wrote that
. . . the concept of probability used in statistical science is meaningless in its own terms [and] . . . , however meaningful it might have been, its meaningfulness would nevertheless have remained fruitless because of the impossibility of gaining information from experimental results. (1957, p. 66)
This rather nihilistic position is unusual and hard to reconcile with the many successful applications of probabilistic ideas. (Indeed, Spencer Brown went on to make constructive qualifications.) A less extreme but related view was expressed by Percy W. Bridgman (1959, pp. 110-111). Both these writers were influenced by statistical uses of tables of random numbers, especially in the con-text of parapsychology, where explanations of puzzling results were sought in the possible misbehavior of random numbers. [See Random numbers; see also PARAPSYCHOLOGY.]
Criticisms about limited utility. A more common criticism, notably among some physical scientists, is that they have little need for statistics because random variability in the problems they study is negligible, at least in comparison with systematic errors or biases. This position has also been taken by some economists, especially in connection with index numbers [see Index numbers, article on SAMPLING]. B. F. Skinner, a psychologist, has forcefully expressed a variant of this position: that there are so many important problems in which random variability is negligible that he will restrict his own research to them (see Skinner 1956 for a presentation of this rather extreme position). In fact, he further argues that the important problems in psychology as a field are the identification of variables that can be observed directly with negligible variability.
It often happens, nonetheless, that, upon detailed examination, random variability is more important than had been thought, especially for the design of future experiments. Further, careful experimental design can often reduce, or bring understanding of, systematic errors. I think that the above kind of criticism is sometimes valid—after all, a single satellite successfully orbiting the earth is enough to show that it can be done—but that usually the criticism represents unwillingness to consider statistical methods explicitly, or a semantic confusion about what statistics is.
Related to the above criticism is the view that statistics is fine for applied technology, but not for fundamental science. In his inaugural lecture at Birkbeck College at the University of London, David Cox countered this criticism. He said in his introduction,
. . . there is current a feeling that in some fields of fundamental research, statistical ideas are sometimes not just irrelevant, but may actually be harmful as a symptom of an over-empirical approach. This view, while understandable, seems to me to come from too narrow a concept of what statistical methods are about. (1961)
Cox went on to give examples of the use of statistics in fundamental research in physics, psychology, botany, and other fields.
Another variant of this criticism sometimes seen (Selvin 1957; Walberg 1966) is that such statistical procedures as hypothesis testing are of doubtful validity unless a classically arranged experiment is possible, complete with randomization, control groups, pre-establishment of hypotheses, and other safeguards. Without such an arrangement—which is sometimes not possible or practical—all kinds of bias may enter, mixing any actual effect with bias effects.
This criticism reflects a real problem of reasonable inference when a true experiment is not available [see Experimental design, article on QUASI-Experimental design], but it is not a criticism unique to special kinds of inference. The problem applies equally to any mode of analysis—formal, informal, or intuitive. A spirited discussion of this topic is given by Kish (1959).
Humanistic criticisms. Some criticisms of statistics represent serious misunderstandings or are really criticisms of poor statistical method, not of statistics per se. For example, one sometimes hears the argument that statistics is inhuman, that “you can't reduce people to numbers,” that statistics (and perhaps science more generally) must be battled by humanists. This is a statistical version of an old complaint, voiced in one form by Horace Walpole, in a letter to H. S. Conway (1778): “This sublime age reduces everything to its quintessence; all periphrases and expletives are so much in disuse, that I suppose soon the only way to [go about] making love will be to say 'Lie down”
A modern variation of this was expressed by W. H. Auden in the following lines:
Thou shalt not answer questionnaires Or quizzes upon World-Affairs,
Nor with compliance Take any test. Thou shalt not sit With statisticians nor commit A social science.
From “Under Which Lyre: A Reactionary Tract for the Times.” Reprinted from Nones, by W. H. Auden, by permission of Random House, Inc. Copyright 1946 by W. H. Auden.
Joseph Wood Krutch (1963) said, “I still think that a familiarity with the best that has been thought and said by men of letters is more helpful than all the sociologists' statistics” (“Through Happiness With Slide Rule and Calipers,” p. 14).
There are, of course, quite valid points buried in such captious and charming criticisms. It is easy to forget that things may be more complicated than they seem, that many important characteristics are extraordinarily difficult to measure or count, that scientists (and humanists alike) may lack professional humility, and that any set of measurements excludes others that might in principle have been made. But the humanistic attack is overdefensive and is a particular instance of what might be called the two-culture fallacy: the belief that science and the humanities are inherently different and necessarily in opposition.
Criticisms of overconcern with averages. Statisticians are sometimes teased about being interested only in averages, some of which are ludicrous: 2.35 children in an average family; or the rare disease that attacks people aged 40 on the average —two cases, one a child of 2 and the other a man of 78. (Chuckles from the gallery.)
Skinner made the point by observing that “no one goes to the circus to see the average dog jump through a hoop significantly oftener than untrained dogs raised under the same circumstances . . .” (1956, p. 228). Krutch said that “Statistics take no account of those who prefer to hear a different drummer” (1963, p. 15).
In fact, although averages are important, statisticians have long been deeply concerned about dispersions around averages and about other aspects of distributions, for example, in extreme values [see Nonparametric statistics, article on Order statistics; and Statistical analysis, Special problems Of, article on OUTLIERS].
In 1889 the criticism of averages was poetically made by Galton:
It is difficult to understand why statisticians commonly limit their inquiries to Averages, and do not revel in more comprehensive views. Their souls seem as dull to the charm of variety as that of the native of one of our flat English counties, whose retrospect of Switzerland was that, if its mountains could be thrown into its lakes, two nuisances would be got rid of at once, (p. 62)
Galton's critique was overstated even at its date, but it would be wholly inappropriate today.
Another passage from the same work by Galton refers to the kind of emotional resistance to statistics that was mentioned earlier:
Some people hate the very name of statistics, but I find them full of beauty and interest. Whenever they are not brutalized, but delicately handled by the higher methods, and are warily interpreted, their power of dealing with complicated phenomena is extraordinary. They are the only tools by which an opening can be cut through the formidable thicket of difficulties that bars the path of those who pursue the Science of man. (1889,pp. 62-63)
One basic source of misunderstanding about averages is that an individual may be average in many ways, yet appreciably nonaverage in others. This was the central difficulty with Quetelet's historically important concept of the average man [see the biography of QUETELET] ; a satirical novel about the point, by Robert A. Aurthur (1953), has appeared. The average number of children per family in a given population is meaningful and sometimes useful to know, for example, in estimating future population. There is, however, no such thing as the average family, if only because a family with an average number of children (assuming this number to be integral) would not be average in terms of the reciprocal of number of children. To put it another way, there is no reason to think that a family with the average number of children also has average income, or average education, or lives at the center of population of the country.
Criticisms of too much mathematics. The criticism is sometimes made—often by statisticians themselves—that statistics is too mathematical. The objection takes various forms, for example:
(1) Statisticians choose research problems be-cause of their mathematical interest or elegance and thus do not work on problems of real statistical concern. (Sometimes the last phrase simply refers to problems of concern to the critic.)
(2) The use of mathematical concepts and language obscures statistical thinking.
(3) Emphasis on mathematical aspects of statistics tends to make statisticians neglect problems of proper goals, meaningfulness of numerical statistics, and accuracy of data.
Critiques along these lines are given by, for example, W. S. Woytinsky (1954) and Corrado Gini (1951; 1959). A similar attack appears in Lancelot Hogben's Statistical Theory (1957). What can one say of this kind of criticism, whether it comes from within or without the profession? It has a venerable history that goes back to the early development of statistics. Perhaps the first quarrel of this kind was in the days when the word “statistics” was used, in a different sense than at present, to mean the systematic study of states, a kind of political science. The dispute was between those “statisticians” who provided discursive descriptions of states and those who cultivated the so-called Tabellenstatistik, which ranged from typo-graphically convenient arrangements of verbal summaries to actual tables of vital statistics. Descriptions of this quarrel are given by Westergaard (1932, pp. 12-15), Lundberg (1940), and Lazarsfeld (1961, especially p. 293).
The ad hominem argument—that someone is primarily a mathematician, and hence incapable of understanding truly statistical problems—has been and continues to be an unfortunately popular rhetorical device. In part it is probably a defensive reaction to the great status and prestige of mathematics.
In my view, a great deal of this kind of discussion has been beside the point, although some charges on all sides have doubtless been correct. If a part of mathematics proves helpful in statistics, then it will be used. As statisticians run onto mathematical problems, they will work on them, borrowing what they can from the store of current mathematical knowledge, and perhaps encouraging or carrying out appropriate mathematical research. To be sure, some statisticians adopt an unnecessarily mathematical manner of exposition. This may seem an irritating affectation to less mathematical colleagues, but who can really tell apart an affectation and a natural mode of communication?
An illuminating discussion about the relationship between mathematics and statistics, as well as about many other matters, is given by Tukey (1961).
Criticisms of obfuscation. Next, there is the charge that statistics is a meretricious mechanism to obfuscate or confuse: “Lies, damned lies, and statistics” (the origin of this canard is not entirely clear: see White 1964). A variant is the criticism that statistical analyses are impossible to follow, filled with unreadable charts, formulas, and jargon.
These points are often well taken of specific statistical or pseudostatistical writings, but they do not relate to statistics as a discipline. A popular book, How to Lie With Statistics (Huff 1954), is in fact a presentation of horrid errors in statistical description and analysis, although it could, of course, be used as a source for pernicious sophistry. It is somewhat as if there were a book called “How to Counterfeit Money,” intended as a guide to bank tellers—or the general public—in protecting themselves against false money.
George A. Lundberg made a cogent defense against one form of this criticism, in the following words:
. . . when we have to reckon with stupidity, incompetence, and illogic, the more specific the terminology and methods employed the more glaring will be the errors in the result. As a result, the errors of quantitative workers lend themselves more easily to detection and derision. An equivalent blunder by a manipulator of rhetoric may not only appear less flagrant, but may actually go unobserved or become a venerated platitude. (1940, p. 138)
Criticisms of sampling per se. One sometimes sees the allegation that it is impossible to make reasonable inferences from a sample to a population, especially if the sample is a small fraction of the population. A variant of this was stated by Joseph Papp: “The methodology . . . was not scientific: they used sampling and you can't draw a complete picture from samplings” (quoted in Kadushin 1966, p. 30).
This criticism has no justification except insofar as it impugns poor sampling methods. Samples have always been used, because it is often impractical or impossible to observe a whole population (one cannot test a new drug on every human being, or destructively test all electric fuses) or because it is more informative to make careful measurements on a sample than crude measurements on a whole population. Proper sampling—for which the absolute size of the sample is far more important than the fraction of the population it represents—is informative, and in constant successful use.
Criticisms of intellectual imperialism. The criticism is sometimes made that statistics is not the whole of scientific method and practice. Skinner said:
. . . it is a mistake to identify scientific practice with the formalized constructions [italics added] of statistics and scientific method. These disciplines have their place, but it does not coincide with the place of scientific research. They offer a method of science but not, as is so often implied, the method. As formal disciplines they arose very late in the history of science, and most of the facts of science have been discovered without their aid. (1956, p. 221)
I know of few statisticians so arrogant as to equate their field with scientific method generally. It is, of course, true that most scientific work has been done without the aid of statistics, narrowly construed as certain formal modes of analysis that are currently promulgated. On the other hand, a good deal of scientific writing is concerned, one way or another, with statistics, in the more general sense of asking how to make sensible inferences.
Skinner made another, somewhat related point: that, because of the prestige of statistics, statistical methods have (in psychology) acquired the honorific status of a shibboleth (1956, pp. 221, 231). Statisticians are sorrowfully aware of the shibboleth use of statistics in some areas of scientific research, but the profession can be blamed for this only because of some imperialistic textbooks— many of them not by proper statisticians.
Other areas of statistics
The remainder of this article is devoted to brief discussions of those statistical articles in the encyclopedia that have not been described earlier.
Grouped observations. The question of grouped observations is sometimes of concern: in much theoretical statistics measurements are assumed to be continuous, while in fact measurements are always discrete, so that there is inevitable grouping. In addition, one often wishes to group measurements further, for simplicity of description and analysis. To what extent are discreteness and grouping an advantage, and to what extent a danger? [See Statistical analysis, Special problems Of, article on Grouped observations.]
Truncation and censorship. Often observations may reasonably follow some standard model except that observations above (or below) certain values are proscribed (truncated or censored). A slightly more complex example occurs in comparing entrance test scores with post-training scores for students in a course; those students with low en-trance test scores may not be admitted and hence will not have post-training scores at all. Methods exist for handling such problems. [See Statistical analysis, Special problems Of, article OH Truncation and CENSORSHIP.]
Outliers. Very often a few observations in a sample will have unusually large or small values and may be regarded as outliers (or mavericks or wild values). How should one handle them? If they are carried along in an analysis, they may distort it. If they are arbitrarily suppressed, important information may be lost. Even if they are to be suppressed, what rule should be used? [See Statistical analysis, Special problems Of, article on OUTLIERS.]
Transformations of data. Transformations of data are often very useful. For example, one may take the logarithm of reaction time, the square root of a test score, and so on. The purposes of such a transformation are (1) to simplify the structure of the data, for example by achieving additivity of two kinds of effects, and (2) to make the data more nearly conform with a well-understood statistical model, for example by achieving near-normality or constancy of variance. A danger of transformations is that one's inferences may be shifted to some other scale than the one of basic interest. [See Statistical analysis, Special problems Of, article On Transformations of DATA.]
Approximations to distributions. Approximations to distributions are important in probability and statistics. First, one may want to approximate some theoretical distribution in order to have a simple analytic form or to get numerical values. Second, one may want to approximate empirical distributions for both descriptive and inferential purposes. [See Distributions, Statistical, article On Approximations to DISTRIBUTIONS.]
Identifiability—mixtures of distributions. The problem of identification appears whenever a precise model for some phenomenon is specified and parameters of the model are to be estimated from empirical observations [see Statistical identifiability]. What may happen—and may even fail to be recognized—is that the parameters are fundamentally incapable of estimation from the kind of data in question. Consider, for example, a learning theory model in which the proportion of learned material retained after a lapse of time is the ratio of two parameters of the model. Then, even if the proportion could be observed without any sampling fluctuation or measurement error, one would not separately know the two parameters. Of course, the identification problem arises primarily in contexts that are complex enough so that immediate recognition of nonidentifiability is not likely. Sometimes there arises an analogous problem, which might be called identifiability of the model. A classic example appears in the study of accident statistics: some kinds of these statistics are satisfactorily fitted by the negative binomial distribution, but that distribution itself may be obtained as the out-come of several quite different, more fundamental models. Some of these models illustrate the important concept of mixtures. A mixture is an important and useful way of forming a new distribution from two or more statistical distributions. [See Distributions, Statistical, article on Mixtures of DISTRIBUTIONS.]
Applications. Next described is a set of articles on special topics linked with specific areas of application, although most of these areas have served as motivating sources for general theory.
Quality control. Statistical quality control had its genesis in manufacturing industry, but its applications have since broadened [see Quality control, statistical]. There are three articles under this heading. The first is on acceptance sampling, where the usual context is that of “lots” of manufactured articles. Here there are close relations to hypothesis testing and to sequential analysis. The second is on process control (and so-called control charts), a topic that is sometimes itself called quality control, in a narrower sense than the usage here. The development of control chart concepts and methods relates to basic notions of randomness and stability, for an important normative concept is that of a process in control, that is, a process turning out a sequence of numbers that behave like independent, identically distributed random variables. The third topic is reliability and life testing, which also relates to matters more general than immediate engineering contexts. [The term “reliability” here has quite a different meaning than it has in the area of psychological testing; see PSYCHOMETRICS.]
Government statistics. Government statistics are of great importance for economic, social, and political decisions [see Government statistics]. The article on that subject treats such basic issues as the use of government statistics for political propaganda, the problem of confidentiality, and the meaning and accuracy of official statistics. [Some related articles are CENSUS; Economic data; MORTALITY; POPULATION; Vital statistics.]
Index numbers. Economic index numbers form an important part of government statistical programs [see Index numbers]. The three articles on this topic discuss, respectively, theory, practical aspects of index numbers, and sampling problems.
Statistics as legal evidence. The use of statistical methods, and their results, in judicial proceedings has been growing in recent years. Trade-mark disputes have been illuminated by sample surveys; questions of paternity have been investigated probabilistically; depreciation and other accounting quantities that arise in quasi-judicial hearings have been estimated statistically. There are conflicts or apparent conflicts between statistical methods and legal concepts like those relating to hearsay evidence. [See Statistics as Legal evidence.]
Statistical geography. Statistical geography, the use of statistical and other quantitative methods in geography, is a rapidly growing area [see Geography, article on Statistical geography]. Some-what related is the topic of rank-size, in which are studied—empirically and theoretically—patterns of relationship between, for example, the populations of cities and their rankings from most populous down. Another example is the relationship between the frequencies of words and their rankings from most frequent down. [See RANK-Size relations.]
Quantal response. Quantal response refers to a body of theory and method that might have been classed with counted data or under linear hypotheses with regression [see Quantal response]. An example of a quantal response problem would be one in which students are given one week, two weeks, and so on, of training (say 100 different students for each training period), and then proportions of students passing a test are observed. Of interest might be that length of training leading to exactly 50 per cent passing. Many traditional psychophysical problems may be regarded from this viewpoint [see Psychophysics].
Queues. The study of queues has been of importance in recent years; it is sometimes considered part of operations research, but it may also be considered a branch of the study of stochastic processes [see QUEUES; Operations research]. An example of queuing analysis is that of traffic flow at a street-crossing with a traffic light. The study has empirical, theoretical, and normative aspects.
Computation. Always intertwined with applied statistics, although distinct from it, has been computation [see COMPUTATION]. The recent advent of high-speed computers has produced a sequence of qualitative changes in the kind of computation that is practicable. This has had, and will continue to have, profound effects on statistics, not only as regards data handling and analysis, but also in theory, since many analytically intractable problems can now be attacked numerically by simulation on a high-speed computer [see Simulation].
Cybernetics. The currently fashionable term “cybernetics” is applied to a somewhat amorphous body of knowledge and research dealing with information processing and mechanisms, both living and nonliving [see Cyberneticsand HOMEOSTASIS]. The notions of control and feedback are central, and the influence of the modern high-speed computer has been strong. Sometimes this area is taken to include communication theory and information theory [see Information theory, which stresses applications to psychology].
William H. Kruskal
Boehm, George a. W. 1964 The Science of Being Al-most Certain. Fortune 69, no. 2:104-107, 142, 144, 146, 148.
Kac, Mark 1964 Probability. Scientific American 211, no. 3:92-108.
Kendall, M. G. 1950 The Statistical Approach. Economica New Series 17:127-145.
Kruskal, William h. (1965) 1967 Statistics, Moliere, and Henry Adams. American Scientist 55:416-428. → Previously published in Volume 9 of Centennial Review.
Weaver, Warren 1952 Statistics. Scientific American 186, no. 1:60-63.
Introductions to Probability and STATISTICS
Borel, Smile f. E. J. (1943) 1962 Probabilities and Life. Translated by M. Baudin. New York: Dover. → First published in French.
Gnedenko, Boris v.; and Khinchin, Aleksandr ia. (1945) 1962 An Elementary Introduction to the Theory of Probability. Authorized edition. Translated from the 5th Russian edition, by Leo F. Boron, with the editorial collaboration of Sidney F. Mack. New York: Dover. → First published as Elementarnoe vvedenie v teoriiu veroiatnostei.
Moroney, M. J. (1951) 1958 Facts From Figures. 3d ed., rev. Harmondsworth (England): Penguin.
Mosteller, FREDERICK; Rourke, Robert e. K.; and Thomas, George b. JR. 1961 Probability With Statistical Applications. Reading, Mass.: Addison-Wesley.
Tippett, L. H. C. (1943) 1956 Statistics. 2d ed. New York: Oxford Univ. Press.
Wallis, W. ALLEN; and Roberts, Harry v. 1962 The Nature of Statistics. New York: Collier. → Based on material presented in the authors' Statistics: A New Approach (1956).
Weaver, Warren 1963 Lady Luck: The Theory of Probability. Garden City, N.Y.: Doubleday.
Youden, W. J. 1962 Experimentation and Measurement. New York: Scholastic Book Services.
Mathematical Reviews. → Published since 1940.
Psychological Abstracts. → Published since 1927. Covers parts of the statistical literature.
Quality Control and Applied Statistics Abstracts. -* Published since 1956.
Referativnyi zhurnal: Matematika. → Published since 1953.
Statistical Theory and Method Abstracts. → Published since 1959.
Zentralblatt fur Mathematik und ihre Grenzgebiete. → Published since 1931.
WORKS CITED IN THE TEXT
Adams, Ernest w.; Fagot, Robert f.; and Robinson, Richard e. 1965 A Theory of Appropriate Statistics. Psychometrika 30:99-127.
Aurthur, Robert a. 1953 The Glorification of Al Toolum. New York: Rinehart.
Born, Max (1949) 1951 Natural Philosophy of Cause and Chance. Oxford: Clarendon.
Bortkiewicz, Ladislaus von 1909 Die statistischen Generalisationen. Scientia 5:102-121. → A French translation appears in a supplement to Volume 5, pages 58-75.
Braithwaite, R. B. 1953 Scientific Explanation: A Study of the Function of Theory, Probability and Law in Science. Cambridge Univ. Press. → A paperback edition was published in 1960 by Harper.
Bridgman, Percy w. 1959 The Way Things Are. Cam-bridge, Mass.: Harvard Univ. Press.
Brown, G. Spencer, see under Spencer brown, G.
Churchman, Charles w.; and Ratoosh, Philburn (editors) 1959 Measurement: Definitions and Theories. New York: Wiley.
Cochran, William g.; Mosteller, FREDERICK; and Tukey, John w. 1954 Statistical Problems of the Kinsey Report on Sexual Behavior in the Human Male. Washington: American Statistical Association.
Coombs, Clyde h. 1964 A Theory of Data. New York: Wiley.
Cox, D. R. 1961 The Role of Statistical Methods in Science and Technology. London: Birkbeck College.
Deming, W. Edwards 1965 Principles of Professional Statistical Practice. Annals of Mathematical Statistics 36:1883-1900.
Galton, Francis 1889 Natural Inheritance. London and New York: Macmillan.
Gini, Corrado 1951 Caractere des plus recents developpements de la methodologie statistique. Statistica 11:3-11.
Gini, Corrado 1959 Mathematics in Statistics. Metron 19, no. 3/4:1-9.
Hogben, Lancelot t. 1957 Statistical Theory; the Relationship of Probability, Credibility and Error: An Examination of the Contemporary Crisis in Statistical Theory From a Behaviourist Viewpoint. London: Allen & Unwin.
Huff, Darrell 1954 How to Lie With Statistics. New York: Norton.
Kadushin, Charles 1966 Shakespeare & Sociology. Co-lumbia University Forum 9, no. 2:25-31.
Kish, Leslie 1959 Some Statistical Problems in Research Design. American Sociological Review 24:328-338.
Krutch, Joseph wood 1963 Through Happiness With Slide Rule and Calipers. Saturday Review 46, no.44:12-15.
Lazarsfeld, Paul f. 1961 Notes on the History of Quantification in Sociology: Trends, Sources and Problems. Iszs 52, part 2:277-333. → Also included in Harry Woolf (editor), Quantification, published by Bobbs-Merrill in 1961.
Lundberg, George a. 1940 Statistics in Modern Social Thought. Pages 110-140 in Harry E. Barnes, Howard Becker, and Frances B. Becker (editors), Contemporary Social Theory. New York: Appleton.
Pearson, Karl (1892) 1957 The Grammar of Science. 3d ed., rev. & enl. New York: Meridian. → The first and second editions (1892 and 1900) contain material not in the third edition.
Pfanzagl, J. 1959 Die axiomatischen Grundlagen einer allgemeinen Theorie des Messens. A publication of the Statistical Institute of the University of Vienna, New Series, No. 1. Würzburg (Germany): Physica-Verlag.
→ Scheduled for publication in English under the title The Theory of Measurement in 1968 by Wiley.
Popper, Karl r. (1935) 1959 The Logic of Scientific Discovery. Rev. ed. New York: Basic Books; London: Hutchinson. → First published as Logik der Forschung. A paperback edition was published in 1961 by Harper.
Selvin, Hanan c. 1957 A Critique of Tests of Significance in Survey Research. American Sociological Review 22:519-527. → See Volume 23, pages 85-86 and 199-200, for responses by David Gold and James M. Beshers.
Selvin, Hanan c.; and Stuart, Alan 1966 Data-dredging Procedures in Survey Analysis. American Statistician 20, no. 3:20-23.
Skinner, B. F. 1956 A Case History in Scientific Method. American Psychologist 11:221-233.
Spencer brown, G. 1957 Probability and Scientific Inference. London: Longmans. → The author's surname is Spencer Brown, but common library practice is to alphabetize his works under Brown.
Stevens, S. S. 1946 On the Theory of Scales of Measurement. Science 103:677-680.
Suppes, Patrick; and Zinnes, Joseph l. 1963 Basic Measurement Theory. Volume 1, pages 1-76 in R. Duncan Luce, Robert R. Bush, and Eugene Galanter (editors), Handbook of Mathematical Psychology. New York: Wiley.
Torgerson, Warren s. 1958 Theory and Methods of Scaling. New York: Wiley.
Tukey, John w. 1961 Statistical and Quantitative Methodology. Pages 84-136 in Donald P. Ray (editor), Trends in Social Science. New York: Philosophical Library.
Tukey, John w. 1962 The Future of Data Analysis. Annals of Mathematical Statistics 33:1-67, 812.
Walberg, HerrbertJ. 1966 When Are Statistics Appropriate? Science 154:330-332. → Follow-up letters by Julian C. Stanley, “Studies of Nonrandom Groups,” and by Herbert J. Walberg, “Statistical Randomization in the Behavioral Sciences,” were published in Volume 155, on page 953, and Volume 156, on page 314, respectively.
Walpole, Horace (1778) 1904 [Letter] To the Hon. Henry Seymour Conway. Vol. 10, pages 337-338 in Horace Walpole, The Letters of Horace Walpole, Fourth Earl of Orford. Edited by Paget Toynbee. Oxford: Clarendon Press.
Westergaard, Harald l. 1932 Contributions to the History of Statistics. London: King.
White, Colin 1964 Unkind Cuts at Statisticians. American Statistician 18, no. 5:15-17.
Woytinsky, W. S. 1954 Limits of Mathematics in Statistics. American Statistician 8, no. 1:6-10, 18.
The broad river of thought that today is known as theoretical statistics cannot be traced back to a single source springing identifiably from the rock. Rather is it the confluence, over two centuries, of a number of tributary streams from many different regions. Probability theory originated at the gaming table; the collection of statistical facts began with state requirements of soldiers and money; marine insurance began with the wrecks and piracy of the ancient Mediterranean; modern studies of mortality have their roots in the plague pits of the seventeenth century; the theory of errors was created in astronomy, the theory of correlation in biology, the theory of experimental design in agriculture, the theory of time series in economics and meteorology, the theories of component analysis and ranking in psychology, and the theory of chi-square methods in sociology. In retrospect it almost seems as if every phase of human life and every science has contributed something of importance to the subject. Its history is accordingly the more interesting, but the more difficult, to write.
Up to about 1850 the word “statistics” was used in quite a different sense from the present one. It meant information about political states, the kind of material that is nowadays to be found assembled in the Statesman's Year-book. Such information was usually, although not necessarily, numerical, and, as it increased in quantity and scope, developed into tabular form. By a natural transfer of meaning, “statistics” came to mean any numerical material that arose in observation of the external world. At the end of the nineteenth century this usage was accepted. Before that time, there were, of course, many problems in statistical methodology considered under other names; but the recognition of their common elements as part of a science of statistics was of relatively late occurrence. The modern theory of statistics (an expression much to be preferred to “mathematical statistics”) is the theory of numerical information of almost every kind.
The characteristic feature of such numerical material is that it derives from a set of objects, technically known as a “population,” and that any particular variable under measurement has a distribution of frequencies over the members of the set. The height of man, for example, is not identical for every individual but varies from man to man. Nevertheless, we find that the frequency distribution of heights of men in a given population has a definite pattern that can be expressed by a relatively simple mathematical formula. Often the “population” may be conceptual but nonexistent, as for instance when we consider the possible tosses of a penny or the possible measurements that may be made of the transit times of a star. This concept of a distribution of measurements, rather than a single measurement, is fundamental to the whole subject. In consequence, points of statistical interest concern the properties of aggregates, rather than of individuals; and the elementary parts of theoretical statistics are much concerned with summarizing these properties in such measures as averages, index numbers, dispersion measures, and so forth.
The simpler facts concerning aggregates of measurements must, of course, have been known almost from the moment when measurements began to be made. The idea of regularity in the patterning of discrete repeatable chance events, such as dice throwing, emerged relatively early and is found explicitly in Galileo's work. The notion that measurements on natural phenomena should exhibit similar regularities, which are mathematically expressible, seems to have originated in astronomy, in connection with measurements on star transits. After some early false starts it became known that observations of a magnitude were subject to error even when the observer was trained and unbiased. Various hypotheses about the pattern of such errors were propounded. Simpson (1757) was the first to consider a continuous distribution, that is to say, a distribution of a variable that could take any values in a continuous range. By the end of the eighteenth century Laplace and Gauss had considered several such mathematically specified distributions and, in particular, had discovered the most famous of them all, the so-called normal distribution [see Distributions, Statistical, article on Special continuous DISTRIBUTIONS].
In these studies there was assumed to be a “true” value underlying the distribution. Departures from this true value were “errors.” They were, so to speak, extraneous to the object of the study, which was to estimate this true value. Early in the nineteenth century a major step forward was taken with the recognition (especially by Quetelet) that living material also exhibited frequency distributions of definite pattern. Furthermore, Galton and Karl Pearson, from about 1880, showed that these distributions were often skew or asymmetrical, in the sense that the shape of the frequency curve for values above the mean was not the mirror image of the curve for values below the mean. In particular it became impossible to maintain that the deviations from the mean were “errors” or that there existed a “true” value; the frequency distribution itself was to be recognized as a fundamental property of the aggregate. Immediately, similar patterns of regularity were brought to light in nearly every branch of science—genetics, biology, meteorology, economics, sociology—and even in some of the arts: distributions of weak verse endings were used to date Shakespeare's plays, and the distribution of words has been used to discuss cases of disputed authorship.
Nowadays the concept of frequency distribution is closely bound up with the notion of probability distribution. Some writers of the twentieth century treat the two things as practically synonymous. Historically, however, the two were not always identified and to some extent pursued independent courses for centuries before coming together. We must go back several millenniums if we wish to trace the concept of probability to its source.
From very ancient times man gambled with primitive instruments, such as astragali and dice, and also used chance mechanisms for divinatory purposes. Rather surprisingly, it does not seem that the Greeks, Romans, or the nations of medieval Europe arrived at any clear notion of the laws of chance. Elementary combinatorics appears to have been known to the Arabs and to Renaissance mathematicians, but as a branch of algebra rather than in a probabilistic context. Nevertheless, chance itself was familiar enough, especially in gambling, which was widespread in spite of constant discouragement from church and state. Some primitive ideas of relative frequency of occurrence can hardly have failed to emerge, but a doctrine of chances was extraordinarily late in coming. The first record we have of anything remotely resembling the modern idea of calculating chances occurs in a fifteenth-century poem called De vetula. The famous mathematician and physicist Geronimo Cardano was the first to leave a manuscript in which the concept of laws of chance was explicitly set out (Ore 1953). Galileo left a fragment that shows that he clearly understood the method of calculating chances at dice. Not until the work of Huygens (1657), the correspondence between Pascal and Fermat, and the work of Jacques Bernoulli (1713) do we find the beginnings of a calculus of probability.
This remarkable delay in the mathematical formulation of regularity in events that had been observed by gamblers over thousands of years is probably to be explained by the philosophical and religious ideas of the times, at least in the Western world. To the ancients, events were mysterious; they could be influenced by superhuman beings but no being was in control of the universe. On the other hand, to the Christians everything occurred under the will of God, and in a sense there was no chance; it was almost impious to suppose that events happened under the blind laws of probability. Whatever the explanation may be, it was not until Europe had freed itself from the dogma of the medieval theologian that a calculus of probability became possible.
Once the theory of probability had been founded, it developed with great rapidity. Only a hundred years separates the two greatest works in this branch of the subject, Bernoulli's Ars conjectandi (1713) and Laplace's Théorie analytique des probabilités (1812). Bernoulli exemplified his work mainly in terms of games of chance, and subsequent mathematical work followed the same line. Montmort's work was concerned entirely with gaming, and de Moivre stated most of his results in similar terms, although actuarial applications were always present in his mind (see Todhunter  1949, pp. 78-134 for Montmort and pp. 135-193 for de Moivre). With Laplace, Condorcet, and rather later, Poisson, we begin to find probabilistic ideas applied to practical problems; for example, Laplace discussed the plausibility of the nebular hypothesis of the solar system in terms of the probability of the planetary orbits lying as nearly in a plane as they do. Condorcet (1785) was concerned with the probability of reaching decisions under various systems of voting, and Poisson (1837) was specifically concerned with the probability of reaching correct conclusions from imperfect evidence. A famous essay of Thomas Bayes (1764) broke new ground by its consideration of probability in inductive reasoning, that is to say, the use of the probabilities of observed events to compare the plausibility of hypotheses that could explain them [see Bayesian inference].
The linkage between classical probability theory and statistics (in the sense of the science of regularity in aggregates of natural phenomena) did not take place at any identifiable point of time. It occurred somewhere along a road with clearly traceable lines of progress but no monumental milestones. The critical point, however, must have been the realization that probabilities were not always to be calculated a priori, as in games of chance, but were measurable constants of the external world. In classical probability theory the probabilities of primitive events were always specified on prior grounds: dice were “fair” in the sense that each side had an equal chance of falling uppermost, cards were shuffled and dealt “at random,” and so on. A good deal of probability theory was concerned with the pure mathematics of deriving the probabilities of complicated contingent events from these more primitive events whose probabilities were known. However, when sampling from an observed frequency distribution, the basic probabilities are not known but are parameters to be estimated. It took some time, perhaps fifty years, for the implications of this notion to be fully realized. Once it was, statistics embraced probability and the subject was poised for the immense development that has occurred over the past century.
Once more, however, we must go back to another contributory subject—insurance, and particularly life insurance. Although some mathematicians, notably Edmund Halley, Abraham de Moivre, and Daniel Bernoulli, made important contributions to demography and insurance studies, for the most part actuarial science pursued a course of its own. The founders of the subject were John Graunt and William Petty. Graunt, spurred on by the information contained in the bills of mortality prepared in connection with the great plague (which hit England in 1665), was the first to reason about demographic material in a modern statistical way. Considering the limitations of his data, his work was a beautiful piece of reasoning. Before long, life tables were under construction and formed the basis of the somewhat intricate calculations of the modern actuary [see Life tables]. In the middle of the eighteenth century, some nations of the Western world began to take systematic censuses of population and to record causes of mortality, an example that was soon followed by all [see CENSUS; Vital statistics]. Life insurance became an exact science. It not only contributed an observable frequency distribution with a clearly defined associated calculus; it also contributed an idea that was to grow into a dynamic theory of probability—the concept of a population moving through time in an evolutionary way. Here and there, too, we find demographic material stimulating statistical studies, for example, in the study of the mysteries of the sex ratio of human births.
1890-1940. If we have to choose a date at which the modern theory of statistics began, we may put it, somewhat arbitrarily, at 1890. Francis Galton was then 68 but still had twenty years of productive life before him. A professor of economics named Francis Ysidro Edgeworth (then age 45) was calling attention to statistical regularities in election results, Greek verse, and the mating of bees and was about to propound a remarkable generalization of the law of error. A young man named Karl Pearson (age 35) had just been joined by the biologist Walter Weldon at University College, London, and was meditating the lectures that ultimately became The Grammar of Science. A student named George Udny Yule, at the age of 20, had caught Pearson's eye. And in that year was born the greatest of them all, Ronald Aylmer Fisher. For the next forty years, notwithstanding Russian work in probability theory—notably the work of Andrei Markov and Aleksandr Chuprov— developments in theoretical statistics were predominantly English. At that point, there was something akin to an intellectual explosion in the United States and India. France was already pursuing an individual line in probability theory under the inspiration of Émile Borel and Paul Levy, and Italy, under the influence of Corrado Gini, was also developing independently. But at the close of World War n the subject transcended all national boundaries and had become one of the accepted disciplines of the scientific, technological, and industrial worlds.
The world of 1890, with its futile power politics, its class struggles, its imperialism, and its primitive educational system, is far away. But it is still possible to recapture the intellectual excitement with which science began to extend its domain into humanitarian subjects. Life was as mysterious as ever, but it was found to obey laws. Human society was seen as subject to statistical inquiry, as an evolutionary entity under human control. It was no accident that Galton founded the science of eugenics and Karl Pearson took a militant part in some of the social conflicts of his time. Statistical science to them was a new instrument for the exploration of the living world, and the behavioral sciences at last showed signs of structure that would admit of mathematical analysis.
In London, Pearson and Weldon soon began to exhibit frequency distributions in all kinds of fields. Carl Charlier in Sweden, Jacobus Kapteyn and Johan van Uven in Holland, and Vilfredo Pareto in Italy, to mention only a few, contributed results from many different sciences. Pearson developed his system of mathematical curves to fit these observations, and Edgeworth and Charlier began to consider systems based on the sum of terms in a series analogous to a Taylor expansion. It was found that the normal curve did not fit most observed distributions but that it was a fair approximation to many of them.
Relationships between variables. About 1890, Pearson, stimulated by some work of Gallon, began to investigate bivariale dislribulions, that is to say, the distribution in a two-way table of frequencies of members, each of which bore a value of two variables. The patterns, especially in the biological field where data were most plentiful, were equally typical. In much observed material there were relationships between variables, but they were not of a mathematically functional form. The length and breadth of oak leaves, for example, were dependent in the sense that a high value of one tended to occur with a high value of the other. But there was no formula expressing this relationship in the familiar deterministic language of physics. There had to be developed a new kind of relationship to describe this type of connection. In the theory of allribules this led to measures of association and contingency [see Statistics, DESCRIPTIVE] ; in the theory of variables it led to correlation and regression [see Linear hypotheses; Multivariate analysis, articles on CORRELATION].
The theory of statistical relationship, and especially of regression, has been studied continuously and intensively ever since. Most writers on statistics have made contributions at one time or another. The work was still going strong in the middle of the twentieth century. Earlier writers, such as Pearson and Yule, were largely concerned with linear regression, in which the value of one variable is expressed as a linear function of the others plus a random term. Later authors extended the theory to cover several dependent variables and curvilinear cases; and Fisher in particular was instrumental in emphasizing the importance of rendering the explanatory variables independent, so far as possible.
Sampling. It was not long before statisticians were brought up against a problem that is still, in one form or another, basic to most of their work. In the majority of cases the data with which they were presented were only samples from a larger population. The problems then arose as to how reliable the samples were, how to estimate from them values of parameters describing the parent population, and, in general, what kinds of inference could be based on them.
Some intuitive ideas on the subject occur as far back as the eighteenth century; but the sampling problem, and the possibility of treating it with mathematical precision, was not fully appreciated until the twentieth century.
Classical error theory, especially the work of Carl Friedrich Gauss in the first half of the nineteenth century, had considered sampling distributions of a simple kind. For example, the chi-square distribution arose in 1875 when the German geodesist Friedrich Helmert worked out the distribution of sample variance for sampling from a normal population. The same chi-square distribution was independently rediscovered in 1900 by Karl Pearson in a quite different context, that of testing distributional goodness of fit [see Counted data; Goodness of FIT]. In another direction, Pearson developed a wide range of asymptotic formulas for standard errors of sample quantities. The mathematics of many so-called small-sample distribution problems presented difficulties with which Pearson was unable to cope, despite valiant attempts. William Cosset, a student of Pearson's, produced in 1908 one of the most important statistical distributions under the pseudonym of “Student”; and this distribution, arising from a basic small sample problem, is known as that of Student's t [see Distributions, Statistical].
It was Student and R . A. Fisher (beginning in 1913) who inaugurated a new era in the study of sampling distributions. Fisher himself made major contributions to the subject over the ensuing thirty years. In rapid succession he found the distribution, in samples from a normal population, of the correlation coefficient, regression coefficients, multiple correlation coefficients, and the ratio of variances known as F. Other writers, notably John Wishart in England, Harold Hotelling and Samuel Wilks in the United States, and S. N. Roy and R. C. Bose in India, added a large number of new results, especially in the field of multivariate analysis. More recently, T. W. Anderson has advanced somewhat farther the frontiers of knowledge in this rather difficult mathematical field.
Concurrently with these spectacular mathematical successes in the derivation of sampling distributions, methods were also devised for obtaining approximations. Again R. A. Fisher was in the lead with a paper (1928) introducing the so-called fe-statistics, functions of sample values that have simplifying mathematical properties.
The question whether a sampling method is random is a subtle one. It does not always trouble an experimental scientist, when he can select his material by a chance mechanism. However, sometimes the data are provided by nature, and whether they are a random selection from the available population is difficult to determine. In the sampling of human beings difficulties are accentuated by the fact that people may react to the sampling process. As sampling methods spread to the social sciences, the problems of obtaining valid samples at low cost from a wide geographical scatter of human beings became increasingly important, and some new problems of respondent bias arose. In consequence, the sampling of humans for social inquiry has almost developed into a separate subject, dependent partly on psychological matters, such as how questions should be framed to avoid bias, and partly on expense. By 1960 sampling errors in social surveys were well under control; but many problems remained for exploration, notably those of drawing samples of individuals with relatively rare and specialized characteristics, such as retail pharmacists or sufferers from lung cancer. Designing a sample was accepted as just as much a matter of expertise as designing a house [see Interviewing; Sample surveys; Survey analysis].
The control of the sample and the derivation of sampling distributions were, of course, only means to an end, which was the drawing of accurate inferences from the sample that ultimately resulted. We shall say more about the general question of inference below, but it is convenient to notice here the emergence, between 1925 and 1935, of two branches of the subject: the theory of estimation, under the inspiration of Fisher, and the theory of hypothesis testing, under the inspiration of Karl Pearson's son Egon and Jerzy Neyman [see Estimation; Hypothesis testing].
Estimation. Up to 1914 (which, owing to World War i, actually means up to 1920), the then current ideas on estimation from a sample were intuitive and far from clear. For the most part, an estimate was constructed from a sample as though it were being constructed for a population (for example, the sample mean was an “obvious” estimate of the parent population mean). A few writers—Daniel Bernoulli, Laplace, Gauss, Markov, and Edgeworth—had considered the problem, asked the right questions, and sometimes found partial answers. Ideas on the subject were clarified and extended in a notable paper by Fisher (1925). He introduced the concepts of optimal estimators and of efficiency in estimation, and emphasized the importance of the so-called method of maximum likelihood as providing a very general technique for obtaining “best” estimators. These ideas were propounded to a world that was just about ripe for them, and the theory of estimation developed at a remarkable rate in the ensuing decades.
The related problem of gauging the reliability of an estimate, that is, of surrounding it with a band of error (which has associated with it a designated probability) led to two very different lines of development, the confidence intervals of Egon Pearson and Neyman and the “fiducial intervals” of Fisher, both originating between 1925 and 1930 [see Estimation, article on Confidence intervals And regions; Fiducial inference]. The two proceeded fairly amiably side by side for a few years, and at the time it seemed that they were equivalent; they certainly led to the same results in simpler cases. However, it became clear about 1935 that they were conceptually very different, and a great deal of argument developed which had not been resolved even at the time of Fisher's death in 1962. Fortunately the controversy, although embittered, did not impede progress. (Omitted at this point is any discussion of Bayesian methods, which may lead to intervals resembling superficially those of confidence and fiducial approaches; Bayesian methods are mentioned briefly below.) [See Bayesian inferencefor a detailed discussion.]
Hypothesis testing. In a like manner, the work of Neyman and Pearson (beginning in 1928) on the theory of statistical tests gave a very necessary clarity to procedures that had hitherto been vague and unsatisfactory. In probabilistic terms the older type of inference had been of this type: If a certain hypothesis were true, the probability that I should observe the actual sample that I have drawn, or one more extreme, is very small; therefore the hypothesis is probably untrue. Neyman and Pearson pointed out that a hypothesis could not be tested in vacuo but only in comparison with other hypotheses. They set up a theory of tests and—as in the case of estimation, with which this subject is intimately linked—discussed power, relative efficiency, and optimality of tests. Here also there was some controversy, but for the most part the Neyman—Pearson theory was generally accepted and had become standard practice by 1950.
Experimental design and analysis. Concurrently with developments in sampling theory, estimation, and hypothesis testing, there was growing rapidly, between 1920 and 1940, a theory of experimental design based again on the work of Fisher. Very early in his career, it had become clear to him that in multivariate situations the “explanation” of one variable in terms of a set of dependent or explanatory variables was rendered difficult, if not impossible, where correlations existed among the explanatory variables themselves; for it then became impossible to say how much of an effect was attributable to a particular cause. This difficulty, which still bedevils the general theory of regression, could be overcome if the explanatory variables could be rendered statistically independent. (This, incidentally, was the genesis of the use of orthogonal polynomials in curvilinear regression analysis.) Fisher recognized that in experimental situations where the design of the experiment was, within limits, at choice, it could be arranged that the effects of different factors were “orthogonal,” that is, independent, so that they could be disentangled. From this notion, coupled with probabilistic interpretations of significance and the necessary mathematical tests, he built up a most remarkable system of experimental design. The new methods were tested at the Rothamsted Experimental Station in England but were rapidly spread by an active and able group of disciples into all scientific fields.
Some earlier work, particularly by Wilhelm Lexis in Germany at the close of the nineteenth century, had called attention to the fact that in sampling from nonhomogeneous populations the formulas of classical probability were a poor representation of the observed effects. This led to attempts to split the sampling variation into components; one, for example, representing the inevitable fluctuation of sampling, another representing the differences between the sections or subpopulations from which members were drawn. In Fisher's hands these ideas were extended and given precision in what is known as the analysis of variance, one of the most powerful tools of modern statistics. The methods were later extended to cover the simultaneous variation of several variables in the analysis of covariance. [See Linear hypotheses, article On Analysis of VARIANCE.]
It may be remarked, incidentally, that the problems brought up by these various developments in theoretical statistics have proved an immense challenge to mathematicians. Many branches of abstract mathematics—invariants, symmetric functions, groups, finite geometries, n-dimensional geometry, as well as the whole field of analysis— have been brought effectively into play in solving practical problems. After World War II the advent of the electronic computer was a vital adjunct to the solution of problems where even the resources of modern mathematics failed. Sampling experiments became possible on a scale never dreamed of before.
Recent developments . So much occurred in the statistical domain between 1920 and 1940 that it is not easy to give a clear account of the various currents of development. We may, however, pause at 1940 to look backward. In Europe, and to a smaller extent in the United States, World War u provided an interregnum, during which much was absorbed and a good deal of practical work was done, but, of necessity, theoretical developments had to wait, at least as far as publication was concerned. The theory of statistical distributions and of statistical relationship had been firmly established by 1940. In sampling theory many mathematical problems had been solved, and methods of approach to outstanding problems had been devised. The groundwork of experimental design had been firmly laid. The basic problems of inference had been explicitly set out and solutions reached over a fairly wide area. What is equally important for the development of the subject, there was about to occur a phenomenal increase in the number of statisticians in academic life, in government work, and in business. By 1945 the subject was ready for decades of vigorous and productive exploration.
Much of this work followed in the direct line of earlier work. The pioneers had left sizable areas undeveloped; and in consequence, work on distribution theory, sampling, and regression analysis continued in fair volume without any fundamental change in concept. Among the newer fields of attention we may notice in particular sequential analysis, decision function theory, multivariate analysis, time series and stochastic processes, statistical inference, and distribution-free, or non-parametric, methods [see Decision theory; Markov chains; Multivariate analysis; Non-Parametric statistics; Queues; Sequential analysis; Time series].
Sequential analysis. During World War n it was realized by George Barnard in England and Abraham Wald in the United States that some types of sampling were wasteful in that they involved scrutinizing a sample of fixed size even if the examination of the first few members already indicated the decision to be made. This led to a theory of sequential sampling, in which the sample number is not fixed in advance but at each stage in the sampling a decision is made whether to continue or not. This work was applied with success to the control of the quality of manufactured products, and it was soon also realized that a great deal of scientific inquiry was, in fact, sequential in character. [See Quality control, STATISTICAL.]
Decision functions. Wald was led to consider a more general approach, which linked up with Neyman's ideas on hypothesis testing and developed into a theory of decision functions. The basic idea was that at certain stages decisions have to be made, for example, to accept or reject a hypothesis. The object of the theory is to lay down a set of rules under which these decisions can be intelligently made; and, if it is possible to specify penalties for taking wrong decisions, to optimize the method of choice according to some criterion, such as minimizing the risk of loss. The theory had great intellectual attraction and even led some statisticians to claim that the whole of statistics was a branch of decision-function theory, a claim that was hotly resisted in some quarters and may or may not stand up to deeper examination.
Multivariate problems. By 1950 the mathematical development of some branches of statistical theory had, in certain directions, outrun their practical usefulness. This was true of multivariate analysis based on normal distributions. In the more general theory of multivariate problems, several lines of development were pursued. One part of the theory attempts to reduce the number of effective dimensions, especially by component analysis and, as developed by psychologists, factor analysis [see Factor analysis}. Another, known as canonical correlation analysis, attempts to generalize correlation to the relationship between two vector quantities. A third generalizes distribution theory and sampling to multidimensional cases. The difficulties are formidable, but a good deal of progress has been made. One problem has been to find practical data that would bear the weight of the complex analysis that resulted. The highspeed computer may be a valuable tool in further work in this field. [See Computation.]
Time series and stochastic processes. Perhaps the most extensive developments after World War II were in the field of time series and stochastic processes generally. The problem of analyzing a time series has particular difficulties of its own. The system under examination may have a trend present and may have seasonal fluctuations. The classical method of approach was to dissect the series into trend, seasonal movement, oscillatory effects, and residual; but there is always danger that an analysis of this kind is an artifact that does not correspond to the causal factors at work, so that projection into the future is unreliable. Even where trend is absent, or has been abstracted, the analysis of oscillatory movements is a treacherous process. Attempts to apply harmonic analysis to economic data, and hence to elicit “cycles,” were usually failures, owing to the fact that observed fluctuations were not regular in period, phase, or amplitude [see Time series, article on CYCLES].
The basic work on time series was done by Yule between 1925 and 1930. He introduced what is now known as an autoregressive process, in which the value of the series at any point is a linear function of certain previous values plus a random residual. The behavior of the series is then determined, so to speak, partly by the momentum of past history and partly by unpredictable disturbance. In the course of this work Yule introduced serial correlations, which measure the relationship between terms of the series separated by specified time intervals. It was later realized that these functions are closely allied to the coefficients that arise in the Fourier analysis of the series.
World War n acted as a kind of incubatory period. Immediately afterward it was appreciated that Yule's method of analyzing oscillatory movements in time series was only part of a much larger field, which was not confined to movements through time. Earlier pioneer work by several writers, notably Louis Bachelier, Eugen Slutsky, and Andrei Markov, was brought together and formed the starting point of a new branch of probability theory. Any system that passes through a succession of states falls within its scope, provided that the transition from one state to the next is decided by a schedule of probabilities and is not purely deterministic. Such systems are known as stochastic processes. A very wide variety of situations falls within their scope, among them epidemics, stock control, traffic movements, and queues. They may be regarded as constituting a probability theory of movement, as distinct from the classical systems in which the generating mechanism behind the observations was constant and the successive observations were independent. From 1945 onward there was a continual stream of papers on the subject, many of which were contributed by Russian and French authors [see Markov chains; Queues].
Some philosophical questions. Common to all this work was a constant re-examination of the logic of the inferential processes involved. The problem of making meaningful statements about the world on the basis of examination of only a small part of it had exercised a series of thinkers from Bacon onward, notably George Boole, John Stuart Mill, and John Venn, but it remained essentially unsolved and was regarded by some as constituting more of a philosophical puzzle than a barrier to scientific advance. The specific procedures proposed by statisticians brought the issue to a head by defining the problem of induction much more exactly, and even by exposing situations where logical minds might well reach different conclusions from the same data. This was intellectually intolerable and necessitated some very searching probing into the rather intuitive arguments by which statisticians drew their conclusions in the earlier stages of development of the subject.
Discussion has been centered on the theory of probability, in which two attitudes may be distinguished: subjective and objective [see Probability, article on INTERPRETATIONS]. Neither approach is free from difficulty. Both lead to the same calculus of probabilities in the deductive sense. The primary problem, first stated explicitly by Thomas Bayes, however, is one of induction, to which the calculus of probabilities makes no contribution except as a tool of analysis. Some authorities reject the Bayesian approach and seek for principles of inferences elsewhere. Others, recognizing that the required prior probabilities necessitate certain assumptions, nevertheless can see no better way of tackling the problem if the relative acceptability of hypotheses is to be quantified at all [see Bayesian inference]. Fortunately for the development of theoretical statistics, the philosophical problems have remained in the background, stimulating argument and a penetrating examination of the inferential process but not holding up development. In practice it is possible for two competent statisticians to differ in the interpretation of data, although if they do, the reliability of the inference is often low enough to justify further experimentation. Such cases are not very frequent, but important instances do occur; a notable one is the interpretation of the undeniable observed relationship between intensity of smoking and cancer of the lung. Differences in interpretation are particularly liable to occur in economic and social investigations because of the difficulty of performing experiments or of isolating causal influences for separate study.
Robustness and nonparametric methods. The precision of inferences in probability is sometimes bought at the expense of rather restrictive assumptions about the population of origin. For example, Student's t-test depends on the supposition that the parent population is normal. Various attempts have been made to give the inferential procedures greater generality by freeing them from these restrictions. For example, certain tests can be shown to be “robust” in the sense that they are not very sensitive to deviations from the basic assumptions [see ERRORS]. Another interesting field is concerned with tests that depend on ranks, order statistics, or even signs, and are very largely independent of the form of the parent population. These so-called distribution-free methods, which are usually easy to apply, are often surprisingly efficient [see Nonparametric statistics].
The frontiers of the subject continue to extend. Problems of statistical relationship, of estimation in complicated models, of quantification and scaling in qualitative material, and of economizing in exploratory effort are as urgent and lively as ever. The theoretical statistician ranges from questions of galactic distribution to the properties of subatomic particles, suspended, like Pascal's man, between the infinitely large and the infinitely small. The greater part of the history of his subject lies in the future.
M. G. Kendall
[The following biographies present further details on specific periods in the history of statistical method. Early Period:Babbage; Bayes; Bernoulli family; Bienaymé; Galton; Gauss; Graunt; Laplace; Moivre; Petty; Poisson; Quetelet; süSSMILCH. Modern Period: BENINI; Bortkiewicz; Fisher, R. A.; GlNI; Glrshick; Gosset; Keynes, John maynard; KÖRÖSY; Lexis; Lotka; Pearson; Spearman; Stouffer; Von mises, Richard; Von neumann; Wald; Wiener; Wilks; Willcox; Yule.]
There is no history of theoretical statistics or of statistical methodology. Westergaard 1932 is interesting as an introduction but is largely concerned with descriptive statistics. Walker 1929 has some valuable sketches of the formative period under Karl Pearson. Todhunter 1865 is a comprehensive guide to mathematical work up to Laplace and contains bibliographical information on many of the early works cited in the text of this article. David 1962 is a modern and lively account up to the time of de Moivre. The main sources for further reading are in obituaries and series of articles that appear from time to time in statistical journals, especially the “Studies in the History of Probability and Statistics” in Biometrika and occasional papers in the Journal of the American Statistical Association.
Bayes, Thomas (1764) 1958 An Essay Towards Solving A Problem in the Doctrine of Chances. Biometrika 45:296-315. → First published in Volume 53 of the Royal Society of London's Philosophical Transactions. A facsimile edition was published in 1963 by Hafner.
Bernoulli, Jacques (1713) 1899 Wahrscheinlichkeits-rechnung (Ars conjectandi). 2 vols. Leipzig: Engel-mann. → First published posthumously in Latin.
Condorcet, Marie Jean Antoine Nicolas Caritat, De 1785 Essai sur ¡'application de ¡'analyse a la probabilité des decisions rendues à la pluralité des voix. Paris: Imprimerie Royale.
Czuber, Emanuel 1898 Die Entwicklung der Wahrscheinlichkeitstheorie una ihrer Anwendungen. Jahresbericht der Deutschen Mathematikervereinigung, Vol. 7, No. 2. Leipzig: Teubner.
David, F. N. 1962 Games, Gods and Gambling: The Origins and History of Probability and Statistical Ideas From the Earliest Times to the Newtonian Era. London: Griffin; New York: Hafner.
Fisher, R. A. 1925 Theory of Statistical Estimation. Cambridge Philosophical Society, Proceedings 22:700-725. H» Reprinted in Fisher 1950.
Fisher, R. A. 1928 Moments and Product Moments of Sampling Distributions. London Mathematical Society, Proceedings 30:199-238. → Reprinted in Fisher 1950.
Fisher, R. A. (1920-1945) 1950 Contributions to Mathematical Statistics. New York: Wiley.
Huygens, Christiaan 1657 De rationciniis in ludo aleae. Pages 521-534 in Frans van Schooten, Eater-citationum mathematicarum. Leiden (Netherlands): Elsevir.
Kotz, Samuel 1965 Statistical Terminology—Russian vs. English—In the Light of the Development of Statistics in the USSR. American Statistician 19, no. 3:14-22.
Laplace, Pierre simon (1812)1820 Théorie analytique des probabilités. 3d ed., revised. Paris: Courcier.
Ore, øYstein 1953 Cardano: The Gambling Scholar. Princeton Univ. Press; Oxford Univ. Press. → Includes a translation from the Latin of Cardano's Book on Games of Chance by Sydney Henry Gould.
Pearson, Karl (1892) 1911 The Grammar of Science. 3d ed., rev. & enl. London: Black. → A paperback edition was published in 1957 by Meridian.
Poisson, SimÉOn denis 1837 Recherches sur la proba-bilité des jugements en matiere criminelle et en matiére civile, précédées des regles genérales du calcul des probabilités. Paris: Bachelier.
Simpson, Thomas 1757 Miscellaneous Tracts on Some Curious and Very Interesting Subjects in Mechanics, Physical-astronomy, and Speculative Mathematics. London: Nourse.
Todhunter, Isaac (1865) 1949 A History of the Mathematical Theory of Probability From the Time of Pascal to That of Laplace. New York: Chelsea.
Walker, Helen m. 1929 Studies in the History of Statistical Method, With Special Reference to Certain Educational Problems. Baltimore: Williams & Wilkins.
Westergaard, Harald L. 1932 Contributions to the History of Statistics. London: King.
"Statistics." International Encyclopedia of the Social Sciences. . Encyclopedia.com. (October 20, 2016). http://www.encyclopedia.com/social-sciences/applied-and-social-sciences-magazines/statistics-0
"Statistics." International Encyclopedia of the Social Sciences. . Retrieved October 20, 2016 from Encyclopedia.com: http://www.encyclopedia.com/social-sciences/applied-and-social-sciences-magazines/statistics-0
Statistics is a field of knowledge that enables an investigator to derive and evaluate conclusions about a population from sample data. In other words, statistics allow us to make generalizations about a large group based on what we find in a smaller group. The field of statistics deals with gathering, selecting, and classifying data; interpreting and analyzing data; and deriving and evaluating the validity and reliability of conclusions based on data.
Strictly speaking, the term “parameter” describes a certain aspect of a population, while a “statistic” describes a certain aspect of a sample (a representative part of the population). In common usage, most people use the word “statistic” to refer to research figures and calculations, either from information based on a sample or an entire population.
Statistics means different things to different people. To a baseball fan, statistics are information about a pitcher's earned run average or a batter's slugging percentage or home run count. To a plant manager at a distribution company, statistics are daily reports on inventory levels, absenteeism, labor efficiency, and production. To a medical researcher investigating the effects of a new drug, statistics are evidence of the success of research efforts. And to a college student, statistics are the grades made on all the exams and quizzes in a course during the semester. Today, statistics and statistical analysis are used in practically every profession, and for managers in particular, statistics have become a most valuable tool.
A set of data is a population if decisions and conclusions based on these data can be made with absolute certainty. If population data is available, the risk of arriving at incorrect decisions is completely eliminated.
But a sample is only part of the whole population. For example, statistics from the U.S. Department of Commerce state that the rental vacancy rate during the second quarter of 2006 was 9.6 percent. However, the data used to calculate this vacancy rate was not derived from all owners of rental property, but rather only a segment (“sample” in statistical terms) of the total group (or “population”) of rental property owners. A population statistic is thus a set of measured or described observations made on each elementary unit. A sample statistic, in contrast, is a measure based on a representative group taken from a population.
QUANTITATIVE AND QUALITATIVE STATISTICS
Measurable observations are called quantitative observations. Examples of measurable observations include the annual salary drawn by a BlueCross/BlueShield underwriter or the age of a graduate student in an MBA program. Both are measurable and are therefore quantitative observations.
Observations that cannot be measured are termed qualitative. Qualitative observations can only be described. Anthropologists, for instance, often use qualitative statistics to describe how one culture varies from another. Marketing researchers have increasingly used qualitative statistical techniques to describe phenomena that are not easily measured, but can instead be described and classified into meaningful categories. Here, the distinction between a population of variates (a set of measured observations) and a population of attributes (a set of described observations) is important.
Values assumed by quantitative observations are called variates. These quantitative observations are further classified as either discrete or continuous. A discrete quantitative observation can assume only a limited number of
values on a measuring scale. For example, the number of graduate students in an MBA investment class is considered discrete.
Some quantitative observations, on the other hand, can assume an infinite number of values on a measuring scale. These quantitative measures are termed continuous. How consumers feel about a particular brand is a continuous quantitative measure; the exact increments in feelings are not directly assignable to a given number. Consumers may feel more or less strongly about the taste of a hamburger, but it would be difficult to say that one consumer likes a certain hamburger twice as much as another consumer.
DESCRIPTIVE AND INFERENTIAL STATISTICS
Managers can apply some statistical technique to virtually every branch of public and private enterprise. These techniques are commonly separated into two broad categories: descriptive statistics and inferential statistics. Descriptive statistics are typically simple summary figures calculated from a set of observations. Poll results and economic data are commonly-seen descriptive statistics. For example, when the American Automobile Association (AAA) reported in May 2008 that average gas prices had topped $4 per gallon in the United States, this was a statistic based on observations of gas prices throughout the United States.
Inferential statistics are used to apply conclusions about one set of observations to reach a broader conclusion or an inference about something that has not been directly observed. For example, inferential statistics could be used to show how strongly correlated gas prices and food prices are.
Data is a collection of any number of related observations. A collection of data is called a data set. Statistical data may consist of a very large number of observations. The larger the number of observations, the greater the need to present the data in a summarized form that may omit some details, but reveals the general nature of a mass of data.
Frequency distribution allows for the compression of data into a table. The table organizes the data into classes or groups of values describing characteristics of the data. For example, students' grade distribution is one characteristic of a graduate class.
A frequency distribution shows the number of observations from the data set that fall into each category describing this characteristic. The relevant categories are defined by the user based on what he or she is trying to accomplish; in the case of grades, the categories might be each letter grade (A, B, C, etc.), pass/fail/incomplete, or grade percentage ranges. If you can determine the
|Table 1 Frequency Distribution for a Class of 25 M.B.A. Students|
frequency with which values occur in each category, you can construct a frequency distribution. A relative frequency distribution presents frequencies in terms of fractions or percentages. The sum of all relative frequency distributions equals 1.00 or 100 percent.
Table 1 illustrates both a frequency distribution and a relative frequency distribution. The frequency distribution gives a break down of the number of students in each grade category ranging from A to F, including “I” for incomplete. The relative frequency distribution takes that number and turns it into a percentage of the whole number.
The chart shows us that five out of twenty-five students, or 25 percent, received an A in the class. It is basically two different ways of analyzing the same data. This is an example of one of the advantages of statistics. The same data can be analyzed several different ways.
Decisions and conclusions can often be made with absolute certainty if a single value that describes a certain aspect of a population is determined. As noted earlier, a parameter describes an entire population, whereas a statistic describes only a sample. The following are a few of the most common types of parameter measurements used.
Aggregate Parameter. An aggregate parameter can be computed only for a population of variates. The aggregate is the sum of the values of all the variates in the population. Industry-wide sales is an example of an aggregate parameter.
Proportion. A proportion refers to a fraction of the population that possesses a certain property. The proportion is the parameter used most often in describing a population of attributes, for example, the percentage of employees over age fifty.
Arithmetic Mean. The arithmetic mean is simply the average. It is obtained by dividing the sum of all variates
in the population by the total number of variates. The arithmetic mean is used more often than the median and mode to describe the average variate in the population. It best describes the values such as the average grade of a graduate student, the average yards gained per carry by a running back, and the average calories burned during a cardiovascular workout. It also has an interesting property: the sum of the deviations of the individual variates from their arithmetic mean is always equal to zero.
Median. The median is another way of determining the “average” variate in the population. It is especially useful when the population has a particularly skewed frequency distribution; in these cases the arithmetic mean can be misleading.
To compute the median for a population of variates, the variates must be arranged first in an increasing or decreasing order. The median is the middle variate if the number of the variates is odd. For example, if you have the distribution 1, 3, 4, 8, and 9, then the median is 4 (while the mean would be 5). If the number of variates is even, the median is the arithmetic mean of the two middle variates. In some cases (under a normal distribution) the mean and median are equal or nearly equal. However, in a skewed distribution where a few large values fall into the high end or the low end of the scale, the median describes the typical or average variate more accurately than the arithmetic mean does.
Consider a population of four people who have annual incomes of $2,000, $2,500, $3,500, and $300,000—an extremely skewed distribution. If we looked only at the arithmetic mean ($77,000), we would conclude that it is a fairly wealthy population on average. By contrast, in observing the median income ($3,000) we would conclude that it is overall a quite poor population, and one with great income disparity. In this example the median provides a much more accurate view of what is “average” in this population because the single income of $300,000 does not accurately reflect the majority of the sample.
Mode. The mode is the most frequently appearing variate or attribute in a population. For example, say a class of thirty students is surveyed about their ages. The resulting frequency distribution shows us that ten students are 18 years old, sixteen students are 19 years old, and four are 20 or older. The mode for this group would be the sixteen students who are 19 years old. In other words, the category with the most students is age 19.
MEASURE OF VARIATION
Another pair of parameters, the range and the standard deviation, measures the disparity among values of the various variates comprising the population. These parameters, called measures of variation, are designed to indicate the degree of uniformity among the variates.
The range is simply the difference between the highest and lowest variate. So, in a population with incomes ranging from $15,000 to $45,000, the range is $30,000 ($45,000 - $15,000 = $30,000).
The standard deviation is an important measure of variation because it lends itself to further statistical analysis and treatment. It measures the average amount by which variates are spread around the mean. The standard deviation is a versatile tool based on yet another calculation called the variance. The variance for a population reflects how far data points are from the mean, but the variance itself is typically used to calculate other statistics rather than for direct interpretation, such as the standard deviation, which is more useful in making sense of the data.
The standard deviation is a simple but powerful adaptation of the variance. It is found simply by taking the square root of the variance. The resulting figure can be used for a variety of analyses. For example, under a normal distribution, a distance of two standard deviations from the mean encompasses approximately 95 percent of the population, and three standard deviations cover 99.7 percent.
Thus, assuming a normal distribution, if a factory produces bolts with a mean length of 7 centimeters (2.8 inches) and the standard deviation is determined to be 0.5 centimeters (0.2 inches), we would know that 95 percent of the bolts fall between 6 centimeters (2.4 inches) and 8 centimeters (3.1 inches) long, and that 99.7 percent of the bolts are between 5.5 centimeters (2.2 inches) and 8.5 centimeters (3.3 inches). This information could be compared to the product specification tolerances to determine what proportion of the output meets quality control standards.
Modern statistics may be regarded as an application of the theory of probability. A set is a collection of well-defined objects called elements of the set. The set may contain a limited or infinite number of elements. The set that consists of all elements in a population is referred to as the universal set.
Statistical experiments are those that contain two significant characteristics. One is that each experiment has several possible outcomes that can be specified in advance. The second is that we are uncertain about the outcome of each experiment. Examples of statistical experiments include rolling a die and tossing a coin. The set that consists of all possible outcomes of an experiment is called a sample space, and each element of the sample space is called a sample point.
Each sample point or outcome of an experiment is assigned a weight that measures the likelihood of its occurrence. This weight is called the probability of the sample point.
Probability is the chance that something will happen. In assigning weights or probabilities to the various sample points, two rules generally apply. The first is that probability assigned to any sample point ranges from 0 to 1. Assigning a probability of 0 means that something can never happen; a probability of 1 indicates that something will always happen. The second rule is that the sum of probabilities assigned to all sample points in the sample space must be equal to 1 (e.g., in a coin flip, the probabilities are.5 for heads and.5 for tails).
In probability theory, an event is one or more of the possible outcomes of doing something. If we toss a coin several times, each toss is an event. The activity that produces such as event is referred to in probability theory as an experiment. Events are said to be mutually exclusive if one, and only one, can take place at a time. When a list of the possible events that can result from an experiment includes every possible outcome; the list is said to be collectively exhaustive. The coin toss experiment is a good example of collective exhaustion. The end result is either a head or a tail.
There are a few theoretical approaches to probability. Two common ones are the classical approach and the relative frequency approach. Classical probability defines the probability that an event will occur as the number of outcomes favorable to the occurrence of the event divided by the total number of possible outcomes. This approach is not practical to apply in managerial situations because it makes assumptions that are unrealistic for many real-life applications. It assumes away situations that are very unlikely, but that could conceivably happen. It is like saying that when a coin is flipped ten times, there will always be exactly five heads and five tails. But how many times do you think that actually happens? Classical probability concludes that it happens every time.
The relative frequency approach is used in the insurance industry. The approach, often called the relative frequency of occurrence, defines probability as the observed relative frequency of an event in a very large number of trials, or the proportion of times that an event occurs in the long run when conditions are stable. It uses past occurrences to help predict future probabilities that the occurrences will happen again.
Actuaries use high-level mathematical and statistical calculations in order to help determine the risk that some people and some groups might pose to the insurance carrier. They perform these operations in order to get a better idea of how and when situations that would cause customers to file claims and cost the company money might occur. The value of this is that it gives the insurance company an estimate of how much to charge for insurance premiums. For example, customers who smoke cigarettes are in higher risk group than those who do not smoke. The insurance company charges higher premiums to smokers to make up for the added risk.
The objective of sampling is to select that part which is representative of the entire population. Sample designs are classified into probability samples and nonprobability samples. A sample is a probability sample if each unit in the population is given some chance of being selected. The probability of selecting each unit must be known. With a probability sample, the risk of incorrect decisions and conclusions can be measured using the theory of probability.
A sample is a nonprobability sample when some units in the population are not given any chance of being selected, and when the probability of selecting any unit into the sample cannot be determined or is not known. For this reason, there is no means of measuring the risk of making erroneous conclusions derived from nonprobability samples. Since the reliability of the results of non-probability samples cannot be measured, such samples do not lend themselves to statistical treatment and analysis. Convenience and judgment samples are the most common types of non-probability samples.
Among its many other applications, sampling is used in some manufacturing and distributing settings as a means of quality control. For example, a sample of 5 percent may be inspected for quality from a predetermined number of units of a product. That sample, if drawn properly, should indicate the total percentage of quality problems for the entire population, within a known margin of error (e.g., an inspector may be able to say with 95 percent certainty that the product defect rate is 4 percent, plus or minus 1 percent).
In many companies, if the defect rate is too high, then the processes and machinery are checked for errors. When the errors are found to be human errors, then a statistical standard is usually set for the acceptable error percentage for laborers.
In sum, samples provide estimates of what we would discover if we knew everything about an entire population. By taking only a representative sample of the population and using appropriate statistical techniques, we can infer certain things, not with absolute precision, but certainly within specified levels of precision.
SEE ALSO Data Processing and Data Management; Forecasting; Models and Modeling; Planning; Statistical Process Control and Six Sigma
Anderson, David, Dennis Sweeney, and Thomas Williams. Essentials of Statistics for Business and Economics. 5th ed. Cincinnati, OH: South-Western College Publications, 2008.
Black, Ken. Business Statistics: For Contemporary Decision Making. 5th ed. Hoboken, NJ: Wiley, 2007.
Hogg, Robert, and Elliot Tanas. Probability and Statistical Inference. 7th ed. Upper Saddle River, NJ: Prentice Hall, 2005.
Lind, Douglas A. Basic Statistics for Business & Economics. Boston: McGraw-Hill, 2008.
"Statistics." Encyclopedia of Management. . Encyclopedia.com. (October 20, 2016). http://www.encyclopedia.com/management/encyclopedias-almanacs-transcripts-and-maps/statistics
"Statistics." Encyclopedia of Management. . Retrieved October 20, 2016 from Encyclopedia.com: http://www.encyclopedia.com/management/encyclopedias-almanacs-transcripts-and-maps/statistics
Statistics is that branch of mathematics devoted to the collection, compilation, display, and interpretation of numerical data. The term statistics actually has two quite different meanings. In one case, it can refer to any set of numbers that has been collected and then arranged in some format that makes them easy to read and understand. In the second case, the term refers to a variety of mathematical procedures used to determine what those data may mean, if anything.
An example of the first kind of statistic is the data on female African Americans in various age groups, shown in Table 1. The table summarizes some interesting information but does not, in and of itself, seem to have any particular meaning. An example of the second kind of statistic is the data collected during the test of a new drug, shown in Table 2. This table not only summarizes information collected in the experiment, but also, presumably, can be used to determine the effectiveness of the drug.
Populations and samples
Two fundamental concepts used in statistical analysis are population and sample. The term population refers to a complete set of individuals, objects, or events that belong to some category. For example, all of the players who are employed by major league baseball teams make up the population of professional major league baseball players. The term sample refers to some subset of a population that is representative of the total population. For example, one might go down the complete list of all major league baseball players and select every tenth name on the list. That subset of every tenth name would then make up a sample of all professional major league baseball players.
Words to Know
Deviation: The difference between any one measurement and the mean of the set of scores.
Histogram: A bar graph that shows the frequency distribution of a variable by means of solid bars without any space between them.
Mean: A measure of central tendency found by adding all the numbers in a set and dividing by the number of numbers.
Measure of central tendency: Average.
Measure of variability: A general term for any method of measuring the spread of measurements around some measure of central tendency.
Median: The middle value in a set of measurements when those measurements are arranged in sequence from least to greatest.
Mode: The value that occurs most frequently in any set of measurements.
Normal curve: A frequency distribution curve with a symmetrical, bellshaped appearance.
Population: A complete set of individuals, objects, or events that belong to some category.
Range: The difference between the largest and smallest numbers in a set of observations.
Sample: A subset of actual observations taken from any larger set of possible observations.
Samples are important in statistical studies because it is almost never possible to collect data from all members in a population. For example, suppose one would like to know how many professional baseball players are Republicans and how many are Democrats. One way to answer that question would be to ask that question of every professional baseball player. However, it might be difficult to get in touch with every player and to get every player to respond. The larger the population, the more difficult it is to get data from every member of the population.
Most statistical studies, therefore, select a sample of individuals from a population to interview. One could use, for example the every-tenth-name list mentioned above to collect data about the political parties to which baseball players belong. That approach would be easier and less expensive than contacting everyone in the population.
The problem with using samples, however, is to be certain that the members of the sample are typical of the members of the population as a whole. If someone decided to interview only those baseball players who live in New York City, for example, the sample would not be a good one. People who live in New York City may have very different political concerns than people who live in the rest of the country.
One of the most important problems in any statistical study, then, is to collect a fair sample from a population. That fair sample is called a random sample because it is arranged in such a way that everyone in the population has an equal chance of being selected. Statisticians have now developed a number of techniques for selecting random samples for their studies.
Once data have been collected on some particular subject, those data must be displayed in some format that makes it easy for readers to see and understand. Table 1 makes it very easy for anyone who wants to know the number of female African Americans in any particular age group.
In general, the most common methods for displaying data are tables and charts or graphs. One of the most common types of graphs used is the display of data as a histogram. A histogram is a bar graph in which each bar represents some particular variable, and the height of each bar represents the number of cases of that variable. For example, one could make a histogram of the information in Table 1 by drawing six bars, one representing each of the six age groups shown in the table. The height of each bar would correspond to the number of individuals in each age group. The bar farthest to the left, representing the age group 0 to 19, would be much higher than any other bar because there are more individuals in that age group than in any other. The bar second from the right would be the shortest because it represents the age group with the fewest numbers of individuals.
Another way to represent data is called a frequency distribution curve. Suppose that the data in Table 1 were arranged so that the number of female African Americans for every age were represented. The table would have to show the number of individuals 1 year of age, those 2 years of age, those 3 years of age, and so on to the oldest living female African American. One could also make a histogram of these data. But a more efficient way would be to draw a line graph with each point on the graph standing for the number of individuals of each age. Such a graph would be called a frequency distribution curve because it shows the frequency (number of cases) for each different category (age group, in this case).
Many phenomena produce distribution curves that have a very distinctive shape, high in the middle and sloping off to either side. These distribution curves are sometimes called "bell curves" because their shape resembles a bell. For example, suppose you record the average weight of 10,000 American 14-year-old boys. You would probably find that the majority of those boys had a weight of perhaps 130 pounds. A smaller number might have weights of 150 or 110 pounds, a still smaller number, weights of 170 or 90 pounds, and very few boys with weights of 190 or 70 pounds. The graph you get for this measurement probably has a peak at the center (around 130 pounds) with downward slopes on either side of the center. This graph would reflect a normal distribution of weights.
Table 1. Number of Female African Americans in Various Age Groups
Table 2. Statistics
Other phenomena do not exhibit normal distributions. At one time in the United States, the grades received by students in high school followed a normal distribution. The most common grade by far was a C, with fewer Bs and Ds, and fewer still As and Fs. In fact, grade distribution has for many years been used as an example of normal distribution.
Today, however, that situation has changed. The majority of grades received by students in high schools tend to be As and Bs, with fewer Cs, Ds and Fs. A distribution that is lopsided on one side or the other of the center of the graph is said to be a skewed distribution.
Measures of central tendency
Once a person has collected a mass of data, these data can be manipulated by a great variety of statistical techniques. Some of the most familiar of these techniques fall under the category of measures of central tendency. By measures of central tendency, we mean what the average of a set of data is. The problem is that the term average can have different meanings—mean, median, and mode among them.
In order to understand the differences of these three measures, consider a classroom consisting of only six students. A study of the six students shows that their family incomes are as follows: $20,000; $25,000; $20,000; $30,000; $27,500; and $150,000. What is the average income for the students in this classroom?
The measure of central tendency that most students learn in school is the mean. The mean for any set of numbers is found by adding all the numbers and dividing by the number of numbers. In this example, the mean would be equal to $20,000 + $25,000 + $20,000 + $30,000 + $27,500 + $150,000 ÷ 6 = $45,417.
But how much useful information does this answer give about the six students in the classroom? The mean that has been calculated ($45,417) is greater than the household income of five of the six students. Another way of calculating central tendency is known as the median. The median value of a set of measurements is the middle value when the measurements are arranged in order from least to greatest. When there are an even number of measurements, the median is half way between the middle two measurements. In the above example, the measurements can be rearranged from least to greatest: $20,000; $20,000; $25,000; $27,500; $30,000; $150,000. In this case, the middle two measurements are $25,000 and $27,500, and half way between them is $26,250, the median in this case. You can see that the median in this example gives a better view of the household incomes for the classroom than does the mean.
A third measure of central tendency is the mode. The mode is the value most frequently observed in a study. In the household income study, the mode is $20,000 since it is the value found most often in the study. Each measure of central tendency has certain advantages and disadvantages and is used, therefore, under certain special circumstances.
Measures of variability
Suppose that a teacher gave the same test four different times to two different classes and obtained the following results: Class 1: 80 percent, 80 percent, 80 percent, 80 percent, 80 percent; Class 2: 60 percent, 70 percent, 80 percent, 90 percent, 100 percent. If you calculate the mean for both sets of scores, you get the same answer: 80 percent. But the collection of scores from which this mean was obtained was very different in the two cases. The way that statisticians have of distinguishing cases such as this is known as measuring the variability of the sample. As with measures of central tendency, there are a number of ways of measuring the variability of a sample.
Probably the simplest method for measuring variability is to find the range of the sample, that is, the difference between the largest and smallest observation. The range of measurements in Class 1 is 0, and the range in class 2 is 40 percent. Simply knowing that fact gives a much better understanding of the data obtained from the two classes. In class 1, the mean was 80 percent, and the range was 0, but in class 2, the mean was 80 percent, and the range was 40 percent.
Other measures of variability are based on the difference between any one measurement and the mean of the set of scores. This measure is known as the deviation. As you can imagine, the greater the difference among measurements, the greater the variability. In the case of Class 2 above, the deviation for the first measurement is 20 percent (80 percent − 60 percent), and the deviation for the second measurement is 10 percent (80 percent − 70 percent).
Probably the most common measures of variability used by statisticians are called the variance and standard deviation. Variance is defined as the mean of the squared deviations of a set of measurements. Calculating the variance is a somewhat complicated task. One has to find each of the deviations in the set of measurements, square each one, add all the squares, and divide by the number of measurements. In the example above, the variance would be equal to [(20)2 + (10)2 + (0)2 + (10)2 + (20)2] ÷ 5 = 200.
For a number of reasons, the variance is used less often in statistics than is the standard deviation. The standard deviation is the square root of the variance, in this case, √200 = 14.1. The standard deviation is useful because in any normal distribution, a large fraction of the measurements (about 68 percent) are located within one standard deviation of the mean. Another 27 percent (for a total of 95 percent of all measurements) lie within two standard deviations of the mean.
Other statistical tests
Many other kinds of statistical tests have been invented to find out the meaning of data. Look at the data presented in Table 2. Those data were collected in an experiment to see if a new kind of drug was effective in curing a disease. The people in the experimental group received the drug, while those in the control group received a placebo, a pill that looked like the drug but contained nothing more than starch. The table shows the number of people who got better ("Improved") and those who didn't ("Not Improved") in each group. Was the drug effective in curing the disease?
You might try to guess the answer to that question just by looking at the table. But is the 62 number in the Experimental Group really significantly greater than the 45 in the Control Group? Statisticians use the term significant to indicate that some result has occurred more often than might be expected purely on the basis of chance alone.
Statistical tests have been developed to answer this question mathematically. In this example, the test is based on the fact that each group was made up of 100 people. Purely on the basis of chance alone, then, one might expect 50 people in each group to get better and 50 not to get better. If the data show results different from that distribution, the results could have been caused by the new drug.
The mathematical problem, then, is to compare the 62 observed in the first cell with the 50 expected, the 38 observed in the second cell with the 50 expected, the 45 observed in the third cell with the 50 expected, and the 55 observed in the fourth cell with the 50 expected.
At first glance, it would appear that the new medication was at least partially successful since the number of those who took it and improved (62) was greater than the number who took it and did not improve (38). But a type of statistical test called a chi square test will give a more precise answer. The chi square test involves comparing the observed frequencies in Table 2 with a set of expected frequencies that can be calculated from the number of individuals taking the tests. The value of chi square calculated can then be compared to values in a table to see how likely the results were due to chance or to some real effect of the medication.
Another common technique used for analyzing numerical data is called the correlation coefficient. The correlation coefficient shows how closely two variables are related to each other. For example, many medical studies have attempted to determine the connection between smoking and lung cancer. The question is whether heavy smokers are more likely to develop lung cancer.
One way to do such studies is to measure the amount of smoking a person has done in her or his lifetime and compare the rate of lung cancer among those individuals. A mathematical formula allows the researcher to calculate the correlation coefficient between these two sets of data—rate of smoking and risk for lung cancer. That coefficient can range between 1.0, meaning the two are perfectly correlated, and −1.0, meaning the two have an inverse relationship (when one is high, the other is low).
The correlation test is a good example of the limitations of statistical analysis. Suppose that the correlation coefficient in the example above turned out to be 1.0. That number would mean that people who smoke the most are always the most likely to develop lung cancer. But what the correlation coefficient does not say is what the cause and effect relationship, if any, might be. It does not say that smoking causes cancer.
Chi square and correlation coefficient are only two of dozens of statistical tests now available for use by researchers. The specific kinds of data collected and the kinds of information a researcher wants to obtain from these data determine the specific test to be used.
"Statistics." UXL Encyclopedia of Science. . Encyclopedia.com. (October 20, 2016). http://www.encyclopedia.com/science/encyclopedias-almanacs-transcripts-and-maps/statistics
"Statistics." UXL Encyclopedia of Science. . Retrieved October 20, 2016 from Encyclopedia.com: http://www.encyclopedia.com/science/encyclopedias-almanacs-transcripts-and-maps/statistics
STATISTICS, the scientific discipline that deals with the collection, classification, analysis, and interpretation of numerical facts or data, was invented primarily in the nineteenth and twentieth centuries in Western Europe and North America. In the eighteenth century, when the term came into use, "statistics" referred to a descriptive analysis of the situation of a political state—its people, resources, and social life. In the early nineteenth century, the term came to carry the specific connotation of a quantitative description and analysis of the various aspects of a state or other social or natural phenomenon. Many statistical associations were founded in the 1830s, including the Statistical Society of London (later the Royal Statistical Society) in 1833 and the American Statistical Association in 1839.
Early Use of Statistics
Although scientific claims were made for the statistical enterprise almost from the beginning, it had few characteristics of an academic discipline before the twentieth century, except as a "state science" or Staatswissenschaft in parts of central Europe. The role of statistics as a tool of politics, administration, and reform defined its character in the United States throughout the nineteenth century. Advocates of statistics, within government and among private intellectuals, argued that their new field would supply important political knowledge. Statistics could provide governing elites with concise, systematic, and authoritative information on the demographic, moral, medical, and economic characteristics of populations. In this view, statistical knowledge was useful, persuasive, and hence powerful, because it could capture the aggregate and the typical, the relationship between the part and the whole, and when data were available, their trajectory over time. It was particularly appropriate to describe the new arrays of social groups in rapidly growing, industrializing societies, the character and trajectory of social processes in far-flung empires, and the behavior and characteristics of newly mobilized political actors in the age of democratic revolutions.
One strand in this development was the creation of data sets and the development of rules and techniques of data collection and classification. In America, the earliest statistical works were descriptions of the American population and economy dating from the colonial period. British officials watched closely the demographic development of the colonies. By the time of the American Revolution (1775–1783), colonial leaders were aware of American demographic realities, and of the value of statistics. To apportion the tax burden and raise troops for the revolution, Congress turned to population and wealth measures to assess the differential capacities among the colonies. In 1787, the framers institutionalized the national population census to apportion seats among the states in the new Congress, and required that statistics on revenues and expenditures of the national state be collected and published by the new government. Almanacs, statistical gazetteers, and the routine publication of numerical data in the press signaled the growth of the field. Government activities produced election numbers, shipping data from tariff payments, value of land sales, and population distributions. In the early nineteenth century, reform organizations and the new statistical societies published data on the moral status of the society in the form of data on church pews filled, prostitutes arrested, patterns of disease, and drunkards reformed. The collection and publication of statistics thus expanded in both government and private organizations.
Professionalization of Statistics
The professionalization of the discipline began in the late nineteenth century. An International Statistical Congress, made up of government representatives from many states, met for the first time in 1853 and set about the impossible task of standardizing statistical categories across nations. In 1885, a new, more academic organization was created, the International Statistical Institute. Statistical work grew in the new federal agencies such as the Departments of Agriculture and Education in the 1860s and 1870s. The annual Statistical Abstract of the United States first appeared in 1878. The states began to create bureaus of labor statistics to collect data on wages, prices, strikes, and working conditions in industry, the first in Massachusetts in 1869; the federal Bureau of Labor, now the Bureau of Labor Statistics, was created in 1884. Statistical analysis became a university subject in the United States with Richmond Mayo Smith's text and course at Columbia University in the 1880s. Governments created formal posts for "statisticians" in government service, and research organizations devoted to the development of the field emerged. The initial claims of the field were descriptive, but soon, leaders also claimed the capacity to draw inferences from data.
Throughout the nineteenth century, a strong statistical ethic favored complete enumerations whenever possible, to avoid what seemed the speculative excess of early modern "political arithmetic." In the first decades of the twentieth century, there were increasingly influential efforts to define formal procedures of sampling. Agricultural economists in the U.S. Department of Agriculture were pioneers of such methods. By the 1930s, sampling was becoming common in U.S. government statistics. Increasingly, this was grounded in the mathematical methods of probability theory, which favored random rather than "purposive" samples. A 1934 paper by the Polish-born Jerzy Neyman, who was then in England but would soon emigrate to America, helped to define the methods of random sampling. At almost the same time, a notorious failure of indiscriminate large-scale polling in the 1936 election—predicting a landslide victory by Alf Landon over Franklin D. Roosevelt—gave credence to the more mathematical procedures.
Tools and Strategies
The new statistics of the twentieth century was defined not by an object of study—society—nor by counting and classifying, but by its mathematical tools, and by strategies of designing and analyzing observational and experimental data. The mathematics was grounded in an eighteenth-century tradition of probability theory, and was first institutionalized as a mathematical statistics in the 1890s by the English biometrician and eugenicist Karl Pearson. The other crucial founding figure was Sir R. A. Fisher, also an English student of quantitative biology and eugenics, whose statistical strategies of experimental design and analysis date from the 1920s. Pearson and Fisher were particularly influential in the United States, where quantification was associated with Progressive reforms of political and economic life. A biometric movement grew up in the United States under the leadership of scientists such as Raymond Pearl, who had been a postdoctoral student in Pearson's laboratory in London. Economics, also, was highly responsive to the new statistical methods, and deployed them to find trends, correlate variables, and detect and analyze business cycles. The Cowles Commission, set up in 1932 and housed at the University of Chicago in 1939, deployed and created statistical methods to investigate the causes of the worldwide depression of that decade. An international Econometric Society was established at about the same time, in 1930, adapting its name from Pearson's pioneering journal Biometrika.
Also prominent among the leading statistical fields in America were agriculture and psychology. Both had statistical traditions reaching back into the nineteenth century, and both were particularly receptive to new statistical tools. Fisher had worked out his methods of experimental design and tests of statistical significance with particular reference to agriculture. In later years he often visited America, where he was associated most closely with a statistical group at Iowa State University led by George Snedecor. The agriculturists divided their fields into many plots and assigned them randomly to experimental and control groups in order to determine, for example, whether a fertilizer treatment significantly increased crop yields. This strategy of collective experiments and randomized treatment also became the model for much of psychology, and especially educational psychology, where the role of the manure (the treatment) was now filled by novel teaching methods or curricular innovations to test for differences in educational achievement. The new experimental psychology was closely tied to strategies for sorting students using tests of intelligence and aptitude in the massively expanded public school systems of the late nineteenth and early twentieth centuries.
The methods of twentieth-century statistics also had a decisive role in medicine. The randomized clinical trial was also in many ways a British innovation, exemplified by a famous test of streptomycin in the treatment of tuberculosis just after World War II (1939–1945). It quickly became important also in America, where medical schools soon began to form departments of biostatistics. Statistics provided a means to coordinate treatments by many physicians in large-scale medical trials, which provided, in effect, a basis for regulating the practice of medicine. By the 1960s, statistical results had acquired something like statutory authority in the evaluation of pharmaceuticals. Not least among the sources of their appeal was the presumed objectivity of their results. The "gold standard" was a thoroughly impersonal process—a well-designed experiment generating quantitative results that could be analyzed by accepted statistical methods to give an un-ambiguous result.
Historical analysis was fairly closely tied to the field of statistics in the nineteenth century, when statistical work focused primarily on developing data and information systems to analyze "state and society" questions. Carroll Wright, first Commissioner of Labor, often quoted August L. von Schloezer's aphorism that "history is statistics ever advancing, statistics is history standing still." The twentieth century turn in statistics to experimental design and the analysis of biological processes broke that link, which was tenuously restored with the development of cliometrics, or quantitative history, in the 1960s and 1970s. But unlike the social sciences of economics, political science, psychology, and sociology, the field of history did not fully restore its relationship with statistics, for example, by making such training a graduate degree requirement. Thus the use of statistical analysis and "statistics" in the form of data in historical writing has remained a subfield of the American historical writing as history has eschewed a claim to being a "scientific" discipline.
Statistics as a field embraces the scientific ideal. That ideal, which replaces personal judgment with impersonal law, resonates with an American political tradition reaching back to the eighteenth century. The place of science, and especially statistics, as a source of such authority grew enormously in the twentieth century, as a greatly expanded state was increasingly compelled to make decisions in public, and to defend them against challenges.
Anderson, Margo. The American Census: A Social History. New Haven, Conn.: Yale University Press, 1988.
———. American Medicine and Statistical Thinking, 1800–1860. Cambridge, Mass.: Harvard University Press, 1984.
Cohen, Patricia Cline. A Calculating People: The Spread of Numeracy in Early America. Chicago: University of Chicago Press, 1982.
Cullen, M. J. The Statistical Movement in Early Victorian Britain: The Foundations of Empirical Social Research. New York: Barnes and Noble, 1975.
Curtis, Bruce. The Politics of Population: State Formation, Statistics, and the Census of Canada, 1840–1875. Toronto: University of Toronto Press, 2001.
Desrosières, Alan. The Politics of Large Numbers: A History of Statistical Reasoning (English translation of Alain Desrosieres 1993 study, La politique des grands nombres: Histoire de la raison statistique). Cambridge, Mass.: Harvard University Press, 1998.
Gigerenzer, G., et al. The Empire of Chance: How Probability Changed Science and Everyday Life. Cambridge, Mass.: Cambridge University Press, 1989.
Glass, D. V. Numbering the People: The Eighteenth-Century Population Controversy and the Development of Census and Vital Statistics in Britain. New York: D.C. Heath, 1973.
Marks, Harry M. The Progress of Experiment: Science and Therapeutic Reform in the United States, 1900–1990. New York: Cambridge University Press, 1997.
Morgan, Mary S. The History of Econometric Ideas. New York: Cambridge University Press, 1990.
Patriarca, Silvana. Numbers and Nationhood: Writing Statistics in Nineteenth-Century Italy. New York: Cambridge University Press, 1996.
Porter, Theodore M. The Rise of Statistical Thinking, 1820–1900. Princeton, N.J.: Princeton University Press, 1986.
———. Trust in Numbers: The Pursuit of Objectivity in Science and Public Life. Princeton, N.J.: Princeton University Press, 1995.
Stigler, Stephen M. The History of Statistics: The Measurement of Uncertainty Before 1900. Cambridge, Mass.: Belknap Press of Harvard University Press, 1986.
———. Statistics on the Table: The History of Statistical Concepts and Methods. Cambridge, Mass.: Harvard University Press, 1999.
See alsoCensus, Bureau of the ; Demography and Demo-graphic Trends .
"Statistics." Dictionary of American History. . Encyclopedia.com. (October 20, 2016). http://www.encyclopedia.com/history/dictionaries-thesauruses-pictures-and-press-releases/statistics
"Statistics." Dictionary of American History. . Retrieved October 20, 2016 from Encyclopedia.com: http://www.encyclopedia.com/history/dictionaries-thesauruses-pictures-and-press-releases/statistics
Statistics is a discipline that deals with data: summarizing them, organizing them, finding patterns, and making inferences. Prior to 1850 the word statistics simply referred to sets of facts, usually numerical, that described aspects of the state; that meaning is still seen in the various sets of government statistics, for example the monthly report on the nation’s unemployment rate and the voluminous tables produced in the wake of each decennial census. During the twentieth century, as a result of the work of Karl Pearson, Ronald Fisher, Jerzy Neyman, Egon Pearson, John Tukey, and others, the term came to be used much more broadly to include theories and techniques for the presentation and analyses of such data and for drawing inferences from them. Two works by Stephen Stigler, The History of Statistics: The Measurement of Uncertainty before 1900 (1986) and Statistics on the Table: The History of Statistical Concepts and Methods (1999) offer broad and readable accounts of the history of statistics.
Although often taught in departments of mathematics, statistics is much more than a branch of applied mathematics. It uses the mathematics of probability theory in many of its applications and finite mathematics and the calculus to derive many of its basic theoretical concepts, but it is a separate discipline that requires its practitioners to understand data as well as mathematics.
In a sense, statistics is mainly concerned with variability. If every object of the same class were the same, we would have no need for statistics. If all peas were indeed alike, we could measure just one and know all about peas. If all families reacted similarly to an income supplement, we would have no need to mount a large scale negative income tax experiment. If all individuals held the same opinion on an issue of the day, we would only need to ask one person’s opinion and we would need to take no particular care in how we chose that person. Variability, however, is a fact of life and so statistics is needed to help reveal patterns in the face of variability.
Statistics is used in the collection of data in several ways. If the data are to be collected via an experiment, statistical theory directs how to design that experiment in such a way that it will yield maximum information. The principles of replication (to allow the measurement of variability), control (to eliminate known sources of extraneous variability), and randomization (to “even out” unknown sources of variation) as enunciated by Fisher in his 1935 book The Design of Experiments help ensure that if differences are found between those receiving experimental treatment(s) and those in control group(s), those differences can be attributed to the treatment(s) rather than to preexisting differences between the groups or to experimental error. If the data are to be collected via a sample survey, the principles of probability sampling ensure that the findings can be generalized to the population from which the sample was drawn. Variations on simple random sampling (which is analogous to drawing numbers out of a hat) take advantage of known properties of a population in order to make the sampling more efficient. The technique of stratified sampling is analogous to blocking in experimental design and takes advantage of similarities in units of the population to control variability.
Once data are collected, via experiments, sample surveys, censuses, or other means, they rarely speak for themselves. There is variability, owing to the intrinsic variability of the units themselves or to their reactions to the experimental treatments, or to errors made in the measuring process itself. Statistical techniques for measuring the central tendency of a variable (e.g., means, medians) clear away variability and make it possible to view patterns and make comparisons across groups. Measures of the variability of a variable (e.g., ranges and standard deviations) give information on the spread of the data—within a group and in comparisons between groups. There are also summarization techniques of correlation and regression to display the patterns of relations between variables—for example, how does a nation’s GDP per capita relate to its literacy rate? These numerical techniques work hand in hand with graphical techniques (e.g., histograms, scattergrams) to reveal patterns in the data. Indeed, using numerical summaries without examining graphical representations of the data can often be misleading. Of course, there are many more complicated and sophisticated summary measures (e.g., multiple regression) and graphical techniques (e.g., residual plots) that aid in the summarization of data. Much of modern data analysis, especially as developed by John Tukey, relies on less conventional measures, on transformations of data, and on novel graphical techniques. Such procedures as correspondence analysis and data mining harness the power of modern computing to search for patterns in very large datasets.
Perhaps the most important use of statistics, however, is in making inferences. One is rarely interested merely in reactions of subjects in an experiment or the answers from members of a sample; instead one wishes to make generalizations to people who are like the experimental subjects or inferences about the population from which the sample was drawn. There are two major modes of making such inference.
Classical or frequentist inference (the mode that has been most often taught and used in the social sciences) conceptualizes the current experiment or sample as one from an infinite number of such procedures carried out in the same way. It then uses the principles codified by Fisher and refined by Neyman and Pearson to ask whether the differences found in an experiment or from a sample survey are sufficiently large to be unlikely to have happened by mere chance. Specifically it takes the stance of positing a null hypothesis that is the opposite of what the investigator believes to be true and has set out to prove. If the outcome of the experiment (or the sample quantity) or one more extreme is unlikely to have occurred if the null hypothesis is true, then the null hypothesis is rejected. Conventionally if the probability of the outcome (or one more extreme) occurring when the null hypothesis is true is less than .05 (or sometimes .01), then the result is declared “statistically significant.”
Frequentists also carry out estimation by putting a confidence interval around a quantity measured from the sample to infer what the corresponding quantity in the population is. For example, if a sample survey reports the percentage in the sample who favor a particular candidate to be 55 percent and gives a 95 percent confidence interval as 52 to 58 percent, the meaning is that a procedure has been followed that gives an interval that covers the true population percent 95 percent of the time. The frequentist does not know (and is not able to put a probability on) whether in any particular case the interval covers the true population percent—the confidence is in the procedure, not in the interval itself. Further, the interval takes into account only what is known as sampling error, the variation among the conceptually infinite number of replications of the current procedure. It does not take into account non-sampling error arising from such problems in data collection as poorly worded questions, nonresponse, and attrition from a sample.
In order for these mechanisms of classical statistics to be used appropriately, a probability mechanism (probability sampling or randomization) must have been used to collect the data. In the social sciences this caution is often ignored; statistical inference is performed on data collected via non-probabilistic means and even on complete enumerations. There is little statistical theory to justify such applications, although superpopulation models are sometimes invoked to justify them and social scientists often argue that the means by which the data were accumulated resemble a random process.
Since the 1970s there has been a major renewal of interest in what was historically called inverse probability and is currently called Bayesian inference (after the English nonconformist minister and—during his lifetime—unpublished mathematician Thomas Bayes [1701?–1761]). Admitting the experimenter’s or analyst’s subjective prior distribution formally into the analysis, Bayesian inference uses Bayes’ theorem (which is an accepted theorem of probability for both frequentists and Bayesians) to combine the prior distribution with the data from the current investigation to update the probability that the hypotheses being investigated is true. Note that Bayesians do speak of the probability of a hypothesis being true while frequentists must phrase their conclusions in terms of the probability of outcomes when the null hypothesis is true. Further, Bayesians construct credibility intervals, for which, unlike the frequentists’ confidence intervals, it is proper to speak of the probability that the population quantity falls in the interval, because in the Bayesian stance population parameters are viewed as having probability distributions. For a frequentist, a population parameter is a fixed, albeit usually unknown, constant. Much of the revival of interest in Bayesian analysis has happened in the wake of advances in computing that make it possible to use approximations of previously intractable models.
While the distinction between Bayesians and frequentists has been fairly sharp, as Stephen E. Fienberg and Joseph B. Kadane (2001) note the two schools are coming together, with Bayesians paying increasing attention to frequentist properties of Bayesian procedures and frequentists increasingly using hierarchical models.
Two much more detailed descriptions of the field of statistics and its ramifications than is possible here are given by William H. Kruskal (1968) and Fienberg and Kadane (2001).
SEE ALSO Bayes’ Theorem; Bayesian Econometrics; Classical Statistical Analysis; Econometric Decomposition; Mathematics in the Social Sciences; Methods, Quantitative; Path Analysis; Pearson, Karl; Probability; Random Samples; Recursive Models; Sampling; Surveys, Sample; Variance; Variance-Covariance Matrix
Fienberg, Stephen E., and Joseph B. Kadane. 2001. Statistics: The Field. In International Encyclopedia of the Social and Behavioral Sciences, ed. Neil J. Smelser and Paul B. Baltes, 15085–15090. Oxford, U.K.: Elsevier.
Fisher, Ronald A. 1935. The Design of Experiments. Edinburgh: Oliver and Boyd.
Kruskal, William H. 1968. Statistics: The Field. In International Encyclopedia of the Social Sciences, ed. David L. Sills, vol. 15, 206–224. New York: Macmillan.
Stigler, Stephen M. 1986. The History of Statistics: The Measurement of Uncertainty before 1900. Cambridge, MA: Harvard University Press.
Stigler, Stephen M. 1999. Statistics on the Table: The History of Statistical Concepts and Methods. Cambridge, MA: Harvard University Press.
Judith M. Tanur
"Statistics." International Encyclopedia of the Social Sciences. . Encyclopedia.com. (October 20, 2016). http://www.encyclopedia.com/social-sciences/applied-and-social-sciences-magazines/statistics
"Statistics." International Encyclopedia of the Social Sciences. . Retrieved October 20, 2016 from Encyclopedia.com: http://www.encyclopedia.com/social-sciences/applied-and-social-sciences-magazines/statistics
Statistics is the set of mathematical tools and techniques that are used to analyze data. In genetics, statistical tests are crucial for determining if a particular chromosomal region is likely to contain a disease gene, for instance, or for expressing the certainty with which a treatment can be said to be effective.
Statistics is a relatively new science, with most of the important developments occurring with the last 100 years. Motivation for statistics as a formal scientific discipline came from a need to summarize and draw conclusions from experimental data. For example, Sir Ronald Aylmer Fisher, Karl Pearson, and Sir Francis Galton each made significant contributions to early statistics in response to their need to analyze experimental agricultural and biological data. For example, one of Fisher's interests was whether crop yield could be predicted from meteorological readings. This problem was one of several that motivated Fisher to develop some of the early methods of data analysis. Much of modern statistics can be categorized as exploratory data analysis, point estimation, or hypothesis testing.
The goal of exploratory data analysis is to summarize and visualize data and information in a way that facilitates the identification of trends or interesting patterns that are relevant to the question at hand. A fundamental exploratory data-analysis tool is the histogram, which describes the frequency with which various outcomes occur. Histograms summarize the distribution of the outcomes and facilitate the comparison of outcomes from different experiments. Histograms are usually plotted as bar plots, with the range of outcomes plotted on the x-axis and the frequency of the individual outcome represented by a bar on the y-axis. For instance, one might use a histogram to describe the number of people in a population with each of the different genotypes for the ApoE alleles , which influence the risk of Alzheimer's disease.
The range of outcomes from an experiment are also described mathematically by their central tendency and their dispersion. Central tendency is a measure of the center of the distribution. This can be characterized by the mean (the arithmetic average) of the outcomes or by the median, which is the value above and below which the number of outcomes is the same. The mean of 3, 4, and 8 is 5, whereas the median is 4. The median length of response to a gene therapy trial might be 30 days, meaning as many people had less than 30 days' benefit as had more than that. The mean might be considerably more—if one person benefited for 180 days, for instance.
Dispersion is a measure of how spread out the outcomes of the random variable are from their mean. It is characterized by the variance or standard deviation. The spread of the data can often be as important as the central tendency in estimating the value of the results. For instance, suppose the median number of errors in a gene-sequencing procedure was 3 per 10,000 bases sequenced. This error rate might be acceptable if the range that was found in 100 trials was between 0 and 5 errors, but it would be unacceptable if the range was between 0 and 150 errors. The occasional large number of errors makes the data from any particular procedure suspect.
Another important concept in statistics is that of populations and samples. The population represents every possible experimental unit that could be measured. For example, every zebra on the continent of Africa might represent a population. If we were interested in the mean genetic diversity of zebras in Africa, it would be nearly impossible to actually analyze the DNA of every single zebra; neither can we sequence the entire DNA of any individual. Therefore we must take a random selection of some smaller number of zebras and some smaller amount of DNA, and then use the mean differences among these zebras to make inferences about the mean diversity in the entire population.
Any summary measure of the data, such as the mean of variance in a subset of the population, is called a sample statistic. The summary measure of the entire group is called a population parameter. Therefore, we use statistics to estimate parameters. Much of statistics is concerned with the accuracy of parameter estimates. This is the statistical science of point estimation.
The final major discipline of statistics is hypothesis testing. All scientific investigations begin with a motivating question. For example, do identical twins have a higher likelihood than fraternal twins of both developing alcoholism ?
From the question, two types of hypotheses are derived. The first is called the null hypothesis. This is generally a theory about the value of one or more population parameters and is the status quo, or what is commonly believed or accepted. In the case of the twins, the null hypothesis might be that the rates of concordance (i.e., both twins are or are not alcoholic) are the same for identical and fraternal twins. The alternate hypothesis is generally what you are trying to show. This might be that identical twins have a higher concordance rate for alcoholism, supporting a genetic basis for this disorder. It is important to note that statistics cannot prove one or the other hypothesis. Rather, statistics provides evidence from the data that supports one hypothesis or the other.
Much of hypothesis testing is concerned with making decisions about the null and alternate hypotheses. You collect the data, estimate the parameter, calculate a test statistic that summarizes the value of the parameter estimate, and then decide whether the value of the test statistic would be expected if the null hypothesis were true or the alternate hypothesis were true. In our case, we collect data on alcoholism in a limited number of twins (which we hope accurately represent the entire twin population) and decide whether the results we obtain better match the null hypothesis (no difference in rates) or the alternate hypothesis (higher rate in identical twins).
Of course, there is always a chance that you have made the wrong decision—that you have interpreted your data incorrectly. In statistics, there are two types of errors that can be made. A type I error is when the conclusion was made in favor of the alternate hypothesis, when the null hypothesis was really true. A type II error refers to the converse situation, where the conclusion was made in favor of the null hypothesis when the alternate hypothesis was really true. Thus a type I error is when you see something that is not there, and a type II error is when you do not see something that is really there. In general, type I errors are thought to be worse than type II errors, since you do not want to spend time and resources following up on a finding that is not true.
How can we decide if we have made the right choice about accepting or rejecting our null hypothesis? These statistical decisions are often made by calculating a probability value, or p-value. P-values for many test statistics are easily calculated using a computer, thanks to the theoretical work of mathematical statisticians such as Jerzy Neyman.
A p-value is simply the probability of observing a test statistic as large or larger than the one observed from your data, if the null hypothesis were really true. It is common in many statistical analyses to accept a type I error rate of one in twenty, or 0.05. This means there is less than a one-in-twenty chance of making a type I error.
To see what this means, let us imagine that our data show that identical twins have a 10 percent greater likelihood of being concordant for alcoholism than fraternal twins. Is this a significant enough difference that we should reject the null hypothesis of no difference between twin types? By examining the number of individuals tested and the variance in the data, we can come up with an estimate of the probability that we could obtain this difference by chance alone, even if the null hypothesis were true. If this probability is less than 0.05—if the likelihood of obtaining this difference by chance is less than one in twenty—then we reject the null hypothesis in favor of the alternate hypothesis.
Prior to carrying out a scientific investigation and a statistical analysis of the resulting data, it is possible to get a feel for your chances of seeing something if it is really there to see. This is referred to as the power of a study and is simply one minus the probability of making a type II error. A commonly accepted power for a study is 80 percent or greater. That is, you would like to know that you have at least an 80 percent chance of seeing something if it is really there. Increasing the size of the random sample from the population is perhaps the best way to improve the power of a study. The closer your sample is to the true population size, the more likely you are to see something if it is really there.
Thus, statistics is a relatively new scientific discipline that uses both mathematics and philosophy for exploratory data analysis, point estimation, and hypothesis testing. The ultimate utility of statistics is for making decisions about hypotheses to make inferences about the answers to scientific questions.
see also Gene Discovery; Gene Therapy: Ethical Issues; Statistical Geneticist; Twins.
Jason H. Moore
Gonick, Larry, and Woollcott Smith. The Cartoon Guide to Statistics. New York:Harper Collins, 1993.
Jaisingh, Lloyd R. Statistics for the Utterly Confused. New York: McGraw-Hill, 2000.
Salsberg, David. The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century. New York: W. H. Freeman, 2001.
HyperStat Online: An Introductory Statistics Book and Online Tutorial for Help in Statistics Courses. David M. Lane., ed. <http://davidmlane.com/hyperstat/>.
"Statistics." Genetics. . Encyclopedia.com. (October 20, 2016). http://www.encyclopedia.com/medicine/medical-magazines/statistics
"Statistics." Genetics. . Retrieved October 20, 2016 from Encyclopedia.com: http://www.encyclopedia.com/medicine/medical-magazines/statistics
STATISTICS. The word statistics comes from the German Statistik and was coined by Gottfried Achenwall (1719–1772) in 1749. This term referred to a thorough, generally nonquantitative description of features of the state—its geography, peoples, customs, trade, administration, and so on. Hermann Conring (1606–1681) introduced this field of inquiry under the name Staatenkunde in the seventeenth century, and it became a standard part of the university curriculum in Germany and in the Netherlands. Recent histories of statistics in France, Italy, and the Netherlands have documented the strength of this descriptive approach. The descriptive sense of statistics continued throughout the eighteenth century and into the nineteenth century.
The numerical origins of statistics are found in distinct national traditions of quantification. In England, self-styled political and medical arithmeticians working outside government promoted numerical approaches to the understanding of the health and wealth of society. In Germany, the science of cameralism provided training and rationale for government administrators to count population and economic resources for local communities. In France, royal ministers, including the duke of Sully (1560–1641) and Jean-Baptiste Colbert (1619–1683), initiated statistical inquiries into state finance and population that were continued through the eighteenth century.
Alongside these quantitative studies of society, mathematicians developed probability theory, which made use of small sets of numerical data. The emergence of probability has been the subject of several recent histories and its development was largely independent of statistics. The two traditions of collecting numbers and analyzing them using the calculus of probabilities did not merge until the nineteenth century, thus creating the modern discipline of statistics.
The early modern field of inquiry that most closely resembles modern statistics was political arithmetic, created in the 1660s and 1670s by two Englishman, John Graunt (1620–1674) and William Petty (1623–1687). Graunt's Natural and Political Observations Made upon the Bills of Mortality (1662) launched quantitative studies of population and society, which Petty labeled political arithmetic. In their work, they showed how numerical accounts of population could be used to answer medical and political questions such as the comparative mortality of specific diseases and the number of men of fighting age. Graunt developed new methods to calculate population from the numbers of christenings and burials. He created the first life table, a numerical table that showed how many individuals out of a given population survived at each year of life. Petty created sample tables to be used in Ireland to collect vital statistics and urged that governments collect regular and accurate accounts of the numbers of christenings, burials, and total population. Such accounts, Petty argued, would put government policy on a firm foundation.
Political arithmetic was originally associated with strengthening monarchical authority, but several other streams of inquiry flowed from Graunt's and Petty's early work. One tradition was medical statistics, which developed most fully in England during the eighteenth century. Physicians such as James Jurin (1684–1750) and William Black (1749–1829) advocated the collection and evaluation of numerical information about the incidence and mortality of diseases. Jurin pioneered the use of statistics in the 1720s to evaluate medical practice in his studies of the risks associated with smallpox inoculation. William Black coined the term medical arithmetic to refer to the tradition of using numbers to analyze the comparative mortality of different diseases. New hospitals and dispensaries such as the London Smallpox and Inoculation Hospital, established in the eighteenth century, provided institutional support for the collection of medical statistics; some treatments were evaluated numerically.
Theology provided another context for the development of statistics. Graunt had identified a constant birth ratio between male and females (14 to 13) and had used this as an argument against polygamy. The physician John Arbuthnot (1667–1735) argued in a 1710 article that this regularity was "an Argument for Divine Providence." Later writers, including William Derham (1657–1735), author of Physico-Theology (1713), and Johann Peter Süssmilch (1707–1767), author of Die Göttliche Ordnung (1741), made the stability of this statistical ratio a part of the larger argument about the existence of God.
One final area of statistics that flowed from Graunt's work and was the most closely associated with probability theory was the development of life (or mortality) tables. Immediately following the publication of Graunt's book, several mathematicians, including Christiaan Huygens (1629–1695), Gottfried Leibniz (1646–1716), and Edmund Halley (1656–1742) refined Graunt's table. Halley, for example, based his life table on numerical data from the town of Breslau that listed ages of death. (Graunt had to estimate ages of death.) In the eighteenth century, further modifications were introduced by the Dutchmen Willem Kersseboom (1690–1771) and Nicolaas Struyck (1686–1769), the Frenchman Antoine Deparcieux (1703–1768), the German Leonard Euler (1707–1783), and the Swede Pehr Wargentin (1717–1783). A French historian has recently argued that the creation of life tables was one of the leading achievements of the scientific revolution. Life tables were used to predict life expectancy and aimed to improve the financial soundness of annuities and tontines.
The administrative demands brought about by state centralization in early modern Europe also fostered the collection and analysis of numerical information about births, deaths, marriages, trade, and so on. In France, for example, Sébastien le Prestre de Vauban (1633–1707), adviser to Louis XIV (ruled 1643–1715), provided a model for the collection of this data in his census of Vézelay (1696), a small town in Burgundy. Although his recommendations were not adopted, a similar approach was pursued decades later by the Controller-General Joseph Marie Terray (1715–1778), who requested in 1772 that the provincial intendants collect accounts of births and deaths from parish clergy and forward them to Paris. Sweden created the most consistent system for the collection of vital statistics through parish clerks in 1749. Efforts in other countries failed. In England, two bills were put before Parliament in the 1750s to institute a census and to insure the collection of vital statistics. Both bills were defeated because of issues concerning personal liberty. While these initiatives enjoyed mixed success, they all spoke to the desire to secure numerical information about the population. Regular censuses, which would provide data for statistical analysis, were not instituted until the nineteenth century.
See also Accounting and Bookkeeping ; Census ; Graunt, John ; Mathematics ; Petty, William .
Arbuthnot, John. "An Argument for Divine Providence Taken from the Regularity Observ'd in the Birth of Both Sexes." Philosophical Transactions 27 (1710–1712): 186–190.
Black, William. An Arithmetical and Medical Analysis of the Diseases and Mortality of the Human Species. London, 1789. Reprinted with an introduction by D. V. Glass. Farnborough, U.K., 1973.
Jurin, James. An Account of the Success of Inoculating the Small Pox in Great Britain with a Comparison between the Miscarriages in That Practice, and the Mortality of the Natural Small Pox. London, 1724.
Petty, William. The Economic Writings of Sir William Petty. Edited by Charles Henry Hull. 2 vols. Cambridge, U.K., 1899.
Bourguet, Marie-Noëlle. Déchiffer la France: La statistique départementale à l'époque napoléonienne. Paris, 1988.
Buck, Peter. "People Who Counted: Political Arithmetic in the Eighteenth Century." Isis 73 (1982): 28–45.
——. "Seventeenth-Century Political Arithmetic: Civil Strife and Vital Statistics." Isis 68 (1977): 67–84.
Daston, Lorraine. Classical Probability in the Enlightenment. Princeton, 1988.
Dupâquier, Jacques. L'invention de la table de mortalité, de Graunt à Wargentin, 1622–1766. Paris, 1996.
Dupâquier, Jacques, and Michel Dupâquier. Histoire de la démographie. Paris, 1985.
Hacking, Ian. The Emergence of Probability. Cambridge, U.K., 1975.
——. The Taming of Chance. Cambridge, U.K., 1990.
Hald, Anders. A History of Probability and Statistics and Their Applications before 1750. New York, 1990.
Klep, Paul M. M., and Ida H. Stamhuis, eds. The Statistical Mind in a Pre-Statistical Era: The Netherlands, 1750–1850. Amsterdam, 2002.
Patriarca, Silvana. Numbers and Nationhood: Writing Statistics in Nineteenth-Century Italy. Cambridge, U.K., 1996.
Pearson, Karl. The History of Statistics in the 17th and 18th Centuries against the Changing Background of Intellectual, Scientific and Religious Thought. Edited by E. S. Pearson. London, U.K., 1978.
Porter, Theodore M. The Rise of Statistical Thinking, 1820–1900. Princeton, 1986.
Rusnock, Andrea. Vital Accounts: Quantifying Health and Population in Eighteenth-Century England and France. Cambridge, U.K., 2002.
"Statistics." Europe, 1450 to 1789: Encyclopedia of the Early Modern World. . Encyclopedia.com. (October 20, 2016). http://www.encyclopedia.com/history/encyclopedias-almanacs-transcripts-and-maps/statistics
"Statistics." Europe, 1450 to 1789: Encyclopedia of the Early Modern World. . Retrieved October 20, 2016 from Encyclopedia.com: http://www.encyclopedia.com/history/encyclopedias-almanacs-transcripts-and-maps/statistics
statistics, science of collecting and classifying a group of facts according to their relative number and determining certain values that represent characteristics of the group. The most familiar statistical measure is the arithmetic mean, which is an average value for a group of numerical observations. A second important statistic or statistical measure is the standard deviation, which is a measure of how much the individual observations are scattered about the mean. The chi-square test is a method of determining the odds for or against a given deviation from expected statistical distribution. Other statistics indicate other characteristics of the group of observations. In addition to the problem of computing certain statistics for a particular group of observations, there is the problem of sampling. This is an attempt to determine for what larger group (called the population) of individuals or characteristics the statistics for this particular group (called the sample) would be a representative figure and how representative a figure it would be for a given larger group. This second problem of sampling can be solved only by resorting to the theory of probability and higher mathematics. In most applications of statistics to scientific and social research, insurance, and finance, the statistician is interested not only in the characteristics of the sample but also in those of some much larger population. Consequently, the theory of sampling is the most important part of statistical theory.
See J. F. Freund, Modern Elementary Statistics (1988); D. S. Moore and G. P. McCabe, Introduction to the Practice of Statistics (1989); D. H. Sanders, Statistics (1989).
"statistics." The Columbia Encyclopedia, 6th ed.. . Encyclopedia.com. (October 20, 2016). http://www.encyclopedia.com/reference/encyclopedias-almanacs-transcripts-and-maps/statistics
"statistics." The Columbia Encyclopedia, 6th ed.. . Retrieved October 20, 2016 from Encyclopedia.com: http://www.encyclopedia.com/reference/encyclopedias-almanacs-transcripts-and-maps/statistics
"statistics." A Dictionary of Sociology. . Encyclopedia.com. (October 20, 2016). http://www.encyclopedia.com/social-sciences/dictionaries-thesauruses-pictures-and-press-releases/statistics
"statistics." A Dictionary of Sociology. . Retrieved October 20, 2016 from Encyclopedia.com: http://www.encyclopedia.com/social-sciences/dictionaries-thesauruses-pictures-and-press-releases/statistics
"statistics." World Encyclopedia. . Encyclopedia.com. (October 20, 2016). http://www.encyclopedia.com/environment/encyclopedias-almanacs-transcripts-and-maps/statistics-0
"statistics." World Encyclopedia. . Retrieved October 20, 2016 from Encyclopedia.com: http://www.encyclopedia.com/environment/encyclopedias-almanacs-transcripts-and-maps/statistics-0
1. Numerical data relating to sets of individuals, objects, or phenomena. It is also the science of collecting, summarizing, and interpreting such data.
2. Quantities derived from data in order to summarize the properties of a sample. For example, the mean of a sample is a statistic that is a measure of location, while the standard deviation is a measure of variation.
"statistics." A Dictionary of Computing. . Encyclopedia.com. (October 20, 2016). http://www.encyclopedia.com/computing/dictionaries-thesauruses-pictures-and-press-releases/statistics
"statistics." A Dictionary of Computing. . Retrieved October 20, 2016 from Encyclopedia.com: http://www.encyclopedia.com/computing/dictionaries-thesauruses-pictures-and-press-releases/statistics
sta·tis·tics / stəˈtistiks/ • pl. n. [treated as sing.] the practice or science of collecting and analyzing numerical data in large quantities, esp. for the purpose of inferring proportions in a whole from those in a representative sample.
"statistics." The Oxford Pocket Dictionary of Current English. . Encyclopedia.com. (October 20, 2016). http://www.encyclopedia.com/humanities/dictionaries-thesauruses-pictures-and-press-releases/statistics
"statistics." The Oxford Pocket Dictionary of Current English. . Retrieved October 20, 2016 from Encyclopedia.com: http://www.encyclopedia.com/humanities/dictionaries-thesauruses-pictures-and-press-releases/statistics