statistics
Statistics
Statistics
The articles under this heading provide an introduction to the field of statistics and to its history. The first article also includes a survey of the statistical articles in the encyclopedia. At the end of the second article there is a list of the biographical articles that are of relevance to statistics.
I. The FIELDWilliam H. Kruskal
II. The History Of Statistical METHODM. G. Kendall
I. THE FIELD
A scientist confronted with empirical observations goes from them to some sort of inference, decision, action, or conclusion. The end point of this process may be the confirmation or denial of some complicated theory; it may be a decision about the next experiment to carry out; or it may simply be a narrowing of the presumed range for some constant of nature. (The end point may even be the conclusion that the observations are worthless.) An end point is typically accompanied by a statement, or at least by a feeling, of how sure the scientist is of his new ground.
These inferential leaps are, of course, never made only in the light of the immediate observations. There is always a body of background knowl-edge and intuition, in part explicit and in part tacit. It is the essence of science that a leap to a false position—whether because of poor observational data, misleading background, or bad leaping form —is sooner or later corrected by future research.
Often the leaps are made without introspection or analysis of the inferential process itself, as a skilled climber might step from one boulder to an-other on easy ground. On the other hand, the slope may be steep and with few handholds; before moving, one wants to reflect on direction, where one's feet will be, and the consequences of a slip.
Statistics is concerned with the inferential process, in particular with the planning and analysis of experiments or surveys, with the nature of observational errors and sources of variability that obscure underlying patterns, and with the efficient summarizing of sets of data. There is a fuzzy boundary, to be discussed below, between statistics and other parts of the philosophy of science.
Problems of inference from empirical data arise, not only in scientific activity, but also in everyday life and in areas of public policy. For example, the design and analysis of the 1954 Salk vaccine tests in the United States were based on statistical concepts of randomization and control. Both private and public economic decisions sometimes turn on the meaning and accuracy of summary figures from complex measurement programs: the unemployment rate, the rate of economic growth, a consumer price index. Sometimes a lack of statistical back-ground leads to misinterpretations of accident and crime statistics. Misinterpretations arising from in-sufficient statistical knowledge may also occur in the fields of military and diplomatic intelligence.
There is busy two-way intellectual traffic between statisticians and other scientists. Psychologists and physical anthropologists have instigated and deeply influenced developments in that branch of statistics called multivariate analysis; sociologists sometimes scold statisticians for not paying more attention to the inferential problems arising in surveys of human populations; some economists are at once consumers and producers of statistical methods.
Theoretical and applied statistics. Theoretical statistics is the formal study of the process leading from observations to inference, decision, or whatever be the end point, insofar as the process can be abstracted from special empirical contexts. This study is not the psychological one of how scientists actually make inferences or decisions; rather, it deals with the consequences of particular modes of inference or decision, and seeks normatively to find good modes in the light of explicit criteria.
Theoretical statistics must proceed in terms of a more or less formal language, usually mathematical, and in any specific area must make assumptions—weak or strong—on which to base the formal analysis. Far and away the most important mathematical language in statistics is that of probability, because most statistical thinking is in terms of randomness, populations, masses, the single event embedded in a large class of events. Even approaches like that of personal probability, in which single events are basic, use a highly probabilistic language. [See Probability.]
But theoretical statistics is not, strictly speaking, a branch of mathematics, although mathematical concepts and tools are of central importance in much of statistics. Some important areas of theoretical statistics may be discussed and advanced without recondite mathematics, and much notable work in statistics has been done by men with modest mathematical training. [For discussion of nonstatistical applications of mathematics in the social sciences, see, for example, Mathematics; Models, MATHEMATICAL; and the material on mathematical economics in Econometrics.]
Applied statistics, at least in principle, is the informed application of methods that have been theoretically investigated, the actual leap after the study of leaping theory. In fact, matters are not so simple. First, theoretical study of a statistical procedure often comes after its intuitive proposal and use. Second, there is almost no end to the possible theoretical study of even the simplest procedure. Practice and theory interact and weave together, so that many statisticians are practitioners one day (or hour) and theoreticians the next.
The art of applied statistics requires sensitivity to the ways in which theoretical assumptions may fail to hold, and to the effects that such failure may have, as well as agility in modifying and ex-tending already studied methods. Thus, applied statistics in the study of public opinion is concerned with the design and analysis of opinion surveys. The main branch of theoretical statistics used here is that of sample surveys, although other kinds of theory may also be relevant—for example, the theory of Markov chains may be useful for panel studies, where the same respondents are asked their opinions at successive times. Again, applied statistics in the study of learning includes careful design and analysis of controlled laboratory experiments, whether with worms, rats, or humans. The statistical theories that enter might be those of experimental design, of analysis of variance, or of quantal response. Of course, nonstatistical, substantive knowledge about the empirical field— public opinion, learning, or whatever—is essential for good applied statistics.
Statistics is a young discipline, and the number of carefully studied methods, although steadily growing, is still relatively small. In the applications of statistics, therefore, one usually reaches a point of balance between thinking of a specific problem in formal terms, which are rarely fully adequate (few problems are standard), and using methods that are not as well understood as one might hope. (For a stimulating, detailed discussion of this theme, see Tukey 1962, where the term “data analysis” is used to mean something like applied statistics.)
The word “statistics” is sometimes used to mean, not a general approach like the one I have outlined, but—more narrowly—the body of specific statistical methods, with associated formulas, tables, and traditions, that are currently understood and used. Other uses of the word are common, but they are not likely to cause confusion. In particular, “statistics” often refers to a set of numbers describing some empirical field, as when one speaks of the mortality statistics of France in 1966. Again, “a statistic” often means some numerical quantity computed from basic observations.
Variability and error; patterns. If life were stable, simple, and routinely repetitious, there would be little need for statistical thinking. But there would probably be no human beings to do statistical thinking, because sufficient stability and simplicity would not allow the genetic randomness that is a central mechanism of evolution. Life is not, in fact, stable or simple, but there are stable and simple aspects to it. From one point of view, the goal of science is the discovery and elucidation of these aspects, and statistics deals with some general methods of finding patterns that are hidden in a cloud of irrelevancies, of natural variability, and of error-prone observations or measurements.
Most statistical thinking is in terms of variability and errors in observed data, with the aim of reaching conclusions about obscured underlying patterns. What is meant by natural variability and by errors of measurement? First, distinct experimental and observational units generally have different characteristics and behave in different ways: people vary in their aptitudes and skills; some mice learn more quickly than others. Second, when a quantity or quality is measured, there is usually an error of measurement, and this introduces a second kind of dispersion with which statistics deals: not only will students taught by a new teaching method react in different ways, but also the test that determines how much they learn cannot be a perfect measuring instrument; medical blood-cell counts made independently by two observers from the same slide will not generally be the same.
In any particular experiment or survey, some sources of variability may usefully be treated as constants; for example, the students in the teaching experiment might all be chosen from one geo-graphical area. Other sources of variability might be regarded as random—for example, fluctuations of test scores among students in an apparently homogeneous group. More complex intermediate forms of variability are often present. The students might be subdivided into classes taught by different teachers. Insofar as common membership in a class with the same teacher has an effect, a simple but important pattern of dependence is present.
The variability concept is mirrored in the basic notion of a population from which one samples. The population may correspond to an actual population of men, mice, or machines; or it may be conceptual, as is a population of measurement errors. A population of numerical values defines a distribution, roughly speaking, and the notion of a random variable, fluctuating in its value according to this distribution, is basic. For example, if a student is chosen at random from a school and given a reading-comprehension test, the score on the test— considered in advance of student choice and test administration—is a random variable. Its distribution is an idealization of the totality of such scores if student choice and testing could be carried out a very large number of times without any changes because of the passage of time or because of inter-actions among students. [For a more precise formulation, see Probability.]
Although much statistical methodology may be regarded as an attempt to understand regularity through a cloud of obscuring variability, there are many situations in which the variability itself is the object of major interest. Some of these will be discussed below.
Planning . An important topic in statistics is that of sensible planning, or design, of empirical studies. In the above teaching example, some of the more formal aspects of design are the following: How many classes to each teaching method? How many students per class to be tested? Should variables other than test scores be used as well— for example, intelligence scores or personality ratings?
The spectrum of design considerations ranges from these to such subject-matter questions as the following: How should the teachers be trained in a new teaching method? Should teachers be chosen so that there are some who are enthusiastic and some who are skeptical of the new method? What test should be used to measure results?
No general theory of design exists to cover all, or even most, such questions. But there do exist many pieces of theory, and—more important—a valuable statistical point of view toward the planning of experiments.
History. The history of the development of statistics is described in the next article [see Statistics, article on THE HISTORY OF STATISTICAL METHOD]. It stresses the growth of method and theory; the history of statistics in the senses of vital statistics, government statistics, censuses, economic statistics, and the like, is described in relevant separate articles [see Census; Cohort analysis; Economic data; Government statistics; Life tables; MORTALITY; Population; Sociology, article on THE EARLY HISTORY OF SOCIAL RESEARCH; Vital statistics]. Two treatments of the history of statistics with special reference to the social sciences are by Lundberg (1940) and Lazarsfeld (1961).
It is important to distinguish between the history of the word “statistics” and the history of statistics in the sense of this article. The word “statistics” is related to the word “state,” and originally the activity called statistics was a systematic kind of comparative political science. This activity gradually centered on numerical tables of economic, demographic, and political facts, and thus “statistics” came to mean the assembly and analysis of numerical tables. It is easy to see how the more philosophical meaning of the word, used in this article, gradually arose. Of course, the abstract study of inference from observations has a long history under various names—such as the theory of errors and probability calculus—and only comparatively recently has the word “statistics” come to have its present meaning. Even now, grotesque misunderstandings abound—for example, thinking of statistics as the routine compilation of uninteresting sets of numbers, or thinking of statistics as mainly a collection of mathematical expressions.
Functions. My description of statistics is, of course, a personal one, but one that many statisticians would generally agree with. Almost any characterization of statistics would include the following general functions:
(1) to help in summarizing and extracting relevant information from data, that is, from observed measurements, whether numerical, classificatory, ordinal, or whatever;
(2) to help in finding and evaluating patterns shown by the data, but obscured by inherent random variability;
(3) to help in the efficient design of experiments and surveys;
(4) to help communication between scientists (if a standard procedure is cited, many readers will understand without need of detail).
There are some other roles that activities called “statistical” may, unfortunately, play. Two such misguided roles are
(1) to sanctify or provide seals of approval (one hears, for example, of thesis advisers or journal editors who insist on certain formal statistical procedures, whether or not they are appropriate );
(2) to impress, obfuscate, or mystify (for ex-ample, some social science research papers contain masses of undigested formulas that serve no pur-pose except that of indicating what a bright fellow the author is).
Some consulting statisticians use more or less explicit declarations of responsibility, or codes, in their relationships with “clients,” to protect themselves from being placed in the role of sanctifier. It is a good general rule that the empirical scientist use only statistical methods whose rationale is clear to him, even though he may not wish or be able to follow all details of mathematical derivation.
A general discussion, with an extensive bibliography, of the relationship between statistician and client is given by Deming (1965). In most applied statistics, of course, the statistician and the client are the same person.
An example. To illustrate these introductory comments, consider the following hypothetical experiment to study the effects of propaganda. Suppose that during a national political campaign in the United States, 100 college students are exposed to a motion picture film extolling the Democratic candidate, and 100 other students (the so-called control group) are not exposed to the film. Then all the students are asked to name their preferred candidate. Suppose that 95 of the first group prefer the Democratic candidate, while only 80 of the second group have that preference. What kinds of conclusions might one want about the effectiveness of the propaganda?
(There are, of course, serious questions about how the students are chosen, about the details of film and questionnaire administration, about possible interaction between students, about the artificiality of the experimental arrangement, and so on. For the moment, these questions are not discussed, although some will be touched on below.)
If the numbers preferring the Democratic candi-date had been 95 and 5, a conclusion that a real effect was present would probably be reached without much concern about inferential methodology (although methodological questions would enter any attempt to estimate the magnitude of the effect). If, in contrast, the numbers had both been 95, the conclusion “no effect observed” would be immediate, although one might wonder about the possibility of observing the tie by chance even if an underlying effect were present. But by and large it is the middle ground that is of greatest statistical interest: for example, do 95 and 80 differ enough in the above context to suggest a real effect?
The simplest probability model for discussing the experiment is that of analogy with two weighted coins, each tossed 100 times. A toss of the coin corresponding to the propaganda is analogous to selecting a student at random, showing him the motion picture, and then asking him which candi-date he prefers. A toss of the other coin corresponds to observing the preference of a student in the control group. “Heads” for a coin is analogous, say, to preference for the Democratic candidate. The hypothetical coins are weighted so that their probabilities of showing heads are unknown (and in general not one-half), and interest lies in the difference between these two unknown heads probabilities.
Suppose that the students are regarded as chosen randomly from some large population of students, and that for a random propagandized student there is a probability p_{A} of Democratic preference, whereas a random nonpropagandized student has probability p_{B} of Democratic preference. Suppose further that the individual observed expressions of political preference are statistically independent; roughly speaking, this means that, even if p_{A} and p_{B} were known, and it were also known which groups the students are in, prediction of one student's response from another's would be no better than prediction without knowing the other’s response. (Lack of independence might arise in various ways, for example, if the students were able to discuss politics among them-selves during the interval between the motion picture and the questionnaire.) Under the above conditions, the probabilities of various outcomes of the experiment, for any hypothetical values of p_{A} and p_{B}, may be computed in standard ways.
In fact, the underlying quantities of interest, the so-called parameters, p_{A} and p,_{B}, are not known; if they were, there would be little or no reason to do the experiment. Nonetheless, it is of fundamental importance to think about possible values of the parameters and to decide what aspects are of primary importance. For example, is p_{A} — p_{B} basic? or perhaps p_{A}/p_{B}? or, again, perhaps (1 — p_{B})/(l — P_{A}) the ratio of probabilities of an expressed Republican preference (assuming that preference is between Democratic and Republican candidates only)? The choice makes a difference: if p_{A} = .99 and p_{B} = .95, use of a statistical procedure sensitive to Pa — Pn (= .04 in this example) might suggest that there is little difference between the parameters, whereas a procedure sensitive to (1 — p_{B})/(1—p_{A}) (in the example, .05/.01 = 5) might show a very large effect. These considerations are, unhappily, often neglected, and such neglect may result in a misdirected or distorted analysis. In recent discussions of possible relationships between cigarette smoking and lung cancer, controversy arose over whether ratios or differences of mortality rates were of central importance. The choice may lead to quite different conclusions.
Even apparently minor changes in graphical presentation may be highly important in the course of research. B. F. Skinner wrote of the importance to his own work of shifting from a graphical record that simply shows the times at which events occur (motion of a rat in a runway) to the logically equivalent cumulative record that shows the number of events up to each point of time. In the latter form, the rate at which events take place often becomes visually clear (see Skinner 1956, p. 225). This general area is called descriptive statistics, perhaps with the prefix “neo.” [See Statistics, DESCRIPTIVE; Graphic presentation; Tabular presentation.]
As suggested above, the assumption of statistical independence might well be wrong for various reasons. One is that the 100 students in each group might be made up of five classroom groups that hold political discussions. Other errors in the assumptions are quite possible. For example, the sampling of students might not be at random from the same population: there might be self-selection, perhaps with the more enterprising students attending the motion picture. Another kind of deviation from the original simple assumptions (in this case planned) might come from balancing such factors as sex and age by stratifying according to these factors and then selecting at random within strata.
When assumptions are in doubt, one has a choice of easing them (sometimes bringing about a more complex, but a more refined, analysis) or of studying the effects of errors in the assumptions on the analysis based on them. When these effects are small, the errors may be neglected. This topic, sometimes called robustness against erroneous assumptions of independence, distributional form, and so on, is difficult and important. [See Errors, article on EFFECTS OF ERRORS IN STATISTICAL ASSUMPTIONS.]
Another general kind of question relates to the design of the experiment. Here, for example, it may be asked in advance of the experiment whether groups of 100 students are large enough (or perhaps unnecessarily large); whether there is merit in equal group sizes; whether more elaborate structures—perhaps allowing explicitly for sex and age —are desirable; and so on. Questions of this kind may call for formal statistical reasoning, but answers must depend in large part on substantive knowledge. [See Experimental design.]
It is important to recognize that using better measurement methods or recasting the framework of the experiment may be far more important aspects of design than just increasing sample size. As B. F. Skinner said,
.. . we may reduce the troublesome variability by changing the condition of the experiment. By discovering, elaborating, and fully exploiting every relevant variable, we may eliminate in advance of measurement the individual differences which obscure the difference under analysis. (1956, p. 229)
In the propaganda experiment at hand, several such approaches come to mind. Restricting oneself to subjects of a given sex, age, kind of background, and so on, might bring out the effects of propaganda more clearly, perhaps at the cost of reduced generality for the results. Rather than by asking directly for political preference, the effects might be better measured by observing physiological reactions to the names or pictures of the candidates, or by asking questions about major political issues. It would probably be useful to try to follow the general principle of having each subject serve as his own control: to observe preference both before and after the propaganda and compare the numbers of switches in the two possible directions. (Even then, it would be desirable to keep the control group—possibly showing it a presumably neutral film—in order to find, and try to correct for, artificial effects of the experimental situation.)
Such questions are often investigated in side studies, ancillary or prior to the central one, and these pilot or instrumental studies are very important.
For the specific simple design with two groups, and making the simple assumptions, consider (conceptually in advance of the experiment) the two observed proportions of students expressing preference for the Democratic candidate, P,_{A} and P_{B}, corresponding respectively to the propagandized and the control groups. These two random variables, together with the known group sizes, contain all relevant information from the experiment itself, in the sense that only the proportions, not the particular students who express one preference or another, are relevant. The argument here is one of sufficiency [see Sufficiency, where the argument and its limitations are discussed]. In practice the analysis might well be refined by looking at sex of stu-dent and other characteristics, but for the moment only the simple structure is considered.
In the notational convention to be followed here, random variables (here P_{A} and P_{B}) are denoted by capital letters, and the corresponding parameters (here p_{A} and p_{B}) by parallel lower-case letters.
Estimation. The random variables lOOP_{A} and 100P_{B} have binomial probability distributions depending on p_{A}, p_{B}, and sample sizes, in this case 100 for each sample [see Distributions, Statistical, article on SPECIAL DISCRETE DISTRIBUTIONS]. The fundamental premise of most statistical methods is that p_{A} and p_{B} should be assessed on the basis of P_{A} and P_{B} in the light of their possible probability distributions. One of the simplest modes of assessment is that of point estimation, in which the result of the analysis for the example consists of two numbers (depending on the observations) that are regarded as reasonable estimates of p_{A} and p_{B}[see Estimation, article on POINT ESTIMATION]. In the case at hand, the usual (not the only) estimators are just P_{A} and P_{B} themselves, but even slight changes in viewpoint can make matters less clear. For example, suppose that a point estimator were wanted for P_{A}/P_{B}, the ratio of the two underlying probability parameters. It is by no means clear that P_{A}/P_{B} would be a good point estimator for this ratio.
Point estimators by themselves are usually inadequate in scientific practice, for some indication of precision is nearly always wanted. (There are, however, problems in which point estimators are, in effect, of primary interest: for example, in a hand-book table of natural constants, or in some aspects of buying and selling.) An old tradition is to follow a point estimate by a “±” (plus-or-minus sign) and a number derived from background experience or from the data. The intent is thus to give an idea of how precise the point estimate is, of the spread or dispersion in its distribution. For the case at hand, one convention would lead to stating, as a modified estimator for p_{A},
that is, the point estimator plus or minus an estimator of its standard deviation, a useful measure of dispersion. (The divisor, 100, is the sample size.) Such a device has the danger that there may be misunderstanding about the convention for the number following “±”; in addition, interpretation of the measure of dispersion may not be direct unless the distribution of the point estimator is fairly simple; the usual presumption is that the distribution is approximately of a form called normal [see Distributions, Statistical, article on Special Continuous distributions].
To circumvent these problems, a confidence interval is often used, rather than a point estimator [see Estimation, article on Confidence Intervals And regions]. The interval is random (before the experiment), and it is so constructed that it covers the unknown true value of the parameter to be esti-mated with a preassigned probability, usually near 1. The confidence interval idea is very useful, although its subtlety has often led to misunderstandings in which the interpretation is wrongly given in terms of a probability distribution for the parameter.
There are, however, viewpoints in which this last sort of interpretation is valid, that is, in which the parameters of interest are themselves taken as random. The two most important of these viewpoints are Bayesian inference and fiducial inference [see Bayesian inference; Fiducial inference; Probability, article on INTERPRETATIONS]. Many variants exist, and controversy continues as the philosophical and practical aspects of these approaches are debated [see Likelihoodfor a discussion of related issues].
Hypothesis testing. In the more usual viewpoint another general approach is that of hypothesis (or significance) testing [see Hypothesis testing; Significance, tests of]. This kind of procedure might be used if it is important to ascertain whether p_{A} and p_{B} are the same or not. Hypothesis testing has two aspects: one is that of a two-decision procedure leading to one of two actions with known controlled chances of error. This first approach generalizes to that of decision theory and has generated a great deal of literature in theoretical statistics [see Decision theory]. In this theory of decision functions, costs of wrong decisions, as well as costs of observation, are explicitly considered. Decision theory is related closely to game theory, and less closely to empirical studies of decision making [see Game theory; Decision making].
The second aspect of hypothesis testing—and the commoner—is more descriptive. From its view-point a hypothesis test tells how surprising a set of observations is under some null hypothesis at test. In the example, one would compute how probable it is under the null hypothesis p_{A}— p_{B} that the actual results should differ by as much as or more than the observed 95 per cent and 80 per cent. (Only recently has it been stressed that one would also do well to examine such probabilities under a variety of hypotheses other than a traditional null one.) Sometimes, as in the propaganda example, it is rather clear at the start that some effect must exist. In other cases, for example, in the study of parapsychology, there may be serious question of any effect whatever.
There are other modes of statistical analysis, for example, classification, selection, and screening [see Multivariate Analysis, article on Classification And discrimination; Screening And selection]. In the future there is likely to be investigation of a much wider variety of modes of analysis than now exists. Such investigation will mitigate the difficulty that standard modes of analysis, like hypothesis testing, often do not exactly fit the inferential needs of specific real problems. The standard modes must usually be regarded as ap-proximate, and used with caution.
One pervasive difficulty of this kind surrounds what might be called exploration of data, or datadredging. It arises when a (usually sizable) body of data from a survey or experiment is at hand but either the analyst has no specific hypotheses about kinds of orderliness in the data or he has a great many. He will naturally wish to explore the body of data in a variety of ways with the hope of finding orderliness: he will try various graphical presentations, functional transformations, perhaps factor analysis, regression analysis, and other de-vices; in the course of this, he will doubtless carry out a number of estimations, hypothesis tests, confidence interval computations, and so on. A basic difficulty is that any finite body of data, even if wholly generated at random, will show orderliness of some kind if studied long and hard enough. Parallel to this, one must remember that most theoretical work on hypothesis tests, confidence intervals, and other inferential procedures looks at their behavior in isolation, and supposes that the procedures are selected in advance of data inspection. For example, if a hypothesis test is to be made of the null hypothesis that mean scores of men and women on an intelligence test are equal, and if a one-sided alternative is chosen after the fact in the same direction as that shown by the data, it is easy to see that the test will falsely show statistical significance, when the null hypothesis is true, twice as often as the analyst might expect.
On the other hand, it would be ridiculously rigid to refuse to use inferential tools in the exploration of data. Two general mitigating approaches are (1) the use of techniques (for example, multiple comparisons) that include explicit elements of exploration in their formulation [see Linear Hypotheses, article on Multiple comparisons], and (2) the splitting of the data into two parts at random, using one part for exploration with no holds barred and then carrying out formal tests or other inferential procedures on the second part.
This area deserves much more research. Selvin and Stuart have given a statement of present opinions, and of practical advice [see Selvin & Stuart 1966; see also Survey analysis; Scalingand Statistical Analysis, Special Problems Of, article on Transformations Of Data, are also relevant].
Breadth of inference. Whatever the mode of analysis, it is important to remember that the inference to which a statistical method directly relates is limited to the population actually experimented upon or surveyed. In the propaganda example, if the students are sampled from a single university, then the immediate inference is to that university only. Wider inferences—and these are usually wanted—presumably depend on subject-matter background and on intuition. Of course, the breadth of direct inference may be widened, for example, by repeating the study at different times, in different universities, in different areas, and so on. But, except in unusual cases, a limit is reached, if only the temporal one that experiments cannot be done now on future students.
Thus, in most cases, a scientific inference has two stages: the direct inference from the sample to the sampled population, and the indirect inference from the sampled population to a much wider, and usually rather vague, realm. That is why it is so important to try to check findings in a variety of contexts, for example, to test psychological generalizations obtained from experiments within one culture in some very different culture.
Formalization and precise theoretical treatment of the second stage represent a gap in present-day statistics (except perhaps for adherents of Bayesian methodology), although many say that the second step is intrinsically outside statistics. The general question of indirect inference is often mentioned and often forgotten; an early explicit treatment is by von Bortkiewicz (1909); a modern discussion in the context of research in sexual behavior is given by Cochran, Mosteller, and Tukey (1954, pp. 18-19, 21-22, 30-31).
An extreme case of the breadth-of-inference problem is represented by the case study, for example, an intensive study of the history of a single psycho-logically disturbed person. Indeed, some authors try to set up a sharp distinction between the method of case studies and what they call statistical methods. I do not feel that the distinction is very sharp. For one thing, statistical questions of measurement reliability arise even in the study of a single person. Further, some case studies, for example, in anthropology, are of a tribe or some other group of individuals, so that traditional sampling questions might well arise in drawing inferences about the single (collective) case.
Proponents of the case study approach emphasize its flexibility, its importance in attaining subjective insight, and its utility as a means of conjecturing interesting theoretical structures. If there is good reason to believe in small relevant intercase variability, then, of course, a single case does tell much about a larger population. The investigator, however, has responsibility for defending an assumption about small intercase variability. [Further discussion will be found in Interviewing; Observation, article on SOCIAL OBSERVATION AND SOCIAL CASE STUDIES.]
Other topics
Linear hypotheses. One way of classifying statistical topics is in terms of the kind of assumptions made, that is—looking toward applications—in terms of the structure of anticipated experiments or surveys for which the statistical methods will be used. The propaganda example, in which the central quantities are two proportions with integral numerators and denominators, falls under the general topic of the analysis of counted or qualitative data; this topic includes the treatment of so-called chi-square tests. Such an analysis would also be applicable if there were more than two groups, and it can be extended in other directions. [See Counted data.]
If, in the propaganda experiment, instead of proportions expressing one preference or the other, numerical scores on a multiquestion test were used to indicate quantitatively the leaning toward a candidate or political party, then the situation might come under the general rubric of linear hypotheses. To illustrate the ideas, suppose that there were more than two groups, say, four, of which the first was exposed to no propaganda, the second saw a motion picture, the third was given material to read, and the fourth heard a speaker, and that the scores of students under the four conditions are to be compared. Analysis-of-variance methods (many of which may be regarded as special cases of regression methods) are of central importance for such a study [see Linear Hypotheses, articles on Analysis Of Varianceand REGRESSION]. Multiple comparison methods are often used here, although —strictly speaking—they are not restricted to the analysis-of-variance context [see Linear Hypotheses, article on Multiple comparisons].
If the four groups differed primarily in some quantitative way, for example, in the number of sessions spent watching propaganda motion pictures, then regression methods in a narrower sense might come into play. One might, for example, suppose that average test score is roughly a linear function of number of motion picture sessions, and then center statistical attention on the constants (slope and intercept) of the linear function.
Multivariate statistics. “Regression” is a word with at least two meanings. A meaning somewhat different from, and historically earlier than, that described just above appears in statistical theory for multivariate analysis, that is, for situations in which more than one kind of observation is made on each individual or unit that is measured [see Multivariate analysis], For example, in an educational experiment on teaching methods, one might look at scores not only on a spelling examination, but on a grammar examination and on a reading-comprehension examination as well. Or in a physical anthropology study, one might measure several dimensions of each individual.
The simplest part of multivariate analysis is concerned with association between just two random variables and, in particular, with the important concept of correlation [see Statistics, Descriptive, article on ASSOCIATION; Multivariate Analysis, articles on CORRELATION]. These ideas extend to more than two random variables, and then new possibilities enter. An important one is that of partial association: how are spelling and grammar scores associated if reading comprehension is held fixed? The partial association notion is important in survey analysis, where a controlled experiment is often impossible [see Survey analysis; EXPERI-Mental Design, article on QUASI-Experimental design].
Multivariate analysis also considers statistical methods bearing on the joint structure of the means that correspond to the several kinds of observations, and on the whole correlation structure.
Factor analysis falls in the multivariate area, but it has a special history and a special relationship with psychology [see Factor analysis]. Factor-analytic methods try to replace a number of measurements by a few basic ones, together with residuals having a simple probability structure. For example, one might hop6 that spelling, grammar, and reading-comprehension abilities are all proportional to some quantity not directly observable, perhaps dubbed “linguistic skill,” that varies from person to person, plus residuals or deviations that are statistically independent.
The standard factor analysis model is one of a class of models generated by a process called mixing of probability distributions [see Distributions, Statistical, article on Mixtures Of distributions]. An interesting model of this general sort, but for discrete, rather than continuous, observations, is that of latent structure [see Latent structure].
Another important multivariate topic is classification and discrimination, which is the study of how to assign individuals to two or more groups on the basis of several measurements per individual [see Multivariate Analysis, article on Classification And discrimination]. Less well understood, but related, is the problem of clustering, or numerical taxonomy: what are useful ways for forming groups of individuals on the basis of several measurements on each? [See Clustering.]
Time series. Related to multivariate analysis, because of its stress on modes of statistical dependence, isJ the large field of time series analysis, sometimes given a title that includes the catchy phrase “stochastic processes.” An observed time series may be regarded as a realization of an under-lying stochastic process [see Time series]. The simplest sort of time series problem might arise when for each child in an educational experiment there is available a set of scores on spelling tests given each month during the school year. More difficult problems arise when there is no hope of observing more than a single series, for example, when the observations are on the monthly or yearly prices of wheat. In such cases—so common in economics—stringent structural assumptions are required, and even then analysis is not easy.
This encyclopedia's treatment of time series begins with a general overview, oriented primarily toward economic series. The overview is followed by a discussion of advanced methodology, mainly that of spectral analysis, which treats a time series as something like a radio signal that can be de-composed into subsignals at different frequencies, each with its own amount of energy. Next comes a treatment of cycles, with special discussion of how easy it is to be trapped into concluding that cycles exist when in fact only random variation is present. Finally, there is a discussion of the important technical problem raised by seasonal variation, and of adjustment to remove or mitigate its effect, The articles on business cycles should also be consulted [see Business cycles].
The topic of Markov chains might have been included under the time series category, but it is separate [see Markov chains]. The concept of a Markov chain is one of the simplest and most useful ways of relaxing the common assumption of independence. Methods based on the Markov chain idea have found application in the study of panels (for public opinion, budget analysis, etc.), of labor mobility, of changes in social class between generations, and so on [see, for example, Panel studies; Social mobility].
Sample surveys and related topics. The subject of sample surveys is important, both in theory and practice [see Sample surveys]. It originated in connection with surveys of economic and social characteristics of human populations, when samples were used rather than attempts at full coverage. But the techniques of sample surveys have been of great use in many other areas, for example in the evaluation of physical inventories of indus-trial equipment. The study of sample surveys is closely related to most of the other major fields of statistics, in particular to the design of experiments, but it is characterized by its emphasis on finite populations and on complex sampling plans.
Most academically oriented statisticians who think about sample surveys stress the importance of probability sampling—that is, of choosing the units to be observed by a plan that explicitly uses random numbers, so that the probabilities of possible samples are known. On the other hand, many actual sample surveys are not based upon probability sampling [for a discussion of the central issues of this somewhat ironical discrepancy, see Sample surveys, article on Nonprobability sampling].
Random numbers are important, not only for sample surveys, but for experimental design generally, and for simulation studies of many kinds [see Random numbers; Simulation].
An important topic in sample surveys (and, for that matter, throughout applied statistics) is that of nonsampling errors [see Errors, article on NON-Sampling errors]. Such errors stem, for example, from nonresponse in public opinion surveys, from observer and other biases in measurement, and from errors of computation. Interesting discussions of these problems, and of many others related to sampling, are given by Cochran, Mosteller, and Tukey (1954).
Sociologists have long been interested in survey research, but with historically different emphases from those of statisticians [see Survey analysis; INTERVIEWING]. The sociological stress has been much less on efficient design and sampling variation and much more on complex analyses of highly multivariate data. There is reason to hope that workers in these two streams of research are coming to understand each other's viewpoint.
Nonparametric analysis and related topics. I re-marked earlier that an important area of study is robustness, the degree of sensitivity of statistical methods to errors in assumptions. A particular kind of assumption error is that incurred when a special distributional form, for example, normality, is assumed when it does not in fact obtain. To meet this problem, one may seek alternate methods that are insensitive to form of distribution, and the study of such methods is called nonparametric analysis or distribution-free statistics [see Nonparametric statistics]. Such procedures as the sign test and many ranking methods fall into the nonparametric category.
For example, suppose that pairs of students— matched for age, sex, intelligence, and so on—-form the experimental material, and that for each pair it is determined entirely at random, as by the throw of a fair coin, which member of the pair is exposed to one teaching method (A) and which to another (B), After exposure to the assigned methods, the students are given an examination; a pair is scored positive if the method A student has the higher score, negative if the method B student has. If the two methods are equally effective, the number of positive scores has a binomial distribution with basic probability 1/2. If, however, method A is superior, the basic probability is greater than 1/2; if method B is superior, less than 1/2. The number of observed positives provides a simple test of the hypothesis of equivalence and a basis for estimating the amount of superiority that one of the teaching methods may have. (The above design is, of course, only sensible if matching is possible for most of the students.)
The topic of order statistics is also discussed in one of the articles on nonparametric analysis, although order statistics are at least as important for procedures that do make sharp distributional assumptions [see Nonparametric statistics, article on Order statistics]. There is, of course, no sharp boundary line for distribution-free procedures. First, many procedures based on narrow distributional assumptions turn out in fact to be robust, that is, to maintain some or all of their characteristics even when the assumptions are relaxed. Second, most distribution-free procedures are only partly so; for example, a distribution-free test will typically be independent of distributional form as regards its level of significance but not so as regards power (the probability of rejecting the null hypothesis when it is false). Again, most nonparametric procedures are nonrobust against dependence among the observations.
Nonparametric methods often arise naturally when observational materials are inherently non-metric, for example, when the results of an experiment or survey provide only rankings of test units by judges.
Sometimes the form of a distribution is worthy of special examination, and goodness-of-fit procedures are used [see Goodness of FIT]. For example, a psychological test may be standardized to a particular population so that test scores over the population have very nearly a unit-normal distribution. If the test is then administered to the individuals of a sample from a different population, the question may arise of whether the score distribution for the different population is still unit normal, and a goodness-of-fit test of unit-normality may be performed. More broadly, an analogous test might be frarned to test only normality, without specification of a particular normal distribution.
Some goodness-of-fit procedures, the so-called chi-square ones, may be regarded as falling under the counted-data rubric [see Counted data]. Others, especially when modified to provide confidence bands for an entire distribution, are usually studied under the banner of nonparametric analysis.
Dispersion. The study of dispersion, or variability, is a topic that deserves more attention than it often receives [see Variances, Statistical study OF]. For example, it might be of interest to compare several teaching methods as to the resulting heterogeneity of student scores. A particular method might give rise to a desirable average score by increasing greatly the scores of some students while leaving other students’ scores unchanged, thereby giving rise to great heterogeneity. Clearly, such a method has different consequences and applications than one that raises each student’s score by about the same amount.
(Terminology may be confusing here. The traditional topic of analysis of variance deals in substantial part with means, not variances, although it does so by looking at dispersions among the means.)
Design. Experimental design has already been mentioned. It deals with such problems as how many observations to take for a given level of accuracy, and how to assign the treatments or factors to experimental units. For example, in the study of teaching methods, the experimental units may be school classes, cross-classified by grade, kind of school, type of community, and the like. Experimental design deals with formal aspects of the structure of an experimental layout; a basic principle is that explicit randomization should be used in assigning “treatments” (here methods of teaching) to experimental units (here classes). Some-times it may be reasonable to suppose that randomization is inherent, supplied, as it were, by nature; but more often it is important to use so-called random numbers. Controversy centers on situations in which randomization is deemed impractical, un-ethical, or even impossible, although one may sometimes find clever ways to introduce randomization in cases where it seems hopeless at first glance. When randomization is absent, a term like “quasi experiment” may be used to emphasize its absence, and a major problem is that of obtaining as much protection as possible against the sources of bias that would have been largely eliminated by the unused randomization [see Experimental design, article on QUASI-Experimental design].
An important aspect of the design of experiments is the use of devices to ensure both that a (human) subject does not know which experimental treatment he is subjected to, and that the investigator who is measuring or observing effects of treatments does not know which treatments particular observed individuals have had. When proper precautions are taken along these two lines, the experiment is called double blind. Many experimental programs have been vitiated by neglect of these precautions. First, a subject who knows that he is taking a drug that it is hoped will improve his memory, or reduce his sensitivity to pain, may well change his behavior in response to the knowledge of what is expected as much as in physiological response to the drug itself. Hence, whenever possible, so-called placebo treatments (neutral but, on the surface, in-distinguishable from the real treatment) are administered to members of the control group. Second, an investigator who knows which subjects are having which treatments may easily, and quite un-consciously, have his observations biased by pre-conceived opinions. Problems may arise even if the investigator knows only which subjects are in the same group. Assignment to treatment by the use of random numbers, and random ordering of individuals for observation, are important devices to ensure impartiality.
The number of observations is traditionally regarded as fixed before sampling. In recent years, however, there have been many investigations of sequential designs in which observations are taken in a series (or in a series of groups of observations), with decisions made at each step whether to take further observations or to stop observing and turn to analysis [see Sequential analysis].
In many contexts a response (or its average value) is a function of several controlled variables. For example, average length of time to relearn the spellings of a list of words may depend on the number of prior learning sessions and the elapsed period since the last learning session. In the study of response surfaces, the structure of the dependence (thought of as the shape of a surface) is investi-gated by a series of experiments, typically with special interest in the neighborhood of a maximum or minimum [see Experimental design, article on Response surfaces].
Philosophy. Statistics has long had a neighborly relation with philosophy of science in the epistemo-logical city, although statistics has usually been more modest in scope and more pragmatic in out-look. In a strict sense, statistics is part of philosophy of science, but in fact the two areas are usually studied separately.
What are some problems that form part of the philosophy of science but are not generally regarded as part of statistics? A central one is that of the formation of scientific theories, their careful statement, and their confirmation or degree of confirmation. This last is to be distinguished from the narrower, but better understood, statistical concept of testing hypotheses. Another problem that many statisticians feel lies outside statistics is that of the gap between sampled and target population.
There are other areas of scientific philosophy that are not ordinarily regarded as part of statistics. Concepts like explanation, causation, operationalism, and free will come to mind.
A classic publication dealing with both statistics and scientific philosophy is Karl Pearson's Grammar of Science (1892). Two more recent such publications are Popper's Logic of Scientific Discovery (1935) and Braithwaite's Scientific Explanation (1953). By and large, nowadays, writers calling themselves statisticians and those calling themselves philosophers of science often refer to each other, but communication is restricted and piece-meal. [See Science, article on The philosophy OF SCIENCE; See also CAUSATION; POWER; PREDICTION; Scientific explanation.]
Measurement is an important topic for statistics, and it might well be mentioned here because some aspects of measurement are clearly philosophical. Roughly speaking, measurement is the process of assigning numbers (or categories) to objects on the basis of some operation. A measurement or datum is the resulting number (or category). But what is the epistemological underpinning for this concept? Should it be broadened to include more general kinds of data than numbers and categories? What kind of operations should be considered?
In particular, measurement scales are important, both in theory and practice. It is natural to say of one object that it is twice as heavy as another (in pounds, grams, or whatever—the unit is immaterial). But it seems silly to say that one object has twice the temperature of another in any of the everyday scales of temperature (as opposed to the absolute scale), if only because the ratio changes when one shifts, say, from Fahrenheit to Centi-grade degrees. On the other hand, it makes sense to say that one object is 100 degrees Fahrenheit hotter than another. Some measurements seem to make sense only insofar as they order units, for example, many subjective rankings; and some measurements are purely nominal or categorical, for example, country of birth. Some measurements are inherently circular, for example, wind direction or time of day. There has been heated discussion of the question of the meaningfulness or legitimacy of arithmetic manipulations of various kinds of measurements; does it make sense, for example, to average measurements of subjective loudness if the individual measurements give information only about ordinal relationships?
The following are some important publications that deal with measurement and that lead to the relevant literature at this date: Churchman and Ratoosh (1959); Coombs (1964); Pf anzagl (1959); Adams, Fagot, and Robinson (1965); Torgerson (1958); Stevens (1946); Suppes and Zinnes (1963). [See Statistics, Descriptive; also relevant are Psychometrics; Scaling; Utility.]
Communication and fallacies. There is an art of communication between statistician and non-statistician scientist: the statistician must be al-ways aware that the nonstatistician is in general not directly interested in technical minutiae or in the parochial jargon of statistics. In the other direction, consultation with a statistician often loses effectiveness because the nonstatistician fails to mention aspects of his work that are of statistical relevance. Of course, in most cases scientists serve as their own statisticians, in the same sense that people, except for hypochondriacs, serve as their own physicians most of the time.
Statistical fallacies are often subtle and may be committed by the most careful workers. A study of such fallacies has intrinsic interest and also aids in mitigating the communication problem just mentioned [see Fallacies, STATISTICAL; see also Errors, article on Nonsampling errors].
Criticisms
If statistics is defined broadly, in terms of the general study of the leap from observations to inference, decision, or whatever, then one can hardly quarrel with the desirability of a study so embracingly characterized. Criticisms of statistics, there-fore, are generally in terms of a narrower characterization, often the kind of activity named “statistics” that the critic sees about him. If, for example, a professor in some scientific field sees colleagues publishing clumsy analyses that they call statistical, then the professor may understand-ably develop a negative attitude toward statistics. He may not have an opportunity to learn that the subject is broader and that it may be used wisely, elegantly, and effectively.
Criticisms of probability in statistics. Some criticisms, in a philosophical vein, relate to the very use of probability models in statistics. For example, some writers have objected to probability because of a strict determinism in their Weltanschauung. This view is rare nowadays, with the success of highly probabilistic quantum methods in physics, and with the utility of probability models for clearly deterministic phenomena, for example, the effect of rounding errors in complex digital calculations. The deterministic critic, however, would probably say that quantum mechanics and probabilistic analysis of rounding errors are just temporary expedients, to be replaced later by nonprobabilistic approaches. For example, Einstein wrote in 1947 that
. . . the statistical interpretation [as in quantum mechanics] . . . has a considerable content of truth.
Yet I cannot seriously believe it because the theory is inconsistent with the principle that physics has to rep-resent a reality. .. . I am absolutely convinced that one will eventually arrive at a theory in which the objects connected by laws are not probabilities, but conceived facts. . . . However, I cannot provide logical arguments for my conviction, but can only call on my little finger as a witness, which cannot claim any authority to be respected outside my own skin. (Quoted in Born 1949, p. 123)
Other critics find vitiating contradictions and paradoxes in the ideas of probability and randomness. For example, G. Spencer Brown sweepingly wrote that
. . . the concept of probability used in statistical science is meaningless in its own terms [and] . . . , however meaningful it might have been, its meaningfulness would nevertheless have remained fruitless because of the impossibility of gaining information from experimental results. (1957, p. 66)
This rather nihilistic position is unusual and hard to reconcile with the many successful applications of probabilistic ideas. (Indeed, Spencer Brown went on to make constructive qualifications.) A less extreme but related view was expressed by Percy W. Bridgman (1959, pp. 110-111). Both these writers were influenced by statistical uses of tables of random numbers, especially in the con-text of parapsychology, where explanations of puzzling results were sought in the possible misbehavior of random numbers. [See Random numbers; see also PARAPSYCHOLOGY.]
Criticisms about limited utility. A more common criticism, notably among some physical scientists, is that they have little need for statistics because random variability in the problems they study is negligible, at least in comparison with systematic errors or biases. This position has also been taken by some economists, especially in connection with index numbers [see Index numbers, article on SAMPLING]. B. F. Skinner, a psychologist, has forcefully expressed a variant of this position: that there are so many important problems in which random variability is negligible that he will restrict his own research to them (see Skinner 1956 for a presentation of this rather extreme position). In fact, he further argues that the important problems in psychology as a field are the identification of variables that can be observed directly with negligible variability.
It often happens, nonetheless, that, upon detailed examination, random variability is more important than had been thought, especially for the design of future experiments. Further, careful experimental design can often reduce, or bring understanding of, systematic errors. I think that the above kind of criticism is sometimes valid—after all, a single satellite successfully orbiting the earth is enough to show that it can be done—but that usually the criticism represents unwillingness to consider statistical methods explicitly, or a semantic confusion about what statistics is.
Related to the above criticism is the view that statistics is fine for applied technology, but not for fundamental science. In his inaugural lecture at Birkbeck College at the University of London, David Cox countered this criticism. He said in his introduction,
. . . there is current a feeling that in some fields of fundamental research, statistical ideas are sometimes not just irrelevant, but may actually be harmful as a symptom of an over-empirical approach. This view, while understandable, seems to me to come from too narrow a concept of what statistical methods are about. (1961)
Cox went on to give examples of the use of statistics in fundamental research in physics, psychology, botany, and other fields.
Another variant of this criticism sometimes seen (Selvin 1957; Walberg 1966) is that such statistical procedures as hypothesis testing are of doubtful validity unless a classically arranged experiment is possible, complete with randomization, control groups, pre-establishment of hypotheses, and other safeguards. Without such an arrangement—which is sometimes not possible or practical—all kinds of bias may enter, mixing any actual effect with bias effects.
This criticism reflects a real problem of reasonable inference when a true experiment is not available [see Experimental design, article on QUASI-Experimental design], but it is not a criticism unique to special kinds of inference. The problem applies equally to any mode of analysis—formal, informal, or intuitive. A spirited discussion of this topic is given by Kish (1959).
Humanistic criticisms. Some criticisms of statistics represent serious misunderstandings or are really criticisms of poor statistical method, not of statistics per se. For example, one sometimes hears the argument that statistics is inhuman, that “you can't reduce people to numbers,” that statistics (and perhaps science more generally) must be battled by humanists. This is a statistical version of an old complaint, voiced in one form by Horace Walpole, in a letter to H. S. Conway (1778): “This sublime age reduces everything to its quintessence; all periphrases and expletives are so much in disuse, that I suppose soon the only way to [go about] making love will be to say 'Lie down”
A modern variation of this was expressed by W. H. Auden in the following lines:
Thou shalt not answer questionnaires Or quizzes upon World-Affairs,
Nor with compliance Take any test. Thou shalt not sit With statisticians nor commit A social science.
From “Under Which Lyre: A Reactionary Tract for the Times.” Reprinted from Nones, by W. H. Auden, by permission of Random House, Inc. Copyright 1946 by W. H. Auden.
Joseph Wood Krutch (1963) said, “I still think that a familiarity with the best that has been thought and said by men of letters is more helpful than all the sociologists' statistics” (“Through Happiness With Slide Rule and Calipers,” p. 14).
There are, of course, quite valid points buried in such captious and charming criticisms. It is easy to forget that things may be more complicated than they seem, that many important characteristics are extraordinarily difficult to measure or count, that scientists (and humanists alike) may lack professional humility, and that any set of measurements excludes others that might in principle have been made. But the humanistic attack is overdefensive and is a particular instance of what might be called the two-culture fallacy: the belief that science and the humanities are inherently different and necessarily in opposition.
Criticisms of overconcern with averages. Statisticians are sometimes teased about being interested only in averages, some of which are ludicrous: 2.35 children in an average family; or the rare disease that attacks people aged 40 on the average —two cases, one a child of 2 and the other a man of 78. (Chuckles from the gallery.)
Skinner made the point by observing that “no one goes to the circus to see the average dog jump through a hoop significantly oftener than untrained dogs raised under the same circumstances . . .” (1956, p. 228). Krutch said that “Statistics take no account of those who prefer to hear a different drummer” (1963, p. 15).
In fact, although averages are important, statisticians have long been deeply concerned about dispersions around averages and about other aspects of distributions, for example, in extreme values [see Nonparametric statistics, article on Order statistics; and Statistical analysis, Special problems Of, article on OUTLIERS].
In 1889 the criticism of averages was poetically made by Galton:
It is difficult to understand why statisticians commonly limit their inquiries to Averages, and do not revel in more comprehensive views. Their souls seem as dull to the charm of variety as that of the native of one of our flat English counties, whose retrospect of Switzerland was that, if its mountains could be thrown into its lakes, two nuisances would be got rid of at once, (p. 62)
Galton's critique was overstated even at its date, but it would be wholly inappropriate today.
Another passage from the same work by Galton refers to the kind of emotional resistance to statistics that was mentioned earlier:
Some people hate the very name of statistics, but I find them full of beauty and interest. Whenever they are not brutalized, but delicately handled by the higher methods, and are warily interpreted, their power of dealing with complicated phenomena is extraordinary. They are the only tools by which an opening can be cut through the formidable thicket of difficulties that bars the path of those who pursue the Science of man. (1889,pp. 62-63)
One basic source of misunderstanding about averages is that an individual may be average in many ways, yet appreciably nonaverage in others. This was the central difficulty with Quetelet's historically important concept of the average man [see the biography of QUETELET] ; a satirical novel about the point, by Robert A. Aurthur (1953), has appeared. The average number of children per family in a given population is meaningful and sometimes useful to know, for example, in estimating future population. There is, however, no such thing as the average family, if only because a family with an average number of children (assuming this number to be integral) would not be average in terms of the reciprocal of number of children. To put it another way, there is no reason to think that a family with the average number of children also has average income, or average education, or lives at the center of population of the country.
Criticisms of too much mathematics. The criticism is sometimes made—often by statisticians themselves—that statistics is too mathematical. The objection takes various forms, for example:
(1) Statisticians choose research problems be-cause of their mathematical interest or elegance and thus do not work on problems of real statistical concern. (Sometimes the last phrase simply refers to problems of concern to the critic.)
(2) The use of mathematical concepts and language obscures statistical thinking.
(3) Emphasis on mathematical aspects of statistics tends to make statisticians neglect problems of proper goals, meaningfulness of numerical statistics, and accuracy of data.
Critiques along these lines are given by, for example, W. S. Woytinsky (1954) and Corrado Gini (1951; 1959). A similar attack appears in Lancelot Hogben's Statistical Theory (1957). What can one say of this kind of criticism, whether it comes from within or without the profession? It has a venerable history that goes back to the early development of statistics. Perhaps the first quarrel of this kind was in the days when the word “statistics” was used, in a different sense than at present, to mean the systematic study of states, a kind of political science. The dispute was between those “statisticians” who provided discursive descriptions of states and those who cultivated the so-called Tabellenstatistik, which ranged from typo-graphically convenient arrangements of verbal summaries to actual tables of vital statistics. Descriptions of this quarrel are given by Westergaard (1932, pp. 12-15), Lundberg (1940), and Lazarsfeld (1961, especially p. 293).
The ad hominem argument—that someone is primarily a mathematician, and hence incapable of understanding truly statistical problems—has been and continues to be an unfortunately popular rhetorical device. In part it is probably a defensive reaction to the great status and prestige of mathematics.
In my view, a great deal of this kind of discussion has been beside the point, although some charges on all sides have doubtless been correct. If a part of mathematics proves helpful in statistics, then it will be used. As statisticians run onto mathematical problems, they will work on them, borrowing what they can from the store of current mathematical knowledge, and perhaps encouraging or carrying out appropriate mathematical research. To be sure, some statisticians adopt an unnecessarily mathematical manner of exposition. This may seem an irritating affectation to less mathematical colleagues, but who can really tell apart an affectation and a natural mode of communication?
An illuminating discussion about the relationship between mathematics and statistics, as well as about many other matters, is given by Tukey (1961).
Criticisms of obfuscation. Next, there is the charge that statistics is a meretricious mechanism to obfuscate or confuse: “Lies, damned lies, and statistics” (the origin of this canard is not entirely clear: see White 1964). A variant is the criticism that statistical analyses are impossible to follow, filled with unreadable charts, formulas, and jargon.
These points are often well taken of specific statistical or pseudostatistical writings, but they do not relate to statistics as a discipline. A popular book, How to Lie With Statistics (Huff 1954), is in fact a presentation of horrid errors in statistical description and analysis, although it could, of course, be used as a source for pernicious sophistry. It is somewhat as if there were a book called “How to Counterfeit Money,” intended as a guide to bank tellers—or the general public—in protecting themselves against false money.
George A. Lundberg made a cogent defense against one form of this criticism, in the following words:
. . . when we have to reckon with stupidity, incompetence, and illogic, the more specific the terminology and methods employed the more glaring will be the errors in the result. As a result, the errors of quantitative workers lend themselves more easily to detection and derision. An equivalent blunder by a manipulator of rhetoric may not only appear less flagrant, but may actually go unobserved or become a venerated platitude. (1940, p. 138)
Criticisms of sampling per se. One sometimes sees the allegation that it is impossible to make reasonable inferences from a sample to a population, especially if the sample is a small fraction of the population. A variant of this was stated by Joseph Papp: “The methodology . . . was not scientific: they used sampling and you can't draw a complete picture from samplings” (quoted in Kadushin 1966, p. 30).
This criticism has no justification except insofar as it impugns poor sampling methods. Samples have always been used, because it is often impractical or impossible to observe a whole population (one cannot test a new drug on every human being, or destructively test all electric fuses) or because it is more informative to make careful measurements on a sample than crude measurements on a whole population. Proper sampling—for which the absolute size of the sample is far more important than the fraction of the population it represents—is informative, and in constant successful use.
Criticisms of intellectual imperialism. The criticism is sometimes made that statistics is not the whole of scientific method and practice. Skinner said:
. . . it is a mistake to identify scientific practice with the formalized constructions [italics added] of statistics and scientific method. These disciplines have their place, but it does not coincide with the place of scientific research. They offer a method of science but not, as is so often implied, the method. As formal disciplines they arose very late in the history of science, and most of the facts of science have been discovered without their aid. (1956, p. 221)
I know of few statisticians so arrogant as to equate their field with scientific method generally. It is, of course, true that most scientific work has been done without the aid of statistics, narrowly construed as certain formal modes of analysis that are currently promulgated. On the other hand, a good deal of scientific writing is concerned, one way or another, with statistics, in the more general sense of asking how to make sensible inferences.
Skinner made another, somewhat related point: that, because of the prestige of statistics, statistical methods have (in psychology) acquired the honorific status of a shibboleth (1956, pp. 221, 231). Statisticians are sorrowfully aware of the shibboleth use of statistics in some areas of scientific research, but the profession can be blamed for this only because of some imperialistic textbooks— many of them not by proper statisticians.
Other areas of statistics
The remainder of this article is devoted to brief discussions of those statistical articles in the encyclopedia that have not been described earlier.
Grouped observations. The question of grouped observations is sometimes of concern: in much theoretical statistics measurements are assumed to be continuous, while in fact measurements are always discrete, so that there is inevitable grouping. In addition, one often wishes to group measurements further, for simplicity of description and analysis. To what extent are discreteness and grouping an advantage, and to what extent a danger? [See Statistical analysis, Special problems Of, article on Grouped observations.]
Truncation and censorship. Often observations may reasonably follow some standard model except that observations above (or below) certain values are proscribed (truncated or censored). A slightly more complex example occurs in comparing entrance test scores with post-training scores for students in a course; those students with low en-trance test scores may not be admitted and hence will not have post-training scores at all. Methods exist for handling such problems. [See Statistical analysis, Special problems Of, article OH Truncation and CENSORSHIP.]
Outliers. Very often a few observations in a sample will have unusually large or small values and may be regarded as outliers (or mavericks or wild values). How should one handle them? If they are carried along in an analysis, they may distort it. If they are arbitrarily suppressed, important information may be lost. Even if they are to be suppressed, what rule should be used? [See Statistical analysis, Special problems Of, article on OUTLIERS.]
Transformations of data. Transformations of data are often very useful. For example, one may take the logarithm of reaction time, the square root of a test score, and so on. The purposes of such a transformation are (1) to simplify the structure of the data, for example by achieving additivity of two kinds of effects, and (2) to make the data more nearly conform with a well-understood statistical model, for example by achieving near-normality or constancy of variance. A danger of transformations is that one's inferences may be shifted to some other scale than the one of basic interest. [See Statistical analysis, Special problems Of, article On Transformations of DATA.]
Approximations to distributions. Approximations to distributions are important in probability and statistics. First, one may want to approximate some theoretical distribution in order to have a simple analytic form or to get numerical values. Second, one may want to approximate empirical distributions for both descriptive and inferential purposes. [See Distributions, Statistical, article On Approximations to DISTRIBUTIONS.]
Identifiability—mixtures of distributions. The problem of identification appears whenever a precise model for some phenomenon is specified and parameters of the model are to be estimated from empirical observations [see Statistical identifiability]. What may happen—and may even fail to be recognized—is that the parameters are fundamentally incapable of estimation from the kind of data in question. Consider, for example, a learning theory model in which the proportion of learned material retained after a lapse of time is the ratio of two parameters of the model. Then, even if the proportion could be observed without any sampling fluctuation or measurement error, one would not separately know the two parameters. Of course, the identification problem arises primarily in contexts that are complex enough so that immediate recognition of nonidentifiability is not likely. Sometimes there arises an analogous problem, which might be called identifiability of the model. A classic example appears in the study of accident statistics: some kinds of these statistics are satisfactorily fitted by the negative binomial distribution, but that distribution itself may be obtained as the out-come of several quite different, more fundamental models. Some of these models illustrate the important concept of mixtures. A mixture is an important and useful way of forming a new distribution from two or more statistical distributions. [See Distributions, Statistical, article on Mixtures of DISTRIBUTIONS.]
Applications. Next described is a set of articles on special topics linked with specific areas of application, although most of these areas have served as motivating sources for general theory.
Quality control. Statistical quality control had its genesis in manufacturing industry, but its applications have since broadened [see Quality control, statistical]. There are three articles under this heading. The first is on acceptance sampling, where the usual context is that of “lots” of manufactured articles. Here there are close relations to hypothesis testing and to sequential analysis. The second is on process control (and so-called control charts), a topic that is sometimes itself called quality control, in a narrower sense than the usage here. The development of control chart concepts and methods relates to basic notions of randomness and stability, for an important normative concept is that of a process in control, that is, a process turning out a sequence of numbers that behave like independent, identically distributed random variables. The third topic is reliability and life testing, which also relates to matters more general than immediate engineering contexts. [The term “reliability” here has quite a different meaning than it has in the area of psychological testing; see PSYCHOMETRICS.]
Government statistics. Government statistics are of great importance for economic, social, and political decisions [see Government statistics]. The article on that subject treats such basic issues as the use of government statistics for political propaganda, the problem of confidentiality, and the meaning and accuracy of official statistics. [Some related articles are CENSUS; Economic data; MORTALITY; POPULATION; Vital statistics.]
Index numbers. Economic index numbers form an important part of government statistical programs [see Index numbers]. The three articles on this topic discuss, respectively, theory, practical aspects of index numbers, and sampling problems.
Statistics as legal evidence. The use of statistical methods, and their results, in judicial proceedings has been growing in recent years. Trade-mark disputes have been illuminated by sample surveys; questions of paternity have been investigated probabilistically; depreciation and other accounting quantities that arise in quasi-judicial hearings have been estimated statistically. There are conflicts or apparent conflicts between statistical methods and legal concepts like those relating to hearsay evidence. [See Statistics as Legal evidence.]
Statistical geography. Statistical geography, the use of statistical and other quantitative methods in geography, is a rapidly growing area [see Geography, article on Statistical geography]. Some-what related is the topic of rank-size, in which are studied—empirically and theoretically—patterns of relationship between, for example, the populations of cities and their rankings from most populous down. Another example is the relationship between the frequencies of words and their rankings from most frequent down. [See RANK-Size relations.]
Quantal response. Quantal response refers to a body of theory and method that might have been classed with counted data or under linear hypotheses with regression [see Quantal response]. An example of a quantal response problem would be one in which students are given one week, two weeks, and so on, of training (say 100 different students for each training period), and then proportions of students passing a test are observed. Of interest might be that length of training leading to exactly 50 per cent passing. Many traditional psychophysical problems may be regarded from this viewpoint [see Psychophysics].
Queues. The study of queues has been of importance in recent years; it is sometimes considered part of operations research, but it may also be considered a branch of the study of stochastic processes [see QUEUES; Operations research]. An example of queuing analysis is that of traffic flow at a street-crossing with a traffic light. The study has empirical, theoretical, and normative aspects.
Computation. Always intertwined with applied statistics, although distinct from it, has been computation [see COMPUTATION]. The recent advent of high-speed computers has produced a sequence of qualitative changes in the kind of computation that is practicable. This has had, and will continue to have, profound effects on statistics, not only as regards data handling and analysis, but also in theory, since many analytically intractable problems can now be attacked numerically by simulation on a high-speed computer [see Simulation].
Cybernetics. The currently fashionable term “cybernetics” is applied to a somewhat amorphous body of knowledge and research dealing with information processing and mechanisms, both living and nonliving [see Cyberneticsand HOMEOSTASIS]. The notions of control and feedback are central, and the influence of the modern high-speed computer has been strong. Sometimes this area is taken to include communication theory and information theory [see Information theory, which stresses applications to psychology].
William H. Kruskal
BIBLIOGRAPHY
General articles
Boehm, George a. W. 1964 The Science of Being Al-most Certain. Fortune 69, no. 2:104-107, 142, 144, 146, 148.
Kac, Mark 1964 Probability. Scientific American 211, no. 3:92-108.
Kendall, M. G. 1950 The Statistical Approach. Economica New Series 17:127-145.
Kruskal, William h. (1965) 1967 Statistics, Moliere, and Henry Adams. American Scientist 55:416-428. → Previously published in Volume 9 of Centennial Review.
Weaver, Warren 1952 Statistics. Scientific American 186, no. 1:60-63.
Introductions to Probability and STATISTICS
Borel, Smile f. E. J. (1943) 1962 Probabilities and Life. Translated by M. Baudin. New York: Dover. → First published in French.
Gnedenko, Boris v.; and Khinchin, Aleksandr ia. (1945) 1962 An Elementary Introduction to the Theory of Probability. Authorized edition. Translated from the 5th Russian edition, by Leo F. Boron, with the editorial collaboration of Sidney F. Mack. New York: Dover. → First published as Elementarnoe vvedenie v teoriiu veroiatnostei.
Moroney, M. J. (1951) 1958 Facts From Figures. 3d ed., rev. Harmondsworth (England): Penguin.
Mosteller, FREDERICK; Rourke, Robert e. K.; and Thomas, George b. JR. 1961 Probability With Statistical Applications. Reading, Mass.: Addison-Wesley.
Tippett, L. H. C. (1943) 1956 Statistics. 2d ed. New York: Oxford Univ. Press.
Wallis, W. ALLEN; and Roberts, Harry v. 1962 The Nature of Statistics. New York: Collier. → Based on material presented in the authors' Statistics: A New Approach (1956).
Weaver, Warren 1963 Lady Luck: The Theory of Probability. Garden City, N.Y.: Doubleday.
Youden, W. J. 1962 Experimentation and Measurement. New York: Scholastic Book Services.
ABSTRACTING JOURNALS
Mathematical Reviews. → Published since 1940.
Psychological Abstracts. → Published since 1927. Covers parts of the statistical literature.
Quality Control and Applied Statistics Abstracts. -* Published since 1956.
Referativnyi zhurnal: Matematika. → Published since 1953.
Statistical Theory and Method Abstracts. → Published since 1959.
Zentralblatt fur Mathematik und ihre Grenzgebiete. → Published since 1931.
WORKS CITED IN THE TEXT
Adams, Ernest w.; Fagot, Robert f.; and Robinson, Richard e. 1965 A Theory of Appropriate Statistics. Psychometrika 30:99-127.
Aurthur, Robert a. 1953 The Glorification of Al Toolum. New York: Rinehart.
Born, Max (1949) 1951 Natural Philosophy of Cause and Chance. Oxford: Clarendon.
Bortkiewicz, Ladislaus von 1909 Die statistischen Generalisationen. Scientia 5:102-121. → A French translation appears in a supplement to Volume 5, pages 58-75.
Braithwaite, R. B. 1953 Scientific Explanation: A Study of the Function of Theory, Probability and Law in Science. Cambridge Univ. Press. → A paperback edition was published in 1960 by Harper.
Bridgman, Percy w. 1959 The Way Things Are. Cam-bridge, Mass.: Harvard Univ. Press.
Brown, G. Spencer, see under Spencer brown, G.
Churchman, Charles w.; and Ratoosh, Philburn (editors) 1959 Measurement: Definitions and Theories. New York: Wiley.
Cochran, William g.; Mosteller, FREDERICK; and Tukey, John w. 1954 Statistical Problems of the Kinsey Report on Sexual Behavior in the Human Male. Washington: American Statistical Association.
Coombs, Clyde h. 1964 A Theory of Data. New York: Wiley.
Cox, D. R. 1961 The Role of Statistical Methods in Science and Technology. London: Birkbeck College.
Deming, W. Edwards 1965 Principles of Professional Statistical Practice. Annals of Mathematical Statistics 36:1883-1900.
Galton, Francis 1889 Natural Inheritance. London and New York: Macmillan.
Gini, Corrado 1951 Caractere des plus recents developpements de la methodologie statistique. Statistica 11:3-11.
Gini, Corrado 1959 Mathematics in Statistics. Metron 19, no. 3/4:1-9.
Hogben, Lancelot t. 1957 Statistical Theory; the Relationship of Probability, Credibility and Error: An Examination of the Contemporary Crisis in Statistical Theory From a Behaviourist Viewpoint. London: Allen & Unwin.
Huff, Darrell 1954 How to Lie With Statistics. New York: Norton.
Kadushin, Charles 1966 Shakespeare & Sociology. Co-lumbia University Forum 9, no. 2:25-31.
Kish, Leslie 1959 Some Statistical Problems in Research Design. American Sociological Review 24:328-338.
Krutch, Joseph wood 1963 Through Happiness With Slide Rule and Calipers. Saturday Review 46, no.44:12-15.
Lazarsfeld, Paul f. 1961 Notes on the History of Quantification in Sociology: Trends, Sources and Problems. Iszs 52, part 2:277-333. → Also included in Harry Woolf (editor), Quantification, published by Bobbs-Merrill in 1961.
Lundberg, George a. 1940 Statistics in Modern Social Thought. Pages 110-140 in Harry E. Barnes, Howard Becker, and Frances B. Becker (editors), Contemporary Social Theory. New York: Appleton.
Pearson, Karl (1892) 1957 The Grammar of Science. 3d ed., rev. & enl. New York: Meridian. → The first and second editions (1892 and 1900) contain material not in the third edition.
Pfanzagl, J. 1959 Die axiomatischen Grundlagen einer allgemeinen Theorie des Messens. A publication of the Statistical Institute of the University of Vienna, New Series, No. 1. Würzburg (Germany): Physica-Verlag.
→ Scheduled for publication in English under the title The Theory of Measurement in 1968 by Wiley.
Popper, Karl r. (1935) 1959 The Logic of Scientific Discovery. Rev. ed. New York: Basic Books; London: Hutchinson. → First published as Logik der Forschung. A paperback edition was published in 1961 by Harper.
Selvin, Hanan c. 1957 A Critique of Tests of Significance in Survey Research. American Sociological Review 22:519-527. → See Volume 23, pages 85-86 and 199-200, for responses by David Gold and James M. Beshers.
Selvin, Hanan c.; and Stuart, Alan 1966 Data-dredging Procedures in Survey Analysis. American Statistician 20, no. 3:20-23.
Skinner, B. F. 1956 A Case History in Scientific Method. American Psychologist 11:221-233.
Spencer brown, G. 1957 Probability and Scientific Inference. London: Longmans. → The author's surname is Spencer Brown, but common library practice is to alphabetize his works under Brown.
Stevens, S. S. 1946 On the Theory of Scales of Measurement. Science 103:677-680.
Suppes, Patrick; and Zinnes, Joseph l. 1963 Basic Measurement Theory. Volume 1, pages 1-76 in R. Duncan Luce, Robert R. Bush, and Eugene Galanter (editors), Handbook of Mathematical Psychology. New York: Wiley.
Torgerson, Warren s. 1958 Theory and Methods of Scaling. New York: Wiley.
Tukey, John w. 1961 Statistical and Quantitative Methodology. Pages 84-136 in Donald P. Ray (editor), Trends in Social Science. New York: Philosophical Library.
Tukey, John w. 1962 The Future of Data Analysis. Annals of Mathematical Statistics 33:1-67, 812.
Walberg, HerrbertJ. 1966 When Are Statistics Appropriate? Science 154:330-332. → Follow-up letters by Julian C. Stanley, “Studies of Nonrandom Groups,” and by Herbert J. Walberg, “Statistical Randomization in the Behavioral Sciences,” were published in Volume 155, on page 953, and Volume 156, on page 314, respectively.
Walpole, Horace (1778) 1904 [Letter] To the Hon. Henry Seymour Conway. Vol. 10, pages 337-338 in Horace Walpole, The Letters of Horace Walpole, Fourth Earl of Orford. Edited by Paget Toynbee. Oxford: Clarendon Press.
Westergaard, Harald l. 1932 Contributions to the History of Statistics. London: King.
White, Colin 1964 Unkind Cuts at Statisticians. American Statistician 18, no. 5:15-17.
Woytinsky, W. S. 1954 Limits of Mathematics in Statistics. American Statistician 8, no. 1:6-10, 18.
II. THE HISTORY OF STATISTICAL METHOD
The broad river of thought that today is known as theoretical statistics cannot be traced back to a single source springing identifiably from the rock. Rather is it the confluence, over two centuries, of a number of tributary streams from many different regions. Probability theory originated at the gaming table; the collection of statistical facts began with state requirements of soldiers and money; marine insurance began with the wrecks and piracy of the ancient Mediterranean; modern studies of mortality have their roots in the plague pits of the seventeenth century; the theory of errors was created in astronomy, the theory of correlation in biology, the theory of experimental design in agriculture, the theory of time series in economics and meteorology, the theories of component analysis and ranking in psychology, and the theory of chi-square methods in sociology. In retrospect it almost seems as if every phase of human life and every science has contributed something of importance to the subject. Its history is accordingly the more interesting, but the more difficult, to write.
Early history
Up to about 1850 the word “statistics” was used in quite a different sense from the present one. It meant information about political states, the kind of material that is nowadays to be found assembled in the Statesman's Year-book. Such information was usually, although not necessarily, numerical, and, as it increased in quantity and scope, developed into tabular form. By a natural transfer of meaning, “statistics” came to mean any numerical material that arose in observation of the external world. At the end of the nineteenth century this usage was accepted. Before that time, there were, of course, many problems in statistical methodology considered under other names; but the recognition of their common elements as part of a science of statistics was of relatively late occurrence. The modern theory of statistics (an expression much to be preferred to “mathematical statistics”) is the theory of numerical information of almost every kind.
The characteristic feature of such numerical material is that it derives from a set of objects, technically known as a “population,” and that any particular variable under measurement has a distribution of frequencies over the members of the set. The height of man, for example, is not identical for every individual but varies from man to man. Nevertheless, we find that the frequency distribution of heights of men in a given population has a definite pattern that can be expressed by a relatively simple mathematical formula. Often the “population” may be conceptual but nonexistent, as for instance when we consider the possible tosses of a penny or the possible measurements that may be made of the transit times of a star. This concept of a distribution of measurements, rather than a single measurement, is fundamental to the whole subject. In consequence, points of statistical interest concern the properties of aggregates, rather than of individuals; and the elementary parts of theoretical statistics are much concerned with summarizing these properties in such measures as averages, index numbers, dispersion measures, and so forth.
The simpler facts concerning aggregates of measurements must, of course, have been known almost from the moment when measurements began to be made. The idea of regularity in the patterning of discrete repeatable chance events, such as dice throwing, emerged relatively early and is found explicitly in Galileo's work. The notion that measurements on natural phenomena should exhibit similar regularities, which are mathematically expressible, seems to have originated in astronomy, in connection with measurements on star transits. After some early false starts it became known that observations of a magnitude were subject to error even when the observer was trained and unbiased. Various hypotheses about the pattern of such errors were propounded. Simpson (1757) was the first to consider a continuous distribution, that is to say, a distribution of a variable that could take any values in a continuous range. By the end of the eighteenth century Laplace and Gauss had considered several such mathematically specified distributions and, in particular, had discovered the most famous of them all, the so-called normal distribution [see Distributions, Statistical, article on Special continuous DISTRIBUTIONS].
In these studies there was assumed to be a “true” value underlying the distribution. Departures from this true value were “errors.” They were, so to speak, extraneous to the object of the study, which was to estimate this true value. Early in the nineteenth century a major step forward was taken with the recognition (especially by Quetelet) that living material also exhibited frequency distributions of definite pattern. Furthermore, Galton and Karl Pearson, from about 1880, showed that these distributions were often skew or asymmetrical, in the sense that the shape of the frequency curve for values above the mean was not the mirror image of the curve for values below the mean. In particular it became impossible to maintain that the deviations from the mean were “errors” or that there existed a “true” value; the frequency distribution itself was to be recognized as a fundamental property of the aggregate. Immediately, similar patterns of regularity were brought to light in nearly every branch of science—genetics, biology, meteorology, economics, sociology—and even in some of the arts: distributions of weak verse endings were used to date Shakespeare's plays, and the distribution of words has been used to discuss cases of disputed authorship.
Nowadays the concept of frequency distribution is closely bound up with the notion of probability distribution. Some writers of the twentieth century treat the two things as practically synonymous. Historically, however, the two were not always identified and to some extent pursued independent courses for centuries before coming together. We must go back several millenniums if we wish to trace the concept of probability to its source.
From very ancient times man gambled with primitive instruments, such as astragali and dice, and also used chance mechanisms for divinatory purposes. Rather surprisingly, it does not seem that the Greeks, Romans, or the nations of medieval Europe arrived at any clear notion of the laws of chance. Elementary combinatorics appears to have been known to the Arabs and to Renaissance mathematicians, but as a branch of algebra rather than in a probabilistic context. Nevertheless, chance itself was familiar enough, especially in gambling, which was widespread in spite of constant discouragement from church and state. Some primitive ideas of relative frequency of occurrence can hardly have failed to emerge, but a doctrine of chances was extraordinarily late in coming. The first record we have of anything remotely resembling the modern idea of calculating chances occurs in a fifteenth-century poem called De vetula. The famous mathematician and physicist Geronimo Cardano was the first to leave a manuscript in which the concept of laws of chance was explicitly set out (Ore 1953). Galileo left a fragment that shows that he clearly understood the method of calculating chances at dice. Not until the work of Huygens (1657), the correspondence between Pascal and Fermat, and the work of Jacques Bernoulli (1713) do we find the beginnings of a calculus of probability.
This remarkable delay in the mathematical formulation of regularity in events that had been observed by gamblers over thousands of years is probably to be explained by the philosophical and religious ideas of the times, at least in the Western world. To the ancients, events were mysterious; they could be influenced by superhuman beings but no being was in control of the universe. On the other hand, to the Christians everything occurred under the will of God, and in a sense there was no chance; it was almost impious to suppose that events happened under the blind laws of probability. Whatever the explanation may be, it was not until Europe had freed itself from the dogma of the medieval theologian that a calculus of probability became possible.
Once the theory of probability had been founded, it developed with great rapidity. Only a hundred years separates the two greatest works in this branch of the subject, Bernoulli's Ars conjectandi (1713) and Laplace's Théorie analytique des probabilités (1812). Bernoulli exemplified his work mainly in terms of games of chance, and subsequent mathematical work followed the same line. Montmort's work was concerned entirely with gaming, and de Moivre stated most of his results in similar terms, although actuarial applications were always present in his mind (see Todhunter [1865] 1949, pp. 78-134 for Montmort and pp. 135-193 for de Moivre). With Laplace, Condorcet, and rather later, Poisson, we begin to find probabilistic ideas applied to practical problems; for example, Laplace discussed the plausibility of the nebular hypothesis of the solar system in terms of the probability of the planetary orbits lying as nearly in a plane as they do. Condorcet (1785) was concerned with the probability of reaching decisions under various systems of voting, and Poisson (1837) was specifically concerned with the probability of reaching correct conclusions from imperfect evidence. A famous essay of Thomas Bayes (1764) broke new ground by its consideration of probability in inductive reasoning, that is to say, the use of the probabilities of observed events to compare the plausibility of hypotheses that could explain them [see Bayesian inference].
The linkage between classical probability theory and statistics (in the sense of the science of regularity in aggregates of natural phenomena) did not take place at any identifiable point of time. It occurred somewhere along a road with clearly traceable lines of progress but no monumental milestones. The critical point, however, must have been the realization that probabilities were not always to be calculated a priori, as in games of chance, but were measurable constants of the external world. In classical probability theory the probabilities of primitive events were always specified on prior grounds: dice were “fair” in the sense that each side had an equal chance of falling uppermost, cards were shuffled and dealt “at random,” and so on. A good deal of probability theory was concerned with the pure mathematics of deriving the probabilities of complicated contingent events from these more primitive events whose probabilities were known. However, when sampling from an observed frequency distribution, the basic probabilities are not known but are parameters to be estimated. It took some time, perhaps fifty years, for the implications of this notion to be fully realized. Once it was, statistics embraced probability and the subject was poised for the immense development that has occurred over the past century.
Once more, however, we must go back to another contributory subject—insurance, and particularly life insurance. Although some mathematicians, notably Edmund Halley, Abraham de Moivre, and Daniel Bernoulli, made important contributions to demography and insurance studies, for the most part actuarial science pursued a course of its own. The founders of the subject were John Graunt and William Petty. Graunt, spurred on by the information contained in the bills of mortality prepared in connection with the great plague (which hit England in 1665), was the first to reason about demographic material in a modern statistical way. Considering the limitations of his data, his work was a beautiful piece of reasoning. Before long, life tables were under construction and formed the basis of the somewhat intricate calculations of the modern actuary [see Life tables]. In the middle of the eighteenth century, some nations of the Western world began to take systematic censuses of population and to record causes of mortality, an example that was soon followed by all [see CENSUS; Vital statistics]. Life insurance became an exact science. It not only contributed an observable frequency distribution with a clearly defined associated calculus; it also contributed an idea that was to grow into a dynamic theory of probability—the concept of a population moving through time in an evolutionary way. Here and there, too, we find demographic material stimulating statistical studies, for example, in the study of the mysteries of the sex ratio of human births.
Modern history
1890-1940. If we have to choose a date at which the modern theory of statistics began, we may put it, somewhat arbitrarily, at 1890. Francis Galton was then 68 but still had twenty years of productive life before him. A professor of economics named Francis Ysidro Edgeworth (then age 45) was calling attention to statistical regularities in election results, Greek verse, and the mating of bees and was about to propound a remarkable generalization of the law of error. A young man named Karl Pearson (age 35) had just been joined by the biologist Walter Weldon at University College, London, and was meditating the lectures that ultimately became The Grammar of Science. A student named George Udny Yule, at the age of 20, had caught Pearson's eye. And in that year was born the greatest of them all, Ronald Aylmer Fisher. For the next forty years, notwithstanding Russian work in probability theory—notably the work of Andrei Markov and Aleksandr Chuprov— developments in theoretical statistics were predominantly English. At that point, there was something akin to an intellectual explosion in the United States and India. France was already pursuing an individual line in probability theory under the inspiration of Émile Borel and Paul Levy, and Italy, under the influence of Corrado Gini, was also developing independently. But at the close of World War n the subject transcended all national boundaries and had become one of the accepted disciplines of the scientific, technological, and industrial worlds.
The world of 1890, with its futile power politics, its class struggles, its imperialism, and its primitive educational system, is far away. But it is still possible to recapture the intellectual excitement with which science began to extend its domain into humanitarian subjects. Life was as mysterious as ever, but it was found to obey laws. Human society was seen as subject to statistical inquiry, as an evolutionary entity under human control. It was no accident that Galton founded the science of eugenics and Karl Pearson took a militant part in some of the social conflicts of his time. Statistical science to them was a new instrument for the exploration of the living world, and the behavioral sciences at last showed signs of structure that would admit of mathematical analysis.
In London, Pearson and Weldon soon began to exhibit frequency distributions in all kinds of fields. Carl Charlier in Sweden, Jacobus Kapteyn and Johan van Uven in Holland, and Vilfredo Pareto in Italy, to mention only a few, contributed results from many different sciences. Pearson developed his system of mathematical curves to fit these observations, and Edgeworth and Charlier began to consider systems based on the sum of terms in a series analogous to a Taylor expansion. It was found that the normal curve did not fit most observed distributions but that it was a fair approximation to many of them.
Relationships between variables. About 1890, Pearson, stimulated by some work of Gallon, began to investigate bivariale dislribulions, that is to say, the distribution in a two-way table of frequencies of members, each of which bore a value of two variables. The patterns, especially in the biological field where data were most plentiful, were equally typical. In much observed material there were relationships between variables, but they were not of a mathematically functional form. The length and breadth of oak leaves, for example, were dependent in the sense that a high value of one tended to occur with a high value of the other. But there was no formula expressing this relationship in the familiar deterministic language of physics. There had to be developed a new kind of relationship to describe this type of connection. In the theory of allribules this led to measures of association and contingency [see Statistics, DESCRIPTIVE] ; in the theory of variables it led to correlation and regression [see Linear hypotheses; Multivariate analysis, articles on CORRELATION].
The theory of statistical relationship, and especially of regression, has been studied continuously and intensively ever since. Most writers on statistics have made contributions at one time or another. The work was still going strong in the middle of the twentieth century. Earlier writers, such as Pearson and Yule, were largely concerned with linear regression, in which the value of one variable is expressed as a linear function of the others plus a random term. Later authors extended the theory to cover several dependent variables and curvilinear cases; and Fisher in particular was instrumental in emphasizing the importance of rendering the explanatory variables independent, so far as possible.
Sampling. It was not long before statisticians were brought up against a problem that is still, in one form or another, basic to most of their work. In the majority of cases the data with which they were presented were only samples from a larger population. The problems then arose as to how reliable the samples were, how to estimate from them values of parameters describing the parent population, and, in general, what kinds of inference could be based on them.
Some intuitive ideas on the subject occur as far back as the eighteenth century; but the sampling problem, and the possibility of treating it with mathematical precision, was not fully appreciated until the twentieth century.
Classical error theory, especially the work of Carl Friedrich Gauss in the first half of the nineteenth century, had considered sampling distributions of a simple kind. For example, the chi-square distribution arose in 1875 when the German geodesist Friedrich Helmert worked out the distribution of sample variance for sampling from a normal population. The same chi-square distribution was independently rediscovered in 1900 by Karl Pearson in a quite different context, that of testing distributional goodness of fit [see Counted data; Goodness of FIT]. In another direction, Pearson developed a wide range of asymptotic formulas for standard errors of sample quantities. The mathematics of many so-called small-sample distribution problems presented difficulties with which Pearson was unable to cope, despite valiant attempts. William Cosset, a student of Pearson's, produced in 1908 one of the most important statistical distributions under the pseudonym of “Student”; and this distribution, arising from a basic small sample problem, is known as that of Student's t [see Distributions, Statistical].
It was Student and R . A. Fisher (beginning in 1913) who inaugurated a new era in the study of sampling distributions. Fisher himself made major contributions to the subject over the ensuing thirty years. In rapid succession he found the distribution, in samples from a normal population, of the correlation coefficient, regression coefficients, multiple correlation coefficients, and the ratio of variances known as F. Other writers, notably John Wishart in England, Harold Hotelling and Samuel Wilks in the United States, and S. N. Roy and R. C. Bose in India, added a large number of new results, especially in the field of multivariate analysis. More recently, T. W. Anderson has advanced somewhat farther the frontiers of knowledge in this rather difficult mathematical field.
Concurrently with these spectacular mathematical successes in the derivation of sampling distributions, methods were also devised for obtaining approximations. Again R. A. Fisher was in the lead with a paper (1928) introducing the so-called fe-statistics, functions of sample values that have simplifying mathematical properties.
The question whether a sampling method is random is a subtle one. It does not always trouble an experimental scientist, when he can select his material by a chance mechanism. However, sometimes the data are provided by nature, and whether they are a random selection from the available population is difficult to determine. In the sampling of human beings difficulties are accentuated by the fact that people may react to the sampling process. As sampling methods spread to the social sciences, the problems of obtaining valid samples at low cost from a wide geographical scatter of human beings became increasingly important, and some new problems of respondent bias arose. In consequence, the sampling of humans for social inquiry has almost developed into a separate subject, dependent partly on psychological matters, such as how questions should be framed to avoid bias, and partly on expense. By 1960 sampling errors in social surveys were well under control; but many problems remained for exploration, notably those of drawing samples of individuals with relatively rare and specialized characteristics, such as retail pharmacists or sufferers from lung cancer. Designing a sample was accepted as just as much a matter of expertise as designing a house [see Interviewing; Sample surveys; Survey analysis].
The control of the sample and the derivation of sampling distributions were, of course, only means to an end, which was the drawing of accurate inferences from the sample that ultimately resulted. We shall say more about the general question of inference below, but it is convenient to notice here the emergence, between 1925 and 1935, of two branches of the subject: the theory of estimation, under the inspiration of Fisher, and the theory of hypothesis testing, under the inspiration of Karl Pearson's son Egon and Jerzy Neyman [see Estimation; Hypothesis testing].
Estimation. Up to 1914 (which, owing to World War i, actually means up to 1920), the then current ideas on estimation from a sample were intuitive and far from clear. For the most part, an estimate was constructed from a sample as though it were being constructed for a population (for example, the sample mean was an “obvious” estimate of the parent population mean). A few writers—Daniel Bernoulli, Laplace, Gauss, Markov, and Edgeworth—had considered the problem, asked the right questions, and sometimes found partial answers. Ideas on the subject were clarified and extended in a notable paper by Fisher (1925). He introduced the concepts of optimal estimators and of efficiency in estimation, and emphasized the importance of the so-called method of maximum likelihood as providing a very general technique for obtaining “best” estimators. These ideas were propounded to a world that was just about ripe for them, and the theory of estimation developed at a remarkable rate in the ensuing decades.
The related problem of gauging the reliability of an estimate, that is, of surrounding it with a band of error (which has associated with it a designated probability) led to two very different lines of development, the confidence intervals of Egon Pearson and Neyman and the “fiducial intervals” of Fisher, both originating between 1925 and 1930 [see Estimation, article on Confidence intervals And regions; Fiducial inference]. The two proceeded fairly amiably side by side for a few years, and at the time it seemed that they were equivalent; they certainly led to the same results in simpler cases. However, it became clear about 1935 that they were conceptually very different, and a great deal of argument developed which had not been resolved even at the time of Fisher's death in 1962. Fortunately the controversy, although embittered, did not impede progress. (Omitted at this point is any discussion of Bayesian methods, which may lead to intervals resembling superficially those of confidence and fiducial approaches; Bayesian methods are mentioned briefly below.) [See Bayesian inferencefor a detailed discussion.]
Hypothesis testing. In a like manner, the work of Neyman and Pearson (beginning in 1928) on the theory of statistical tests gave a very necessary clarity to procedures that had hitherto been vague and unsatisfactory. In probabilistic terms the older type of inference had been of this type: If a certain hypothesis were true, the probability that I should observe the actual sample that I have drawn, or one more extreme, is very small; therefore the hypothesis is probably untrue. Neyman and Pearson pointed out that a hypothesis could not be tested in vacuo but only in comparison with other hypotheses. They set up a theory of tests and—as in the case of estimation, with which this subject is intimately linked—discussed power, relative efficiency, and optimality of tests. Here also there was some controversy, but for the most part the Neyman—Pearson theory was generally accepted and had become standard practice by 1950.
Experimental design and analysis. Concurrently with developments in sampling theory, estimation, and hypothesis testing, there was growing rapidly, between 1920 and 1940, a theory of experimental design based again on the work of Fisher. Very early in his career, it had become clear to him that in multivariate situations the “explanation” of one variable in terms of a set of dependent or explanatory variables was rendered difficult, if not impossible, where correlations existed among the explanatory variables themselves; for it then became impossible to say how much of an effect was attributable to a particular cause. This difficulty, which still bedevils the general theory of regression, could be overcome if the explanatory variables could be rendered statistically independent. (This, incidentally, was the genesis of the use of orthogonal polynomials in curvilinear regression analysis.) Fisher recognized that in experimental situations where the design of the experiment was, within limits, at choice, it could be arranged that the effects of different factors were “orthogonal,” that is, independent, so that they could be disentangled. From this notion, coupled with probabilistic interpretations of significance and the necessary mathematical tests, he built up a most remarkable system of experimental design. The new methods were tested at the Rothamsted Experimental Station in England but were rapidly spread by an active and able group of disciples into all scientific fields.
Some earlier work, particularly by Wilhelm Lexis in Germany at the close of the nineteenth century, had called attention to the fact that in sampling from nonhomogeneous populations the formulas of classical probability were a poor representation of the observed effects. This led to attempts to split the sampling variation into components; one, for example, representing the inevitable fluctuation of sampling, another representing the differences between the sections or subpopulations from which members were drawn. In Fisher's hands these ideas were extended and given precision in what is known as the analysis of variance, one of the most powerful tools of modern statistics. The methods were later extended to cover the simultaneous variation of several variables in the analysis of covariance. [See Linear hypotheses, article On Analysis of VARIANCE.]
It may be remarked, incidentally, that the problems brought up by these various developments in theoretical statistics have proved an immense challenge to mathematicians. Many branches of abstract mathematics—invariants, symmetric functions, groups, finite geometries, n-dimensional geometry, as well as the whole field of analysis— have been brought effectively into play in solving practical problems. After World War II the advent of the electronic computer was a vital adjunct to the solution of problems where even the resources of modern mathematics failed. Sampling experiments became possible on a scale never dreamed of before.
Recent developments . So much occurred in the statistical domain between 1920 and 1940 that it is not easy to give a clear account of the various currents of development. We may, however, pause at 1940 to look backward. In Europe, and to a smaller extent in the United States, World War u provided an interregnum, during which much was absorbed and a good deal of practical work was done, but, of necessity, theoretical developments had to wait, at least as far as publication was concerned. The theory of statistical distributions and of statistical relationship had been firmly established by 1940. In sampling theory many mathematical problems had been solved, and methods of approach to outstanding problems had been devised. The groundwork of experimental design had been firmly laid. The basic problems of inference had been explicitly set out and solutions reached over a fairly wide area. What is equally important for the development of the subject, there was about to occur a phenomenal increase in the number of statisticians in academic life, in government work, and in business. By 1945 the subject was ready for decades of vigorous and productive exploration.
Much of this work followed in the direct line of earlier work. The pioneers had left sizable areas undeveloped; and in consequence, work on distribution theory, sampling, and regression analysis continued in fair volume without any fundamental change in concept. Among the newer fields of attention we may notice in particular sequential analysis, decision function theory, multivariate analysis, time series and stochastic processes, statistical inference, and distribution-free, or non-parametric, methods [see Decision theory; Markov chains; Multivariate analysis; Non-Parametric statistics; Queues; Sequential analysis; Time series].
Sequential analysis. During World War n it was realized by George Barnard in England and Abraham Wald in the United States that some types of sampling were wasteful in that they involved scrutinizing a sample of fixed size even if the examination of the first few members already indicated the decision to be made. This led to a theory of sequential sampling, in which the sample number is not fixed in advance but at each stage in the sampling a decision is made whether to continue or not. This work was applied with success to the control of the quality of manufactured products, and it was soon also realized that a great deal of scientific inquiry was, in fact, sequential in character. [See Quality control, STATISTICAL.]
Decision functions. Wald was led to consider a more general approach, which linked up with Neyman's ideas on hypothesis testing and developed into a theory of decision functions. The basic idea was that at certain stages decisions have to be made, for example, to accept or reject a hypothesis. The object of the theory is to lay down a set of rules under which these decisions can be intelligently made; and, if it is possible to specify penalties for taking wrong decisions, to optimize the method of choice according to some criterion, such as minimizing the risk of loss. The theory had great intellectual attraction and even led some statisticians to claim that the whole of statistics was a branch of decision-function theory, a claim that was hotly resisted in some quarters and may or may not stand up to deeper examination.
Multivariate problems. By 1950 the mathematical development of some branches of statistical theory had, in certain directions, outrun their practical usefulness. This was true of multivariate analysis based on normal distributions. In the more general theory of multivariate problems, several lines of development were pursued. One part of the theory attempts to reduce the number of effective dimensions, especially by component analysis and, as developed by psychologists, factor analysis [see Factor analysis}. Another, known as canonical correlation analysis, attempts to generalize correlation to the relationship between two vector quantities. A third generalizes distribution theory and sampling to multidimensional cases. The difficulties are formidable, but a good deal of progress has been made. One problem has been to find practical data that would bear the weight of the complex analysis that resulted. The highspeed computer may be a valuable tool in further work in this field. [See Computation.]
Time series and stochastic processes. Perhaps the most extensive developments after World War II were in the field of time series and stochastic processes generally. The problem of analyzing a time series has particular difficulties of its own. The system under examination may have a trend present and may have seasonal fluctuations. The classical method of approach was to dissect the series into trend, seasonal movement, oscillatory effects, and residual; but there is always danger that an analysis of this kind is an artifact that does not correspond to the causal factors at work, so that projection into the future is unreliable. Even where trend is absent, or has been abstracted, the analysis of oscillatory movements is a treacherous process. Attempts to apply harmonic analysis to economic data, and hence to elicit “cycles,” were usually failures, owing to the fact that observed fluctuations were not regular in period, phase, or amplitude [see Time series, article on CYCLES].
The basic work on time series was done by Yule between 1925 and 1930. He introduced what is now known as an autoregressive process, in which the value of the series at any point is a linear function of certain previous values plus a random residual. The behavior of the series is then determined, so to speak, partly by the momentum of past history and partly by unpredictable disturbance. In the course of this work Yule introduced serial correlations, which measure the relationship between terms of the series separated by specified time intervals. It was later realized that these functions are closely allied to the coefficients that arise in the Fourier analysis of the series.
World War n acted as a kind of incubatory period. Immediately afterward it was appreciated that Yule's method of analyzing oscillatory movements in time series was only part of a much larger field, which was not confined to movements through time. Earlier pioneer work by several writers, notably Louis Bachelier, Eugen Slutsky, and Andrei Markov, was brought together and formed the starting point of a new branch of probability theory. Any system that passes through a succession of states falls within its scope, provided that the transition from one state to the next is decided by a schedule of probabilities and is not purely deterministic. Such systems are known as stochastic processes. A very wide variety of situations falls within their scope, among them epidemics, stock control, traffic movements, and queues. They may be regarded as constituting a probability theory of movement, as distinct from the classical systems in which the generating mechanism behind the observations was constant and the successive observations were independent. From 1945 onward there was a continual stream of papers on the subject, many of which were contributed by Russian and French authors [see Markov chains; Queues].
Some philosophical questions. Common to all this work was a constant re-examination of the logic of the inferential processes involved. The problem of making meaningful statements about the world on the basis of examination of only a small part of it had exercised a series of thinkers from Bacon onward, notably George Boole, John Stuart Mill, and John Venn, but it remained essentially unsolved and was regarded by some as constituting more of a philosophical puzzle than a barrier to scientific advance. The specific procedures proposed by statisticians brought the issue to a head by defining the problem of induction much more exactly, and even by exposing situations where logical minds might well reach different conclusions from the same data. This was intellectually intolerable and necessitated some very searching probing into the rather intuitive arguments by which statisticians drew their conclusions in the earlier stages of development of the subject.
Discussion has been centered on the theory of probability, in which two attitudes may be distinguished: subjective and objective [see Probability, article on INTERPRETATIONS]. Neither approach is free from difficulty. Both lead to the same calculus of probabilities in the deductive sense. The primary problem, first stated explicitly by Thomas Bayes, however, is one of induction, to which the calculus of probabilities makes no contribution except as a tool of analysis. Some authorities reject the Bayesian approach and seek for principles of inferences elsewhere. Others, recognizing that the required prior probabilities necessitate certain assumptions, nevertheless can see no better way of tackling the problem if the relative acceptability of hypotheses is to be quantified at all [see Bayesian inference]. Fortunately for the development of theoretical statistics, the philosophical problems have remained in the background, stimulating argument and a penetrating examination of the inferential process but not holding up development. In practice it is possible for two competent statisticians to differ in the interpretation of data, although if they do, the reliability of the inference is often low enough to justify further experimentation. Such cases are not very frequent, but important instances do occur; a notable one is the interpretation of the undeniable observed relationship between intensity of smoking and cancer of the lung. Differences in interpretation are particularly liable to occur in economic and social investigations because of the difficulty of performing experiments or of isolating causal influences for separate study.
Robustness and nonparametric methods. The precision of inferences in probability is sometimes bought at the expense of rather restrictive assumptions about the population of origin. For example, Student's t-test depends on the supposition that the parent population is normal. Various attempts have been made to give the inferential procedures greater generality by freeing them from these restrictions. For example, certain tests can be shown to be “robust” in the sense that they are not very sensitive to deviations from the basic assumptions [see ERRORS]. Another interesting field is concerned with tests that depend on ranks, order statistics, or even signs, and are very largely independent of the form of the parent population. These so-called distribution-free methods, which are usually easy to apply, are often surprisingly efficient [see Nonparametric statistics].
The frontiers of the subject continue to extend. Problems of statistical relationship, of estimation in complicated models, of quantification and scaling in qualitative material, and of economizing in exploratory effort are as urgent and lively as ever. The theoretical statistician ranges from questions of galactic distribution to the properties of subatomic particles, suspended, like Pascal's man, between the infinitely large and the infinitely small. The greater part of the history of his subject lies in the future.
M. G. Kendall
[The following biographies present further details on specific periods in the history of statistical method. Early Period:Babbage; Bayes; Bernoulli family; Bienaymé; Galton; Gauss; Graunt; Laplace; Moivre; Petty; Poisson; Quetelet; süSSMILCH. Modern Period: BENINI; Bortkiewicz; Fisher, R. A.; GlNI; Glrshick; Gosset; Keynes, John maynard; KÖRÖSY; Lexis; Lotka; Pearson; Spearman; Stouffer; Von mises, Richard; Von neumann; Wald; Wiener; Wilks; Willcox; Yule.]
BIBLIOGRAPHY
There is no history of theoretical statistics or of statistical methodology. Westergaard 1932 is interesting as an introduction but is largely concerned with descriptive statistics. Walker 1929 has some valuable sketches of the formative period under Karl Pearson. Todhunter 1865 is a comprehensive guide to mathematical work up to Laplace and contains bibliographical information on many of the early works cited in the text of this article. David 1962 is a modern and lively account up to the time of de Moivre. The main sources for further reading are in obituaries and series of articles that appear from time to time in statistical journals, especially the “Studies in the History of Probability and Statistics” in Biometrika and occasional papers in the Journal of the American Statistical Association.
Bayes, Thomas (1764) 1958 An Essay Towards Solving A Problem in the Doctrine of Chances. Biometrika 45:296-315. → First published in Volume 53 of the Royal Society of London's Philosophical Transactions. A facsimile edition was published in 1963 by Hafner.
Bernoulli, Jacques (1713) 1899 Wahrscheinlichkeits-rechnung (Ars conjectandi). 2 vols. Leipzig: Engel-mann. → First published posthumously in Latin.
Condorcet, Marie Jean Antoine Nicolas Caritat, De 1785 Essai sur ¡'application de ¡'analyse a la probabilité des decisions rendues à la pluralité des voix. Paris: Imprimerie Royale.
Czuber, Emanuel 1898 Die Entwicklung der Wahrscheinlichkeitstheorie una ihrer Anwendungen. Jahresbericht der Deutschen Mathematikervereinigung, Vol. 7, No. 2. Leipzig: Teubner.
David, F. N. 1962 Games, Gods and Gambling: The Origins and History of Probability and Statistical Ideas From the Earliest Times to the Newtonian Era. London: Griffin; New York: Hafner.
Fisher, R. A. 1925 Theory of Statistical Estimation. Cambridge Philosophical Society, Proceedings 22:700-725. H» Reprinted in Fisher 1950.
Fisher, R. A. 1928 Moments and Product Moments of Sampling Distributions. London Mathematical Society, Proceedings 30:199-238. → Reprinted in Fisher 1950.
Fisher, R. A. (1920-1945) 1950 Contributions to Mathematical Statistics. New York: Wiley.
Huygens, Christiaan 1657 De rationciniis in ludo aleae. Pages 521-534 in Frans van Schooten, Eater-citationum mathematicarum. Leiden (Netherlands): Elsevir.
Kotz, Samuel 1965 Statistical Terminology—Russian vs. English—In the Light of the Development of Statistics in the USSR. American Statistician 19, no. 3:14-22.
Laplace, Pierre simon (1812)1820 Théorie analytique des probabilités. 3d ed., revised. Paris: Courcier.
Ore, øYstein 1953 Cardano: The Gambling Scholar. Princeton Univ. Press; Oxford Univ. Press. → Includes a translation from the Latin of Cardano's Book on Games of Chance by Sydney Henry Gould.
Pearson, Karl (1892) 1911 The Grammar of Science. 3d ed., rev. & enl. London: Black. → A paperback edition was published in 1957 by Meridian.
Poisson, SimÉOn denis 1837 Recherches sur la proba-bilité des jugements en matiere criminelle et en matiére civile, précédées des regles genérales du calcul des probabilités. Paris: Bachelier.
Simpson, Thomas 1757 Miscellaneous Tracts on Some Curious and Very Interesting Subjects in Mechanics, Physical-astronomy, and Speculative Mathematics. London: Nourse.
Todhunter, Isaac (1865) 1949 A History of the Mathematical Theory of Probability From the Time of Pascal to That of Laplace. New York: Chelsea.
Walker, Helen m. 1929 Studies in the History of Statistical Method, With Special Reference to Certain Educational Problems. Baltimore: Williams & Wilkins.
Westergaard, Harald L. 1932 Contributions to the History of Statistics. London: King.
Statistics
STATISTICS
STATISTICS. The word statistics comes from the German Statistik and was coined by Gottfried Achenwall (1719–1772) in 1749. This term referred to a thorough, generally nonquantitative description of features of the state—its geography, peoples, customs, trade, administration, and so on. Hermann Conring (1606–1681) introduced this field of inquiry under the name Staatenkunde in the seventeenth century, and it became a standard part of the university curriculum in Germany and in the Netherlands. Recent histories of statistics in France, Italy, and the Netherlands have documented the strength of this descriptive approach. The descriptive sense of statistics continued throughout the eighteenth century and into the nineteenth century.
The numerical origins of statistics are found in distinct national traditions of quantification. In England, self-styled political and medical arithmeticians working outside government promoted numerical approaches to the understanding of the health and wealth of society. In Germany, the science of cameralism provided training and rationale for government administrators to count population and economic resources for local communities. In France, royal ministers, including the duke of Sully (1560–1641) and Jean-Baptiste Colbert (1619–1683), initiated statistical inquiries into state finance and population that were continued through the eighteenth century.
Alongside these quantitative studies of society, mathematicians developed probability theory, which made use of small sets of numerical data. The emergence of probability has been the subject of several recent histories and its development was largely independent of statistics. The two traditions of collecting numbers and analyzing them using the calculus of probabilities did not merge until the nineteenth century, thus creating the modern discipline of statistics.
The early modern field of inquiry that most closely resembles modern statistics was political arithmetic, created in the 1660s and 1670s by two Englishman, John Graunt (1620–1674) and William Petty (1623–1687). Graunt's Natural and Political Observations Made upon the Bills of Mortality (1662) launched quantitative studies of population and society, which Petty labeled political arithmetic. In their work, they showed how numerical accounts of population could be used to answer medical and political questions such as the comparative mortality of specific diseases and the number of men of fighting age. Graunt developed new methods to calculate population from the numbers of christenings and burials. He created the first life table, a numerical table that showed how many individuals out of a given population survived at each year of life. Petty created sample tables to be used in Ireland to collect vital statistics and urged that governments collect regular and accurate accounts of the numbers of christenings, burials, and total population. Such accounts, Petty argued, would put government policy on a firm foundation.
Political arithmetic was originally associated with strengthening monarchical authority, but several other streams of inquiry flowed from Graunt's and Petty's early work. One tradition was medical statistics, which developed most fully in England during the eighteenth century. Physicians such as James Jurin (1684–1750) and William Black (1749–1829) advocated the collection and evaluation of numerical information about the incidence and mortality of diseases. Jurin pioneered the use of statistics in the 1720s to evaluate medical practice in his studies of the risks associated with smallpox inoculation. William Black coined the term medical arithmetic to refer to the tradition of using numbers to analyze the comparative mortality of different diseases. New hospitals and dispensaries such as the London Smallpox and Inoculation Hospital, established in the eighteenth century, provided institutional support for the collection of medical statistics; some treatments were evaluated numerically.
Theology provided another context for the development of statistics. Graunt had identified a constant birth ratio between male and females (14 to 13) and had used this as an argument against polygamy. The physician John Arbuthnot (1667–1735) argued in a 1710 article that this regularity was "an Argument for Divine Providence." Later writers, including William Derham (1657–1735), author of Physico-Theology (1713), and Johann Peter Süssmilch (1707–1767), author of Die Göttliche Ordnung (1741), made the stability of this statistical ratio a part of the larger argument about the existence of God.
One final area of statistics that flowed from Graunt's work and was the most closely associated with probability theory was the development of life (or mortality) tables. Immediately following the publication of Graunt's book, several mathematicians, including Christiaan Huygens (1629–1695), Gottfried Leibniz (1646–1716), and Edmund Halley (1656–1742) refined Graunt's table. Halley, for example, based his life table on numerical data from the town of Breslau that listed ages of death. (Graunt had to estimate ages of death.) In the eighteenth century, further modifications were introduced by the Dutchmen Willem Kersseboom (1690–1771) and Nicolaas Struyck (1686–1769), the Frenchman Antoine Deparcieux (1703–1768), the German Leonard Euler (1707–1783), and the Swede Pehr Wargentin (1717–1783). A French historian has recently argued that the creation of life tables was one of the leading achievements of the scientific revolution. Life tables were used to predict life expectancy and aimed to improve the financial soundness of annuities and tontines.
The administrative demands brought about by state centralization in early modern Europe also fostered the collection and analysis of numerical information about births, deaths, marriages, trade, and so on. In France, for example, Sébastien le Prestre de Vauban (1633–1707), adviser to Louis XIV (ruled 1643–1715), provided a model for the collection of this data in his census of Vézelay (1696), a small town in Burgundy. Although his recommendations were not adopted, a similar approach was pursued decades later by the Controller-General Joseph Marie Terray (1715–1778), who requested in 1772 that the provincial intendants collect accounts of births and deaths from parish clergy and forward them to Paris. Sweden created the most consistent system for the collection of vital statistics through parish clerks in 1749. Efforts in other countries failed. In England, two bills were put before Parliament in the 1750s to institute a census and to insure the collection of vital statistics. Both bills were defeated because of issues concerning personal liberty. While these initiatives enjoyed mixed success, they all spoke to the desire to secure numerical information about the population. Regular censuses, which would provide data for statistical analysis, were not instituted until the nineteenth century.
See also Accounting and Bookkeeping ; Census ; Graunt, John ; Mathematics ; Petty, William .
BIBLIOGRAPHY
Primary Sources
Arbuthnot, John. "An Argument for Divine Providence Taken from the Regularity Observ'd in the Birth of Both Sexes." Philosophical Transactions 27 (1710–1712): 186–190.
Black, William. An Arithmetical and Medical Analysis of the Diseases and Mortality of the Human Species. London, 1789. Reprinted with an introduction by D. V. Glass. Farnborough, U.K., 1973.
Jurin, James. An Account of the Success of Inoculating the Small Pox in Great Britain with a Comparison between the Miscarriages in That Practice, and the Mortality of the Natural Small Pox. London, 1724.
Petty, William. The Economic Writings of Sir William Petty. Edited by Charles Henry Hull. 2 vols. Cambridge, U.K., 1899.
Secondary Sources
Bourguet, Marie-Noëlle. Déchiffer la France: La statistique départementale à l'époque napoléonienne. Paris, 1988.
Buck, Peter. "People Who Counted: Political Arithmetic in the Eighteenth Century." Isis 73 (1982): 28–45.
——. "Seventeenth-Century Political Arithmetic: Civil Strife and Vital Statistics." Isis 68 (1977): 67–84.
Daston, Lorraine. Classical Probability in the Enlightenment. Princeton, 1988.
Dupâquier, Jacques. L'invention de la table de mortalité, de Graunt à Wargentin, 1622–1766. Paris, 1996.
Dupâquier, Jacques, and Michel Dupâquier. Histoire de la démographie. Paris, 1985.
Hacking, Ian. The Emergence of Probability. Cambridge, U.K., 1975.
——. The Taming of Chance. Cambridge, U.K., 1990.
Hald, Anders. A History of Probability and Statistics and Their Applications before 1750. New York, 1990.
Klep, Paul M. M., and Ida H. Stamhuis, eds. The Statistical Mind in a Pre-Statistical Era: The Netherlands, 1750–1850. Amsterdam, 2002.
Patriarca, Silvana. Numbers and Nationhood: Writing Statistics in Nineteenth-Century Italy. Cambridge, U.K., 1996.
Pearson, Karl. The History of Statistics in the 17th and 18th Centuries against the Changing Background of Intellectual, Scientific and Religious Thought. Edited by E. S. Pearson. London, U.K., 1978.
Porter, Theodore M. The Rise of Statistical Thinking, 1820–1900. Princeton, 1986.
Rusnock, Andrea. Vital Accounts: Quantifying Health and Population in Eighteenth-Century England and France. Cambridge, U.K., 2002.
Andrea Rusnock
Statistics
STATISTICS
Official Statistics
Prior to the 19^{th} century, statistical data on Jews were obtained irregularly, either from mere estimates, or as a by-product from administrative records specifically relating to Jews. As modern official statistics developed in Europe and American countries during the 19^{th} century, they began to provide some statistical information on Jewish inhabitants. But enumeration of the number of Jews in some European countries, before the latter part of the 19^{th} century, is considered to be incomplete. The growth of official statistics in general, and statistics on Jews in particular, was a gradual process. During the 20^{th} century, official statistics on the number of Jews in the general population have been compiled in some Asian and African countries. The most favorable conditions for statistical information on Jews from official sources prevailed in the first decades of the 20^{th} century until World War ii. The majority of Jews were then living in countries – especially in Eastern and Central Europe – that rather regularly collected and published vital and migratory statistics, in addition to census data on Jews as a distinct group within the general population. These data not only supplied the overall numbers of the Jewish populations but also reflected their composition and demographic patterns. The Jews were distinguished in three ways: by religion, by ethnic group (termed "nationality" in Eastern Europe), and by language, i.e., according to the use of Yiddish or Ladino. Sometimes all three criteria were used concurrently by the same country. During the Nazi ascendancy, some countries made counts of persons of "Jewish descent." The wide-ranging changes in the period after World War ii also affected the quantity and quality of statistics on Jews. On the one hand, the State of Israel has provided competent and detailed statistics on both its Jewish and non-Jewish inhabitants, and on the other hand, there has been a great reduction in the volume of official statistics on Diaspora Jewries. The Holocaust and subsequent migrations diminished the numerical importance of the Jews in Eastern and Central Europe. In addition, the policy of the new Communist regimes in that part of the world was to discontinue religious and, in some countries, ethnic, classification in official statistics. In the West, when religious information is not collected, this is attributed to "separation of church and state." Nevertheless, some liberal and democratic Western countries have developed a tradition of either distinguishing or not distinguishing religious groups in their official statistics (e.g., Canada, the Netherlands, Switzerland differentiate; the U.S., Belgium, and France do not). A new circumstance which, in recent decades, has complicated the collection of data on Jews, is the increased number of "marginal" Jews who are apt to conceal their Jewish identity and indicate on statistical returns that they are "without religion." At present about 70% of Diaspora Jewry, i.e., more than 50% of world Jewry, live in countries without regular official statistics on Jews. Even where such inquiries are made in Diaspora countries, the information published on the composition of the Jewish population is usually meager, often no more than a geographical breakdown cross-classified by sex. The situation in the major countries of Jewish residence in the Diaspora is as follows: the U.S., which has the largest Jewish population of any country, does not distinguish Jews in its decennial population census. Some figures on the number and residential distribution of the Jews were obtained by a "census of religious bodies," but this was last taken in 1936. The separate classification of Jews in U.S. immigration statistics was discontinued in 1943. There are no official vital statistics on religious groups in the U.S. (except for marriage and divorce data collected in two states).
After World War ii the U.S.S.R., another major center of Jewish population, had two population censuses, in 1959 and 1970. The published results distinguished Jews as well as the many other ethnic groups in the Soviet Union. The number of Jews in the U.S.S.R. recorded by the censuses was contested by Jewish circles as being too low. However, it should be remembered that there were conceptual and practical problems of identification of Jews in the U.S.S.R. In any case, no reliable means exist for making alternative estimates because there is no statistical information on the manifold changes in the Jewish population which took place during and since the Holocaust on the territory of the U.S.S.R. (which was enlarged after World War ii). In France and Great Britain there are virtually no official statistics on Jews, and those in Argentina are scanty.
Of the Diaspora countries with several hundred thousand Jews after World War ii, Canada has had the most detailed official statistics on Jews. But even in this case, conceptual difficulties affected the results of the 1951 and 1961 censuses, relevant vital statistics on Jews no longer extend to all provinces, and the separate designation of Jews was recently omitted from the immigration statistics.
Jewish institutions in several countries made successful efforts to use, as they became available, the electronically processed material of official statistics for preparing special tabulations on Jews in response to Jewish initiatives.
Jewish-Sponsored Data Collection
In countries where there are no official statistics on the Jewish population, the only practical way to obtain any numerical information about it is through Jewish-sponsored data collection. The customary method, local community surveys, has been used sporadically over the last few decades, especially in the U.S. In the case of large Jewish groups, these surveys are necessarily sample studies. Many improvements have been incorporated in the technique of some Jewish surveys to make them more sophisticated. However, isolated community surveys have essential shortcomings, e.g., the local focus, the differences in content and method between the various studies, and the fact that they are conducted at different times even within the same country. Hence their usefulness for countrywide or larger statistical syntheses is very limited. "Marginal" Jews who have little desire to identify themselves as Jews and who have few or no organizational ties with the Jewish community are now not few in number in many Diaspora countries. While official statistics may not adequately identify individuals in the general population as Jews, Jewish-sponsored surveys have difficulty in reaching the total number of Jews in a community. In the collection of demographic data the concept "Jewish" should be construed in the widest sense. But in the tabulations various categories within the Jewish population should be distinguished according to attachment to Jewish practices, mixed marriages, etc. At any rate, the customary "master list," i.e., the combined information on Jews from various institutional and organizational records, is often insufficient as the sole source for surveying a Jewish population.
Another field of Jewish-sponsored statistical activity is the collection of vital statistics. These, however, often reflect only those activities which take place under the auspices of Jewish religious institutions, e.g., synagogue marriages, circumcisions, and burials with religious ceremonies. But the marriages, births, and deaths of Jews which are not accompanied by religious ceremonies are unrecorded in the statistics of Jewish institutions. The increasing assimilation and secularization of Diaspora Jews, and the consequent absence of "marginal" Jews from the data collected by Jewish institutions, are apt to vitiate the data's demographic value. In some Diaspora countries, interested organizations make counts of Jewish immigrants who have received assistance as well as estimates of the total number of Jewish immigrants. Sociological and socio-psychological investigations which supply data on Jews have only limited demographic value because their subjects are often unrepresentative of the entire Jewish population or their figures are too small. In a few European countries where the Jewish communities are recognized by public law, permanent population registers are kept by the community.
From the 1960s, the Institute of Contemporary Jewry of The Hebrew University, Jerusalem, designed a new and more efficient type of Jewish-sponsored population survey. These surveys inquired into demographic, economic, and social characteristics as well as aspects of Jewish identity, and permitted many cross-tabulations of population characteristics. They were preferably on a countrywide basis, with improvements in sampling technique and especially designed to include "marginal" Jews. The first survey of this type was taken in Italy in 1965. Better information on Jewish vital statistics is also partially obtainable from population surveys. Jewish-sponsored surveys are not only substitutes for nonexistent governmental statistics on Jews but are, in fact, the only means of investigating aspects of Jewish identity. Jewish-sponsored data collection on topics other than population statistics usually relates to the working of Jewish institutions and organizations, international, national, and local. In general, the data are collected within the framework of the respective agencies themselves.
Research Activities
The copious statistical material on Jews which accumulated before World War ii encouraged scholars and others to compile comparative statistics of various countries, and to analyze the available data in detail. Among the major contributors to the field of Jewish demographic research have been A. *Nossig and J. *Jacobs, toward the end of the 19^{th} century; and in the 20^{th} century, A. *Ruppin, J. *Thon, B. *Blau, J. Segall, F. Theilhaber, I. Koralnik, L. *Hersch, J. *Lestschinsky, H.S. Linfield, A. *Tartakower, and R. *Bachi. Important centers for demographic and statistical research on the Jews were the Bureau fuer Statistik der Juden (Berlin) and *yivo. Periodicals of importance in this field were Zeitschrift fuer Demographie und Statistik der Juden; Bleter far Yidishe Demografye, Statistik un Ekonomik; and Shriftn far Ekonomik un Statistik.
The period after World War ii has seen not only the diminution of official data on Jews, but also the passing of the previous generation of scholars in Jewish statistics. The present scholarly emphasis in Jewish population statistics has partially shifted, of necessity, from the analysis and comparison of available data to the methodology and promotion of data collection. Several Jewish research institutions have been engaged primarily in statistical and demographic work on a local and national level: the Bureau of Social and Economic Research of the Canadian Jewish Congress, Montreal; the Statistical and Demographic Research Unit of the Board of Deputies of British Jews, London; and Instituto de Investigaciones Sociales of the Asociacion Mutual Israelita Argentina (amia), Buenos Aires. Some permanent institutions for social and historical research on the Jews which have given part of their attention to statistical and demographic matters are: Centre National des Hautes Etudes Juives, Brussels; Communauté, Paris; Oficina Latinoamericana of the American Jewish Committee and Centro de Estudios Sociales of the Delegación de Asociaciones Israelitas Argentinas (daia), Buenos Aires; and the Jewish Museum of the Czech State, Prague.
In some cases, scholars have carried out ad hoc demographic and social surveys of local Jewish populations at the invitation of the community leadership. There are many such instances in the U.S., the most notable through to the mid-1960s being the surveys taken in Washington (1956), Los Angeles (1959 and 1965), Providence (1963), Camden Area (1964), Boston (1965), and Springfield (1966). Elsewhere, local surveys were taken in recent years in São Paulo (Brazil); Melbourne (Australia); Leeds and Edgware (England); Brussels (Belgium); and Wroclaw (Poland). For Dutch Jewry, a survey based on records only, without home visits, was made in 1954; a similar survey of Dutch Jewry took place in 1966. Counts based on community population registers are available for the Jews in Vienna, Austria, in the German Federal Republic, and to some extent in Italy and the Netherlands. Additional countrywide sample surveys of Jewish populations were planned for the U.S., France, and other countries (in the U.S. under the auspices of the Council of Jewish Federations and Welfare Funds).
Israel has a very active Central Bureau of Statistics (headed by R. Bachi until 1972), whose work also illuminates important aspects of Jewish demography in the Diaspora. On the international level, the Division of Jewish Demography and Statistics in the Institute of Contemporary Jewry of The Hebrew University, Jerusalem, also headed by R. Bachi, has advanced the study of Jewish demography throughout the world by encouraging and coordinating data collection and research, refining methodology, developing technical services (world bibliography, documentation center), and training specialists. It is also the seat of the Association for Jewish Demography and Statistics, which serves as the international organization for interested scholars and laymen. Other international Jewish research bodies whose activities include some statistical work are: The Institute of Jewish Affairs (in London since 1966), YIVO, and *Yad Vashem.
[Usiel Oscar Schmelz]
Sources
The amount and quality of documentation on Jewish population size and characteristics is far from satisfactory. Reviewing the sources since 1990, however, one finds that important new data and estimates have become available for several countries through official population censuses and Jewish-sponsored sociodemographic surveys. National censuses yielded results on Jewish populations in Ireland, the Czech Republic, and India (1991); Romania and Bulgaria (1992); the Russian Republic and Macedonia (1994), Israel (1995), Canada, South Africa, Australia, and New Zealand (1996 and 2001); Belarus, Azerbaijan, Kazakhstan, and Kyrgyzstan (1999); Brazil, Mexico, Switzerland, Estonia, Latvia, and Tajikistan (2000); the United Kingdom, Hungary, Croatia, Lithuania, and Ukraine (2001); the Russian Republic, and Georgia (2002). Permanent national population registers, including information on the Jewish religious, ethnic or national group, exist in several European countries (Switzerland, Norway, Finland, Estonia, Latvia, and Lithuania), and in Israel.
In addition, independent sociodemographic studies have provided most valuable information on Jewish demography and socioeconomic stratification as well as on Jewish identification. Surveys were conducted over the last several years in South Africa (1991 and 1998); Mexico (1991 and 2000); Lithuania (1993); the United Kingdom and Chile (1995); Venezuela (1998–99); Israel, Hungary, the Netherlands, and Guatemala (1999); Moldova and Sweden (2000); France and Turkey (2002); Argentina (2003 and 2004). In the United States, important new insights were provided by two large surveys, the National Jewish Population Survey (njps, 2000–01) and the American Jewish Identity Survey (ajis, 2001). Several further Jewish population studies were separately conducted in major cities in the United States (notably in New York City in 2002) and in other countries. Additional evidence on Jewish population trends can be obtained from the systematic monitoring of membership registers, vital statistics, and migration records available from Jewish communities and other Jewish organizations in many countries or cities, notably in the United Kingdom, Germany, Italy, Buenos Aires, and São Paulo. Detailed data on Jewish immigration routinely collected in Israel helps in the assessment of changing Jewish population sizes in other countries. Some of this ongoing research is part of a coordinated effort aimed at updating the profile of world Jewry.
Following an International Conference on Jewish Population Problems held in Jerusalem in 1987, initiated by the late Roberto Bachi of the Hebrew University and sponsored by major Jewish organizations worldwide, an International Scientific Advisory Committee (isac) was established, chaired by Sidney Goldstein of Brown University. An Initiative on Jewish Demography, sponsored by the Jewish Agency under the chairmanship of Sallai Meridor, led to an international conference held in Jerusalem in 2002 and to an effort of data collection and analysis implemented over the years 2003–2005. The Jewish People Policy Planning Institute (jpppi), chaired by Ambassador Dennis Ross, provides a framework for policy analyses and suggestions, including Jewish population issues.
Definitions
A major problem with Jewish population estimates periodically circulated by individual scholars or Jewish organizations is a lack of coherence and uniformity in the definitional criteria followed – when the issue of defining the Jewish population is addressed at all. Simply put, the quantitative study of Jewish populations can rely only on operational, not normative, definitional criteria. Three major concepts must be considered in order to put the study of Jewish demography on serious comparative ground.
In most countries outside of Israel, the core Jewish population includes all those who, when asked, identify themselves as Jews; or, if the respondent is a different person in the same household, are identified by him/her as Jews. This is an intentionally comprehensive and pragmatic approach reflecting the nature of most available sources of data on Jewish population. In countries other than Israel, such data often derive from population censuses or social surveys, where interviewees have the option to decide how to answer relevant questions on religious or ethnic preferences. Such a definition of a person as a Jew, reflecting subjective feelings, broadly overlaps but does not necessarily coincide with halakhah (rabbinic law) or other normatively binding definitions. Inclusion does not depend on any measure of that person's Jewish commitment or behavior in terms of religiosity, beliefs, knowledge, communal affiliation, or otherwise. The core Jewish population includes all converts to Judaism by any procedure as well as other people who declare they are Jewish. Also included are persons of Jewish parentage who claim no current religious or ethnic identity. Persons of Jewish parentage who adopted another religion are excluded, as are other individuals who in censuses or surveys explicitly identify with a non-Jewish group without having converted out. In the State of Israel, personal status is subject to the rulings of the Ministry of the Interior, which relies on criteria established by rabbinical authorities. In Israel, therefore, the core Jewish population does not simply express subjective identification but reflects definite legal rules, those of halakhah. Documentation to prove a person's Jewish status may include non-Jewish sources.
The question whether Jewish identification according to this core definition can or should be mutually exclusive with other religious corporate identities emerged on a major scale in the course of the 2000–01 njps. The solution chosen – admittedly after much debate – was to allow for Jews with multiple religious identities to be included under certain circumstances in the standard definition of Jewish population. In the latter survey, at least in the version initially processed and circulated by ujc, "a Jew is defined as a person whose religion is Judaism, or whose religion is Jewish and something else, or who has no religion and has at least one Jewish parent or a Jewish upbringing, or who has a non-monotheistic religion and has at least one Jewish parent or a Jewish upbringing." A category of Persons of Jewish Background (pjbs) was introduced: some of these were included in the Jewish population count and others were not. By the same token, Jews with multiple ethnic identities were included in the standard Jewish population count in Canada. The adoption of such extended criteria by the research community tends to stretch Jewish population definitions further than had usually been done in the past and beyond the above-mentioned typical core definition. These procedures tend to limit actual comparability of the same Jewish population over time and of different Jewish populations at the same time.
The enlarged Jewish population includes the sum of (a) the core Jewish population; (b) all other persons of Jewish parentage who – by core Jewish population criteria – are not Jewish currently (or at the time of investigation); and (c) all of the respective further non-Jewish household members (spouses, children, etc.). Non-Jews with Jewish background, as far as they can be ascertained, include: (a) persons who have themselves adopted another religion, even though they may also claim to be Jewish by ethnicity or religion – with the caveat just mentioned for recent U.S. and Canadian data; and (b) other persons with Jewish parentage who disclaim being Jews. As noted, some PJBs who do not pertain to the core Jewish population naturally belong under the enlarged definition. It is customary in sociodemographic surveys to consider the religio-ethnic identification of parents. Some censuses, however, do ask about more distant ancestry. For both conceptual and practical reasons, the enlarged definition does not include other non-Jewish relatives who lack a Jewish background and live in exclusively non-Jewish households.
The *Law of Return, Israel's distinctive legal framework for the acceptance and absorption of new immigrants, awards Jewish new immigrants immediate citizenship and other civil rights. According to the current, amended version of the Law of Return, a Jew is any person born to a Jewish mother or converted to Judaism (regardless of denomination – Orthodox, Conservative, or Reform), who does not have another religious identity. By ruling of Israel's Supreme Court, conversion from Judaism, as in the case of some ethnic Jews who currently identify with another religion, entails loss of eligibility for Law of Return purposes. The law as such does not affect a person's Jewish status – which, as noted, is adjudicated by Israel's Ministry of the Interior and rabbinical authorities – but only the specific benefits available under the Law of Return. The law extends its provisions to all current Jews, their children, and grandchildren, as well as to the respective Jewish or non-Jewish spouses. As a result of its three-generation and lateral extension, the Law of Return applies to a large population, one of significantly wider scope than the core and enlarged Jewish populations defined above. It is actually quite difficult to estimate what the total size of the Law of Return population could be. These higher estimates in some of the major countries reach values double or three times as high as those for the core Jewish population.
The significant involvement of major Jewish organizations in Israel and in the U.S. – such as the Jewish Agency, the American Joint Distribution Committee, hias or ujc – in sponsoring data collection tends to complicate research issues. Organizations are motivated by the needs of their constituencies more than by neutral analytic criteria. In turn, the understandable interest of organizations to continue functioning and securing budgetary resources tends to bring them to take care of Jewish populations increasingly closer to the enlarged than to the core definition.
For further developments see *Population; Vital Statistics.
[Sergio DellaPergola (2^{nd} ed.)]
bibliography:
A. Nossig (ed.), Juedische Statistik (1903); Bureau fuer Statistik der Juden, Statistik der Juden (1918); Israel, Central Bureau of Statistics, Official Statistics in Israel (1963; Heb., 1966^{2}); R. Bachi, in: La vie juive dans l'Europe contemporaine (1965); idem, in: jjso, 8 no. 2 (1966), 142–9; U.O. Schmelz, ibid., 8 no. 1 (1966), 49–63; idem, Jewish Demography and Statistics (1961), bibliography for 1920–60; U.O. Schmelz and P. Glikson, Jewish Population Studies 1961 – 1968 (1970). add. bibliography: U.O. Schmelz, P. Glikson, and S.J. Gould (eds.), Studies in Jewish Demography: Survey for 1969 – 1971 (1975), 60–97; M. Corinaldi, "Jewish Identity," chap. 2, in: M. Corinaldi, Jewish Identity: The Case of Ethiopian Jewry (1998); S. DellaPergola and L. Cohen (eds.), World Jewish Population: Trends and Policies (1992); B.A. Kosmin, S. Goldstein, J. Waksberg, N. Lerer, A. Keysar, and J. Scheckner, Highlights of the cjf 1990 National Jewish Population Survey (1991); L. Kotler-Berkowitz, S.M. Cohen, J. Ament, V. Klaff, F. Mott, and D. Peckerman-Neuman, with L. Blass, D. Bursztyn, and D. Marker, The National Jewish Population Survey 2000 – 01: Strength, Challenge, and Diversity in the American Jewish Population (2003); S. DellaPergola, Jewish Demography: Facts, Outlook, Challenges, jpppi Alert Paper 2 (2003); The Jewish People Policy Planning Institute Assessment 2005, Executive Report 2 (2005); S. Della Pergola, World Jewish Population, American Jewish Year Book, 100 (New York, 2005), 87–122.
Statistics
Statistics
Other kinds of frequency distributions
Statistics is that branch of mathematics devoted to the collection, compilation, display, and interpretation of numerical data. In general, the field can be divided into two major subgroups, descriptive statistics and inferential statistics. The former subject deals primarily with the accumulation and presentation of numerical data, while the latter focuses on predictions that can be made based on those data.
Some fundamental concepts
Two fundamental concepts used in statistical analysis are population and sample. The term population refers to a complete set of individuals, objects, or events that belong to some category. For example, all of the players who are employed by Major League Baseball teams make up the population of professional major league baseball players. The term sample refers to some subset of a population that is representative of the total population. For example, one might go down the complete list of all major league baseball players and select every tenth name. That subset of every tenth name would then make up a sample of all professional major league baseball players.
Another concept of importance in statistics is the distinction between discrete and continuous data. Discrete variables are numbers that can have only certain specific numerical value that can be clearly separated from each other. For example, the number of professional major league baseball players is a discrete variable. There may be 400, 410, 475, or 615 professional baseball players, but never 400.5, 410.75, or 615.895.
Continuous variables may take any value whatsoever. The readings on a thermometer are an example of a continuous variable. The temperature can range from 10°C to 10.1°C to 10.2°C to 10.3°C (about 50°F) and so on upward or downward. Also, if a thermometer accurate enough is available, even finer divisions, such as 10.11°C, 10.12°C, and 10.13°C, can be made. Methods for dealing with discrete and continuous variables are somewhat different from each other in statistics.
In some cases, it is useful to treat continuous variable as discrete variables, and vice versa. For example, it might be helpful in some kind of statistical analysis to assume that temperatures can assume only discrete values, such as 5°C, 10°C, 15°C (41°F, 50°F, 59°F) and so on. It is important in making use of that statistical analysis, then, to recognize that this kind of assumption has been made.
Collecting data
The first step in doing a statistical study is usually to obtain raw data. As an example, suppose that a researcher wants to know the number of female African-Americans in each of six age groups (1 to 19; 20 to 29; 30 to 39; 40 to 49; 50 to 59; and 60+ years) in the United States. One way to answer that question would be to do a population survey, that is, to interview every single female African-American in the United States, and ask what her age is. Quite obviously, such a study would be very difficult and very expensive to complete. In fact, it would probably be impossible to do.
A more reasonable approach is to select a sample of female African-Americans that is smaller than the total population and to interview this sample. Then, if the sample is drawn so as to be truly representative of the total population, the researcher can draw some conclusions about the total population based on the findings obtained from the smaller sample.
Descriptive statistics
Perhaps the simplest way to report the results of the study described above is to make a table. The advantage of constructing a table of data is that a reader can get a general idea about the findings of the study in a brief glance (Table 1).
Graphical representation
The table shown above is one way of representing the frequency distribution of a sample or population. A frequency distribution is any method for summarizing data that shows the number of individuals or individual cases present in each given interval of measurement. In the table above, there are 5,382,025 female African-Americans in the age group 0 to 19; 2,982,305 in the age group 20 to 29; 2,587,550 in the age group 30 to 39; and so on.
A common method for expressing frequency distributions in an easy-to-read form is a graph. Among the kinds of graphs used for the display of data are histograms, bar graphs, and line graphs. A histogram is a graph that consists of solid bars without any space between them. The width of the bars corresponds to one of the variables being presented, and the height of the bars to a second variable. If one constructed a histogram based on the table shown above, the graph would have six bars, one for each of the six age groups included in the study. The height of the six bars would correspond to the frequency found for each group. The first bar (ages 0 to 19) would be nearly twice as high as the second (20 to 29) and third (30 to 39) bars since there are nearly twice as many individuals in the first group as in the second or third. The fourth, fifth, and six bars would be nearly the same height since there are about the same numbers of individuals in each of these three groups.
Another kind of graph that can be constructed from a histogram is a frequency polygon. A frequency polygon can be made by joining the midpoints of the lines of each bar in a histogram to each other.
Distribution curves
Finally, think of a histogram in which the vertical bars are very narrow, and then very, very narrow. As one connects the midpoints of these bars, the frequency polygon begins to look like a smooth curve, perhaps like a high, smoothly shaped hill. A curve of this kind is known as a distribution curve.
Table 1. Number of female African-Americans in Various Age Groups. (Thomson Gale.) | |
---|---|
Number of female African-Americans in various age groups | |
Age | Number |
1 – 19 | 5,382,025 |
20 – 29 | 2,982,305 |
30 – 39 | 2,587,550 |
40 – 49 | 1,567,735 |
50 – 59 | 1,335,235 |
60 + | 1,606,335 |
Probably the most familiar kind of distribution curve is one with a peak in the middle of the graph that falls off equally on both sides of the peak. This kind of distribution curve is known as a normal curve. Normal curves result from a number of random events that occur in the world. For example, suppose one was to flip a penny a thousand times and count how many times heads and how many times tails came up. What one would find would be a normal distribution curve, with a peak at equal heads and tails. That means that, if one were to flip a penny many times, the person would most commonly expect equal numbers of heads and tails. But the likelihood of some other distribution of heads and tails—such as 10% heads and 90% tails—would occur much less often.
Frequency distributions that are not normal are said to be skewed. In a skewed distribution curve, the number of cases on one side of the maximum is much smaller than the number of cases on the other side of the maximum. The graph might start out at zero and rise very sharply to its maximum point and then drop down again on a very gradual slope to zero on the other side. Depending on where the gradual slope is, the graph is said to be skewed to the left or to the right.
Other kinds of frequency distributions
Bar graphs look very much like histograms except that gaps are left between adjacent bars. This difference is based on the fact that bar graphs are usually used to represent discrete data and the space between bars is a reminder of the discrete character of the data represented.
Line graphs can also be used to represent continuous data. If one were to record the temperature once an hour all day long, a line graph could be constructed with the hours of day along the horizontal axis of the graph and the various temperatures along the vertical axis. The temperature found for each hour could then be plotted on the graph as a point and the points then connected with each other. The assumption of such a graph is that the temperature varied continuously between the observed readings and that those temperatures would fall along the continuous line drawn on the graph.
A circle graph, or pie chart, can also be used to graph data. A circle graph shows how the total number of individuals, cases, or events is divided up into various categories. For example, a circle graph showing the population of female African-Americans in the United States would be divided into pie-shaped segments, one (0 to 19) twice as large as the next two (20 to 20 and 30 to 39), and three about equal in size and smaller than the other three.
Measures of central tendency
Both statisticians and non-statisticians talk about averages all the time. However, the term average can have a number of different meanings. In the field of statistics, therefore, workers prefer to use the term measure of central tendency for the concept of an average. One way to understand how various measures of central tendency (different kinds of “average”) differ from each other is to consider a classroom consisting of only six students. A study of the six students shows that their family incomes are as follows: $20,000; $25,000; $20,000; $30,000; $27,500; and $150,000. What is the average income for the students in this classroom?
The measure of central tendency that most students learn in school is the mean. The mean for any set of numbers is found by adding all the numbers and dividing by the quantity of numbers. In this example, the mean would be equal to ($20,000þ $25,000þ $20,000 þ $30,000þ $27,500þ $150,000)÷ 6 = $45,417. However, how much useful information does this answer give about the six students in the classroom? The mean that has been calculated ($45,417) is greater than the household income of five of the six students” families.
Another way of calculating central tendency is known as the median. The median value of a set of measurements is the middle value when the measurements are arranged in order from least to greatest. When there is an even number of measurements, the median is half way between the middle two measurements. In the above example, the measurements can be rearranged from least to greatest: $20,000; $20,000; $25,000; $27,500; $30,000; and $150,000. In this case, the middle two measurements are $25,000 and $27,500, and half way between them is $26,250, the median in this case. One can see that the median in this example gives a better view of the household incomes for the classroom than does the mean.
A third measure of central tendency is the mode. The mode is the value most frequently observed in a study. In the household income study, the mode is $20,000 since it is the value found most often in the study. Each measure of central tendency has certain advantages and disadvantages and is used, therefore, under certain special circumstances.
Measures of variability
Suppose that a teacher gave the same test to two different classes and obtained the following results: Class 1: 80%, 80%, 80%, 80%, and 80%; and Class 2: 60%, 70%, 80%, 90%, and 100%. If one calculates the mean for both sets of scores, the same answer results: 80%. However, the collection of scores from which this mean was obtained was very different in the two cases. The way that statisticians have of distinguishing cases such as this is known as measuring the variability of the sample. As with measures of central tendency, there are a number of ways of measuring the variability of a sample.
Probably the simplest method is to find the range of the sample, that is, the difference between the largest and smallest observation. The range of measurements in Class 1 is 0, and the range in class 2 is 40%. Simply knowing that fact gives a much better understanding of the data obtained from the two classes. In Class 1, the mean was 80%, and the range was 0, but in Class 2, the mean was 80%, and the range was 40%.
Other measures of variability are based on the difference between any one measurement and the mean of the set of scores. This measure is known as the deviation. As one can imagine, the greater the difference among measurements, the greater the variability. In the case of Class 2 above, the deviation for the first measurement is 20% (80% – 60%), and the deviation for the second measurement is 10% (80% – 70%).
Probably the most common measures of variability used by statisticians are the variance and standard deviation. Variance is defined as the mean of the squared deviations of a set of measurements. Calculating the variance is a somewhat complicated task. One has to find each of the deviations in the set of measurements, square each one, add all the squares, and divide by the number of measurements. In the example above, the variance would be equal to [(20)^{2} + (10)^{2} + (0)^{2} + (10)^{2} + (20)^{2}] ÷ 5 = 200.
For a number of reasons, the variance is used less often in statistics than is the standard deviation. The standard deviation is the square root of the variance, in this case, distribution, a large fraction of the measurements (about 68%) is located within one standard deviation of the mean. Another 27% (for a total of 95% of all measurements) lie within two standard deviations of the mean.
Inferential statistics
Expressing a collection of data in some useful form, as described above, is often only the first step in a statistician’s work. The next step will be to decide what conclusions, predictions, and other statements, if any, can be made based on those data. A number of sophisticated mathematical techniques have now been developed to make these judgments.
An important fundamental concept used in inferential statistics is that of the null hypothesis. A null hypothesis is a statement made by a researcher at the beginning of an experiment that says, essentially, that nothing is happening in the experiment. That is, nothing other than natural events is going on during the experiment. At the conclusion of the experiment, the researcher submits his or her data to some kind of statistical analysis to see if the null hypothesis is true, that is, if nothing other than normal statistical variability has taken place in the experiment. If the null hypothesis is shown to be true, than the experiment truly did not have any effect on the subjects. If the null hypothesis is shown to be false, then the researcher is justified in putting forth some alternative hypothesis that will explain the effects that were observed. The role of statistics in this process is to provide mathematical tests to find out whether or not the null hypothesis is true or false.
A simple example of this process is deciding on the effectiveness of a new medication. In testing such medications, researchers usually select two groups, one the control group and one the experimental group. The control group does not receive the new medication; it receives a neutral substance instead. The experimental group receives the medication. The null hypothesis in an experiment of this kind is that the medication will have no effect and that both groups will respond in exactly the same way, whether they have been given the medication or not.
Suppose that the results of one experiment of this kind was shown in Table 2, with the numbers shown being the number of individuals who improved or did not improve after taking part in the experiment.
Table 2. Statistics. (Thomson Gale.) | |||
---|---|---|---|
Statistics | |||
Improved | Not improved | Total | |
Experimental group | 62 | 38 | 100 |
Control group | 45 | 55 | 100 |
Total | 107 | 93 | 200 |
At first glance, it would appear that the new medication was at least partially successful since the number of those who took it and improved (62) was greater than the number who took it and did not improve (38). However, a statistical test is available that will give a more precise answer, one that will express the probability (90%, 75%, 50%, etc.) that the null hypothesis is true. This test, called the chi square test, involves comparing the observed frequencies in the table above with a set of expected frequencies that can be calculated from the number of individuals taking the tests. The value of chi square calculated can then be compared to values in a table to see how likely the results were due to chance and how likely to some real effect of the medication.
Another example of a statistical test is called the Pearson correlation coefficient. The Pearson correlation coefficient is a way of determining the extent to which two variables are somehow associated, or correlated, with each other. For example, many medical studies have attempted to determine the connection between smoking and lung cancer. One way to do such studies is to measure the amount of smoking a person has done in her or his lifetime and compare the rate of lung cancer among those individuals. A mathematical formula allows the researcher to calculate the Pearson correlation coefficient between these two sets of data, rate of smoking and risk for lung cancer. That coefficient can range between 1.0, meaning the two are perfectly correlated, and -1.0, meaning the two have an inverse relationship (when one is high, the other is low).
The correlation test is a good example of the limitations of statistical analysis. Suppose that the Pearson correlation coefficient in the example above turned out to be 1.0. That number would mean that people who smoke the most are always the most likely to develop lung cancer. However, what the correlation coefficient does not say is what the cause and effect relationship, if any, might be. It does not say that smoking causes cancer.
Chi square and correlation coefficient are only two of dozens of statistical tests now available for use by researchers. The specific kinds of data collected and the
KEY TERMS
Continuous variables —A variable that may take any value whatsoever.
Deviation —The difference between any one measurement and the mean of the set of scores.
Discrete variable —A number that can have only certain specific numerical value that can be clearly separated from each other.
Frequency polygon —A type of frequency distribution graph that is made by joining the midpoints of the lines of each bar in a histogram to each other.
Histogram —A bar graph that shows the frequency distribution of a variable by means of solid bars without any space between them.
Mean —A measure of central tendency found by adding all the numbers in a set and dividing by the quantity of numbers.
Measure of central tendency —Average.
Measure of variability —A general term for any method of measuring the spread of measurements around some measure of central tendency.
Median —The middle value in a set of measurements when those measurements are arranged in sequence from least to greatest.
Mode —The value that occurs most frequently in any set of measurements.
Normal curve —A frequency distribution curve with a symmetrical, bell-shaped appearance.
Null hypothesis —A statistical statement that nothing unusual is taking place in an experiment.
Population —A complete set of individuals, objects, or events that belong to some category.
Range —The set containing all the values of the function.
Standard deviation —The square root of the variance.
kinds of information a researcher wants to obtain from these data determine the specific test to be used.
See also Accuracy.
Resources
BOOKS
Hastie, T., et al. The Elements of Stastical Learning: Data Mining, Inference, and Prediction. New York: Springer Verlag, 2001.
Montgomery, Douglas C. Applied Statistics and Probability for Engineers. Hoboken, NJ: Wiley, 2007.
Newman, Isadore, et al. Conceptual Statistics for Beginners. Lanham, MD: University Press of America, 2006.
Rice, John A. Mathematical Statistics and Data Analysis. Belmont, CA: Thompson/Brooks/Cole, 2007.
Walpole, Ronald, and Raymond Myers, et al. Probability and Statistics for Engineers and Scientists. Englewood Cliffs, NJ: Prentice Hall, 2002.
David E. Newton
Statistics
Statistics
Statistics is a field of knowledge that enables an investigator to derive and evaluate conclusions about a population from sample data. In other words, statistics allow us to make generalizations about a large group based on what we find in a smaller group. The field of statistics deals with gathering, selecting, and classifying data; interpreting and analyzing data; and deriving and evaluating the validity and reliability of conclusions based on data.
Strictly speaking, the term “parameter” describes a certain aspect of a population, while a “statistic” describes a certain aspect of a sample (a representative part of the population). In common usage, most people use the word “statistic” to refer to research figures and calculations, either from information based on a sample or an entire population.
Statistics means different things to different people. To a baseball fan, statistics are information about a pitcher's earned run average or a batter's slugging percentage or home run count. To a plant manager at a distribution company, statistics are daily reports on inventory levels, absenteeism, labor efficiency, and production. To a medical researcher investigating the effects of a new drug, statistics are evidence of the success of research efforts. And to a college student, statistics are the grades made on all the exams and quizzes in a course during the semester. Today, statistics and statistical analysis are used in practically every profession, and for managers in particular, statistics have become a most valuable tool.
A set of data is a population if decisions and conclusions based on these data can be made with absolute certainty. If population data is available, the risk of arriving at incorrect decisions is completely eliminated.
But a sample is only part of the whole population. For example, statistics from the U.S. Department of Commerce state that the rental vacancy rate during the second quarter of 2006 was 9.6 percent. However, the data used to calculate this vacancy rate was not derived from all owners of rental property, but rather only a segment (“sample” in statistical terms) of the total group (or “population”) of rental property owners. A population statistic is thus a set of measured or described observations made on each elementary unit. A sample statistic, in contrast, is a measure based on a representative group taken from a population.
QUANTITATIVE AND QUALITATIVE STATISTICS
Measurable observations are called quantitative observations. Examples of measurable observations include the annual salary drawn by a BlueCross/BlueShield underwriter or the age of a graduate student in an MBA program. Both are measurable and are therefore quantitative observations.
Observations that cannot be measured are termed qualitative. Qualitative observations can only be described. Anthropologists, for instance, often use qualitative statistics to describe how one culture varies from another. Marketing researchers have increasingly used qualitative statistical techniques to describe phenomena that are not easily measured, but can instead be described and classified into meaningful categories. Here, the distinction between a population of variates (a set of measured observations) and a population of attributes (a set of described observations) is important.
Values assumed by quantitative observations are called variates. These quantitative observations are further classified as either discrete or continuous. A discrete quantitative observation can assume only a limited number of
values on a measuring scale. For example, the number of graduate students in an MBA investment class is considered discrete.
Some quantitative observations, on the other hand, can assume an infinite number of values on a measuring scale. These quantitative measures are termed continuous. How consumers feel about a particular brand is a continuous quantitative measure; the exact increments in feelings are not directly assignable to a given number. Consumers may feel more or less strongly about the taste of a hamburger, but it would be difficult to say that one consumer likes a certain hamburger twice as much as another consumer.
DESCRIPTIVE AND INFERENTIAL STATISTICS
Managers can apply some statistical technique to virtually every branch of public and private enterprise. These techniques are commonly separated into two broad categories: descriptive statistics and inferential statistics. Descriptive statistics are typically simple summary figures calculated from a set of observations. Poll results and economic data are commonly-seen descriptive statistics. For example, when the American Automobile Association (AAA) reported in May 2008 that average gas prices had topped $4 per gallon in the United States, this was a statistic based on observations of gas prices throughout the United States.
Inferential statistics are used to apply conclusions about one set of observations to reach a broader conclusion or an inference about something that has not been directly observed. For example, inferential statistics could be used to show how strongly correlated gas prices and food prices are.
FREQUENCY DISTRIBUTION
Data is a collection of any number of related observations. A collection of data is called a data set. Statistical data may consist of a very large number of observations. The larger the number of observations, the greater the need to present the data in a summarized form that may omit some details, but reveals the general nature of a mass of data.
Frequency distribution allows for the compression of data into a table. The table organizes the data into classes or groups of values describing characteristics of the data. For example, students' grade distribution is one characteristic of a graduate class.
A frequency distribution shows the number of observations from the data set that fall into each category describing this characteristic. The relevant categories are defined by the user based on what he or she is trying to accomplish; in the case of grades, the categories might be each letter grade (A, B, C, etc.), pass/fail/incomplete, or grade percentage ranges. If you can determine the
Table 1 Frequency Distribution for a Class of 25 M.B.A. Students | ||
Grade Scale |
Student/ Grade Frequency |
Relative Frequency |
A | 5 | 20% |
B | 12 | 48% |
C | 4 | 16% |
D | 2 | 8% |
F | 1 | 4% |
I (Incomplete) | 1 | 4% |
TOTAL | 25 | 100% |
frequency with which values occur in each category, you can construct a frequency distribution. A relative frequency distribution presents frequencies in terms of fractions or percentages. The sum of all relative frequency distributions equals 1.00 or 100 percent.
Table 1 illustrates both a frequency distribution and a relative frequency distribution. The frequency distribution gives a break down of the number of students in each grade category ranging from A to F, including “I” for incomplete. The relative frequency distribution takes that number and turns it into a percentage of the whole number.
The chart shows us that five out of twenty-five students, or 25 percent, received an A in the class. It is basically two different ways of analyzing the same data. This is an example of one of the advantages of statistics. The same data can be analyzed several different ways.
PARAMETERS
Decisions and conclusions can often be made with absolute certainty if a single value that describes a certain aspect of a population is determined. As noted earlier, a parameter describes an entire population, whereas a statistic describes only a sample. The following are a few of the most common types of parameter measurements used.
Aggregate Parameter. An aggregate parameter can be computed only for a population of variates. The aggregate is the sum of the values of all the variates in the population. Industry-wide sales is an example of an aggregate parameter.
Proportion. A proportion refers to a fraction of the population that possesses a certain property. The proportion is the parameter used most often in describing a population of attributes, for example, the percentage of employees over age fifty.
Arithmetic Mean. The arithmetic mean is simply the average. It is obtained by dividing the sum of all variates
in the population by the total number of variates. The arithmetic mean is used more often than the median and mode to describe the average variate in the population. It best describes the values such as the average grade of a graduate student, the average yards gained per carry by a running back, and the average calories burned during a cardiovascular workout. It also has an interesting property: the sum of the deviations of the individual variates from their arithmetic mean is always equal to zero.
Median. The median is another way of determining the “average” variate in the population. It is especially useful when the population has a particularly skewed frequency distribution; in these cases the arithmetic mean can be misleading.
To compute the median for a population of variates, the variates must be arranged first in an increasing or decreasing order. The median is the middle variate if the number of the variates is odd. For example, if you have the distribution 1, 3, 4, 8, and 9, then the median is 4 (while the mean would be 5). If the number of variates is even, the median is the arithmetic mean of the two middle variates. In some cases (under a normal distribution) the mean and median are equal or nearly equal. However, in a skewed distribution where a few large values fall into the high end or the low end of the scale, the median describes the typical or average variate more accurately than the arithmetic mean does.
Consider a population of four people who have annual incomes of $2,000, $2,500, $3,500, and $300,000—an extremely skewed distribution. If we looked only at the arithmetic mean ($77,000), we would conclude that it is a fairly wealthy population on average. By contrast, in observing the median income ($3,000) we would conclude that it is overall a quite poor population, and one with great income disparity. In this example the median provides a much more accurate view of what is “average” in this population because the single income of $300,000 does not accurately reflect the majority of the sample.
Mode. The mode is the most frequently appearing variate or attribute in a population. For example, say a class of thirty students is surveyed about their ages. The resulting frequency distribution shows us that ten students are 18 years old, sixteen students are 19 years old, and four are 20 or older. The mode for this group would be the sixteen students who are 19 years old. In other words, the category with the most students is age 19.
MEASURE OF VARIATION
Another pair of parameters, the range and the standard deviation, measures the disparity among values of the various variates comprising the population. These parameters, called measures of variation, are designed to indicate the degree of uniformity among the variates.
The range is simply the difference between the highest and lowest variate. So, in a population with incomes ranging from $15,000 to $45,000, the range is $30,000 ($45,000 - $15,000 = $30,000).
The standard deviation is an important measure of variation because it lends itself to further statistical analysis and treatment. It measures the average amount by which variates are spread around the mean. The standard deviation is a versatile tool based on yet another calculation called the variance. The variance for a population reflects how far data points are from the mean, but the variance itself is typically used to calculate other statistics rather than for direct interpretation, such as the standard deviation, which is more useful in making sense of the data.
The standard deviation is a simple but powerful adaptation of the variance. It is found simply by taking the square root of the variance. The resulting figure can be used for a variety of analyses. For example, under a normal distribution, a distance of two standard deviations from the mean encompasses approximately 95 percent of the population, and three standard deviations cover 99.7 percent.
Thus, assuming a normal distribution, if a factory produces bolts with a mean length of 7 centimeters (2.8 inches) and the standard deviation is determined to be 0.5 centimeters (0.2 inches), we would know that 95 percent of the bolts fall between 6 centimeters (2.4 inches) and 8 centimeters (3.1 inches) long, and that 99.7 percent of the bolts are between 5.5 centimeters (2.2 inches) and 8.5 centimeters (3.3 inches). This information could be compared to the product specification tolerances to determine what proportion of the output meets quality control standards.
PROBABILITY
Modern statistics may be regarded as an application of the theory of probability. A set is a collection of well-defined objects called elements of the set. The set may contain a limited or infinite number of elements. The set that consists of all elements in a population is referred to as the universal set.
Statistical experiments are those that contain two significant characteristics. One is that each experiment has several possible outcomes that can be specified in advance. The second is that we are uncertain about the outcome of each experiment. Examples of statistical experiments include rolling a die and tossing a coin. The set that consists of all possible outcomes of an experiment is called a sample space, and each element of the sample space is called a sample point.
Each sample point or outcome of an experiment is assigned a weight that measures the likelihood of its occurrence. This weight is called the probability of the sample point.
Probability is the chance that something will happen. In assigning weights or probabilities to the various sample points, two rules generally apply. The first is that probability assigned to any sample point ranges from 0 to 1. Assigning a probability of 0 means that something can never happen; a probability of 1 indicates that something will always happen. The second rule is that the sum of probabilities assigned to all sample points in the sample space must be equal to 1 (e.g., in a coin flip, the probabilities are.5 for heads and.5 for tails).
In probability theory, an event is one or more of the possible outcomes of doing something. If we toss a coin several times, each toss is an event. The activity that produces such as event is referred to in probability theory as an experiment. Events are said to be mutually exclusive if one, and only one, can take place at a time. When a list of the possible events that can result from an experiment includes every possible outcome; the list is said to be collectively exhaustive. The coin toss experiment is a good example of collective exhaustion. The end result is either a head or a tail.
There are a few theoretical approaches to probability. Two common ones are the classical approach and the relative frequency approach. Classical probability defines the probability that an event will occur as the number of outcomes favorable to the occurrence of the event divided by the total number of possible outcomes. This approach is not practical to apply in managerial situations because it makes assumptions that are unrealistic for many real-life applications. It assumes away situations that are very unlikely, but that could conceivably happen. It is like saying that when a coin is flipped ten times, there will always be exactly five heads and five tails. But how many times do you think that actually happens? Classical probability concludes that it happens every time.
The relative frequency approach is used in the insurance industry. The approach, often called the relative frequency of occurrence, defines probability as the observed relative frequency of an event in a very large number of trials, or the proportion of times that an event occurs in the long run when conditions are stable. It uses past occurrences to help predict future probabilities that the occurrences will happen again.
Actuaries use high-level mathematical and statistical calculations in order to help determine the risk that some people and some groups might pose to the insurance carrier. They perform these operations in order to get a better idea of how and when situations that would cause customers to file claims and cost the company money might occur. The value of this is that it gives the insurance company an estimate of how much to charge for insurance premiums. For example, customers who smoke cigarettes are in higher risk group than those who do not smoke. The insurance company charges higher premiums to smokers to make up for the added risk.
SAMPLING
The objective of sampling is to select that part which is representative of the entire population. Sample designs are classified into probability samples and nonprobability samples. A sample is a probability sample if each unit in the population is given some chance of being selected. The probability of selecting each unit must be known. With a probability sample, the risk of incorrect decisions and conclusions can be measured using the theory of probability.
A sample is a nonprobability sample when some units in the population are not given any chance of being selected, and when the probability of selecting any unit into the sample cannot be determined or is not known. For this reason, there is no means of measuring the risk of making erroneous conclusions derived from nonprobability samples. Since the reliability of the results of non-probability samples cannot be measured, such samples do not lend themselves to statistical treatment and analysis. Convenience and judgment samples are the most common types of non-probability samples.
Among its many other applications, sampling is used in some manufacturing and distributing settings as a means of quality control. For example, a sample of 5 percent may be inspected for quality from a predetermined number of units of a product. That sample, if drawn properly, should indicate the total percentage of quality problems for the entire population, within a known margin of error (e.g., an inspector may be able to say with 95 percent certainty that the product defect rate is 4 percent, plus or minus 1 percent).
In many companies, if the defect rate is too high, then the processes and machinery are checked for errors. When the errors are found to be human errors, then a statistical standard is usually set for the acceptable error percentage for laborers.
In sum, samples provide estimates of what we would discover if we knew everything about an entire population. By taking only a representative sample of the population and using appropriate statistical techniques, we can infer certain things, not with absolute precision, but certainly within specified levels of precision.
SEE ALSO Data Processing and Data Management; Forecasting; Models and Modeling; Planning; Statistical Process Control and Six Sigma
BIBLIOGRAPHY
Anderson, David, Dennis Sweeney, and Thomas Williams. Essentials of Statistics for Business and Economics. 5th ed. Cincinnati, OH: South-Western College Publications, 2008.
Black, Ken. Business Statistics: For Contemporary Decision Making. 5th ed. Hoboken, NJ: Wiley, 2007.
Hogg, Robert, and Elliot Tanas. Probability and Statistical Inference. 7th ed. Upper Saddle River, NJ: Prentice Hall, 2005.
Lind, Douglas A. Basic Statistics for Business & Economics. Boston: McGraw-Hill, 2008.
Statistics
Statistics
Statistics is that branch of mathematics devoted to the collection, compilation, display, and interpretation of numerical data. In general, the field can be divided into two major subgroups, descriptive statistics and inferential statistics. The former subject deals primarily with the accumulation and presentation of numerical data, while the latter focuses on predictions that can be made based on those data.
Some fundamental concepts
Two fundamental concepts used in statistical analysis are population and sample . The term population refers to a complete set of individuals, objects, or events that belong to some category. For example, all of the players who are employed by Major League Baseball teams make up the population of professional major league baseball players. The term sample refers to some subset of a population that is representative of the total population. For example, one might go down the complete
Number of Female African-Americans in Various Age Groups | |
Age | Number |
0 - 19 | 5,382,025 |
20 - 29 | 2,982,305 |
30 - 39 | 2,587,550 |
40 - 49 | 1,567,735 |
50 - 59 | 1,335,235 |
60 + | 1,606,335 |
list of all major league baseball players and select every tenth name. That subset of every tenth name would then make up a sample of all professional major league baseball players.
Another concept of importance in statistics is the distinction between discrete and continuous data. Discrete variables are numbers that can have only certain specific numerical value that can be clearly separated from each other. For example, the number of professional major league baseball players is a discrete variable . There may be 400 or 410 or 475 or 615 professional baseball players, but never 400.5, 410.75, or 615.895.
Continuous variables may take any value whatsoever. The readings on a thermometer are an example of a continuous variable. The temperature can range from 10°C to 10.1°C to 10.2°C to 10.3°C (about 50°F) and so on upward or downward. Also, if a thermometer accurate enough is available, even finer divisions, such as 10.11°C, 10.12°C, and 10.13°C, can be made. Methods for dealing with discrete and continuous variables are somewhat different from each other in statistics.
In some cases, it is useful to treat continuous variable as discrete variables, and vice versa. For example, it might be helpful in some kind of statistical analysis to assume that temperatures can assume only discrete values, such as 5°C, 10°C, 15°C (41°F, 50°F, 59°F) and so on. It is important in making use of that statistical analysis, then, to recognize that this kind of assumption has been made.
Collecting data
The first step in doing a statistical study is usually to obtain raw data. As an example, suppose that a researcher wants to know the number of female African-Americans in each of six age groups (1-19; 20-29; 30-39; 40-49; 50-59; and 60+) in the United States. One way to answer that question would be to do a population survey, that is, to interview every single female African-American in the United States and ask what her age is. Quite obviously, such a study would be very difficult and very expensive to complete. In fact, it would probably be impossible to do.
A more reasonable approach is to select a sample of female African-Americans that is smaller than the total population and to interview this sample. Then, if the sample is drawn so as to be truly representative of the total population, the researcher can draw some conclusions about the total population based on the findings obtained from the smaller sample.
Descriptive statistics
Perhaps the simplest way to report the results of the study described above is to make a table. The advantage of constructing a table of data is that a reader can get a general idea about the findings of the study in a brief glance.
Graphical representation
The table shown above is one way of representing the frequency distribution of a sample or population. A frequency distribution is any method for summarizing data that shows the number of individuals or individual cases present in each given interval of measurement. In the table above, there are 5,382,025 female African-Americans in the age group 0-19; 2,982,305 in the age group 20-29; 2,587,550 in the age group 30-39; and so on.
A common method for expressing frequency distributions in an easy-to-read form is a graph. Among the kinds of graphs used for the display of data are histograms, bar graphs, and line graphs. A histogram is a graph that consists of solid bars without any space between them. The width of the bars corresponds to one of the variables being presented, and the height of the bars to a second variable. If we constructed a histogram based on the table shown above, the graph would have six bars, one for each of the six age groups included in the study. The height of the six bars would correspond to the frequency found for each group. The first bar (ages 0-19) would be nearly twice as high as the second (20-29) and third (30-39) bars since there are nearly twice as many individuals in the first group as in the second or third. The fourth, fifth, and six bars would be nearly the same height since there are about the same numbers of individuals in each of these three groups.
Another kind of graph that can be constructed from a histogram is a frequency polygon. A frequency polygon can be made by joining the midpoints of the top lines of each bar in a histogram to each other.
Distribution curves
Finally, think of a histogram in which the vertical bars are very narrow...and then very, very narrow. As one connects the midpoints of these bars, the frequency polygon begins to look like a smooth curve , perhaps like a high, smoothly shaped hill. A curve of this kind is known as a distribution curve.
Probably the most familiar kind of distribution curve is one with a peak in the middle of the graph that falls off equally on both sides of the peak. This kind of distribution curve is known as a "normal" curve. Normal curves result from a number of random events that occur in the world. For example, suppose you were to flip a penny a thousand times and count how many times heads and how many times tails came up. What you would find would be a normal distribution curve, with a peak at equal heads and tails. That means that, if you were to flip a penny many times, you would most commonly expect equal numbers of heads and tails. But the likelihood of some other distribution of heads and tails—such as 10% heads and 90% tails—would occur much less often.
Frequency distributions that are not normal are said to be skewed. In a skewed distribution curve, the number of cases on one side of the maximum is much smaller than the number of cases on the other side of the maximum. The graph might start out at zero and rise very sharply to its maximum point and then drop down again on a very gradual slope to zero on the other side. Depending on where the gradual slope is, the graph is said to be skewed to the left or to the right.
Other kinds of frequency distributions
Bar graphs look very much like histograms except that gaps are left between adjacent bars. This difference is based on the fact that bar graphs are usually used to represent discrete data and the space between bars is a reminder of the discrete character of the data represented.
Line graphs can also be used to represent continuous data. If one were to record the temperature once an hour all day long, a line graph could be constructed with the hours of day along the horizontal axis of the graph and the various temperatures along the vertical axis. The temperature found for each hour could then be plotted on the graph as a point and the points then connected with each other. The assumption of such a graph is that the temperature varied continuously between the observed readings and that those temperatures would fall along the continuous line drawn on the graph.
A circle graph, or "pie chart," can also be used to graph data. A circle graph shows how the total number of individuals, cases or events is divided up into various categories. For example, a circle graph showing the population of female African-Americans in the United States would be divided into pie-shaped segments, one (0-19) twice as large as the next two (20-20 and 30-39), and three about equal in size and smaller than the other three.
Measures of central tendency
Both statisticians and non-statisticians talk about "averages" all the time . But the term average can have a number of different meanings. In the field of statistics, therefore, workers prefer to use the term "measure of central tendency" for the concept of an "average." One way to understand how various measures of central tendency (different kinds of "average") differ from each other is to consider a classroom consisting of only six students. A study of the six students shows that their family incomes are as follows: $20,000; $25,000; $20,000; $30,000; $27,500; $150,000. What is the "average" income for the students in this classroom?
The measure of central tendency that most students learn in school is the mean. The mean for any set of numbers is found by adding all the numbers and dividing by the quantity of numbers. In this example, the mean would be equal to ($20,000 + $25,000 + $20,000 + $30,000 + $27,500 + $150,000) ÷ 6 = $45,417. But how much useful information does this answer give about the six students in the classroom? The mean that has been calculated ($45,417) is greater than the household income of five of the six students.
Another way of calculating central tendency is known as the median. The median value of a set of measurements is the middle value when the measurements are arranged in order from least to greatest. When there are an even number of measurements, the median is half way between the middle two measurements. In the above example, the measurements can be rearranged from least to greatest: $20,000; $20,000; $25,000; $27,500; $30,000; $150,000. In this case, the middle two measurements are
Improved | Not Improved | Total | |
Experimental Group | 62 | 38 | 100 |
Control Group | 45 | 55 | 100 |
Total | 107 | 93 | 200 |
$25,000 and $27,500, and half way between them is $26,250, the median in this case. You can see that the median in this example gives a better view of the household incomes for the classroom than does the mean.
A third measure of central tendency is the mode. The mode is the value most frequently observed in a study. In the household income study, the mode is $20,000 since it is the value found most often in the study. Each measure of central tendency has certain advantages and disadvantages and is used, therefore, under certain special circumstances.
Measures of variability
Suppose that a teacher gave the same test to two different classes and obtained the following results: Class 1: 80%, 80%, 80%, 80%, 80% Class 2: 60%, 70%, 80%, 90%, 100% If you calculate the mean for both sets of scores, you get the same answer: 80%. But the collection of scores from which this mean was obtained was very different in the two cases. The way that statisticians have of distinguishing cases such as this is known as measuring the variability of the sample. As with measures of central tendency, there are a number of ways of measuring the variability of a sample.
Probably the simplest method is to find the range of the sample, that is, the difference between the largest and smallest observation. The range of measurements in Class 1 is 0, and the range in class 2 is 40%. Simply knowing that fact gives a much better understanding of the data obtained from the two classes. In class 1, the mean was 80%, and the range was 0, but in class 2, the mean was 80%, and the range was 40%.
Other measures of variability are based on the difference between any one measurement and the mean of the set of scores. This measure is known as the deviation. As you can imagine, the greater the difference among measurements, the greater the variability. In the case of Class 2 above, the deviation for the first measurement is 20% (80%-60%), and the deviation for the second measurement is 10% (80%-70%).
Probably the most common measures of variability used by statisticians are the variance and standard deviation. Variance is defined as the mean of the squared deviations of a set of measurements. Calculating the variance is a somewhat complicated task. One has to find each of the deviations in the set of measurements, square each one, add all the squares, and divide by the number of measurements. In the example above, the variance would be equal to [(20)2 + (10)2 + (0)2 + (10)2 + (20)2] 4 ÷ 5 = 200.
For a number of reasons, the variance is used less often in statistics than is the standard deviation. The standard deviation is the square root of the variance, in this case, √+200 = 14.1. The standard deviation is useful because in any normal distribution, a large fraction of the measurements (about 68%) are located within one standard deviation of the mean. Another 27% (for a total of 95% of all measurements) lie within two standard deviations of the mean.
Inferential statistics
Expressing a collection of data in some useful form, as described above, is often only the first step in a statistician's work. The next step will be to decide what conclusions, predictions, and other statements, if any, can be made based on those data. A number of sophisticated mathematical techniques have now been developed to make these judgments.
An important fundamental concept used in inferential statistics is that of the null hypothesis. A null hypothesis is a statement made by a researcher at the beginning of an experiment that says, essentially, that nothing is happening in the experiment. That is, nothing other than natural events are going on during the experiment. At the conclusion of the experiment, the researcher submits his or her data to some kind of statistical analysis to see if the null hypothesis is true, that is, if nothing other than normal statistical variability has taken place in the experiment. If the null hypothesis is shown to be true, than the experiment truly did not have any effect on the subjects. If the null hypothesis is shown to be false, then the researcher is justified in putting forth some alternative hypothesis that will explain the effects that were observed. The role of statistics in this process is to provide mathematical tests to find out whether or not the null hypothesis is true or false.
A simple example of this process is deciding on the effectiveness of a new medication. In testing such medications, researchers usually select two groups, one the control group and one the experimental group. The control group does not receive the new medication; it receives a neutral substance instead. The experimental group receives the medication. The null hypothesis in an experiment of this kind is that the medication will have no effect and that both groups will respond in exactly the same way, whether they have been given the medication or not.
Suppose that the results of one experiment of this kind was as follows, with the numbers shown being the number of individuals who improved or did not improve after taking part in the experiment.
At first glance, it would appear that the new medication was at least partially successful since the number of those who took it and improved (62) was greater than the number who took it and did not improve (38). But a statistical test is available that will give a more precise answer, one that will express the probability (90%, 75%, 50%, etc.) that the null hypothesis is true. This test, called the chi square test, involves comparing the observed frequencies in the table above with a set of expected frequencies that can be calculated from the number of individuals taking the tests. The value of chi square calculated can then be compared to values in a table to see how likely the results were due to chance and how likely to some real affect of the medication.
Another example of a statistical test is called the Pearson correlation coefficient. The Pearson correlation coefficient is a way of determining the extent to which two variables are somehow associated, or correlated, with each other. For example, many medical studies have attempted to determine the connection between smoking and lung cancer . One way to do such studies is to measure the amount of smoking a person has done in her or his lifetime and compare the rate of lung cancer among those individuals. A mathematical formula allows the researcher to calculate the Pearson correlation coefficient between these two sets of data-rate of smoking and risk for lung cancer. That coefficient can range between 1.0, meaning the two are perfectly correlated, and -1.0, meaning the two have an inverse relationship (when one is high, the other is low).
The correlation test is a good example of the limitations of statistical analysis. Suppose that the Pearson correlation coefficient in the example above turned out to be 1.0. That number would mean that people who smoke the most are always the most likely to develop lung cancer. But what the correlation coefficient does not say is what the cause and effect relationship, if any, might be. It does not say that smoking causes cancer.
Chi square and correlation coefficient are only two of dozens of statistical tests now available for use by researchers. The specific kinds of data collected and the kinds of information a researcher wants to obtain from these data determine the specific test to be used.
See also Accuracy.
Resources
books
Freund, John E., and Richard Smith. Statistics: A First Course. Englewood Cliffs, NJ: Prentice Hall Inc., 1986.
Hastie, T., et al. The Elements of Stastical Learning: Data Mining, Inference, and Prediction. New York: Springer Verlag, 2001.
Walpole, Ronald, and Raymond Myers, et al. Probability andStatistics for Engineers and Scientists. Englewood Cliffs, NJ: Prentice Hall, 2002.
Witte, Robert S. Statistics. 3rd ed. New York: Holt, Rinehart and Winston, Inc., 1989.
David E. Newton
KEY TERMS
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .- Continuous variables
—A variable that may take any value whatsoever.
- Deviation
—The difference between any one measurement and the mean of the set of scores.
- Discrete variable
—A number that can have only certain specific numerical value that can be clearly separated from each other.
- Frequency polygon
—A type of frequency distribution graph that is made by joining the midpoints of the top lines of each bar in a histogram to each other.
- Histogram
—A bar graph that shows the frequency distribution of a variable by means of solid bars without any space between them.
- Mean
—A measure of central tendency found by adding all the numbers in a set and dividing by the quantity of numbers.
- Measure of central tendency
—Average.
- Measure of variability
—A general term for any method of measuring the spread of measurements around some measure of central tendency.
- Median
—The middle value in a set of measurements when those measurements are arranged in sequence from least to greatest.
- Mode
—The value that occurs most frequently in any set of measurements.
- Normal curve
—A frequency distribution curve with a symmetrical, bellshaped appearance.
- Null hypothesis
—A statistical statement that nothing unusual is taking place in an experiment.
- Population
—A complete set of individuals, objects, or events that belong to some category.
- Range
—The set containing all the values of the function.
- Standard deviation
—The square root of the variance.
Statistics
Statistics
Statistics is that branch of mathematics devoted to the collection, compilation, display, and interpretation of numerical data. The term statistics actually has two quite different meanings. In one case, it can refer to any set of numbers that has been collected and then arranged in some format that makes them easy to read and understand. In the second case, the term refers to a variety of mathematical procedures used to determine what those data may mean, if anything.
An example of the first kind of statistic is the data on female African Americans in various age groups, shown in Table 1. The table summarizes some interesting information but does not, in and of itself, seem to have any particular meaning. An example of the second kind of statistic is the data collected during the test of a new drug, shown in Table 2. This table not only summarizes information collected in the experiment, but also, presumably, can be used to determine the effectiveness of the drug.
Populations and samples
Two fundamental concepts used in statistical analysis are population and sample. The term population refers to a complete set of individuals, objects, or events that belong to some category. For example, all of the players who are employed by major league baseball teams make up the population of professional major league baseball players. The term sample refers to some subset of a population that is representative of the total population. For example, one might go down the complete list of all major league baseball players and select every tenth name on the list. That subset of every tenth name would then make up a sample of all professional major league baseball players.
Words to Know
Deviation: The difference between any one measurement and the mean of the set of scores.
Histogram: A bar graph that shows the frequency distribution of a variable by means of solid bars without any space between them.
Mean: A measure of central tendency found by adding all the numbers in a set and dividing by the number of numbers.
Measure of central tendency: Average.
Measure of variability: A general term for any method of measuring the spread of measurements around some measure of central tendency.
Median: The middle value in a set of measurements when those measurements are arranged in sequence from least to greatest.
Mode: The value that occurs most frequently in any set of measurements.
Normal curve: A frequency distribution curve with a symmetrical, bellshaped appearance.
Population: A complete set of individuals, objects, or events that belong to some category.
Range: The difference between the largest and smallest numbers in a set of observations.
Sample: A subset of actual observations taken from any larger set of possible observations.
Samples are important in statistical studies because it is almost never possible to collect data from all members in a population. For example, suppose one would like to know how many professional baseball players are Republicans and how many are Democrats. One way to answer that question would be to ask that question of every professional baseball player. However, it might be difficult to get in touch with every player and to get every player to respond. The larger the population, the more difficult it is to get data from every member of the population.
Most statistical studies, therefore, select a sample of individuals from a population to interview. One could use, for example the every-tenth-name list mentioned above to collect data about the political parties to which baseball players belong. That approach would be easier and less expensive than contacting everyone in the population.
The problem with using samples, however, is to be certain that the members of the sample are typical of the members of the population as a whole. If someone decided to interview only those baseball players who live in New York City, for example, the sample would not be a good one. People who live in New York City may have very different political concerns than people who live in the rest of the country.
One of the most important problems in any statistical study, then, is to collect a fair sample from a population. That fair sample is called a random sample because it is arranged in such a way that everyone in the population has an equal chance of being selected. Statisticians have now developed a number of techniques for selecting random samples for their studies.
Displaying data
Once data have been collected on some particular subject, those data must be displayed in some format that makes it easy for readers to see and understand. Table 1 makes it very easy for anyone who wants to know the number of female African Americans in any particular age group.
In general, the most common methods for displaying data are tables and charts or graphs. One of the most common types of graphs used is the display of data as a histogram. A histogram is a bar graph in which each bar represents some particular variable, and the height of each bar represents the number of cases of that variable. For example, one could make a histogram of the information in Table 1 by drawing six bars, one representing each of the six age groups shown in the table. The height of each bar would correspond to the number of individuals in each age group. The bar farthest to the left, representing the age group 0 to 19, would be much higher than any other bar because there are more individuals in that age group than in any other. The bar second from the right would be the shortest because it represents the age group with the fewest numbers of individuals.
Another way to represent data is called a frequency distribution curve. Suppose that the data in Table 1 were arranged so that the number of female African Americans for every age were represented. The table would have to show the number of individuals 1 year of age, those 2 years of age, those 3 years of age, and so on to the oldest living female African American. One could also make a histogram of these data. But a more efficient way would be to draw a line graph with each point on the graph standing for the number of individuals of each age. Such a graph would be called a frequency distribution curve because it shows the frequency (number of cases) for each different category (age group, in this case).
Many phenomena produce distribution curves that have a very distinctive shape, high in the middle and sloping off to either side. These distribution curves are sometimes called "bell curves" because their shape resembles a bell. For example, suppose you record the average weight of 10,000 American 14-year-old boys. You would probably find that the majority of those boys had a weight of perhaps 130 pounds. A smaller number might have weights of 150 or 110 pounds, a still smaller number, weights of 170 or 90 pounds, and very few boys with weights of 190 or 70 pounds. The graph you get for this measurement probably has a peak at the center (around 130 pounds) with downward slopes on either side of the center. This graph would reflect a normal distribution of weights.
Table 1. Number of Female African Americans in Various Age Groups
Age | Number |
0–19 | 5,382,025 |
20–29 | 2,982,305 |
30–39 | 2,587,550 |
40–49 | 1,567,735 |
50–59 | 1,335,235 |
60+ | 1,606,335 |
Table 2. Statistics
Improved | Not Improved | Total | |
Experimental group | 62 | 38 | 100 |
Control group | 45 | 55 | 100 |
Total | 107 | 93 | 200 |
Other phenomena do not exhibit normal distributions. At one time in the United States, the grades received by students in high school followed a normal distribution. The most common grade by far was a C, with fewer Bs and Ds, and fewer still As and Fs. In fact, grade distribution has for many years been used as an example of normal distribution.
Today, however, that situation has changed. The majority of grades received by students in high schools tend to be As and Bs, with fewer Cs, Ds and Fs. A distribution that is lopsided on one side or the other of the center of the graph is said to be a skewed distribution.
Measures of central tendency
Once a person has collected a mass of data, these data can be manipulated by a great variety of statistical techniques. Some of the most familiar of these techniques fall under the category of measures of central tendency. By measures of central tendency, we mean what the average of a set of data is. The problem is that the term average can have different meanings—mean, median, and mode among them.
In order to understand the differences of these three measures, consider a classroom consisting of only six students. A study of the six students shows that their family incomes are as follows: $20,000; $25,000; $20,000; $30,000; $27,500; and $150,000. What is the average income for the students in this classroom?
The measure of central tendency that most students learn in school is the mean. The mean for any set of numbers is found by adding all the numbers and dividing by the number of numbers. In this example, the mean would be equal to $20,000 + $25,000 + $20,000 + $30,000 + $27,500 + $150,000 ÷ 6 = $45,417.
But how much useful information does this answer give about the six students in the classroom? The mean that has been calculated ($45,417) is greater than the household income of five of the six students. Another way of calculating central tendency is known as the median. The median value of a set of measurements is the middle value when the measurements are arranged in order from least to greatest. When there are an even number of measurements, the median is half way between the middle two measurements. In the above example, the measurements can be rearranged from least to greatest: $20,000; $20,000; $25,000; $27,500; $30,000; $150,000. In this case, the middle two measurements are $25,000 and $27,500, and half way between them is $26,250, the median in this case. You can see that the median in this example gives a better view of the household incomes for the classroom than does the mean.
A third measure of central tendency is the mode. The mode is the value most frequently observed in a study. In the household income study, the mode is $20,000 since it is the value found most often in the study. Each measure of central tendency has certain advantages and disadvantages and is used, therefore, under certain special circumstances.
Measures of variability
Suppose that a teacher gave the same test four different times to two different classes and obtained the following results: Class 1: 80 percent, 80 percent, 80 percent, 80 percent, 80 percent; Class 2: 60 percent, 70 percent, 80 percent, 90 percent, 100 percent. If you calculate the mean for both sets of scores, you get the same answer: 80 percent. But the collection of scores from which this mean was obtained was very different in the two cases. The way that statisticians have of distinguishing cases such as this is known as measuring the variability of the sample. As with measures of central tendency, there are a number of ways of measuring the variability of a sample.
Probably the simplest method for measuring variability is to find the range of the sample, that is, the difference between the largest and smallest observation. The range of measurements in Class 1 is 0, and the range in class 2 is 40 percent. Simply knowing that fact gives a much better understanding of the data obtained from the two classes. In class 1, the mean was 80 percent, and the range was 0, but in class 2, the mean was 80 percent, and the range was 40 percent.
Other measures of variability are based on the difference between any one measurement and the mean of the set of scores. This measure is known as the deviation. As you can imagine, the greater the difference among measurements, the greater the variability. In the case of Class 2 above, the deviation for the first measurement is 20 percent (80 percent − 60 percent), and the deviation for the second measurement is 10 percent (80 percent − 70 percent).
Probably the most common measures of variability used by statisticians are called the variance and standard deviation. Variance is defined as the mean of the squared deviations of a set of measurements. Calculating the variance is a somewhat complicated task. One has to find each of the deviations in the set of measurements, square each one, add all the squares, and divide by the number of measurements. In the example above, the variance would be equal to [(20)^{2} + (10)^{2} + (0)^{2} + (10)^{2} + (20)^{2}] ÷ 5 = 200.
For a number of reasons, the variance is used less often in statistics than is the standard deviation. The standard deviation is the square root of the variance, in this case, √200 = 14.1. The standard deviation is useful because in any normal distribution, a large fraction of the measurements (about 68 percent) are located within one standard deviation of the mean. Another 27 percent (for a total of 95 percent of all measurements) lie within two standard deviations of the mean.
Other statistical tests
Many other kinds of statistical tests have been invented to find out the meaning of data. Look at the data presented in Table 2. Those data were collected in an experiment to see if a new kind of drug was effective in curing a disease. The people in the experimental group received the drug, while those in the control group received a placebo, a pill that looked like the drug but contained nothing more than starch. The table shows the number of people who got better ("Improved") and those who didn't ("Not Improved") in each group. Was the drug effective in curing the disease?
You might try to guess the answer to that question just by looking at the table. But is the 62 number in the Experimental Group really significantly greater than the 45 in the Control Group? Statisticians use the term significant to indicate that some result has occurred more often than might be expected purely on the basis of chance alone.
Statistical tests have been developed to answer this question mathematically. In this example, the test is based on the fact that each group was made up of 100 people. Purely on the basis of chance alone, then, one might expect 50 people in each group to get better and 50 not to get better. If the data show results different from that distribution, the results could have been caused by the new drug.
The mathematical problem, then, is to compare the 62 observed in the first cell with the 50 expected, the 38 observed in the second cell with the 50 expected, the 45 observed in the third cell with the 50 expected, and the 55 observed in the fourth cell with the 50 expected.
At first glance, it would appear that the new medication was at least partially successful since the number of those who took it and improved (62) was greater than the number who took it and did not improve (38). But a type of statistical test called a chi square test will give a more precise answer. The chi square test involves comparing the observed frequencies in Table 2 with a set of expected frequencies that can be calculated from the number of individuals taking the tests. The value of chi square calculated can then be compared to values in a table to see how likely the results were due to chance or to some real effect of the medication.
Another common technique used for analyzing numerical data is called the correlation coefficient. The correlation coefficient shows how closely two variables are related to each other. For example, many medical studies have attempted to determine the connection between smoking and lung cancer. The question is whether heavy smokers are more likely to develop lung cancer.
One way to do such studies is to measure the amount of smoking a person has done in her or his lifetime and compare the rate of lung cancer among those individuals. A mathematical formula allows the researcher to calculate the correlation coefficient between these two sets of data—rate of smoking and risk for lung cancer. That coefficient can range between 1.0, meaning the two are perfectly correlated, and −1.0, meaning the two have an inverse relationship (when one is high, the other is low).
The correlation test is a good example of the limitations of statistical analysis. Suppose that the correlation coefficient in the example above turned out to be 1.0. That number would mean that people who smoke the most are always the most likely to develop lung cancer. But what the correlation coefficient does not say is what the cause and effect relationship, if any, might be. It does not say that smoking causes cancer.
Chi square and correlation coefficient are only two of dozens of statistical tests now available for use by researchers. The specific kinds of data collected and the kinds of information a researcher wants to obtain from these data determine the specific test to be used.
Statistics
Statistics
Statistics is the mathematical science of collecting, organizing, summarizing, and interpreting information in a numerical form. There are two main branches of statistics. Descriptive statistics summarizes particular data about a given situation through the collection, organization, and presentation of those data. Inferential statistics is used to test hypotheses, to make predictions, and to draw conclusions, often about larger groups than the one from which the data have been collected.
Statistics enables the discernment of patterns and trends, and causes and effects in a world that may otherwise seem made up of random events and phenomena. Much of the current debate over issues like global warming, the effects of industrial pollution , and conservation of endangered species rests on statistics. Arguments for and against the human impact on the environment can be made or broken based on the quality of statistics supporting the argument, the interpretation of those data, or both. Providing accurate statistics to help researchers and policy-makers come to the most sound environmental conclusions is the domain of statistical ecology in particular, and environmental science in general.
Basic concepts
Statistical analysis begins with the collection of data—the values and measurements describing an event or phenomenon. Researchers collect two types of data: qualitative and quantitative. Qualitative data refers to information that cannot be ascribed a numerical value but that can be counted.
For example, suppose a city wants to gauge the success of its curbside recycling program. Researchers might survey residents to better understand their attitudes toward recycling. The survey might ask whether or not people actually recycle, or whether they feel the program is adequate. The results of that survey—the number of people in the city who recycle, the number who think the program could be improved, and so on—would provide quantitative data on the city's program. Quantitative data, then, refers to information that can be ascribed numeric values—for example, a study of how many tons each of glass and paper get recycled over a certain period of time. A more detailed breakdown might include data on the recycling habits of individual residents by measuring the amount of recycled materials people place in curbside bins.
Sorting through and weighing the contents of every recycling bin in a town would be not only time consuming but expensive. When researchers are unable to collect data on every member of a population or group under consideration, they collect data from a sample, or subset, of that population. In this case, the population would be everyone who recycles in the city.
To bring the scope of the study to a manageable size, the researchers might study the recycling habits of a particular neighborhood as a sample of all recyclers. A certain neighborhood's habits may not reflect the behavior of an entire city, however. To avoid this type of potential bias, researchers try to take random samples. That is, researchers collect data in such a way that each member of the population stands an equal chance of being selected for the sample. In this case, investigators might collect data on the contents of one bin from every block in the city.
Researchers must also define the particular characteristics of a population they intend to study. A measurement on a population that characterizes one of its features is called a parameter. A statistic is a characteristic of or fact about a sample. Researchers often use statistics from a sample to estimate the values of an entire population's parameters when it is impossible or impractical to collect data on every member of the population. The aim of random sampling is to help make that estimate as accurate as possible.
Descriptive statistics
Once the data have been collected, they must be organized into a readily understandable form. Organizing data according to groups in a table is called a frequency distribution. Graphic representations of data include bar charts, histograms, frequency polygons (line graphs), and pie charts.
Measures of central tendency, or the average, provide an idea of what the "typical" value is for a group of data. There are three measures of central tendency: the mean, the median, and the mode. The mean, which is what most people refer to when they use the term "average," is derived by adding all the values in a data set and then dividing the resulting sum by the number of values. The median is the middle value in a data set when all the numbers are arranged in ascending or descending order. When there is an even number of values, the median is derived by calculating the mean of the two middle values. The mode is the value that most often occurs in a set of data.
Although averages describe a "typical" member of a group or set of data, it can also be helpful to know about the exceptions. Statisticians have therefore devised several measures of variability—the extent to which data fluctuate between different measures. The range of data set is the difference between the highest and lowest values. Deviation is the difference between any one measure and the mean.
Range and deviation provide information on the variability of the individual members of a group, but there are also ways to describe the variability of the group as a whole if, for example, a statistician wants to compare the variability of two sets of data. Variance is derived from squaring the deviations of a set of measures and then calculating the mean of those squares. Standard deviation is the most common statistic used to describe a data sets variability because it can be expressed in the same units as the original data. Standard deviation is derived by calculating the square root of the variance.
Inferential statistics
Inferential statistics is largely concerned with predicting the probability—the likelihood (or not)—of certain outcomes, and establishing relationships or links between different variables. Variables are the changing factors or measurements that can affect the outcome of a study or experiment.
Inferential statistics is particularly important in fields such as ecological and environmental studies. For example, there are chemicals contained in cigarette smoke that are considered to be carcinogenic. Researchers rely in no small part on the methods of inferential statistics to justify such a conclusion.
The process begins by establishing a statistical link. To use a common example, health experts begin noticing an elevated incidence of lung cancer among cigarette smokers. The experts may suspect that the cause of the cancer is a particular chemical (if there are 40 suspected chemical carcinogens in the smoke , each chemical must be evaluated separately) in the cigarette smoke. Thus, there is a suspected association, or possible relationship, between a chemical in the smoke and lung cancer.
The next step is to examine the type of correlation that exists, if any. Correlation is the statistical measure of association, that is, the extent or degree to which two or more variables (a potential chemical carcinogen in cigarette smoke and lung cancer, in this case) are related. If statistical evidence shows that lung cancer rates consistently rise among a group of smokers who smoke cigarettes containing the suspected chemical compared with a group of nonsmokers (who are similar in other ways, such as age and general health), then researchers may say that a correlation exists.
Correlation does not prove a cause and effect relationship, however. The reason, in part, is the possible presence of confounders—other variables that might cause or contribute to the observed effect. Therefore, before proposing a cause-effect relationship between a chemical in cigarette smoke and lung cancer, researchers would have to consider whether other contributing factors (confounders)—such as diet, exposure to environmental toxins , stress, or genetics — may have contributed to onset of lung cancer in the study population. For example, do some smokers also live in homes containing asbestos ? Are there high levels of naturally occurring carcinogens such as radon in the work or home environment?
Teasing out the many possible confounders in the real world can be extremely difficult, so although statistics based on such observations are useful in establishing correlation, researchers must find a way to limit confounders to better determine whether a cause-effect relationship exists. Much of the available information on environmentally related causes and effects is verified with data from lab experiments; in a lab setting, variables can be better controlled than in the field.
Statistically and scientifically, cause and effect can never be proved or disproved 100%. Researchers test hypotheses, or explanations for observed phenomena, with an approach that may at first appear backwards. They begin by positing a null hypothesis, which states that the effects of the experiment will be opposite of what is expected. For example, researchers testing a chemical (called, for example, chemical x) in cigarette smoke might start with a null hypothesis such as: "exposure to chemical x does not produce cancer in lab mice." If the results of the experiment disprove the null hypothesis, then researchers are justified in advancing an alternative hypothesis. To establish that there is an effect, an experiment of this nature would rely on comparing an experimental group (mice exposed to chemical x, in this case) with a control group—an unexposed group used as a standard for comparison.
The next step is to determine whether the results are statistically significant. Researchers establish a test or experiments P value, the likelihood that the observed results are due to chance. Frequently, the results of an experiment or test are deemed statistically significant if the P value is equal to or less than 0.05. A P value of 0.05 means there are five or fewer chances in 100 that the observed results were due to random processes or statistical variability. In other words, researchers are 95% sure they have documented a real cause and effect.
Other important considerations include whether the results can be confirmed and are reliable. Findings are considered confirmed if another person running the same test or experiment can produce the same results. Reliability means that the same results can be reproduced in similar studies.
[Darrin Gunkel ]
RESOURCES
BOOKS
Cohn, Victor. News and Numbers. Ames, IA: Iowa State University Press, 1989.
Graham, Alan. Statistics. Lincolnwood, IL: NTC/Contemporary Publishing, 1999.
Jaisingh, Lloyd R. Statistics for the Utterly Confused. New York: McGraw-Hill, 2000.
Slavin, Stephen. Chances Are: The Only Statistics Book You'll Ever Need. Lanham, MD: Madison Books, 1998.
OTHER
Elementary Concepts in Statistics. 1984–2002 [cited July 6, 2002]. <http://www.statsoft.com/textbook/stathome.htm>.
Introduction to Statistics. 1997–2002 [cited July 6, 2002]. <http://writing.colostate.edu/references/research/stats/pop2a.cfm>.
Statistics
STATISTICS
STATISTICS, the scientific discipline that deals with the collection, classification, analysis, and interpretation of numerical facts or data, was invented primarily in the nineteenth and twentieth centuries in Western Europe and North America. In the eighteenth century, when the term came into use, "statistics" referred to a descriptive analysis of the situation of a political state—its people, resources, and social life. In the early nineteenth century, the term came to carry the specific connotation of a quantitative description and analysis of the various aspects of a state or other social or natural phenomenon. Many statistical associations were founded in the 1830s, including the Statistical Society of London (later the Royal Statistical Society) in 1833 and the American Statistical Association in 1839.
Early Use of Statistics
Although scientific claims were made for the statistical enterprise almost from the beginning, it had few characteristics of an academic discipline before the twentieth century, except as a "state science" or Staatswissenschaft in parts of central Europe. The role of statistics as a tool of politics, administration, and reform defined its character in the United States throughout the nineteenth century. Advocates of statistics, within government and among private intellectuals, argued that their new field would supply important political knowledge. Statistics could provide governing elites with concise, systematic, and authoritative information on the demographic, moral, medical, and economic characteristics of populations. In this view, statistical knowledge was useful, persuasive, and hence powerful, because it could capture the aggregate and the typical, the relationship between the part and the whole, and when data were available, their trajectory over time. It was particularly appropriate to describe the new arrays of social groups in rapidly growing, industrializing societies, the character and trajectory of social processes in far-flung empires, and the behavior and characteristics of newly mobilized political actors in the age of democratic revolutions.
One strand in this development was the creation of data sets and the development of rules and techniques of data collection and classification. In America, the earliest statistical works were descriptions of the American population and economy dating from the colonial period. British officials watched closely the demographic development of the colonies. By the time of the American Revolution (1775–1783), colonial leaders were aware of American demographic realities, and of the value of statistics. To apportion the tax burden and raise troops for the revolution, Congress turned to population and wealth measures to assess the differential capacities among the colonies. In 1787, the framers institutionalized the national population census to apportion seats among the states in the new Congress, and required that statistics on revenues and expenditures of the national state be collected and published by the new government. Almanacs, statistical gazetteers, and the routine publication of numerical data in the press signaled the growth of the field. Government activities produced election numbers, shipping data from tariff payments, value of land sales, and population distributions. In the early nineteenth century, reform organizations and the new statistical societies published data on the moral status of the society in the form of data on church pews filled, prostitutes arrested, patterns of disease, and drunkards reformed. The collection and publication of statistics thus expanded in both government and private organizations.
Professionalization of Statistics
The professionalization of the discipline began in the late nineteenth century. An International Statistical Congress, made up of government representatives from many states, met for the first time in 1853 and set about the impossible task of standardizing statistical categories across nations. In 1885, a new, more academic organization was created, the International Statistical Institute. Statistical work grew in the new federal agencies such as the Departments of Agriculture and Education in the 1860s and 1870s. The annual Statistical Abstract of the United States first appeared in 1878. The states began to create bureaus of labor statistics to collect data on wages, prices, strikes, and working conditions in industry, the first in Massachusetts in 1869; the federal Bureau of Labor, now the Bureau of Labor Statistics, was created in 1884. Statistical analysis became a university subject in the United States with Richmond Mayo Smith's text and course at Columbia University in the 1880s. Governments created formal posts for "statisticians" in government service, and research organizations devoted to the development of the field emerged. The initial claims of the field were descriptive, but soon, leaders also claimed the capacity to draw inferences from data.
Throughout the nineteenth century, a strong statistical ethic favored complete enumerations whenever possible, to avoid what seemed the speculative excess of early modern "political arithmetic." In the first decades of the twentieth century, there were increasingly influential efforts to define formal procedures of sampling. Agricultural economists in the U.S. Department of Agriculture were pioneers of such methods. By the 1930s, sampling was becoming common in U.S. government statistics. Increasingly, this was grounded in the mathematical methods of probability theory, which favored random rather than "purposive" samples. A 1934 paper by the Polish-born Jerzy Neyman, who was then in England but would soon emigrate to America, helped to define the methods of random sampling. At almost the same time, a notorious failure of indiscriminate large-scale polling in the 1936 election—predicting a landslide victory by Alf Landon over Franklin D. Roosevelt—gave credence to the more mathematical procedures.
Tools and Strategies
The new statistics of the twentieth century was defined not by an object of study—society—nor by counting and classifying, but by its mathematical tools, and by strategies of designing and analyzing observational and experimental data. The mathematics was grounded in an eighteenth-century tradition of probability theory, and was first institutionalized as a mathematical statistics in the 1890s by the English biometrician and eugenicist Karl Pearson. The other crucial founding figure was Sir R. A. Fisher, also an English student of quantitative biology and eugenics, whose statistical strategies of experimental design and analysis date from the 1920s. Pearson and Fisher were particularly influential in the United States, where quantification was associated with Progressive reforms of political and economic life. A biometric movement grew up in the United States under the leadership of scientists such as Raymond Pearl, who had been a postdoctoral student in Pearson's laboratory in London. Economics, also, was highly responsive to the new statistical methods, and deployed them to find trends, correlate variables, and detect and analyze business cycles. The Cowles Commission, set up in 1932 and housed at the University of Chicago in 1939, deployed and created statistical methods to investigate the causes of the worldwide depression of that decade. An international Econometric Society was established at about the same time, in 1930, adapting its name from Pearson's pioneering journal Biometrika.
Also prominent among the leading statistical fields in America were agriculture and psychology. Both had statistical traditions reaching back into the nineteenth century, and both were particularly receptive to new statistical tools. Fisher had worked out his methods of experimental design and tests of statistical significance with particular reference to agriculture. In later years he often visited America, where he was associated most closely with a statistical group at Iowa State University led by George Snedecor. The agriculturists divided their fields into many plots and assigned them randomly to experimental and control groups in order to determine, for example, whether a fertilizer treatment significantly increased crop yields. This strategy of collective experiments and randomized treatment also became the model for much of psychology, and especially educational psychology, where the role of the manure (the treatment) was now filled by novel teaching methods or curricular innovations to test for differences in educational achievement. The new experimental psychology was closely tied to strategies for sorting students using tests of intelligence and aptitude in the massively expanded public school systems of the late nineteenth and early twentieth centuries.
The methods of twentieth-century statistics also had a decisive role in medicine. The randomized clinical trial was also in many ways a British innovation, exemplified by a famous test of streptomycin in the treatment of tuberculosis just after World War II (1939–1945). It quickly became important also in America, where medical schools soon began to form departments of biostatistics. Statistics provided a means to coordinate treatments by many physicians in large-scale medical trials, which provided, in effect, a basis for regulating the practice of medicine. By the 1960s, statistical results had acquired something like statutory authority in the evaluation of pharmaceuticals. Not least among the sources of their appeal was the presumed objectivity of their results. The "gold standard" was a thoroughly impersonal process—a well-designed experiment generating quantitative results that could be analyzed by accepted statistical methods to give an un-ambiguous result.
Historical analysis was fairly closely tied to the field of statistics in the nineteenth century, when statistical work focused primarily on developing data and information systems to analyze "state and society" questions. Carroll Wright, first Commissioner of Labor, often quoted August L. von Schloezer's aphorism that "history is statistics ever advancing, statistics is history standing still." The twentieth century turn in statistics to experimental design and the analysis of biological processes broke that link, which was tenuously restored with the development of cliometrics, or quantitative history, in the 1960s and 1970s. But unlike the social sciences of economics, political science, psychology, and sociology, the field of history did not fully restore its relationship with statistics, for example, by making such training a graduate degree requirement. Thus the use of statistical analysis and "statistics" in the form of data in historical writing has remained a subfield of the American historical writing as history has eschewed a claim to being a "scientific" discipline.
Statistics as a field embraces the scientific ideal. That ideal, which replaces personal judgment with impersonal law, resonates with an American political tradition reaching back to the eighteenth century. The place of science, and especially statistics, as a source of such authority grew enormously in the twentieth century, as a greatly expanded state was increasingly compelled to make decisions in public, and to defend them against challenges.
BIBLIOGRAPHY
Anderson, Margo. The American Census: A Social History. New Haven, Conn.: Yale University Press, 1988.
———. American Medicine and Statistical Thinking, 1800–1860. Cambridge, Mass.: Harvard University Press, 1984.
Cohen, Patricia Cline. A Calculating People: The Spread of Numeracy in Early America. Chicago: University of Chicago Press, 1982.
Cullen, M. J. The Statistical Movement in Early Victorian Britain: The Foundations of Empirical Social Research. New York: Barnes and Noble, 1975.
Curtis, Bruce. The Politics of Population: State Formation, Statistics, and the Census of Canada, 1840–1875. Toronto: University of Toronto Press, 2001.
Desrosières, Alan. The Politics of Large Numbers: A History of Statistical Reasoning (English translation of Alain Desrosieres 1993 study, La politique des grands nombres: Histoire de la raison statistique). Cambridge, Mass.: Harvard University Press, 1998.
Gigerenzer, G., et al. The Empire of Chance: How Probability Changed Science and Everyday Life. Cambridge, Mass.: Cambridge University Press, 1989.
Glass, D. V. Numbering the People: The Eighteenth-Century Population Controversy and the Development of Census and Vital Statistics in Britain. New York: D.C. Heath, 1973.
Marks, Harry M. The Progress of Experiment: Science and Therapeutic Reform in the United States, 1900–1990. New York: Cambridge University Press, 1997.
Morgan, Mary S. The History of Econometric Ideas. New York: Cambridge University Press, 1990.
Patriarca, Silvana. Numbers and Nationhood: Writing Statistics in Nineteenth-Century Italy. New York: Cambridge University Press, 1996.
Porter, Theodore M. The Rise of Statistical Thinking, 1820–1900. Princeton, N.J.: Princeton University Press, 1986.
———. Trust in Numbers: The Pursuit of Objectivity in Science and Public Life. Princeton, N.J.: Princeton University Press, 1995.
Stigler, Stephen M. The History of Statistics: The Measurement of Uncertainty Before 1900. Cambridge, Mass.: Belknap Press of Harvard University Press, 1986.
———. Statistics on the Table: The History of Statistical Concepts and Methods. Cambridge, Mass.: Harvard University Press, 1999.
MargoAnderson
Theodore M.Porter
See alsoCensus, Bureau of the ; Demography and Demo-graphic Trends .
Statistics
STATISTICS
theoretical debatesstatisticians and bureaucrats
bibliography
The invention of statistics involved the recognition of a distinct and widely applicable set of procedures based on mathematical probability for studying mass phenomena.
theoretical debates
Pierre-Simon Laplace (1749–1827) is generally considered one of the fathers of (inverse) probabilities, mainly applied to astrophysics. He exploited the previous works of Abraham de Moivre (1667–1754) and Jakob Bernoulli (1654–1705). The development of mathematical probability, although motivated by problems in insurance, the social sciences, and astronomy, was actually linked to the analysis of games of chance. But, instead of adopting the law of large numbers (according to which the larger the sample, the less the conclusion will be liable to error), Laplace further developed an intuition of Thomas Bayes (1701–1761), who replaced the ordinary assumption of equally likely causes with the principle of a priori probabilities, which holds that each of the different causes of an event may occur. A direct probability is the chance that a fair coin tossed ten times will yield six heads and four tails; the inverse probability is the probability that the coin is unfair once it is known that six heads and four tails appeared in ten tosses. This conclusion allowed Laplace to indicate, for example, the most likely causes (climate, biology, etc.) of the higher rate of male births in London than in Paris.
The application of probability to the social sciences centered around the work of the Belgian Adolphe Quetelet (1796–1874). As a follower of the large numbers law, he supported general censuses rather than sample studies, whose selection he considered arbitrary. He was reluctant to group together as homogenous data that which he believed were not. Social scientists were thus encouraged to gather as much data as possible. Quetelet's name is tightly bound to the notion of the average person, with the statistical average turned into an ideal social type, for example, the average height of the soldier, the average income, age, and so on of a criminal or a drunk. He used probabilities to estimate the propensity of the average person to commit a crime. Quetelet saw in the regularity of crime the proof that statistical-social laws are true when applied to the whole society, although they may be false for a single individual. This approach reflected the nineteenth-century positivistic ideal of a science able to manage society. The liberal notion of equality is also reflected in the average person: in principle no a priori distinctions are made between individuals, but their social attitudes, as "scientifically" proved, can prevent society from deviance. Deviations from the average (and normality, as Émile Durkheim added later [1895]) cancel themselves out when a large enough number of cases is considered. In this view, statistics confirmed the stability of bourgeois society (Quetelet was writing in the immediate aftermath of the 1848 revolution) while trying to identify regularities in the apparent chaos that accompanied the fall of the ancien régime and the onset of the Industrial Revolution.
The judgment of homogeneity, however, could be made either on external grounds or on internal evidence. One solution consisted in developing a test of homogeneity internal to data (accomplished, for example, by the German statistician Wilhelm Lexis (1837–1914); another solution was to develop a methodology that acted as a surrogate for experimental control in social sciences. The three main contributors to the latter approach were Francis Galton, Francis Ysidro Edgeworth, and Karl Pearson. Galton (1822–1911), a romantic English traveler with a medical background, is known principally for his Hereditary Genius (1869). Unlike Quetelet, Galton seemed more interested in the exceptional than in the average. In his approach, the homogeneity of data was the starting tool and not the aim, as in Quetelet; in fact, once a stable homogenous group had been identified, Galton raised the question of identifying deviation from the average. For example, he classified how well hundreds of people performed a particular talent as evaluated according to a pre-settled scale. On this ground he identified the probability that certain characteristics of peas could be reproduced according to hereditary laws, and he extended this conclusion to human beings.
Edgeworth (1845–1926) corrected Galton's approach using Laplacés analysis of inverse probability. Edgeworth divided a population into subgroups and tested their homogeneity. By doing so, he anticipated the modern t-test and variance analysis. Pearson (1857–1936) went beyond Edgeworth's conclusions by considering homogenous groups and corresponding curves to be mental constructs rather than real phenomena. Pearson's philosophy of science, as expressed in his Grammar of Science (1892), constituted the ground on which he developed his analysis of skew curves, which were outside the bounds of the "normal distribution" studied by his predecessors.
The social and political implications of this new generation's studies were important. Galton is generally considered the founder of eugenics, the evolutionary doctrine holding that the condition of human beings can be improved through a scientific process of breeding rather than through education. Galton concluded that the upper classes hold their rank based not on greater economic means but on superior biological characteristics. Pearson further developed this approach in what became a form of social Darwinism: on the ground of a "real scientific knowledge" the state had to promote efficient reproduction of individuals, beyond personal beliefs and market competition.
Beginning in the 1860s and particularly during the last quarter of the nineteenth century, however, increasing criticism of positivism led to attacks on social and statistical determinism. Individual free will was opposed to "social laws" and statistical averages; Lexis and Georg Friedrich Knapp (1842–1926) in Germany, their students, and most of the Russian statisticians criticized universal statistical laws. They identified national paths of economic and demographic growth, and, by the same token, they stressed the role of individual freedom in social dynamics. According to Knapp, because every individual is different from every other individual, the notion of variation should replace that of statistical error.
Tightly linked with national specificities, regional and monographic analysis enjoyed increasing success in the last quarter of the nineteenth century. These studies were mostly developed in Germany and Russia where federalism (in the former case) or local governments (the zemstvos, in the latter case) encouraged studies on local economic conditions.
From a theoretical point of view, however, these studies raised a serious problem: in the absence of regular homogenous censuses, academic statisticians were rather skeptical about inference from samples largely gathered by administrative (above all local) statistical offices. Classical histories of statistics have contended that the theory of sampling was roughly constructed by the Norwegian statistician Anders Kiaer (1838–1919) in the 1890s and fully developed by Jerzy Neyman (1894–1981) in 1934. But the practice and theory of sampling were first developed in Russia, where, starting in the 1870s, several statistical bureaus of local self-government organizations (the zemstvos) developed monographic studies on the local population. Most of these studies were "partial" in the sense that they covered only a part of the population. During the following years, the best method of selecting the sample was under discussion in the meetings of the Russian statisticians as well as in their main publications. The first solution considered was that of a completely random selection; unfortunately, this approach required the contemporary achievement of general censuses upon which the test of representativeness could be made. Starting in 1887 a majority of Russian statisticians supported a "reasoned" selection of the sampling based on the investigator's knowledge of the main characteristics of the local economy and society. In the following years and up through the outbreak of World War I, Alexander Chuprov (1874–1926), Pafnuty Chebyshev (1821–1894), and Andrei Markov (1856–1922) strongly contributed to the development of sampling and probability theory in general.
Nevertheless, not only in Russia, but throughout Europe, the use of statistics in local and monographic studies highlighted the difficulty of balancing the reliability of studies and the need to limit their costs. The trade-off between cost–benefit analysis and sampling significance expressed two different notions of the use of scientific knowledge in the public sphere.
statisticians and bureaucrats
Before the nineteenth century, state statistics were limited to state budgeting and demographic concerns. It was only with the increasing economic and social activity of administrations that calculus and statistics acquired major social, political, and organizational roles. The nineteenth century was a period of increasing enthusiasm for statistics as a tool for a scientific management of politics. The positivist ideal and the reformist attitude of most of the European governments contributed to this success. Statistics entered general newspapers, and statistical societies multiplied after the 1830s (an era of social reforms in several European countries) either as sections of broader associations (e.g., the Academy of Sciences in France, the British Association for the Advancement of Science) or as independent associations (e.g., the London or Birmingham Statistical Society, the American Statistical Association). These different outcomes were linked either to the scientific debate (statistics as an independent science or as a general method for other sciences) or had political origins. For example, in Russia, up to World War I, local statisticians were forbidden from gathering as a group and were obliged to find a place in the general meetings of naturalists' associations.
International conferences aimed to offer the image of an international "objective" and homogenous science, and as such, held out the promise of the "scientific." Statisticians complained about the "ignorance" of professional politicians and the differences in the organization of national statistics. In France the statistical apparatus was highly centralized, whereas in Britain, Germany, and Russia, local administrations on the one hand and academicians on the other hand played major roles in the organization of statistical enquiries. Statisticians soon claimed not a mere executive but also a decisional role. In most of the nineteenth-century societies, this problem was deepened by the fact that statisticians mostly came from a different social group than did top-rank bureaucrats and politicians. Statistical analysis thus became a forum for professional, social, and political debates. For example, French and British statisticians focused on health problems in the burgeoning cities, as well as the living and working conditions of workers, and called for social and political reforms. For their part, Russian statisticians stressed the need for agrarian reforms and recommended the redistribution of noble and state lands to the peasants. World War I pushed specialists' ambitions to their apogee, while the ensuing interwar period was marked by the success of administrative bureaucracies.
See alsoDemography; Galton, Francis; Quetelet, Lambert Adolphe Jacques.
bibliography
Beaud, Jean-Pierre, and Jean-Guy Prévost. "La forme est le fond: La structuration des appareils statistiques nationaux (1800–1945)." Revue de synthèse 118, no. 4 (1997): 419–456.
Blum, Alain, and Martine Mespoulet. L'anarchie bureau-cratique: Pouvoir et statistique sous Staline. Paris, 2003.
Brian, Eric. La mesure de l'Etat: Administrateurs et géomètres au XVIIIe siècle. Paris, 1994.
Dale, Andrew I. A History of Inverse Probability: From Thomas Bayes to Karl Pearson. 2nd ed. New York, 1999.
Desrosières, Alain. The Politics of Large Numbers: A History of Statistical Reasoning. Translated by Camille Naish. Cambridge, Mass., 1998. Translation of La politique des grands nombres: Histoire de la raison statistique.
MacKenzie, Donald A. Statistics in Britain, 1865–1930: The Social Construction of Scientific Knowledge. Edinburgh, 1981.
Patriarca, Silvana. Numbers and Nationhood: Writing Statistics in Nineteenth-Century Italy. Cambridge, U.K., 1996.
Perrot, Jean-Claude. Une histoire intellectuelle de l'économie politique. Paris, 1992.
Porter, Theodore M. The Rise of Statistical Thinking, 1820–1900. Princeton, N.J., 1986.
Stanziani, Alessandro. L'économie en révolution: Le cas russe, 1870–1930. Paris, 1998.
Stigler, Stephen M. The History of Statistics: The Measurement of Uncertainty before 1900. Cambridge, Mass., 1986.
Tooze, J. Adam. Statistics and the German State, 1900–1945: The Making of Modern Economic Knowledge. Cambridge, U.K., 2001.
Alessandro Stanziani