This article will be mainly concerned with statistical fallacies, but it should be noted that most other fallacious types of reasoning can be carried over into statistics.
Most fallacies seem foolish when pinpointed, but they are not the prerogative of fools and statisticians. Great men make mistakes, and when they admit them remorsefully, they reveal a facet of their greatness. The reason for mentioning the mistakes of eminent people in this article is to make it more fun to read.
Many fallacies, statistical or otherwise, have their origin in wishful thinking, laziness, and busyness. These conditions lead to oversimplification, the desire to win an argument at all costs (even at the cost of overcomplication), failure to listen to the opposition, too-ready acceptance of authority, tooready rejection of it, too-ready acceptance of the printed word (even in newspapers), too-great reliance on a machine or formal system or formula (deus ex machina), and too-ready rejection of them (diabolus ex machina). These emotionally determined weaknesses are not themselves fallacies, but they provoke them. For example, they provoke special pleading, the use of language in more than one sense without notice of the ambiguity (if the argument leads to a desirable conclusion), the insistence that a method used successfully in one field of research is the only appropriate one in another, the distortion of judgment, and the forgetting of the need for judgment.
A logical or syntactical fallacy. We begin with an example of a fallacious argument in which the conclusion is correct:
“No cat has no tail. One cat has one more tail than no cat. Therefore one cat has one tail.”
A good technique for exposing fallacious reasoning is to use the same form of argument in order to deduce an obviously false result:
“No cat has eight tails. One cat has one more tail than no cat. Therefore one cat has nine tails.”
The fallacy can be explained by careful attention to syntax, specifically by noting that the following two propositions have been confused: (1) It is false that any cat has eight tails, and (2) the object named “no cat” has eight tails. P. M. S. Blackett once said, exaggerating somewhat, that a physicist is satisfied with an argument if it leads to a result that he believes to be true.
Arguments from authority. The book Popular Fallacies by Alfred S. E. Ackermann is more concerned with fallacies of fact than of reasoning, and here many fallacies depend on the acceptance of authority. It is interesting to see that the author was himself misled by authority on at least two occasions.
First, he argues that it is a fallacy “that cigarette smoking is especially pernicious,” appealing to the opinions of several authorities: for example, “Of the various forms of smoking, cigarette smoking is the most wholesome, preferably without a holder,” according to Sir Robert Armstrong-Jones, F.R.C.P., in the Daily Mail, January 1, 1927 (Ackermann  1950, pp. 174–175). (The current medical opinion is that, of cigarettes, cigars, and pipes, cigarettes are the least wholesome, at any rate in regard to lung cancer. Of course, Armstrong-Jones might be right after all.)
Then Ackermann refers to the thesis “that there is a prospect of atomic energy being of practical use.” Lord Rutherford is quoted, from the Evening News, September 11, 1933, as saying that “anyone who expects a source of power from the transformation of these atoms is talking moonshine” (ibid., pp. 708–709).
It would be unfair to blame Ackermann for relying on these authorities, but it is useful to hold in mind that the highest authorities can be wrong, even when they are emphatic in their opinions. Their desire not to seem too academic should sometimes be allowed for, especially when they hold an administrative appointment.
What should the question really be? When Gertrude Stein was on her deathbed, one of her friends asked her, “What is the answer?” After a few seconds she whispered back, “What is the question?”
It is important for the statistician to satisfy himself that a right question is being asked, by inquiring into the purposes behind the question. Chambers (1965) states that when a member of Parliament asked for some inland revenue figures that were not available from the published statistics, his invariable rule was to find out the purpose for which the information was needed. More often than not he found that the figures sought were irrelevant, that other figures already published were more helpful, or that the M.P. was misguided over the whole business.
It is often reasonable to make exploratory investigations without a clear purpose in mind. The fallacy we have just pointed out is the assumption that the questioner necessarily asks for information that is very relevant to his purposes, whether those purposes are clear or vague. The fallacy of giving the “right answer to the wrong question” is further discussed by A. W. Kimball (1957).
Ignoring the “exposure base.” Consider, for example, the reports on traffic deaths that are issued after public holidays. Many readers conclude from the increased number of deaths that it is more dangerous to drive on public holidays than on ordinary days. The conclusion may or may not be correct, but the reasoning is fallacious, since it ignores the fact that many more people drive automobiles on public holidays and thus more people are exposed to the possibility of an accident. If holidays and ordinary days are compared on the basis of deaths per passenger mile, it might turn out that the holiday death rate is lower, since the reduction of the average speed caused by the volume of traffic also may reduce the seriousness, if not the number, of accidents.
“Deus ex machina”-the precision fallacy. When we know a machine or formal system that can produce an exact answer to a question, we are tempted to provide an answer and inquire no further. But exact methods often produce exact answers to wrong questions.
One of the main aims of statistical technique is to fight the danger of wishful thinking and achieve a measure of objectivity in probability statements. But absolute objectivity and precision are seldom if ever attainable: there is always, or nearly always, a need for judgment in any application of statistical methods. (This point is especially emphasized in Good 1950.) Inexperienced statisticians often overestimate the degree of precision and objectivity that can be attained. An elementary form of the precision fallacy, which is less often committed by statisticians than by others, is the use of an average without reference to “spread.” A related trap is to gauge closeness by some measure of spread but to ignore systematic errors (bias) [see ERRORS, article on NONSAMPLING ERRORS].
The use of an average without a measure of spread can be especially misleading, or even comic, when the sample is small and the spread is therefore large. But even if the mean and standard deviation of the population are given, they can be misleading if the population is very skew. For skew populations it is better to give some of the quantiles [see STATISTICS, DESCRIPTIVE, article on LOCATION AND DISPERSION].
Randomization. An example of the precision fallacy occurs in connection with the important technique of randomization. Let us consider the famous tea-tasting experiment (see Fisher 1935, chapter 1; Good 1956). A lady claims to be able to tell by tasting, with better than random chance, whether the milk is put into her tea first or last. We decide to test her by giving her twenty cups of tea to taste, in ten of which the milk is poured first, and in ten last. If the lady gets many more than ten of her assertions right and if we have not randomized the order of the twenty trials, we might suspect that whatever sequence we selected for some psychological reason, the lady might have tended to guess for similar psychological reasons. So we randomize the order and can then apparently make use of the hypergeometric tail-area probability as a precise, objective, and effectively complete summary of the statistical significance of the experiment [see EXPERIMENTAL DESIGN and RANDOM NUMBERS].
If the number of cups of tea is very large, then the importance of the above criticism of randomization will usually be negligible. But long experiments are expensive, and in the statistical design of experiments the expense can never be ignored.
Randomization in itself is by no means a fallacious technique; what is fallacious is the notion that without the suppression of information, it can lead to a precise tail-area probability relevant only to a null hypothesis.
The suppression of information. In its crudest forms the suppression of information is often at least as wicked as an outright lie. We shall later refer to some of the cruder forms. But we have just seen that randomization loses its precision unless some information is suppressed, and we shall now argue more forcibly that it is a fallacy to suppose that the suppression of information is always culpable.
One way of seeing this is in terms of digital communication. When an electrical “pulse” of a given shape is liable to have been attenuated and distorted by noise, some circuitry is often incorporated for the purpose of re-forming (“regenerating”) the pulse. This circuitry, as it were, accepts the hypothesis that the pulse is supposed to be present rather than absent. Since noise in the electronic system makes the probability less than one that the supposed pulse is present, the regeneration loses some information. But allowing for the nature of the subsequent communication channel, it can be proved that the loss is often more than compensated. (This will not be proved here, but it is not surprising to common sense.) In pedagogy the corresponding principle is that simplification is necessary when teaching beginners. In statistics the corresponding device is known as the reduction of the data, that is, the reduction of a mass of data to a more easily assimilable form. If the statistics are “sufficient,” then there is no loss of information, but we often have to be satisfied with “insufficient” statistics in order to make an effective reduction of the data. Thus it is fallacious to say that the suppression of information is always a statistical crime. (Note that apparently sufficient statistics might not really be so if the model is wrong. People sometimes publish, say, only a mean and variance of a sample, and this prevents readers from checking the validity of the model for which these statistics would be sufficient.) [See SUFFICIENCY.]
Terminological ambiguities. Important examples of terminological ambiguities occur both in the philosophy of statistics and in its practical applications. Often they are as obvious as the “no cat” ambiguity, once they are pointed out. But before they are pointed out, they lead to a great deal of argument at cross purposes. Thus, many of the problems in the philosophy of probability clear themselves up as soon as we distinguish between various kinds of probability (see Good 1959a). (We shall not discuss here whether they can all be reduced to a single kind or whether they all “exist.” But they are all talked about.) [See PROBABILITY, article on INTERPRETATIONS.]
There is tautological, or mathematical, probability, which occurs in mathematical theories and requires no operational definition. It occurs also in the definition of a “simple statistical hypothesis,” which is a hypothesis for which some probabilities of the form P(E H) are assigned by definition. (Here E represents an event or a proposition asserting that an event obtains, and the vertical stroke stands for “given” or “assuming.”) There are physical, or material, probabilities, or chances. These relate to tautological probabilities by means of the linguistic axiom that to say that H is true is to say that the physical probability of E is P(EH), for some class of events, E. There are logical probabilities, or credibilities. There are subjective, or personal, probabilities, which are the intensities of conviction that a man will use for betting purposes, after mature consideration. There are multisubjeclive probabilities, belonging to groups of people. And there are psychological probabilities, which are the probabilities that people behave as if they accept, even before applying any criterion to test their consistency. By confusing pairs of these six kinds of probability, fifteen different kinds of fallacy can be generated. For example, it is often said that there is no sense in talking about the probability that a population parameter has a certain value, for “it either has the value or it does not, and therefore the probability is either 0 or 1.” It need hardly be mentioned, to those who are not choked with emotion, that the probability need not be 0 or 1 when it is interpreted not as a physical probability but as a logical or as a subjective probability.
Even physical probabilities can be confused with each other, since they can be mistakenly referred to the same event. For example, apparent variations in the incidence of some crime or disease from one place or time to another are very often found to be due to variations in the methods of classification. Adultery would appear to increase enormously if Christ’s definition were suddenly to be accepted in the law-“Whosoever looketh on a woman to lust after her hath committed adultery with her already in his heart” (Matthew 5.28). Since partners to adultery do not often turn in official reports, perhaps a better example is that of crime records. An example with documentation, quoted by Wallis and Roberts (1956), is that of felonies in New York. It was alleged that there had been an increase of 34.8 per cent from 1949 to 1950, but later it appeared that this was at least largely due to a revised method of classification.
This class of practical statistical fallacies is extremely common in the social sciences, and one should be very much on guard against it. As a further example, two standard definitions of the number of unemployed in the United States differ by a factor of over 3, namely, the “average monthly rate” and the “total annual rate.” Putting it roughly, one measure for any given year is the average monthly number of people unemployed; the other, larger measure is the number of people who were unemployed at any time during the year. [See EMPLOYMENT AND UNEMPLOYMENT.]
An example of a terminological fallacy is the confusion of “some” and “all.” It is perpetrated by John Hughlings Jackson in the following excerpt: “To coin the word, verbalising, to include all ways in which words serve, I would assert that both halves of the brain are alike in that each serves in verbalising. That the left half does is evident, because damage of it makes a man speechless. That the right does is inferable, because the speechless man understands all I say to him in ordinary matters” (quoted in Penfield & Roberts 1959, p. 62). Some damage to the left hemisphere seems here to have been confused with destruction of all of it.
Another example of the “some and all” fallacy is to assume that since some poems are better than others in the opinion of any reasonable judge, then, given any set of poems, one of them must be the best. More generally, the possibility of partial ordering is easily overlooked. But sometimes the assumption of partial ordering, although truer than complete ordering, is too complicated for a given application. In a beauty competition, for example, each girl might be the best of her kind, but it might be essential to award the prize to only one of them.
In some social surveys respondents are asked to rank several objects in order of merit. An alternate design, which will often be less watered down by the need to reach decisions in doubtful cases, is to ask for comparisons of pairs of objects but to permit “no comparison” as a response for any given pair.
Ignoring a relevant concomitant variable. “The death rate in the American Army in peacetime is lower than that in New York City. Therefore leave New York and join the army.” The fallacy is that the methods of selection for the army are biased toward longevity, both by age and by health, and these clearly relevant variables have been ignored. The fallacy can also be categorized as failure to control for exposure, since many inhabitants of New York City are subject to the possibility of death from infant diseases, chronic diseases, and old age, whereas very few men in the army are so exposed. The example can also be regarded as one of “biased sampling,” a category of fallacy to be considered later.
Ignoring half of a contingency table. It is commonly believed that government scientists in the United Kingdom earn more on the average than university teachers. But, as Rowe pointed out (1962), the average age of university teachers is less than that of government scientists, because a large proportion of lecturers leave the universities before their mid-thirties. Rowe showed that the median earning of university teachers above the age of 35 is greater than that of government scientists. But he did not estimate what these men would have earned in government service. Thus, although he refuted the original argument, he did not ask the really relevant question. This question is very difficult to answer. A possible approach, which would shed some light, would be to find out the distribution of salaries as a function of job, age, and intelligence quotient.
The perennial problem here is which covariates to choose and where to stop choosing them, since the list of possibilities is typically impracticably large. A related issue is that the greater the number of conditioning classificatory variables (dimensionality of the contingency table), the fewer the cases in the relevant cross-classification cell. This is one of the unsolved problems of actuarial science, where the problem of estimating probabilities in multidimensional contingency tables is philosophically basic (for some discussion of this problem, with references, see Good 1965).
It sometimes happens that a fact is almost universally ignored, although in retrospect it is clearly highly relevant. In the late 1950s people in the United States were arguing that the standard of college teaching staffs was deteriorating, since the proportion of newly employed teachers who held a doctorate was decreasing. What was overlooked was that the proportion of teachers who took their doctorates after becoming teachers was increasing. Cartter (1965) states that Bernard Berelson was almost alone in his correct interpretation of the situation.
Biased sample. At one time, most known quasistellar radio sources lay approximately in a plane, and this seemed to one writer to have deep cosmological significance. But these radio sources could not be definitely identified with optical sources unless they were located with great accuracy, and for this purpose they had to be occluded by the moon. Also, as it happened, most of the observations had been made from the same observatory. Hence there was a very strong bias in the sampling of the sources (this was mentioned by D. W. Dewhurst in a lecture in Oxford on January 28, 1965). As Sir Arthur Eddington once pointed out, if you catch fish with a net having a 6-inch mesh, you are liable to formulate the hypothesis that all fish are more than 6 inches in length. [See ERRORS, article onNONSAMPLING ERRORS.]
It is sometimes overlooked that atrocity stories usually form a biased sample. Newspapers tend to report the atrocities of political opponents more than those of friends. An exception was the Nazi atrocities, which were so great that the evidence for them had to be overwhelming before they could be believed. (For example, there appears to be no reference to them in the 1951 edition of the Encyclopaedia Britannica.)
Sometimes inferences from a sample are biased because of seasonal variations. According to Starnes (1962), Democratic Secretary of Labor Willard Wirtz stated just before an election that over “four and a half million more Americans have jobs than when this Administration took office in January of 1961.” Wirtz later admitted that the figure should have been 1.224 million, and he said, “It isn’t proper to compare January figures with October figures without a seasonal adjustment.” Similarly, the Republican governor of New York, Nelson D. Rockefeller, once referred to a “net increase of 450,000 jobs” since he had taken office. The figure is worthless because it again ignores the adjustment for seasonal variations.
Bias is difficult to avoid in social surveys, for example, in the use of questionnaires, where poor wording is frequent and where one sometimes (especially in political and commercial surveys) finds tendentious wording.
Even with an unbiased sample, it is possible to get a biased conclusion by computing the significance level of various tests of the null hypothesis and selecting the one most favorable to one’s wishes. Although these tests are based on the same sample and are therefore statistically interdependent, there will be a reasonable probability that one out of twenty such tests will reach a 5 per cent significance level. A suggestion of how to combine such “parallel” tests is given by Good (1958a).
The suppression of the uninteresting. Suppose we have done an experiment, and it reaches a significance level of 5 per cent. Should we reject the null hypothesis? Perhaps the experiment has been performed by others without significant results. If these other experiments were taken into account, the total significance of all the experiments combined might be negligible. Moreover, the other results might have been unpublished because they were nonsignificant and therefore uninteresting. This explains why some apparent medical advances do not fulfill their early promise. The published statistics are biased in favor of what is interesting. As one physician said, “Hasten to use the remedy before it is too late” (Good 1958b, p. 283; Sterling 1959).
Sample too small. One of the most frequent and elementary statistical fallacies is the reliance on too small a sample. In 1933 Meduna, believing that schizophrenia and epilepsy were incompatible because of the rarity of their joint occurrence, started to induce convulsions in mental patients by chemical means. Consequently, the beneficial effect of convulsions on depressives was eventually accidentally discovered. Meduna’s sample was too small, and in fact it has now been found that schizophrenia and epilepsy are positively correlated (Slater & Beard 1963). One moral of this story is that experiments can be worth trying without theoretical reason to believe that they might be successful.
Misleading use of graphs and pictures. Graphs and pictures are often used in newspapers in the hope of misleading readers who are not experienced in interpreting them. Sometimes graphs are inadequately labeled; sometimes the scale is chosen so as to make a small slope appear large; sometimes the graph is drawn on a board and the board is pictured in perspective so as to accentuate the most recent slope; sometimes too little of a time series is shown, and the graph is started at a trough (a device that is useful for salesmen of stocks, when they wish the public to invest in a particular equity).
A useful method for misleading with pictures is to depict, say, salaries by means of objects such as cash boxes whose linear dimensions are proportional to the salaries. In this way an increase is made to appear much larger than it really is. Another useful method for misleading the public is attributed by Huff (1954) to the First National Bank of Boston. The bank represented governmental expenditure by means of a map of the United States in which states of low population densities were shaded to indicate that total government spending was equal to the combined income of the people of those states. The hope was that the reader would get the impression that federal spending, as a fraction of the total income of the United States, was equal to the total area of the shaded states divided by the whole area of the country. [See GRAPHIC PRESENTATION.]
“Smaller” versus “smaller than necessary.” The confusion of “smaller” with “smaller than necessary” will be illustrated in a hereditary context, and an oversimplification of the theory of natural selection will be pointed out. Let us suppose that it is true that intelligent people tend to have fewer children than less intelligent people and that the level of intelligence is hereditary. (We are not here concerned with whether and where this supposition is true, nor with the precise interpretation of “intelligent.”) It then appears to follow that the average level of intelligence will necessarily decline. This fallacy will be perpetrated on most readers of Chapter 5 of the book by the eminent zoologist Peter B. Medawar (1960, p. 86), in spite of the words italicized by us in the following quotation: “If innately unintelligent people tend to have larger families, then, with some qualifications, we can infer that the average level of intelligence will decline.” In order to show that the argument without the qualification is invalid it is sufficient to use a mathematical model that, for other purposes, would be much oversimplified (see Behrens 1963). Imagine a population in which 10 per cent of men are intelligent and 90 per cent are unintelligent and that, on the average, 100 intelligent fathers have 46 sons, of whom 28 are intelligent and 18 unintelligent, whereas 100 unintelligent fathers have 106
|Table I – Hypothetical proportions of intelligent and unintelligent sons|
sons, of whom 98 are unintelligent and 8 are intelligent. It will be seen from Table 1 that the proportion of intelligent males would remain steady in expectation.
But now it must be determined whether the right question is being asked. Suppose we were convinced that the general level of intelligence was decreasing, and we made suggestions accordingly for encouraging the more intelligent to have more children. Should we not put these suggestions forward even if the general level of intelligence were increasing! Would we not like to see the rate of increase also increase? Yes, of course. Looking from this point of view, we might not fully agree with Medawar’s arguments, but we might well agree with some of his recommendations.
“Regression fallacy.” If we select a short or tall person at random, the chances are that his relatives will be closer in average height than he is to the mean height of the population. Francis Galton described this phenomenon as “regression.” If now we consider the heights of the sons of tall men and of short men, we might infer that the variability of heights is decreasing with time. This would be an example of the regression fallacy. One way of seeing that the argument must be fallacious is by considering the heights of the parents of short and tall people: we would then infer that the variability of heights is increasing with time!
Wallis and Roberts (1956) mention several other examples of the regression fallacy. One is the widespread belief that the second year in the major leagues is an unlucky one for new baseball players who have successfully finished their first year.
Invalid use of formulas or theorems. The use of formulas or theorems in situations where they are not valid is a special case of the deus ex machina class of fallacies and is very frequent. The following are a few examples.
Implicit assumption of independence. In an experiment consisting of n trials, each successful with probability p, is the variance (the square of the standard deviation) of the number of successes equal to np(l – p), as it would be if independence held? (An example would be the quality inspection of items on an assembly line.) The formula is so familiar that it is tempting to assume that it is always a good approximation. But familiarity breeds mistakes. For a Markov chain the variance can be quite different (see, for example, Good 1963), as it can also be when sampling features of children in families or fruit on trees.
Another example of a fallacious assumption of independence relates to the variability of physiological traits. Why, even if there were only eight traits, each trichotomized into equal thirds, only one person out of 3s = 6561 would be in the middle (normal) group for all eight traits!
Assuming form determines distribution. Let nij, be the frequency of the “dinome,” that is, pair of adjacent digits (i,j), in a sequence of N random sampling digits (i,j = 0,1,2, … , 9). Clearly
It has been erroneously assumed at least four times in the statistical literature that ψ2 has asymptotically (for large N) a tabular chi-squared distribution. In one case this led to the unfair rejection of a method of producing pseudo random numbers. Presumably the erroneous distribution arose from the typographical identity of the expression for ψ2 with the familiar statistic of the chi-square test. (For references to three of these papers and to a paper that gives a correct method of using ψ2, see Good 1963.) The misapplication of the above socalled serial test is particularly disastrous when working with binary digits (0 and 1), that is, with base 2.
Assuming the winner leads half the time. There is a fallacy in assuming that in a long sequence of statistically independent fair games of chance between two players the ultimate winner will be in the lead about half the time. This is a misapplication of the law of large numbers. That it is a fallacy depends on one of the most surprising theorems in the theory of probability, the so-called arc sine law. In fact, the probability that a specified player will be in the lead for less than a fraction x of the time is approximately (2/Tr) arc sin x! (see, for example, Feller 1950–1966, vol. 1, p. 251). This implies that however long the game, it is much more likely that a specified player will be ahead most of the time or behind most of the time than that he will be about even; for example, the probability that a specific player will be ahead 90 per cent or more of the time, or behind 90 per cent or more of the time, is about .40, while the probability that the player will be ahead between 40 and 60 per cent of the time is only about .13. As Feller says, the arc sine law “should serve as a warning to those who easily discover ‘obvious’ secular trends” in economic and social phenomena.
The “maturity of the chances.” An elementary misapplication of the law of large numbers, or “law of averages,” is known as the maturity of the chances. In World War I many soldiers took shelter in bomb craters on the grounds that two bombs seldom hit the same spot. For the same reason P. S. Milner-Barry, the British chess master, decided to retain his London flat after it was bombed in World War II. As a matter of fact it was bombed again. At roulette tables, it is said, the chips pile up on the color that has not occurred much hi recent spins. Of course, in practice, if a coin came down heads fifty times running, it would be more likely than not, in logical probability, to come down heads on the next spin, not less likely. In fact, it would probably be double-headed. There are circumstances, of course, when an event is less likely to occur soon after it has just occurred: this would be true for some kinds of accidents and in many situations where one is sampling without replacement. Usually the question is basically empirical, but the expression “maturity of the chances,” or “Monte Carlo fallacy,” usually refers to sequences of events that are statistically independent, at least to a good approximation.
Law of large numbers misapplied to pairs. A mnemonic for the fallacy of misapplying the law of large numbers when considering pairs of objects selected from a set is the well-known “birthday problem.” If 24 people are selected at random, then it is more likely than not that at least one pair of them will have the same birthday (that is, month and day). This is simple to prove, but a good intuitive “reason” for it is that the number of pairs of people in a group of 24 people is 276, and exp (-276/365) < ½. (The crude argument here is based on a Poisson approximation to the probability of no “successes” in 276 roughly independent trials with common success probability 1/365.) The result is true a fortiori if births are not distributed uniformly over days of the year.
C. R. Hewitt, in a “Science Survey” program of the British Broadcasting Corporation in February, 1951, stated that the probability is less than 1/64,000,000,000 that two fingerprints of different people will be indistinguishable. From this he inferred that no two people have indistinguishable fingerprints and thus committed the birthday fallacy. The argument is fallacious even if we ignore resemblances of fingerprints among relatives, since the number of pairs of people in the world exceeds 4,000,000,000,000,000,000. The conclusion might be correct.
A similar fallacy arises in connection with precognition. Suppose, entirely unrealistically, that there is just one remarkable and well-documented case of somebody in the world having an apparently precognitive dream. How small must the apparent probability be in order that the report, if true, should by itself convince us of precognition? Presumably its reciprocal should be at least of the order of the population of the world times the number of dream experiences of a man times the number of his waking experiences. This triple product might be as large as 1,000,000,000,000,000,000,000,000,000 (or 10-27). This informal application of statistics should discourage a too ready assumption that the evidence from apparently precognitive dreams is overwhelming. A formal application of statistical methods to this problem is very difficult. This discussion is not intended to undermine a belief in the possibility of precognition, but it is a plea for a better evaluation of the evidence.
Failure to use precise notation. An example of the fallacy of failure to use sufficiently explicit notation is given by the “fiducial argument.” The purpose of R. A. Fisher’s fiducial argument (1956, pp. 52–54) was to produce a final (posterior) distribution for a parameter without assuming an initial (prior) distribution for it. This was ambitious, to say the least, since de nihilo nihilum.
The argument starts off from a parametric distribution for a random variable, X. Fisher selected an example of which the following is a special case. For each positive number xa, suppose that
where θ is a positive parameter in whose value and final distribution we are interested. Writing x0 = u/θ, we get P(X θ >u θ) = exp(-u). From this it can be proved, using the usual axioms of probability theory (although Fisher omitted the proof), that P(X θ > u) = exp(-u) for any positive number, u, provided that an initial distribution for θ is assumed to exist. (It is not necessary to assume that this distribution is in any sense known.) Hence P(θ > θ0) = exp(-xθ0), where θ0 = u/x. Fisher infers from this that
where θ0 is any real positive number. But this last equation does not follow from the axioms of probability unless the initial probability density of θ is proportional to 1/θ. The fallacy in the fiducial argument was due to Fisher’s failure to indicate what is “given” in his probability notation. So great was Fisher’s authority that there are still many statisticians who make use of the fiducial argument; thus the analysis given here is currently considered controversial [see FIDUCIAL INFERENCE].
Assuming order of operations reversible. An example of the fallacy of assuming that the order of two mathematical operations can be interchanged is the assumption that the expectation of a square is equal to the square of the expectation. This occurs in M. J. Moroney (1951, p. 250), where he says that evidently the expected value of chisquare for a multinomial distribution is zero.
Correlation and causation. Positive correlation does not imply causation, either way round. There is a positive correlation between the number of maiden aunts one has and the proportion of calcium in one’s bones. But you cannot acquire more maiden aunts by eating calcium tablets. (Younger people tend to have more maiden aunts and more bone calcium.) In New Hebrides people in good health are lousier than people with fever. The advice to acquire lice cannot be rationally given, since lice avoid hot bodies [see Huff 1954, p. 99; see also CAUSATION].
Zero correlation does not imply statistical independence, although it does so for a bivariate normal distribution and for some other special families of distributions.
If there is a positive correlation between A and B and also between B and C, this does not imply that the correlation between A and C is positive, even for a trivariate normal distribution. But the implication does follow if the sum of the squares of the first two correlation coefficients exceeds unity.
If the time order is wrong, then causation is unlikely, to say the least. In one survey vaccination was found to be positively correlated with various infectious diseases, when one looked at different districts in India. This was used by antivaccinationists for propaganda. If they had not been emotionally involved, they would probably have noticed that in several districts increased vaccination had followed an increase in the incidence of disease (Chambers 1965).
Post hoc, ergo propter hoc (“after this, therefore because of this”). D. O. Moberg, in a lecture in Oxford on February 2, 1965, stated that premarital intercourse seemed to be positively correlated with divorce and inferred that the propensity to divorce was increased by premarital intercourse. The inference might be correct, but an equally good explanation is that premarital intercourse and divorce are both largely consequences of the same attitude toward the institution of matrimony. It is also possible that untruthful responses are associated with a propensity to divorce or with a propensity to avoid divorce.
Ecological correlation. Suppose we find that in American cities the illiteracy rate and the percentage of foreign-born are associated. This does not imply the same association for individuals (see Goodman 1959). It would even be possible that every foreign-born person was highly literate. Cities might attract foreign-born people and also attract or produce illiteracy.
Wrong criteria for suboptimization. Granted that in most decision problems it is not so much a matter of optimization as of “suboptimization,” that is, of approximate optimization, there is still an acute problem in choosing what to suboptimize. Various fallacies arise through choosing a wrong criterion or through not using a criterion at all (see Koopman 1956; Good 1962). Often a criterion is selected from too narrow a point of view, ignoring questions of consistency with higher-level criteria. For example, when coeducation at New College, Oxford, was being discussed at another Oxford college, the question of the relative requirements for education of men and women was ignored, but the effect on the atmosphere of the senior common room was mentioned. Another fallacy is to ignore the “spillover,” or side effects, of some project. Sometimes, when an urgent decision is required, the cost in delay of detailed theory is unjustifiably ignored. At other times the cost of the theory is said to be too heavy, and the fact is overlooked that the results of this theory might be valuable in similar circumstances in the future and that the training of the theoretician is important. Sometimes the criterion of profitability is given too little weight, sometimes too much (see also McKean 1958; Hitch & McKean 1954).
Statistics of statistical fallacies. There is some unpublished work by Christopher Scott on the statistics of statistical fallacies and errors for the specialized field of sample surveys conducted by mail. Scott read the 117 articles and research reports that had been written in English on this topic up to the end of 1960. He excluded 22 of the reports either because they were duplicates of others or because they gave almost no details of method. Of the remaining 95 articles, he found one or more definite errors in 54 and definite shortcomings in another 13. Among the definite errors there were 14 cases in which the experimental variable was not successfully isolated, that is, a change in technique was reported as causing a change in the result, whereas the latter change could reasonably be ascribed to variation in some concomitant variable.
There were 9 cases in which obviously relevant data, such as sample size or response rate, were not reported, and 7 cases in which a necessary significance test was not given. There is not space here for further details, and hopefully they will be published elsewhere.
For misuses of the chi-square test, see Lewis and Burke (1949).
Good fallacies. It is a fallacy to suppose that all fallacies are bad. A clearly self-contradictory epigram can be a neat way of conveying truth or advice, to everybody except quibblers. For example:
“Only a half-truth can be expressed in a nutshell.”
“Everything in moderation.”
“It would be a non sequitur if it were not a tautology.”
“Races in which people were immortal became extinct by natural selection.”
“There’s nothing wrong with chess players that not being people wouldn’t put right.”
In this article it has been necessary to omit reference to many kinds of fallacies. A more complete listing is given in the categorization of logical and statistical fallacies by Good (1959b).
IRVING JOHN GOOD
Further literature on fallacies is mentioned in Good 1959b. In particular, Thouless 1932 for fallacies in ordinary reasoning and chapter 3 of Wallis & Roberts 1956 for fallacies in statistics are both very useful. Wagemann 1935 also gives an interesting general treatment.
ACKERMANN, ALFRED S. E. (1907) 1950 Popular Fallacies: A Book of Common Errors, Explained and Corrected With Copious References to Authorities. 4th ed. London: Old Westminster Press.
BEHRENS, D. J. 1963 High IQ, Low Fertility? Statistical “Non Sequitur.” Mensa Correspondence (London) no. 50:6 only.
CARTTER, ALLAN M. 1965 A New Look at the Supply of College Teachers. Educational Record 46:267–277.
CHAMBERS, S. PAUL 1965 Statistics and Intellectual Integrity. Journal of the Royal Statistical Society Series A 128:1–15.
FELLER, WILLIAM 1950–1966 An Introduction to Probbility Theory and Its Applications. 2 vols. New York: Wiley. → The second edition of Volume 1 was published in 1957.
FISHER, R. A. (1935) 1960 The Design of Experiments. 7th ed. New York: Hafner; London: Oliver & Boyd.
FISHER, R. A. (1956) 1959 Statistical Methods and Scientific Inference. 2d ed., rev. New York: Hafner; London: Oliver & Boyd.
GOOD, I. J. 1950 Probability and the Weighing of Evidence. London: Griffin.
Good, I. J. 1956 Which Comes First, Probability or Statistics? Journal of the Institute of Actuaries 82: 249–255.
Good, I. J. 1958a Significance Tests in Parallel and in Series. Journal of the American Statistical Association53>: 799–813.
Good, I. J. 1958b How Much Science Can You Have at Your Fingertips? IBM Journal of Research and Development 2: 282–288.
Good, I. J. 1959a Kinds of Probability. Science New Series 129: 443–447.
Good, I. J. (1959b) 1962 A Classification of Fallacious Arguments and Interpretations. Technometrics 4: 125–132. → First published in Volume 11 of Methodos.
Good, I. J. 1960-1961 The Paradox of Confirmation. British Journal for the Philosophy of Science 11:145-149; 12: 63–64.
Good, I. J. (1962) 1965 How Rational Should a Manager Be? Pages 88-98 in Executive Readings in Management Science. Edited by Martin K. Starr. New York: Macmillan. → First published in Volume 8 of Management Science.
Good, I. J. 1963 Quadratics in Markov-chain Frequencies, and the Binary Chain of Order 2. Journal of the Royal Statistical Society Series B 25: 383–391.
Good, I. J. 1965 The Estimation of Probabilities: An Essay in Modern Bayesian Methods. Cambridge, Mass.: M.I.T. Press.
Goodman, Leo A. 1959 Some Alternatives to Ecological Correlation. American Journal of Sociology 64: 610–625.
Hitch, Charles; and McKean, Ronald 1954 Suboptimization in Operations Problems. Volume 1, pages 168-186 in Operations Research for Management. Edited by Joseph F. McCloskey and Florence N. Trefethen. Baltimore: Johns Hopkins Press.
Huff, Darrel 1954 How to Lie With Statistics. New York: Norton. → Also published in paperback edition.
Jackson, John H. (1931) 1958 Selected Writings of John Hughlings Jackson. Vol. 2. Edited by James Taylor. New York: Basic Books.
Kimball, A. W. 1957 Errors of the Third Kind in Statistical Consulting. Journal of the American Statistical Association 52: 133–142.
Koofman, B. O. 1956 Fallacies in Operations Research. Journal of the Operations Research Society of America 4: 422–426.
Lewis, D.; and Burke, C. J. 1949 The Use and Misuse of the Chi-square Test. Psychological Bulletin 46: 433–489. → Discussions of the article may be found in subsequent issues of this bulletin: 47:331-337, 338-340, 341-346, 347-355; 48:81-82.
Mckean, Ronald N. 1958 The Criterion Problem. Pages 25-49 in Ronald N. McKean, Efficiency in Government Through Systems Analysis. New York: Wiley.
Medawah, Peter B. 1960 The Future of Man. New York: Basic Books; London: Methuen.
Moroney, M. J. ri951) 1958 Facts From Figures. 3d ed., rev. Harmondsworth (England): Penguin.
Penfield, Wilder; and Roberts, Lamar 1959 Speech and Brain-mechanisms. Princeton Univ. Press.
Rowe, P. 1962 What the Dons Earn. The Sunday Times (London) October 21.
Slater, Eliot; and Beard, A. W. 1963 The Schizophrenia-like Psychoses of Epilepsy: Psychiatric Aspects. British Journal of Psychiatry 109: 95–112.
Starnes, Richard 1962 Age of Falsehood. Trenton Evening Times December 19.
Sterling, Theodore D. 1959 Publication Decisions and Their Possible Effects on Inferences Drawn From Tests of Significance–Or Vice Versa. Journal of the American Statistical Association 54: 30–34.
Thouless, Robert H. (1932) 1947 How to Think Straight.New York: Simon & Schuster. → First published as Straight and Crooked Thinking.
Wagemann, Ernst F. (1935) 1950 Narrenspiegel der Statistik; Die Umrisse eines statistischen Weltbildes. 3d ed. Salzburg (Austria): Verlag “Das Bergland-Buch.”
Wallis, W. Allen; and Roberts, Harry V. 1956 Statistics: A New Approach. G’encoe, 111.: Free Press. → An abridged paperback edition was published in 1965 by the Free Press.