William Sealy Gosset
Gosset, William Sealy
Gosset, William Sealy
The impact of W. S. Gosset (1876–1937) on the social sciences was entirely indirect. He was, however, one of the pioneers in the development of modern statistical method and its application to the design and analysis of experiments. He is far better known to the scientific world under the pseudonym of “Student” than under his own name. Indeed all his papers except one appeared under the pseudonym.
He was the son of Colonel Frederic Gosset of the Royal Engineers, the descendant of an old Huguenot family that left France after the revocation of the Edict of Nantes. Gosset was a scholar of Winchester—that is, a boy who was awarded a prize on the basis of a competitive examination to pay for part or all of his education—which shows that his exceptional mental powers had developed early. From Winchester he went, again as a scholar, to New College, Oxford, where he obtained first class degrees in mathematics and natural science.
On leaving Oxford in the autumn of 1899 he joined the famous brewing firm of Guinness in Dublin. He remained with Guinness all his life, ultimately becoming, in 1935, chief brewer at Park Royal, the firm’s newly established brewery in London.
At that time scientific methods and laboratory determinations were beginning to be seriously ap plied to brewing, and this naturally led Gosset to study error functions and to see the need for adequate methods to deal with small samples in exam ining the relations between the quality of the raw materials of beer, such as barley and hops, the conditions of production, and the finished article. The importance of controlling the quality of barley ultimately led him to study the design of agricultural field trials.
In 1904 he drew up for the directors the first report on “The Application of the Law of Error.” This emphasized the importance of the theory of probability in setting “an exact value on the results of our experiments; many of which lead to results which are probable but not certain.” He used only the classical theory of errors, such as is found in G. B. Airy’s On the Algebraical, and Numerical Theory of Errors of Observation (1861) and M. Merriman’s A Text-book on the Method of Least Squares (1884). But he observed that if X and Y are both measured from their mean, there are often considerable differences between Σ(X + Y)2 and Σ(X — Y)2; in other words, he was feeling his way toward the notion of correlation, although he had not yet heard of the correlation coefficient.
His first meeting with Karl Pearson took place in 1905, and in 1906/1907 he was sent for a year’s specialized study in London, where he worked at, or in close contact with, the biometric laboratory at University College.
Mathematical statistics . Gosset was once described by Sir Ronald Fisher as the “Faraday of statistics.” The comparison is apt, for he was not a profound mathematician but had a superb intuitive faculty that enabled him to grasp general principles and see their relevance to practical ends.
His first mathematical paper was “On the Error of Counting With a Haemacytometer” ( 1943, pp. 1-10); here he derived afresh the Poisson distribution as a limiting form of the binomial and fitted it to four series of counts of yeast cells. The derivation presented no particular difficulty (it had in fact been obtained before by several investigators), but it was characteristic of him to see immediately the correct method of dealing with a practical problem. One of these series has become world famous owing to its inclusion as an example in Fisher’s Statistical Methods for Research Workers (1925).
His next paper, “The Probable Error of a Mean” ( 1943, pp. 11-34), brought him more fame, in the course of time, than any other work that he did, for it provided the basis of Student’s t-test.
In his work at the brewery he had been struck by the importance of knowing the accuracy of the mean of a small sample. The usual procedure at the time was to compute the sample average and standard deviation, x̄ and s, and to proceed as if x̄ were normally distributed with the same mean as that of the population and with standard deviation s/√n, where n is sample size. The difficulty here is that s is a fallible estimate of the true population standard deviation. Gosset’s intuition told him that the usual procedure, based on large sample con siderations, would, for small samples, give a spuriously high impression of how accurately the population mean is estimated.
By a combination of exceptional clearheadedness and simple algebra, he obtained the first four moments of the distribution of s2. He then proceeded to fit the Pearson curve that has these moments. His results showed that the curve has to be of Type iii(essentially the gamma or χ2 distribution), and he found the distribution of s2 to be C(S2/σ2)(n-3)/2 exp [(- ns2/2σ2)]d(S2/σ2). He then showed that the correlation coefficient between x2 and s2 was zero, and assuming absolute independence (which does not necessarily follow but was true in this case), he deduced the probability distribution of z = (x̄ - μ)/s, where ft is the true mean. With a mere change of notation this is the t- distribution. Here s is denned to be (1/n)Σ(x - x̄)2 so that
He then checked the adequacy of this distribution by drawing 750 samples of 4 from W. R. Mac-donell’s data on the height and middle-finger length of 3,000 criminals and by working out the standard deviations of both variates in each sample (see Macdonell 1902). This he did by shuffling 3,000 pieces of cardboard on which the results had been written, possibly the earliest work in statistical research that led to the development of the Monte Carlo method.
Later in his paper on “Probable Error of a Correlation Coefficient” ( 1943, pp. 35-42), Gosset used the 750 correlation coefficients of the two variables. Here his remarkable intuition again led him to a correct answer. By correlating the height measurements of one sample with the middle-finger lengths of the next, he was able to obtain 750 values of r, the sample correlation coefficient, for which p, the true population correlation coefficient, was presumably zero. He noticed that the observed distribution of r was approximately rectangular. If it were a Pearson curve it would have to be Type ii, that is, C(1 — r2)λ and his result from the 750 samples suggested λ = k(n — 4). He guessed that k ½ and confirmed the result by taking 750 samples of 8 to which C(l — r2)2 gave an excellent fit. Six years later Fisher proved that all these brilliant conjectures for the distribution of s2, t, and r when p = 0 were indeed correct.
The correlation coefficient between the two measurements in the 3,000 criminals was 0.66. Gosset also examined two sets of 750 samples of sizes 4 and 8 and one set of 100 samples of 30, for which the true value must have been close to 0.66. He could see from his results that the standard deviation given by (1 - p2)√n was too small and that the distribution could not be of Pearson type except when p = 0. He succeeded in obtaining the exact repartition of r for any p in samples of 2, but the general solution for p ≠ 0 had to await the publication of Fisher’s famous paper in 1915.
Gosset’s fourth paper ( 1943, pp. 43-48) dealt with the distribution of the means of samples not drawn at random. His brewing experience had repeatedly drawn his attention to the fact that successive observations were not uncorrelated. Here he supposed a sample of n values to be drawn in such a way that the correlation between every pair of observations is the same (say p), so that p is effectively an intraclass correlation. He used the algebraical methods of his second paper to determine the first four moments of the mean, in this case employing as an illustration some data published by Greenwood and White (1909) in which 2,000 phagocytic counts had been grouped in samples of 25. Both the original counts and the distribution of means could be fitted by Pearson Type i (Beta) curves. However, the observed values of β, and (β, — 3) for the distribution of means were bigger than would have been anticipated if the usual theory for independent observations had been valid. The modified theory produced much better agreement.
Gosset published five more mathematical papers between 1909 and 1921 ([1913; 1914; 1917; 1919; 1921] 1943, pp. 53-89). With the possible exception of the first of these, they are still of interest. The 1921 paper gave for the first time the correction for ties in calculating Spearman’s rank correlation coefficient.
Agricultural and other biometric studies. It was natural, owing to the high importance of barley quality in brewing, that Gosset should have become interested in agricultural problems. His active interest seems to have started in 1905 when he was first asked for advice by E. S. Beaven, a maltster who had started experimental work in the 1890s. From then onward there was a constant interchange of correspondence and ideas between them, in which the mathematical insight of the younger man supplemented the experimental experience of the older.
Gosset’s first meeting with Fisher was at Roth amsted in August 1922; each had the greatest admiration for the other’s work and doubtless each had considerable influence on the development of the other’s ideas on experimental design. Toward the end of Gosset’s life they had a difference of opinion about the relative methods of random and systematic arrangements, but this did not affect the high regard that they always had for one another.
In 1911 Gosset examined the results of some uniformity trials carried out by Mercer and Hall at Rothamsted ( 1943, pp. 49-52). In the most important of these, an acre of wheat had been harvested in 1/500-acre plots. Gosset showed from the results how advantage could be taken of the correlation between the yields of adjacent plots to increase the accuracy of varietal comparisons, and he showed that for a given acreage greater accuracy could be obtained with smaller plots rather than with larger plots.
As early as 1912 and 1913 Beaven had invented the “chessboard” design, and experiments had been laid down, each with eight varieties of barley on yard-square plots, in three centers. These were essentially “block designs,” with each variety occur ring once in each block; but within the block, the arrangement was balanced rather than random. At this time Gosset discovered the correct estimate of error per plot for the varietal comparisons, precisely the same result as would be obtained from an analysis of variance. He compared every pos sible pair of varieties and calculated for each pair Σ(d — d̄)2, d being the difference in one block. He added these results together for all n varieties and divided by ½n(n - 1)(m — 1), where m is the number of blocks. These experiments were discontinued during World War i, but in 1923 Gosset and Fisher discovered, independently, the analysis of variance method of obtaining the result. In a letter to Gosset, Fisher proved the algebraical equivalence of Gosset’s original method and the new one.
These chessboard designs were small-scale work. For field trials, Gosset and Beaven favored the “half-drill strip method,” in which two varieties were compared on an area of about an acre. In this method, the two varieties are sown in long strips— CAACCAAC, etc.—there being an integral number of “sandwiches” (such as CAAC).
The error of the varietal comparison was obtained from the variances of the differences (C — A) either in individual strips or in sandwiches. In one such experiment, described by Gosset, on something more than an acre, the standard error of a varietal mean was found to be about 0.6 per cent. Gosset was later criticized by Fisher for preferring this method to randomized strips or randomized sandwiches. Gosset welcomed the advances in the science of agricultural experimentation that came from Fisher and his school. His own attitude was a very practical one, based on his extensive experience in Ireland experimenting with barley.
A good account of much of this kind of work is given in Gosset’s most important paper on agricultural experimentation, “On Testing Varieties of Cereals” ( 1943, pp. 90-114). The paper also describes some large-scale work carried out by the department of agriculture in Ireland during 1901-1906 to find the best variety of barley to grow in that country. Here two varieties, Archer and Goldthorpe, were carried right through the whole period and each tested on two-acre plots in a large number of centers. With 50 pairs of plots of this size, the standard error of the comparison was still about 10 per cent to 15 per cent. However, the result was based on wide experience. In the half-drill strip experiment the corresponding standard error was only 1 per cent, but the result applied only to an acre, in one place, under very particular conditions of soil and season.
While it was important to plan yield trials in such a way as to reduce experimental error and to obtain an accurate estimate of it, it was only by comparison and analysis of the results from a number of soils, seasons, and climates that one could judge the relative value of different varieties or different treatments. Further, products must also be subjected to tests of quality. Conclusions drawn in one center could in any case be applicable only to the particular conditions under which the trials were carried out. While he insisted that “experiments must be capable of being considered to be a random sample of the population to which the conclusions are to be applied,” in an individual center he often preferred balanced (that is, systematic) arrangements to randomized ones. He liked the Latin square, because of its combination of balance (to eliminate soil heterogeneity) with a random element, thus conforming to all the principles of allowed witchcraft ( 1943, pp. 199-215). He was less happy about randomized blocks because he felt that a balanced arrangement within the blocks often gave a greater accuracy than did a random one. Further, he was unwilling to accept the result of the toss of a coin, or its equivalent, if the arrangement so obtained was biased in relation to already available knowledge of the fertility gradients of the experimental area. In his last paper, “Comparison Between Balanced and Ran dom Arrangements of Field Plots” ( 1943, pp. 193-215), he wrote:
It is of course perfectly true that in the long run, taking all possible arrangements, exactly as many mis leading conclusions will be drawn as are allowed for in the tables, and anyone prepared to spend a blameless life in repeating an experiment would doubtless confirm this; nevertheless it would be pedantic to con tinue with an arrangement of plots known before hand to be likely to lead to a misleading conclusion. (p. 202)
He thought that an experimenter with a knowledge of his job could arrange the treatments within a block so that real error, that is, the variance of the different treatment means that would be obtained with dummy treatments in a uniformity trial, would be less than if the treatments had been randomized. This statement was no doubt often true in the domain in which he worked, but its general validity has often been questioned. He distinguished between the real error as here defined and the calculated error, that is, the error variance of the treatment mean, that would be obtained from usual analysis of variance procedures. He maintained, perfectly correctly, that if the real error were reduced by balancing, the calculated error would be too high. In his last paper, he showed, in addition, that in this situation experiments that have a real error less than the calculated one fail to give as many “significant” results as those that have a greater error, if the real treatment differences are small. When, however, the real treatment differences are large, the reverse is the case. There fore, if balanced arrangements have a small real error, they will less often miss large real differ ences and more often miss small ones. He regarded this as a positive advantage; where real differences in a particular center were small, he was satisfied to have an upper limit to his error because he thought that only by collating results from different centers could he arrive at the truth. Where real differences were small, even if statistically significant, the results at different centers were likely to be conflicting.
This last paper was written in reply to one by Barbacki and Fisher (1936), which purported to show that the half-drill strip method is less accurate than the corresponding randomized arrangement. Gosset was right in maintaining that these authors were in error, for they had not compared like with like in the actual data they had examined —a uniformity trial carried out by Wiebe (1935). However the data were not very good for deciding the question, for as subsequently shown by Yates (1939), owing to defective drilling they contained a periodic fluctuation, two drill-widths wide. Gosset would almost certainly have welcomed the combination of balance and randomization achieved by some of the designs invented since his day, which are likely to give a gain in accuracy similar to that obtained by his systematic designs over randomized blocks and at the same time are free from difficul ties in error estimation.
In an article on Gosset, Sir Ronald Fisher praised “Student’s” work on genetical evolutionary theory (see Gosset [1907-1938] 1943, pp. 181-191). He concluded: “In spite of his many activities it is the ‘Student’ of ’Student’s’ test of significance who has won, and deserved to win, a unique place in the history of scientific method” (Fisher 1939, p. 8).
J. O. Irwin
[For the historical context of Gosset’s work, seeDistributions, Statistical; Statistics, article onThe History of Statistical Method; and the biographies ofFisher, R. A.; and Pearson. For discussion of the subsequent development of his ideas, seeEstimation; Experimental Design; Hypothesis Testing.]
(1907–1938) 1943 “Student’s” Collected Papers. Edited by E. S. Pearson and John Wishart. London: Univer sity College, Biometrika Office. → William S. Gosset wrote under the pseudonym “Student.” The 1943 edition contains all the articles cited in the text.
Airy, GeorgeB. (1861) 1879 On the Algebraical and Numerical Theory of Errors of Observations and the Combination of Observations. 3d ed. London: Mac-millan.
Barbacki, S.; and Fisher, R. A. 1936 A Test of the Supposed Precision of Systematic Arrangements. An nals of Eugenics 7:189–193.
Fisher, R. A. (1925) 1958 Statistical Methods for Research Workers. 13th ed. New York: Hafner. → Previ ous editions were published by Oliver & Boyd.
Fisher, R. A. 1939 “Student.” Annals of Eugenics 9: 1–9.
Greenwood, M. Jr.; and White, J. D. C. 1909 On the Frequency Distribution of Phagocytic Counts. Bio-metrika 6:376–401.
Macdonell, W. R. 1902 On Criminal Anthropometry and the Identification of Criminals. Biometrika 1: 177–227.
Merriman, Mansfield (1884)1911 A Text-book on the Method of Least Squares. 8th ed. New York: Wiley.
Wiebe, G. A. 1935 Variation and Correlation in Grain Yield Among 1,500 Wheat Nursery Plots. Journal of Agricultural Research 50:331–357.
Yates, F. 1939 The Comparative Advantages of Systematic and Randomized Arrangements in the Design of Agricultural and Biological Experiments. Biometrika 30:440–466.
Gosset, William Sealy
Gosset, William Sealy
(also “Student ”)
(b. Canterbury, England, 13 June 1876; d. Beaconsfield, England, 16 October 1937)
The eldest son of Col. Frederic Gosset and Agnes Sealy, Gosset studied at Winchester College and New College, Oxford. He read mathematics and chemistry and took a first-class degree in natural sciences in 1899. In that year he joined Arthur Guinness and Sons, the brewers, in Dublin. Perceiving the need for more accurate statistical analysis of a variety of processes, from barley production to yeast fermentation, he urged the firm to seek mathematical advice. In 1906 he was therefore sent to work under Karl Pearson at University College, London. In the next few years Gosset made his most notable contributions to statistical theory, publishing under the pseudonym “Student” He remained with Guinness throughout his life, working mostly in Dublin, although he moved to London to take charge of a new brewery in 1935. He married Marjory Surtees in 1906; they had two children.
All of Gosset’s theoretical work was prompted by practical problems arising at the brewery. The most famous example is his 1908 paper, “The Probable Error of a Mean,” He had to estimate the mean value of some characteristic in a population on the basis of very small samples. The theory for large samples had been worked out from the time of Gauss a century earlier, but when in practice large samples could not be obtained economically, there was no accurate theory of estimation. If an n fold sample gives values X1X2…Xn the sample mean
is used to estimate the true mean. How reliable is the estimate? Let it be supposed that the characteristic of interest is normally distributed with unknown mean μ and variance σ2. The sample variance is
It was usual to take s as an estimate of σ; if it is assumed that σ=s, then for any error e, the probability that ǀm-μǀ≤ e can be computed; and thus the reliability of the estimate of the mean can be assessed. But if n is small s is an erratic estimator of σ; and hence the customary measure of accuracy is invalid for small samples.
Gosset analyzed the distribution of the statistic z=(m - μ)/s. This is asymptotically normal as n increases but differs substantially from the normal for small samples. Experimental results m ands map possible values of z onto possible values of μ. Through this mapping a probability, that ǀx-μǀ≤e is obtained. In particular for any large probability. Say 95 percent, Gosset could compute an error e such that it is 95 percent probable that ǀx - μǀ≤e.
R. A. Fisher observed that the derived statistic t= (n -, 1)½ Z can be computed for all n more readily than z can be. What came to be called Student’s t-test of statistical hypotheses consists in rejecting a hypothesis if and only if the probability, derived from t. of erroneous rejection is small. In the theory of testing later advanced by Jerzy. Neyman and Egon S. Pearson, Student’s t-test is shown to be optimum. In the competing theory of fiducial probability advanced by R. A. Fisher, t is equally central.
Gosset was perhaps lucky that he hit on the statistic which has proved basic for the statistical analysis of the normal distribution. His real insight lies in his observation that the sampling distribution of such statistics is fundamental for inference. In particular, it paved the way for the analysis of variance, which was to occupy such an important place in the next generation of statistical workers.
Gosset’s “Student’s” Collected Papers were edited by E. S. Pearson and John Wishart (Cambridge-London, 1942; 2nd ed., 1947).
For further biography, consult E. S. Pearson, “Student as Statistician,” in Biometrika. 30 (1938), 210–250; and “Studies in the History of Probability and Statistics, XVII,” ibid,54 (1967), 350–353; and “...XX,” ibid, 55 (1968). 445–457.