views updated

# Student’s T-Statistic

BIBLIOGRAPHY

In measuring social and economic progression, statistical methods that take into account various magnitudes of uncertainties are used for inference and decision-making. One of the tasks in statistical inference is to estimate population mean from a sample of observations. For example, one may want to estimate the mean value of the retail price (μ ) of a gallon of unleaded gasoline based on prices from a few gas stations in a large city. A natural estimate of μ is the arithmetic sample mean . The estimator X ̂ has many desirable statistical properties, especially when are independent and identically distributed as a normal (Gaussian) random variable with mean μ and variance σ2. From properties of normal distribution, the following Z -statistic

is a standard normal random variable with mean μ = 0 and variance σ2 = 1. This Z -statistic has been used to make statistical inferences and decisions when variance σ2 is known or the number of observations n is large. However, in most applications, the variance σ2 is usually unknown and the number of observations n may be small. William Sealy Gosset (18761937), a chemist at the brewery of Arthur Guinness Sons and Co. (Boland 1984), studied the small sample property of

where is the sample variance. Gossets work resulted in the birth of Students T -statistic. Gosset published his work in Biometrika in 1908 under the pseudonym Student from his 1904 report to the company titled The Application of the Law of Error to Work of the Brewery. Gossets employer was against the work done for the company being made public but allowed him to publish it under a pseudonym. Gossets original statistic was

The Students T -statistic used in the current formulation in expression (2) is due to English statistician Ronald Aylmer Fisher (1925).

Students T statistic was one of the important breakthroughs in statistical sciences in the twentieth century (Kotz and Johnson 1992). The density function of the Students T -statistic with v (integer) degrees of freedom is

where Γ(a) is the Gamma function defined as

Density functions of the Z -statistic and Students T -statistic are bell-shaped curves with Students T having a heavier tail. The density function of Students T -statistic relative to the density function of the standard normal is shown in Figure 1.

The distribution of the Students T -statistic is the Students T -distribution and statistical tests based on the

Students T -statistic form various Students T -tests. The expectation (mean) of the Students T -statistic is 0, and the variance is v /(v 2) for v 2. When v = 1, the Students T -statistic becomes the Cauchy random variable, which does not have a variance. The Students T -statistic has infinite variance when v = 2. The degrees of freedom of the Students T -statistic defined in expression (2) are n 1.

Students T -statistic has been used for estimation and decision-making in many fields of science ranging from agriculture, biology, economics, public health, and zoology. As a simple illustration, to estimate the average unleaded retail gasoline price per gallon in a large city, the author of this entry conducted an informal random sample survey on his way (seventeen miles) home at the end of April in 2006. He observed prices of \$2.999, \$2.879, \$2.959, \$2.839, \$2.899, \$3.019, and \$2.919 without including the duplicates. The sample mean (X̄) of these seven observations was \$2.930. Under normality assumption of the gasoline price, the sample mean is usually a good point estimator of the underlying mean value μ. Because of the randomness of the sample, a better approach for statistical inference is to construct an interval estimator. A classic method is to find the 100(1 α)% confidence interval for the underlying mean gasoline price based on the Students T -statistic:

where t 1 - α/2, n - 1 is the critical value that satisfies

p (|T |t 1-α/2, n-1) = α

and α is the probability that the confidence interval does not cover the true underlying mean value μ. The critical values of Students T -statistic for various values of α and degrees of freedom are available in all statistical software packages and in most basic statistical books. A commonly used confidence is 95% (α = 0.05) for interval estimation. From our sample observations, the 95% confidence interval for μ is

which is (2.870, 2.990).

In addition to estimation, Students T -statistic is alsouseful for hypothesis testing. The hypothesis to be testedis the null hypothesis. For example, the null hypothesis may be H 0: μ = μ0 and the alternative hypothesis can be set as H α: μμ0 (two-sided test). One may commit twotypes of errors in testing statistical hypotheses. Rejectingthe null hypothesis when it is true is the Type I error andaccepting the null hypothesis when it is false is the Type IIerror. The probability of making Type I error is denoted by α which is the same value used above for constructinga confidence interval. The probability of making Type IIerror is denoted by β. Type I error and Type II error areinversely related. The power of the test is 1 β, which is the probability of rejecting the null hypothesis when it is false. The upper bound of α is the size of the test, and thecritical (rejection) region for a given Students T -test of size α with v degrees of freedom is {|T |t 1-α/2-v }. The p value is the probability of the test statistic as contradictory to the null hypothesis as the observed Students T -statistic. Using the above seven observations, we can conduct a two-sided Students T -test on the null hypothesis H 0: μ = 3.000 vs μ 3.000 by constructing the test statistic:

The absolute value of the observed Students T -statistic ǀTǀ > 2.447 = t 1 0.05/2,6. That is, the observed Students T -statistic (2.8493) is in the critical region, which leads to rejecting the null hypothesis. A very popular way of performing statistical hypothesis testing is by computing the p -value. The two-sided p -value from the above example is 2P (T 2.8493) = 0.0292, which is less than the commonly used size of 0.05. Therefore, one would reject the null hypothesis of μ = 3.000 and conclude that the underlying mean retail price for unleaded gasoline in that city was different from \$3.000 per gallon.

The above simple example on estimation and hypothesis testing is formulated based on a two-sided Students T -statistic. Similar estimation and hypothesis testing can be done for a one-sided test. In planning a scientific investigation, scientists need to decide how many samples are needed in order to control both Type I and Type II errors. Students T -statistic plays a fundamental role in designing scientific investigations. Alan Agresti and Barbara Finlay (1997) provide an introduction to statistical estimation and hypothesis testing.

Students T -statistic can be extended in many directions. For example, if Xi denotes the difference of the gasoline prices of a gas station at two different occasions, one can conduct a paired Students T -test for quantifying the changes. If one wants to make a statistical comparison of the gasoline prices in two cities, one may construct a two-sample Students T -statistic

where n 1, n 2 are the number of observations from sample (city) one and two, respectively, , , . If the two samples have equal variances, one can form the pooled Students t -statistic as

where

is the pooled sample variance. Under the null hypothesis of equal means of and (H 0: μ1 = μ2), the two-sample Students T -statistic in expression (7) follows the Students T -distribution with n 1 + n 2 2 degrees of freedom if and are independent and identically normally distributed.

For more advanced statistical inferences, such as in correlation analysis, linear regression, and generalized linear models, the estimators of the parameters are approximately of Students T -distribution. Hence, one can perform a statistical analysis on the estimated parameters based on Students T -statistic.

As the sample size increases, the degrees of freedom increase and the Students T -statistic converges to the standard normal random variable (Z ). Asymptotically, one may use the Z -test to replace the Students T -test when degrees of freedom are large (n 30). Students T -statistic is widely used for small sample analysis. One of the fundamental assumptions in deriving the distribution of Students T -statistic is the normality of . This assumption is not easy to check for a small sample size. Many nonparametric (distribution-free) procedures have been proposed for conducting statistical inference without assuming normality of (Hollander and Wolfe 1999). For multiple sample inferences, Fisher (1925) extended the two-sample Students T -test to analysis of variance (ANOVA) when are normally distributed. Without assuming normality, many rank-based techniques were developed for nonparametric versions of the two-sample Students T -test and Fishers ANOVA (Hollander and Wolfe 1999). A great many statistical methods have been invented in the twentieth century. Students T -statistic is one of the most widely used statistical tools not only by professional statisticians but also by all scientists involved in data analysis and decision making.

SEE ALSO Descriptive Statistics; Distribution, Normal; Hypothesis and Hypothesis Testing; Probability Distributions; Test Statistics

## BIBLIOGRAPHY

Agresti, Alan, and Barbara Finlay. 1997. Statistical Methods for the Social Sciences. 3rd ed. Upper Saddle River, NJ: Prentice Hall.

Boland, Philip J. 1984. A Biographical Glimpse of William Sealy Gosset. American Statistician 38: 179183.

Fisher, Ronald A. 1925. Statistical Methods for Research Workers. Edinburgh, U.K.: Oliver and Boyd.

Hollander, Myles, and Douglas A. Wolfe. 1999. Nonparametric Statistical Methods. 2nd ed. New York: Wiley.

Kotz, Samuel, and Norman L. Johnson, ed. 1992. Breakthroughs in Statistics: Methodology and Distribution. Vol. 2. New York: Springer.

Student (William Sealy Gosset). 1908. The Probable Error of a Mean. Biometrika 6 (1): 125.

Dejian Lai