Chi-Square Test
CHI-SQUARE TEST
Studies often collect data on categorical variables that can be summarized as a series of counts. These counts are commonly arranged in a tabular format known as a contingency table. For example, a study designed to determine whether or not there is an association between cigarette smoking and asthma might collect data that could be assembled into a 2−2 table. In this case, the two columns could be defined by whether the subject smoked or not, while the rows could represent whether or not the subject experienced symptoms of asthma. The cells of the table would contain the number of observations or patients as defined by these two variables.
The chi-square test statistic can be used to evaluate whether there is an association between the rows and columns in a contingency table. More specifically, this statistic can be used to determine whether there is any difference between the study groups in the proportions of the risk factor of interest. Returning to our example, the chi-square statistic could be used to test whether the proportion of individuals who smoke differs by asthmatic status.
The chi-square test statistic is designed to test the null hypothesis that there is no association between the rows and columns of a contingency table. This statistic is calculated by first obtaining for each cell in the table, the expected number of
Table 1
| Observed values for data presented in a two-by-two table |
| source: Courtesy of author. |
| Variable 2 |
Variable 1 |
Total |
| Yes |
No |
| Yes |
a |
b |
a+b |
| No |
c |
d |
c+d |
| Total |
a+c |
b+d |
n |
events that will occur if the null hypothesis is true. When the observed number of events deviates significantly from the expected counts, then it is unlikely that the null hypothesis is true, and it is likely that there is a row-column association. Conversely, a small chi-square value indicates that the observed values are similar to the expected values leading us to conclude that the null hypothesis is plausible. The general formula used to calculate the chi-square (X 2) test statistic is as follows:
where O = observed count in category; E = expected count in the category under the null hypothesis; df = degrees of freedom; and c, r represent the number of columns and rows in the contingency table.
The value of the chi-square statistic cannot be negative and can assume values from zero to infinity. The p-value for this test statistic is based on the chi-square probability distribution and is generally extracted from published tables or estimated using computer software programs. The p-value represents the probability that the chi-square test statistic is as extreme as or more extreme than observed if the null hypothesis were true. As with the t and F distributions, there is a different chi-square distribution for each possible value of degrees of freedom. Chi-square distributions with a small number of degrees of freedom are highly skewed; however, this skewness is attenuated as the number of degrees of freedom increases. In general, the degrees of freedom for tests of hypothesis that involve an r×c contingency table is
Table 2
| Expected values for data presented in a two-by-two table |
| source: Courtesy of author. |
| Variable 2 |
Variable 1 |
Total |
| Yes |
No |
| Yes |
(a+b)(a+c)/n |
(a+b)(b+d)/n |
a+b |
| No |
(c+d)(a+c)/n |
(c+d)(b+d)/n |
c+d |
| Total |
a+c |
b+d |
n |
equal to (r7minus;1)×(c−1); thus for any 2×2 table, the degrees of freedom is equal to one. A chi-square distribution with one degree of freedom is equal to the square root of the normal distribution, and, consequently, either the chi-square or standard normal table can be used to determine the corresponding p-value.
The chi-square test is most widely used to conduct tests of hypothesis that involve data that can be presented in a 2×2 table. Indeed, this tabular format is a feature of the case-control study design that is commonly used in public health research. Within this contingency table, we could denote the observed counts as shown in Table 1. Under the null hypothesis of no association between the two variables, the expected number in each cell under the null hypothesis is calculated from the observed values using the formula outlined in Table 2.
The use of the chi-square test can be illustrated by using hypothetical data from a study investigating the association between smoking and asthma among adults observed in a community health clinic. The results obtained from classifying 150 individuals are shown in Table 3. As Table 3 shows, among asthmatics the proportion of smokers was 40 percent (20/50), while the corresponding proportion among asymptomatic individuals was 22 percent (22/100). By applying the formula presented in Table 2, for the observed cell counts of 20, 30, 22, and 78 (Table 3) the corresponding expected counts are 14, 36, 28, and 72. The observed and expected counts can then be used to calculate the chi-square test statistic as outlined in Equation 1. The resulting value of the chi-square
Table 3
| Hypothetical data showing chi-square test |
| source: Courtesy of author. |
| Symptoms of asthma |
Ever smoke cigarettes |
Total |
| Yes |
No |
| Yes |
20 |
30 |
50 |
| No |
22 |
30 |
100 |
| Total |
42 |
108 |
150 |
test statistic is approximately 5.36, and the associated p-value for this chi-square distribution that has one degree of freedom is 0.02. Therefore, if there was truly no association between smoking and asthma, there is a 2 out of 100 probability of observing a difference in proportions that is at least as large as 18 percent (40%–22%) by chance alone. We would therefore conclude that the observed difference in the proportions is unlikely to be explained by chance alone, and consider this result statistically significant.
Because the construction of the chi-square test makes use of discrete data to estimate a continuous distribution, some authors will apply a continuity correction when calculating this statistic. Specifically,
where Oi−Ei is the absolute value of the difference between Oi and Ei and the term 0.5 in the numerator is often referred to as Yates correction factor. This correction factor serves to reduce the chi-square value, and, therefore, increases the resulting p-value. It has been suggested that this correction yields an overly conservative test that may fail to reject a false null hypothesis. However, as long as the sample size is large, the effect of the correction factor is negligible.
When there is a small number of counts in the table, the use of the chi-square test statistic may not be appropriate. Specifically, it has been recommended that this test not be used if any cell in the table has an expected count of less than one, or if 20 percent of the cells have an expected count that is greater than five. Under this scenario, the Fisher's exact test is recommended for conducting tests of hypothesis.
Paul J. Villeneuve
(see also: Normal Distributions; Probability Model; Sampling; Statistics for Public Health; T-Test )
Bibliography
Cohran, W. G. (1954). "Some Methods for Strengthening the Common X 2 Test." Biometrics 10:417–451.
Grizzle, J. E. (1967). "Continuity Correction in the X2 Test for 2×2 Tables." The American Statistician 21:28–32.
Pagano, M., and Gauvreau, K. (2000). Principles of Biostatistics, 2nd edition. Pacific Grove, CA: Duxbury Press.
Rosner, B. (2000). Fundamentals of Biostatistics, 5th edition. Pacific Grove, CA: Duxbury Press.
Cite this article
Pick a style below, and copy the text for your bibliography.
|
Temperature Inversions in the Subarctic North Pacific
Magazine article from: Journal of Physical Oceanography; 12/1/2005; ; 700+ words
; ...masses including temperature minima at the top of the inversion are more directly...maxima, and of temperature inversions, is poorly understood...examined the number of inversion layers in a single...that temperature inversions occur over the...
|
|
Characteristics of Satellite-Derived Clear-Sky Atmospheric Temperature Inversion Strength in the Arctic, 1980-96
Magazine article from: Journal of Climate; 10/1/2006; ; 700+ words
; ...level atmospheric temperature inversion is a dominant feature...distribution of temperature inversions. Satellite-borne...Knowledge of inversion characteristics is...characteristics of polar temperature inversions based on radiosonde...However, temperature ...
|
|
A quality-control procedure for surface temperature and surface layer inversion in the XBT data archive from the Indian Ocean
Magazine article from: Journal of Atmospheric and Oceanic Technology; 7/1/1999; ; 700+ words
; ...erroneous bucket temperatures results in poor...taking 5-m XBT temperature as an extrapolation...al. 1993). Inversion-like features...and Oyashio, temperature inversions are not unusual...temperature inversions were reported...to find out an inversion filter ...
|
|
Detection and Analysis of Clear-Sky, Low-Level Atmospheric Temperature Inversions with MODIS
Magazine article from: Journal of Atmospheric and Oceanic Technology; 12/1/2003; ; 700+ words
; ...by temperature inversions throughout most...level temperature inversions using the Moderate...relationship between the inversion strength, defined as the temperature difference across...between brightness temperatures in the 7.2...80-0.97. Inversion depth can be...With ...
|
|
Minimum Temperatures, Diurnal Temperature Ranges, and Temperature Inversions in Limestone Sinkholes of Different Sizes and Shapes
Magazine article from: Journal of Applied Meteorology; 8/1/2004; ; 700+ words
; ...ranges, temperature inversion strengths, and vertical temperature gradients. Data...finding is that temperatures were similar in...e., nighttime temperature inversions) that form regularly...cover, found air temperatures as low as -30...impressive 20C inversion over a ...
|
|
Wintertime evolution of the temperature inversion in the Colorado Plateau basin
Magazine article from: Journal of Applied Meteorology; 8/1/1999; ; 700+ words
; ...persistent potential temperature inversions form in this basin...dissipation of these inversions are investigated using...90. In winter, inversion evolution is forced...induced nocturnal inversion is destroyed nearly...pools or temperature inversions in basins and valleys...
|
|
Temperature inversions put a lid on air pollution
Newspaper article from: The Milwaukee Journal Sentinel; 10/24/1998; ; 428 words
; ...With plenty of sunshine today and temperatures in the 60s, be sure to get out and...aloft can produce what is called a temperature inversion. This is a situation where it actually...as you climb in the atmosphere. Inversions trap air pollution. When air tries...flattening out or even falling ...
|
|
A Look at the Surface-Based Temperature Inversion on the Antarctic Plateau
Magazine article from: Journal of Climate; 6/1/2005; ; 700+ words
; ...relationships of temperature and inversion strength in the lowest...are found in the temperature versus longwave flux...determined. The minimum temperatures and the maximum inversions tend to occur not...likely due to the inversion wind. This inversion...northeast quadrant. ...
|
|
Climatological Description of Seasonal Variations in Lower-Tropospheric Temperature Inversion Layers over the Indochina Peninsula
Magazine article from: Journal of Climate; 7/1/2006; ; 700+ words
; ...understand the behavior of temperature inversion layers, that is...inversion layers (hereafter inversion layers) frequently develop...the rainy season, but the inversion layers have not been systematically studied. These inversion layers are considerably higher than those of the ...
|
|
Haze attributed to temperature inversion
Newspaper article from: Tribune-Review/Pittsburgh Tribune-Review; 7/8/2006; ; 420 words
; ...Hannastown south through Greensburg and Youngwood was probably a temperature inversion. "It occurs when the ground and lower air are significantly...where that haze was reported. It all contributes to an inversion," Humphreys said.
|
|
temperature inversion
Book article from: The Columbia Encyclopedia, Sixth Edition
temperature inversion condition in which the temperature of the atmosphere increases...reach the warmer, less dense air in the upper layers of a temperature inversion. During a temperature inversion, air pollution released into the atmosphere...
|
|
Atmospheric Inversion Layers
Book article from: World of Earth Science
...atmosphere in which an increase in temperature , humidity , or precipitation...be expected, there is an inversion, or reversal. An atmospheric inversion most commonly refers to temperature inversion where the temperature increases...hundred meters of the ground. Inversions can ...
|
|
inversion
Book article from: The Oxford Pocket Dictionary of Current English
...of being inverted: the inversion of the normal domestic...alternating current. 2. (also temperature inversion or thermal inversion...normal decrease of air temperature with altitude, or of water temperature with depth. ∎...
|
|
radiation inversion
Book article from: A Dictionary of Ecology
radiation inversion A temperature inversion in the lower atmosphere which is due to radiation cooling of the ground at night. See also radiation night .
|
|
surface inversion
Book article from: A Dictionary of Ecology
surface inversion A temperature inversion in the lower atmospheric layers, extending upwards from the Earth's surface. The condition results, for example, from radiation cooling of the ground and the air above, or from advection of warm air over cold surfaces.
|