significance tests

views updated May 29 2018

significance tests A variety of statistical techniques are used in empirical social research in order to establish whether a relationship between sample variables may be inferred to apply to the population from which the sample was drawn. These techniques assess the rarity, unusualness, or unexpectedness of the results obtained. Significance tests form the central core of statistical inference: they are the technique of analysis by which statistical inferences concerning the relationship between two or more variables in a sample can be generalized to a population. They are therefore subject to the same conditions concerning sampling procedures as apply to all statistical inference.

The starting-point is the null hypothesis, which states that interesting differences in the research results are due to sampling error, rather than to genuine differences: in other words there is no relationship in the population between the variables under test. In a simple test of the relationship between two dichotomous variables from a sample it is then established how similar the category proportions are in relation to the size of the standard error. The standard error is itself calculated in relation both to the sample size and to the variability of the dependent variable in the general population. The significance of the difference between the variable proportions in the sample with respect to the standard error is then calculated and assessed in relation to certain predetermined levels of significance. These depend upon the level of error which is deemed acceptable in drawing inferences about the relationship between the variables in the population based on that in the sample. The null hypothesis is rejected if a significance test statistic lies within a range which has a very small probability of occurring as deduced from theoretical distributions such as the normal distribution (the so-called region of rejection).

Significance tests offer various levels of significance or confidence: they state that a particular statistical result would only occur by chance less than one in a thousand times (the 0.01 level), less than one in a hundred times (the 0.1 level), or less than one in twenty times (the 0.5 level). Thus, for example, we can choose to say with a 5 per cent probability of error within what range the difference between the variable proportions in the population will fall. Or we can choose to reduce our probability of error to 1 per cent—which will have the effect of increasing the range within which we expect to find the difference in variable proportions within the population. The former is referred to as the 95 per cent level of significance and the latter as the 99 per cent level of significance. Depending on the degree of certainty required by the researcher, results at any of these levels might be accepted as significant, but the ‘less than one chance in a thousand’ level is regarded as the safest assurance that research results would so rarely arise by chance that they must be true reflections of the real world.

Having established the level of significance in which he or she is interested, the researcher can then consult published tables to establish (or the computer will produce) the maximum value of the difference between the sample proportions, in relation to the standard error, which needs to be exceeded to arrive at a statistically significant difference: that is, a sample difference which is large enough to suggest that it may also apply to the population. If this value is exceeded, one may conclude that there is evidence to reject the null hypothesis of no difference between the proportions in the population, and that the chance of being wrong in arriving at this conclusion equals the probability of error at whatever level of significance has been chosen (usually 5 per cent or 1 per cent).

There are two major types of significance tests belonging to two major branches of statistics—parametric and non-parametric statistics. In parametric statistics assumptions are made about the underlying distributional form of the variables in the population. Examples of parametric tests of significance are the SND tests (z-score tests) and the t-test. The best-known example of a non-parametric test of significance is the Mann-Whitney U-Test.

There are a large number of tests, for the testing of means, proportions, variances, correlations, and goodness of fit; there are tests for one-sample studies, two-sample studies, and multiple sample studies; and there are tests for nominal, ordinal, and interval scales. Those used by sociologists are provided by the SPSS package, but a textbook should be consulted to ensure a test appropriate to the data-set is chosen. The two best-known tests are probably the chi-square, which makes only some very simple assumptions about underlying distributional form, and is suitable for simple nominal variables; and Spearman's rank correlation coefficient, one of the earliest to be developed, and the most widely used for variables consisting of ordinal or interval scales.

Statistical significance is not the same thing as the substantive importance of a research finding, which is determined by theory, policy perspectives, and other considerations. And although there is a relationship between the statistical significance and the size (or strength) of an association, correlation, difference between samples, and so on, there is no simple equivalence between the two. In effect, a research finding may be trivially small and relate to an unimportant subject, but still attain statistical significance. Consequently, some critics have argued that tests of statistical significance are often used unthinkingly and wrongly, and given undue weight in reports on research findings.

significance test

views updated May 17 2018

significance test A statistical procedure whereby a quantity computed from data samples is compared with theoretical values of standard probability distributions. Formally it is a comparison between a null-hypothesis, H0 (for example that there is no difference between the means of two populations), and an alternative hypothesis, H1, (that a real difference exists). If H0 is assumed to be true, the probability distribution of the test statistic can be computed or tabulated. If the test statistic exceeds the critical value corresponding to a probability level of α per cent, the null-hypothesis is rejected at the α per cent significance level. The most commonly used levels of significance are 5%, 1%, and 0.1%. Care must be taken to specify exactly what alternative hypothesis is being tested. Tests involving both tails of the probability distribution are known as two-tailed tests; those involving only one tail are one-tailed tests. See also analysis of variance, goodness-of-fit tests, Student's t distribution, chi-squared distribution, multiple-range tests.