Stratification of Data

views updated

STRATIFICATION OF DATA

In public health, "stratification" is defined as the process of partitioning data into distinct or nonoverlapping groups. These distinct groups can represent, among other things, treatment regimens, geographical regions, or study centers. Although this definition is seemingly straightforward, stratification is a term that can be used to characterize either the design of a study (e.g., stratified sampling), or alternatively, an analytic approach (stratified analysis) that can be applied to data that has already been collected. In both cases, stratification is used because the study population consists of subpopulations or subdomains that are of particular interest to the researcher.

Stratified sampling is an approach used to ensure that an adequate number of individuals or entities are sampled so that comparisons of a parameter of interest can be made between two or more groups (strata) within a population. For example, a social worker may be interested in comparing the prevalence of drug use between those who live on the streets and the general population. This could be evaluated by sampling a sufficient number of individuals in each of the two groups and performing the relevant statistical test to determine whether the prevalence of drug use was equal in these two groups. In such a study, sampling is best approached using a stratified design because different recruitment strategies are needed to collect data from these two groups. That is, it is unlikely that homeless individuals could be contacted using telephone or voting lists. In some cases, data can only be divided into strata after it has been collected—this technique is referred to as poststratificition. For example, a health professional might be interested in determining differences in cigarette smoking between male and female smokers in order to devise a program to reduce smoking in teenagers. Such information could readily be extracted from an existing health survey that collected information on the smoking habits of both sexes.

As described at the outset, stratification is an important analytic tool in studies of public health. As an illustration, investigators conducting a study of smoking and lung cancer may elect to stratify study participants by gender in order to determine whether the females are more susceptible to the effects of smoking than males. Notable differences in the risk of lung cancer due to smoking between men and women would indicate that gender modifies the effect of cigarette smoking on the risk of developing lung cancer. In this instance, the terms "effect-modifier" or "interaction variable" could be applied to describe the role of gender on the relation between smoking and lung cancer.

Stratified analysis can also be used to assess whether the variable upon which the strata are based confounds the relationship between the outcome and the factor of primary interest. A confounding variable is a factor that is associated with both the factor of primary interest and the outcome under study. The inability to control for a confounding variable will bias inferences drawn between the factor of the primary interest and the outcome variable. With stratified analysis, if the overall effect using missing data pooled from the different strata is approximately equal to the stratum-specific estimates of effect, this indicates that the stratification variable does not confound the result. Alternatively, if the stratum-specific estimates of risk are similar to each other, yet different from the risk estimate based using the entirety of the data, then this indicates that the stratification variable is a confounder.

Stratification can also be used within the context of a randomized control trial. For example, in some clinical studies, patients may be divided into subgroups (strata) based on factors that are thought to be related to outcome. Within each strata, patients could randomly be assigned to different treatment groups (e.g., placebo or treatment). This analytic approach would permit the effectiveness of the different treatments to be compared within each stratum, while also ensuring the treatment and control groups are similar with respect to the postulated risk factors upon which the stratification was based.

Paul J. Villeneuve

(see also: Sampling; Statistics for Public Health; Survey Research Methods )