Confounding, Confounding Factors
CONFOUNDING, CONFOUNDING FACTORS
The word confounding has been used to refer to at least three distinct concepts. In the oldest and most widespread usage, confounding is a source of bias in estimating causal effects. This bias is sometimes informally described as mixing of effects of extraneous factors (called confounders) with the effect of interest. This usage predominates in nonexperimental research, especially in epidemiology and sociology. In a second and more recent usage originating in statistics, confounding is a synonym for change in an effect measure upon stratification or adjustment for extraneous factors (a phenomenon called noncollapsibility or Simpson's paradox). In a third usage, originating in the experimentaldesign literature, confounding refers to inseparability to main effects and interactions under a particular design. The three concepts are closely related and are not always distinguished from one another. In particular, the concepts of confounding as a bias in effect estimation and as noncollapsibility are often treated as equivalent, even though they are not. Only the former concept will be described here.
CONFOUNDING AS A BIAS IN EFFECT ESTIMATION
A classic discussion of confounding in which explicit reference is made is to "confounded effects" is found in John Stuart Mill's A System of Logic, although Mill lays out the primary issues and acknowledges Francis Bacon as a forerunner in dealing with them. Mill lists a requirement for experiment intended to determine causal relations: " …none of the circumstances [of the experiment] that we do know shall have effects susceptible of confounded with those of the agents whose properties we with to study [emphasis added]."
In Mill's time, the world experiment referred to an observation in which some circumstances were under the control of the observer, as it still is used in ordinary English, rather than to the notion of a comparative trial. Nonetheless, Mill's requirement suggests that a comparison is to be made between the outcome of one's "experiment" (which is essentially, an uncontrolled trial) and what one would expect the outcome to be if the agents one wished to study had been absent. If the outcome is not as one would expect in the absence of the study agents, then Mill's requirement ensures that the unexpected outcome was not brought about by extraneous "circumstances" (factors). If, however, those circumstances do bring about the unexpected outcome, and that outcome is mistakenly attributed to effects of the study agents, then the mistake is one of confounding (or confusion) of the extraneous effects with the agent effects.
Much of the modern literature follows the same informal conceptualization give by Mill. Terminology is now more specific, with "treatment" used to refer to an agent administered by the investigator and "exposure" often used to denote an unmanipulated agent. The chief development beyond Mill is that the expectation for the outcome in the absence of the study exposure is now almost always explicitly derived from observation of a control group that is untreated or unexposed. For example, D. Clayton and M. Hills (1993) state of observational studies:
there is always the possibility that an important influence on the outcome … differs systematically between the comparison [exposed and unexposed] groups. It is then possible [that] part of the apparent effect of exposure is due to these differences, [in which case] the comparison of the exposure groups is said to be confounded [emphasis in the original].
In fact, confounding is also possible in randomized experiments owing to systematic improprieties in treatment allocation, administration, and compliance. A further and somewhat controversial point that confounding (as per Mill's original definition) can also occur perfect randomized trials due to random differences between comparison groups.
THE COUNTERFACTUAL APPROACH
Various mathematical formalizations of confounding have been proposed for use in statistical analyses. Perhaps the one closest to Mill's concept is based on the counterfactual model for casual effects. Suppose one wishes to consider how a healthstatus (outcome) measure of a population would change in response to an intervention (population treatment). More precisely, suppose one's objective is to determine the effect that applying a treatment x _{1} had or would have an outcome measure µ relative to applying treatment x _{0} to a specific target population A. For example, A could be a cohort of breastcancer patients, treatment x _{1} could be a new hormone therapy, x _{0} could be a placebo therapy, and the measure µ could be a fiveyear survival probability. The treatment x _{1} is sometimes called the index treatment; and x _{0} is sometimes called the control or reference treatment (which if often a standard or placebo treatment).
The counterfactual model posits that, in population A, µ will equal µ_{A1} if x _{1} is applied, µ_{A0} is applied; the casual effect of x _{1} relative to x _{0} is defined as the change from µ_{A0} to µ_{A1}, which might be measured as µ_{A1} − µ_{A0} or µ_{A1}/µ_{A0}. If A is given treatment x _{1} then µ will equal µ_{A1} and µ_{A1} will be observable, but µ_{A0} will be unobserved. Suppose, however, we expect µ_{A0} to equal µ_{B0}, where µ_{B0} is the value of the outcome µ observed or estimated for a population B that was administered treatment x _{0}. The latter population is sometimes called the control or reference population. Confounding is said to be present if in fact µ_{A0} [.notequal] µ_{B0}, for then there must be some difference between populations A and B (other than treatment) that is affecting µ.
If confounding is present, a naïve (crude) association measure obtained by substituting µ_{B0} for µ_{A0} is an effect measure will not equal the effect measure, and the association measure is said to be confounded. For example, if µ_{B0} [.notequal] µ_{A0} then µ_{A1} − µ_{A1}, which measure the association of treatments with outcomes across the populations, is confounded for µ_{A1} − µ_{A0}, which measures the effect of treatment x _{1} on population A. Thus, saying an association measure such as µ_{A1} − µb0 is confounded for an effect measure such as µ_{A1} − µ_{A0} is synonymous with saying the two measures are not equal.
The preceding counterfactual approach to confounding gradually emerged through attempts to separate effect measures into a component due to the effect of interest and a component due to the effect of interest and a component due to extraneous effects. One noteworthy aspect of this approach is that confounding depends on the outcome measure. For example, suppose populations A and B have a different fiveyear survival probability µ under placebo treatment x _{0}; that is, suppose µ_{B0} [.notequal] µ_{A0}, so that µ_{A1} − µ_{B0} is confounded for the actual effect µ_{A1} − µ_{B0} of treatment on fiveyear survival. It is then still possible that tenyear survival, µ, under the placebo would be identical in both populations; that is, µ_{A0} could equal µ_{B0}, so that µ_{A1} − µ_{B0} is not confounded for the actual effect of treatment on tenyear survival. (We should generally expect no confounding for 200year survival, since no treatment is likely to raise the 200year survival probability of human patients above zero.)
A second noteworthy point is that confounding depends on the target population of inference. The preceding example, with A as the target, had different fiveyear survivals µ_{A0} and µ_{A0} for A and B under placebo therapy, and hence µ_{A1} − µ_{B0} was confounded for the effect µ_{A1} − µ_{A0} of treatment on population A. A lawyer or ethicist may also be interested in what effect the hormone treatment would have had on population B. Writing µ_{B1} for the (unobserved) outcome of B under treatment, this effect on B may measured by µ_{B1} − µ_{B0}. Substituting µ_{A1} for the unobserved µ_{B1} yields µ_{A1}− µ_{B0}. This measure of association is confounded for µ_{B1} − µ_{B0} (the effect of treatment x _{1} on fiveyear survival in population B) if and only if µ_{A1} [.notequal] µ_{B1}. Thus, the same measure of association, µ_{A1}, may be confounded for the effect of treatment on neither, one, or both of populations A and B, and may or may not be confounded for the effect of treatment on other targets.
CONFOUNDERS (CONFOUNDING FACTORS)
A third noteworthy aspect of the counterfactual formalization of confounding is that is invokes no explicit difference (imbalances) between populations A and B with respect to circumstances or covariates that might influence µ. Clearly, if µ_{A0} and µ_{B0} differ, then A and B must differ with respect to factors with influence µ. This observation has led some authors to define confounding as the presence of such covariate differences between the compared populations. Nonetheless, confounding is only a consequence of these covariate differences. In fact, A and B may differ profoundly with respect to convariates that influence µ, and yet confounding may be absent. In other words, a covariate difference between A and B is a necessary but not sufficient condition for confounding. This is because the impact of covariate differences may balance each other out, leaving no confounding.
Suppose now that populations A and B differ with respect to certain covariates, and that these differences have led to confounding of an association measure for the effect measure of interest. The responsible covariates are then termed confounders of the association measure. In the above example, with µ_{A1} − µ_{B0} confounded for the effect µ_{A1} − µ_{A0}, the factors responsible for the confounding (i.e., the factors that led to µ_{A0} [.notequal] µ_{B0}) are the confounders. It can be deduced that a variable cannot be a confounder unless it can effect the outcome parameter µ within treatment groups and it is distributed differently among the compared populations. These two necessary conditions are sometimes offered together as a definition of a confounder. Nonetheless, counterexamples show that the two conditions are not sufficient for a variable with more than two levels to be a confounder.
PREVENTION OF CONFOUNDING
Perhaps the most obvious way to avoid confounding in estimating µ_{A1}− µ_{A0} is to obtain a reference population B for which µ_{B0} is known to equal µ_{A0}. Among epidemiologists, such a population is sometimes said to be comparable to or exchangeable with A with respect to the outcome under the reference treatment. In practice, such a population may be difficult or impossible to find. Thus, an investigator may attempt to construct such a population, or to construct exchangeable index and reference populations. These constructions may be viewed as designbased methods for the control of confounding.
Perhaps no approach is more effective for preventing confounding by a known factor than restriction. For example, gender imbalances cannot confound a study restricted to women. However, there are several drawbacks: Restriction on enough factors can reduce the number of available subjects to unacceptable low levels and may greatly reduce the generalizability of results as well. Matching the treatment populations on confounders overcomes these drawbacks, and, if successful, can be as effective as restriction. For example, gender imbalances cannot confound a study in which the compared groups have identical proportions of women. Unfortunately, differential losses to observation may undo the initial covariate balances produced by matching.
Neither restriction nor matching prevents (although it may diminish) imbalances on unrestricted, unmatched, or unmeasured covariates. In contrast, randomization offers a means of dealing with confounding by covariates not accounted for by the design. It must be emphasized, however, that this solution is only probabilistic and subject to severe constraints in practice. Randomization is not always feasible or ethical, and (as mentioned earlier) many practical problems, such as differential loss and noncompliance, can lead to confounding in comparisons of the groups actually receiving treatments x _{1} and x _{0}. One somewhat controversial solution to noncompliance problems is intenttotreat analysis, which defines the comparison groups A and B by treatment assigned rather than treatment received. Confounding may, however, affect even intenttotreat analyses, and (contrary to widespread misperceptions) the bias in those analyses can be away from the null (exaggerating an effect). For example, the assignments may not always be random, as when blinding is insufficient to prevent the treatment providers from protocol violations. And, purely by bad luck, randomization may itself produce allocations with severe covariate imbalances between the groups (and consequent confounding), especially if the study size is small. Blocked (matched) randomization can help ensure that random imbalances on the blocking factors will not occur, but it does not guarantee balance of unblocked factors.
ADJUSTMENT FOR CONFOUNDING
Designbased methods are often infeasible or insufficient to prevent confounding. Thus, there has been an enormous amount of work devoted to analytic adjustments for confounding. With a few exceptions, these methods are based on observed covariate distributions in the compared populations. Such methods can successfully control confounding only to the extent that enough confounders are adequately measured. Then, too, many methods employ parametric models at some stage, and their success may thus depend on the faithfulness of the model to reality. These issues cannot be covered in depth here, but a few basic points are worth noting.
The simplest and most widely trusted methods of adjustment begin with stratification on confounders. A covariate cannot be responsible for confounding within internally homogeneous strata of the covariate. For example, gender imbalances cannot confound observations within a stratum composed solely of women, More generally, comparisons within strata cannot be confounded by a covariate that is unassociated with treatment within strata. This is so regardless of whether the covariate was used to define the strata. Thus, one need not stratify on all confounders in order to control confounding. Furthermore, if one has accurate background information on relations among the confounders, one may use this information to identify sets of covariates sufficient for control of confounding.
Some controversy has occurred about adjustment for covariates in randomized trials. Although Fisher asserted that randomized comparisons were "unbiased," he also pointed out that they could be confounded in the sense used here. Resolution comes from noting that Fisher's use of the word unbiased referred to the design and was not meant to guide analysis of a given trial. Once the trial is underway and the actual treatment allocation is completed, the unadjusted treatmenteffect estimate will be biased if the covariate is associated with treatment, and this bias can be removed by adjustment for the covariate.
Sander Greenland
(see also: Bias )
Bibliography
Bross, I. D. J. (1967). "Pertinency of an Extraneous Variable." Journal of Chronic Diseases 20:487–495.
Clayton, D., and Hills, M. (1993). Statistical Models in Epidemiology. New York: Oxford University Press.
Fisher, R. A. (1935). The Design of Experiments. Edinburgh: Oliver & Boyd.
Greenland, S. and Robins, J. M. (1986). "Identifiability, Exchangeability, and Epidemiological Confounding." International Journal of Epidemiology 15:413–419.
Greenland, S.; Robins, J. M.; and Pearl, J. (1999). "Confounding and Collapsibility in Causal Inference." Statistical Science 14:29–46.
Greenland, S., and Rothman, K. J. (1998). "Measures of Effect and Measures of Association." Modern Epidemiology, 2nd edition, eds. K. J. Rothman and S. Greenland. Philadelphia: Lippincott.
Groves, E. R., and Ogburn, W. F. (1928). American Marriage and Family Relationships. New York: Henry Holt.
Kitagawa, E. M. (1955). "Components of a Difference between Two Rates." Journal of the American Statistical Association 50:1168–1194.
Miettinen, O. S. (1972). "Components of the Crude Risk Ratio." American Journal of Epidemiology 96:168–172.
Mill, J. S. (1843). A System of Logic, Ratiocinative and Inductive. London: Longmans Green.
Pearl, J. (2000). Causality. New York: Cambridge University Press.
Robins, J. M. (1998). "Correction for NonCompliance in Equivalence Trials." Statistics in Medicine 17:269–302.
Rothman, K. J. (1977). "Epidemiologic Methods in Clinical Trials." Cancer 39:1771–1775.
Yule, G. U. (1903). "Notes on the Theory of Association of Attributes in Statistics." Biometrika 2:121–134.
Cite this article
Pick a style below, and copy the text for your bibliography.

MLA

Chicago

APA
"Confounding, Confounding Factors." Encyclopedia of Public Health. . Encyclopedia.com. 17 Oct. 2018 <http://www.encyclopedia.com>.
"Confounding, Confounding Factors." Encyclopedia of Public Health. . Encyclopedia.com. (October 17, 2018). http://www.encyclopedia.com/education/encyclopediasalmanacstranscriptsandmaps/confoundingconfoundingfactors
"Confounding, Confounding Factors." Encyclopedia of Public Health. . Retrieved October 17, 2018 from Encyclopedia.com: http://www.encyclopedia.com/education/encyclopediasalmanacstranscriptsandmaps/confoundingconfoundingfactors
Citation styles
Encyclopedia.com gives you the ability to cite reference entries and articles according to common styles from the Modern Language Association (MLA), The Chicago Manual of Style, and the American Psychological Association (APA).
Within the “Cite this article” tool, pick a style to see how all available information looks when formatted according to that style. Then, copy and paste the text into your bibliography or works cited list.
Because each style has its own formatting nuances that evolve over time and not all information is available for every reference entry or article, Encyclopedia.com cannot guarantee each citation it generates. Therefore, it’s best to use Encyclopedia.com citations as a starting point before checking the style against your school or publication’s requirements and the mostrecent information available at these sites:
Modern Language Association
The Chicago Manual of Style
http://www.chicagomanualofstyle.org/tools_citationguide.html
American Psychological Association
Notes:
 Most online reference entries and articles do not have page numbers. Therefore, that information is unavailable for most Encyclopedia.com content. However, the date of retrieval is often important. Refer to each style’s convention regarding the best way to format page numbers and retrieval dates.
 In addition to the MLA, Chicago, and APA styles, your school, university, publication, or institution may have its own requirements for citations. Therefore, be sure to refer to those guidelines when editing your bibliography or works cited list.