Multicollinearity
Multicollinearity
A multiple regression is said to exhibit multicollinearity when the explanatory variables are correlated with one another. Almost all multiple regressions have some degree of multicollinearity. The extent to which multicollinearity is a problem is widely misunderstood. Multicollinearity is not a violation of the classical statistical assumptions underlying multiple regression. Specifically, multicollinearity does not cause either biased coefficients or incorrect standard errors. For this reason, while identifying multicollinearity can be helpful in understanding the outcome of a regression, “corrections” to reduce multicollinearity are rarely appropriate.
In the regression model
y_{i} = b _{1}x _{i 1} + b _{2}x _{i 2} + … + b_{K}x_{ik} + e_{i}
there is multicollinearity if the x variables are correlated with one another, as is usually the case. The consequence of such correlation is that the estimates of regression coefficients are less precise than they would be absent such correlation. For example, in the regression y_{i} = a + b _{1}x _{i 1} + b _{2}x _{i 1} + e_{i} with n observations, the variance of the estimated coefficient b̂_{1} can be thought of as
When x _{1} and x _{2} have a high correlation, corr(x _{1}, x _{2}), the uncertainty about b_{1} will be large. Because the formulas for reporting standard errors reflect this, such uncertainty will be correctly reflected in the reported regression statistics.
Fundamentally, a regression estimates the effect of one explanatory variable holding constant the other explanatory variables. If one or more variables tended to move together in the available data, in which case the data will be multicollinear, then very little evidence is available about the effect of a single variable, as is reflected in the variance formula above.
The only “cures” for multicollinearity are (1) to find data with less correlation among the explanatory variables, or (2) to use a priori information to specify a value for the coefficient of one of the correlated variables, and by so doing avoid the need to separately estimate the effect of each variable.
If one explanatory variable equals a linear combination of other explanatory variables (for example, if x _{1} = x _{2} + x _{3}) the regression has perfect multicollinearity. Perfect multicollinearity makes it impossible to estimate the regression model, as indicated by the infinite variance in the formula above. However, perfect multicollinearity almost always indicates an error in specifying the model. One common error is the dummy variable trap, in which a complete set of dummy variables and an intercept, or more than one complete set of dummy variables, are included in a regression. For example, including a variable for female gender (coded 1/0), a variable for male gender, and an intercept would cause the regression to fail.
Because of limits on the numerical accuracy of computer arithmetic, a high degree of multicollinearity can lead to numerical, as opposed to statistical, errors in computing regression results. This is rarely a problem with modern software, which typically includes internal checks for such errors.
One indication of significant multicollinearity is that individual coefficients are insignificant but sets of coefficients are jointly significant. For example, a set of indicators of underlying socioeconomic status (e.g., mother’s education and father’s education) may be jointly significant even though no single indicator is significant. In such situations, investigators sometimes drop all but one indicator. While not strictly rigorous, such a procedure is not harmful so long as the coefficient on the retained variable is interpreted as a proxy for the entire set of socioeconomic indicators rather than being the effect of the specific variable that was retained. (One might retain only mother’s education, but interpret the effect loosely as “parent’s education.”)
Another indication of multicollinearity that is sometimes used is a high variance inflation factor (VIF), which measures the increase in variance of b̂_{i} due to correlation between x_{i} and the other explanatory variables. In the example above, the VIF is 1/(1 – corr(x _{1}, x _{2})^{2}).
SEE ALSO Least Squares, Ordinary; Principal Components; Properties of Estimators (Asymptotic and Exact)
BIBLIOGRAPHY
Goldberger, Arthur S. 1991. A Course in Econometrics. Cambridge, MA: Harvard University Press.
Richard Startz
Cite this article
Pick a style below, and copy the text for your bibliography.

MLA

Chicago

APA
"Multicollinearity." International Encyclopedia of the Social Sciences. . Encyclopedia.com. 24 Sep. 2018 <http://www.encyclopedia.com>.
"Multicollinearity." International Encyclopedia of the Social Sciences. . Encyclopedia.com. (September 24, 2018). http://www.encyclopedia.com/socialsciences/appliedandsocialsciencesmagazines/multicollinearity
"Multicollinearity." International Encyclopedia of the Social Sciences. . Retrieved September 24, 2018 from Encyclopedia.com: http://www.encyclopedia.com/socialsciences/appliedandsocialsciencesmagazines/multicollinearity
Citation styles
Encyclopedia.com gives you the ability to cite reference entries and articles according to common styles from the Modern Language Association (MLA), The Chicago Manual of Style, and the American Psychological Association (APA).
Within the “Cite this article” tool, pick a style to see how all available information looks when formatted according to that style. Then, copy and paste the text into your bibliography or works cited list.
Because each style has its own formatting nuances that evolve over time and not all information is available for every reference entry or article, Encyclopedia.com cannot guarantee each citation it generates. Therefore, it’s best to use Encyclopedia.com citations as a starting point before checking the style against your school or publication’s requirements and the mostrecent information available at these sites:
Modern Language Association
The Chicago Manual of Style
http://www.chicagomanualofstyle.org/tools_citationguide.html
American Psychological Association
Notes:
 Most online reference entries and articles do not have page numbers. Therefore, that information is unavailable for most Encyclopedia.com content. However, the date of retrieval is often important. Refer to each style’s convention regarding the best way to format page numbers and retrieval dates.
 In addition to the MLA, Chicago, and APA styles, your school, university, publication, or institution may have its own requirements for citations. Therefore, be sure to refer to those guidelines when editing your bibliography or works cited list.
multicollinearity
multicollinearity Multicollinearity occurs where there is a high correlation between two or more independent variables in a regression analysis. There is considerable disagreement about the degree of correlation that must exist between independent variables before they are considered to be multicollinear. Extreme multicollinearity (for example a correlation of .70 or higher between two independent variables) has adverse effects on the standard errors of the regression coefficients (and hence on tests of their statistical significance and confidence intervals).
Cite this article
Pick a style below, and copy the text for your bibliography.

MLA

Chicago

APA
"multicollinearity." A Dictionary of Sociology. . Encyclopedia.com. 24 Sep. 2018 <http://www.encyclopedia.com>.
"multicollinearity." A Dictionary of Sociology. . Encyclopedia.com. (September 24, 2018). http://www.encyclopedia.com/socialsciences/dictionariesthesaurusespicturesandpressreleases/multicollinearity
"multicollinearity." A Dictionary of Sociology. . Retrieved September 24, 2018 from Encyclopedia.com: http://www.encyclopedia.com/socialsciences/dictionariesthesaurusespicturesandpressreleases/multicollinearity
Citation styles
Encyclopedia.com gives you the ability to cite reference entries and articles according to common styles from the Modern Language Association (MLA), The Chicago Manual of Style, and the American Psychological Association (APA).
Within the “Cite this article” tool, pick a style to see how all available information looks when formatted according to that style. Then, copy and paste the text into your bibliography or works cited list.
Because each style has its own formatting nuances that evolve over time and not all information is available for every reference entry or article, Encyclopedia.com cannot guarantee each citation it generates. Therefore, it’s best to use Encyclopedia.com citations as a starting point before checking the style against your school or publication’s requirements and the mostrecent information available at these sites:
Modern Language Association
The Chicago Manual of Style
http://www.chicagomanualofstyle.org/tools_citationguide.html
American Psychological Association
Notes:
 Most online reference entries and articles do not have page numbers. Therefore, that information is unavailable for most Encyclopedia.com content. However, the date of retrieval is often important. Refer to each style’s convention regarding the best way to format page numbers and retrieval dates.
 In addition to the MLA, Chicago, and APA styles, your school, university, publication, or institution may have its own requirements for citations. Therefore, be sure to refer to those guidelines when editing your bibliography or works cited list.