Econometric Decomposition

Updated About content Print Article Share Article
views updated

Econometric Decomposition



Ronald Oaxaca (1973) and Alan Blinder (1973) introduced a statistical tool that enables social scientists to identify the ability of a particular observable characteristic to explain the difference in the outcomes of two groups (e.g., the black-white wage gap). The tool, known as a decomposition, provides an estimate of the contribution of discrimination to the difference in the outcomes of the two groups. Prior to Oaxaca and Blinders innovation, researchers were only able to identify collective contribution of all observable differences in the characteristics of two groups. The decomposition has become a required tool in many social science disciplines. It is used to explain pay differences between men and women, public and private sector workers, union and nonunion workers. Most recently, the decomposition has been applied to explaining pay differences between older and younger workers, people with disabilities and those without disabilities, and the pay disadvantage that gays, lesbians, and bisexuals experience (see, for example, Rodgers 2005, Badgett 2006, Baldwin and Johnson 2006, and Adams and Neumark 2006).

Since Oaxaca and Blinders seminal work, numerous extensions have been developed. Using the white-black wage gap as the example, this entry summarizes the techniques major extensions and limitations.


Oaxaca (1973) combines log-earnings function estimates for blacks and whites and standardizes the error term to construct the following expression:

where the Dt denotes the total log earnings differential. On the right-hand side, the first term is the explained gap (the portion explained by differences in measured characteristics). The second term is the residual gap (the portion attributed to differences in rates of compensation to the characteristics). The remaining two terms are generally ignored, as the decomposition is usually done at the means; otherwise, the sum of the last three terms is considered the residual gap. The residual gap is interpreted as the contribution of discrimination and characteristics that have been excluded from the model. These characteristics both predict wages and are correlated with race.

Interpreting the residual gap as discrimination requires that the model contain all of the factors that predict wages. Otherwise, discriminations estimated contribution is biased. Little theoretical guidance exists on the selection of the characteristics that should be included. For example, some researchers control for racial differences in occupational outcomes. Yet these outcomes are influenced by discrimination. Another major issue is that the choice of weights is arbitrary. This is a problem when the weights differ across groups, generating a range of decompositions. Some efforts have attempted to utilize economic theory to provide guidance on the weights choice (see, for example, Cotton 1988 and Neumark 1988). In practice, researchers either present results assuming different weighting structures, or present their preferred specification and say in a note that the results are not sensitive to choice of weights.


An extension developed by Chinhui Juhn, Kevin Murphy, and Brooks Pierce (1991) is to decompose time series changes in the wage gap into four components. For example, a narrowing or wdening in the white-black wage gap from year t to t , can be written as

The change in the actual wage gap is decomposed into (1) changes in measured characteristics holding the coefficients or prices fixed; (2) changes in prices holding characteristics fixed; (3) the contribution of shifts in central tendency or the movement of the average black in the white distribution; and (4) the contribution of shifts in spread, or changes in the variance of wages (see Juhn, Murphy, and Pierce 1991 for a detailed description of how these components are constructed). Term 3 measures changes in the position of blacks in the white residual wage distribution due to changes in unmeasured racial-specific factors (e.g., discrimination). Term 4 measures changes in residual white inequality, the wage disadvantage for having a position below the mean in the white residual wage distribution. Even this decomposition contains the index number problem. Similar decompositions can be constructed using different base years or by substituting the estimated white prices with the black prices. In practice, researchers use the average across all years as the base to avoid possible extremes within any given year.

Wing Seun (1997) identifies another potential limitation to this decomposition. The procedure generates biased results if wage inequality (Term 3) and the percentile ranks (Term 4) are not independent of one another. As wage inequality expands, the term that measures the contribution of unobservable prices increases while the term capturing movements in the position of blacks falls. This problem is greatest at the tails of the distribution. As inequality rises, the tails become fatter, artificially moving blacks up in the white distribution. The bias will be larger at the lowest percentiles because of the skewed shape of wage distributions, but bias could be present at segments of the distribution where mass points exist. Mass points are wages that are common to a significant portion of the population.

William Rodgers III (2005) constructs distribution-specific approaches to address this potential bias. His extension of the Juhn, Murphy, and Pierce residual wage procedure (1991) starts with estimating a log wage equation for year t using only whites. He then uses the estimated coefficients to construct white and black residual distributions. With these distributions, Rodgers finds the white residual wage that equals the median black wage. This location is denoted as the qth quantile. Now using the year t white residual distribution, Rodgers finds the white residual that corresponds to the qth quantile. This residual is interpreted as the predicted year t black wage residual, assuming that the median blacks initial year t position is preserved. The actual change, predicted change, and the ratio of the two are then constructed. This local approach can be performed at any quantile of the wage distribution, breaking the correlation between wage inequality and percentile rank.


At first glance, decomposing within group differences seems like a trivial exercise, but Oaxaca and Michael Ransom (1999) show that applying the typical wage decomposition techniques within groups leads to unidentified estimates. Lack of identification occurs because one cannot identify the separate contributions of the dummy variables that are included in the model. It is only possible to identify the relative effects of the dummy variable on the gap. The size of the residual wage gap depends on the omitted reference group chosen by the researcher (see Oaxaca and Ransom 1999 for a detailed description of this econometric problem).

For example, a decomposition of the racial wage gap in the jth occupation can be written as:

where the first three terms on the right-hand side measure occupation j s unexplained racial gap. The last term is the predicted racial gap due to observable racial differences in characteristics.

The typical approach defines the unexplained portion of the wage gap as:

The expression represents for occupation j racial differences in the coefficients after removing the adjusted wage difference between the average black and white in the excluded occupation (the difference in each regressions constants). The β̂js denote race-specific coefficients on the jth occupation dummy variables in each black and white log wage equation. The α̂s are the constants from each black and white log wage equation.

The black-white wage gap for the jth occupation is not identified because it depends on the selection of the omitted reference group of any dummy variable contained in the regression. The estimates of the βs, the coefficients on the predictor variables (e.g., education, potential experience, industry), and the α̂s, the intercepts are not robust to choice of the omitted reference group. The α̂s will change when different omitted groups are specified.

To achieve identification, William Horrace and Oaxaca (2001) construct three estimators. One of the estimators is written as follows:

where the β̂s and α̂s are defined as earlier. The term denotes the average characteristics of African Americans in occupation j and (θ̂b θ̂w) denotes the difference between the black and white coefficients on the characteristics of blacks and whites. This estimator avoids the identification problem because the changes in the coefficients (θ̂b θ̂w) offset any changes in the intercepts (α̂b α̂w). One potential drawback to this estimator is that the predicated racial wage gap varies with the average characteristics of black workers in each occupation (). In order to deal with this potential problem, Horrace and Oaxaca use the means of blacks across all occupations.

Horrace and Oaxacas third estimator provides information about the significance of the ordered occupation wage gaps. The relative wage gap in the jth occupation can be written as:

Horrace and Oaxaca take advantage of the fact that γ̂j 0 and create the normalization e-γ̂ [0,1]. This normalization expresses the wage gaps as a percentage of the largest normalized wage gap (1.0). The estimator removes racial differences for all the excluded reference groups for all dummy variables (α̂b α̂w) and the omitted occupation. The standard errors on the differences between the wage gaps are used to determine whether these differences are statistically significant and whether or not the order statistic has any statistical meaning.

SEE ALSO Discrimination


Adams, Scott, and David Neumark. 2006. Age Discrimination in U.S. Labor Markets: A Review of the Evidence. In Handbook on the Economics of Discrimination, ed. William M. Rodgers III, 187214. Northampton, MA: Edgar Elgar.

Badgett, M.V. Lee. 2006. Discrimination Based on Sexual Orientation: A Review of the Literature. In Handbook on the Economics of Discrimination, ed. William M. Rodgers III, 161186. Northampton, MA: Edgar Elgar.

Baldwin, Marjorie L., and William G. Johnson. 2006. A Critical Review of Studies of Discrimination Against Workers with Disabilities. In Handbook on the Economics of Discrimination, ed. William M. Rodgers III, 119160. Northampton, MA: Edgar Elgar.

Blau, Francine, and Andrea H. Beller. 1992. Black-White Earnings Over the 1970s and 1980s: Gender Differences in Trends. Review of Economics and Statistics 74 (2): 276286.

Blinder, Alan. 1973. Wage Discrimination: Reduced Form and Structural Estimates. Journal of Human Resources 8 (4): 436455.

Cotton, Jeremiah. 1988. On the Decomposition of Wage Differentials. Review of Economics and Statistics 70 (2): 236243.

Horrace, William, and Ronald Oaxaca. 2001. Inter-Industry Wage Differentials and the Gender Wage Gap: An Identification Problem. Industrial and Labor Relations Review 54: 611618.

Juhn, Chinhui, Kevin Murphy, and Brooks Pierce. 1991. Accounting for the Slowdown in Black-White Wage Convergence. In Workers and Their Wages: Changing Patterns in the United States, ed. Marvin Kosters, 107143. Washington, DC: American Enterprise Institute Press.

Neumark, David. 1988. Employers Discriminatory Behavior and the Estimation of Wage Discrimination. Journal of Human Resources 23 (3): 279295.

Oaxaca, Ronald. 1973. Male-Female Wage Differentials in Urban Labor Markets. International Economic Review 14 (3): 693709.

Oaxaca, Ronald, and Michael Ransom. 1999. Identification in Detailed Wage Decompositions. Review of Economics and Statistics 81 (1): 154157.

Rodgers, William M., III. 2005. Male White-Black Wage Gaps, 19791994: A Distributional Analysis. Southern Economic Journal 72 (4): 773793.

Rodgers, William M., III. 2006. Handbook on the Economics of Discrimination. Northampton, MA: Edgar Elgar.

Seun, Wing. 1997. Decomposing Wage Residuals: Unmeasured Skill or Statistical Artifact? Journal of Labor Economics 15 (3, part 1): 555566.

William M. Rodgers III


Econometric Decomposition