Hierarchical Linear Models

views updated


Hierarchical linear models are applicable in situations where data have been collected from two (or more) different levels. Sociology's initial interest in such multilevel relationships can be traced back to Durkheim's research into the impact of community on suicide (Durkheim [1898] 1951). More recently, these models have been related to the topic of contextual analysis (Boyd and Iversen 1979), where researchers are interested in investigating linkages between micro-level and macro-level variables. Sociological theories have been classified into three groups according to the degree to which they incorporate multilevel variables (Coleman 1986). In one group, variation in a dependent variable is explained through independent variables obtained from the same social level (e.g., country, community, individual). In a second group, attempts are made to account for differences in a dependent variable at one level by examining variation in an independent variable at a higher level; and in a third group, variations in a dependent variable are explained by variations in an independent variable at a lower level. Theories that fall into either the second or third group are multilevel theories and can be explored using hierarchical linear models.


A wide variety of hierarchical models can be specified. However, in order to outline the basic features of such models, a simple example will be developed. Assume that a researcher is interested in modeling the length of hospital stay (LOS) for a specific individual (Yi) as a function of the severity of that individual's illness (Xi) and the bed occupancy rate for the institution in which that individual is hospitalized (Gj). In this hypothetical model we have one criterion (or dependent) variable, Yi, at the micro level, one micro-level predictor (or independent) variable, Xi, and one macro predictor (or independent) variable, Gj. This produces a two-level hierarchical model. The technique is quite flexible and can be expanded to include multiple predictor variables at either (or both) the micro- and macro-levels and additional levels. In the given example, an index of individual comorbidity could be included as an additional micro-level predictor, type of hospital (e.g., public vs. private) could be included as an additional macro-level predictor, and an additional level of the gross national product (GNP) of the country in which the hospital is located could be added to create a three-level model.

The first step in developing hierarchical models is to specify a model for the micro-level variables that is identical for all contexts. In the present example a linear model relating LOS as a function of severity of illness is specified for each of the hospitals.

Where j=1, 2, . . . . , j denotes the macro-level contexts (e.g., the hospitals) and i=1, 2, . . . . , nj denotes micro-level observations within contexts (e.g., individuals within hospitals). The intercepts from Equation 1 (ß0j) provide estimates of the expected LOS for individual i in hospital j whose severity of illness is zero, whereas the slopes (ß1j) provide estimates for the effect of a unit change in the severity of the illness for individual i in hospital j. Finally, the ßij's represent random errors or residuals. It is assumed that these errors are normally distributed within each context with a mean of zero and a constant variance σ2. This is a standard linear model with the exception that the coefficients (i.e., the ßj's) are allowed to vary across contexts (hospitals).

In situations where separate regression equations are estimated for various contexts, four different patterns can emerge. These patterns are depicted in Figures 1a, 1b, 1c, and 1d. In Figure 1a, the functional relationship between the micro-level variables is identical for all the contexts, and thus the intercepts and slopes are the same for all contexts. In Figure 1b, the degree of linear relationship between the micro-level variables is equivalent across contexts; however, the initial "location" (i.e., the intercept) of this relationship varies across contexts. In Figure 1c, the degree of linear relationship between the micro-level variables varies as a function of context, although the initial "location" is consistent across contexts. Finally, in Figure 1d, both the initial location and the relationship between the micro-level variables vary significantly across contexts.

Systematic differences across contexts are reflected in three of the figures (viz., Figures 1b, 1c, and 1d). The presence of these differences leads to questions of whether there are contextual or macro-level variables that could be associated with the varying micro-level coefficients (i.e., the slopes and/or intercepts). Questions of this type are addressed by specifying a second-level model. For example, if there is significant variation among the micro-level coefficients, then this variation could be modeled as a function of contextual or macro-level variables as follows:

where Gj is a contextual (or macro-level) variable, γ00 and γ10 are the intercepts from the second-level models, γ01 and γ11 are the slopes from the second-level model, and U0j and U1j are the second-level residuals. It is assumed that the residuals are distributed multivariate normal with mean vector 0 and variance-covariance matrix T . In the present example, Equation 2 would be used to model differences across hospitals among the intercepts of the micro-level equations (cf. Figures 1b and 1d), whereas Equation 3 would be used to model differences across hospitals in the slopes of the micro-level equations (cf. Figures 1c and 1d).

Depending on the actual variability of the micro-level coefficients (i.e., the ßj's), different second-level models would be justified. For example, in situations where there is no variation in the slopes across contexts (see Figure 1b), the inclusion of Gj in Equation 3 would not be meaningful given that ß1j is the same across all contexts. Similarly, in situations where there is no variation in the intercepts across contexts (see Figure 1c), the inclusion of Gj in Equation 2 would not be meaningful given that ß0j is the same across all contexts.

By substituting Equations 2 and 3 into Equation 1, we can obtain a single equation form of the hierarchical model as follows:

The model represented by Equation 4 is a mixed model with both fixed coefficients (viz., the γ's) and random coefficients (viz., the U's and the ε's). Further, since the random coefficients are allowed to covary across contexts, it can be called a variance component model.

The approach to investigating relationships occurring across hierarchical levels represented by the equations above is not new. Burstein and colleagues (1978) discussed a similar approach under the conceptualization of "slopes as outcomes." Conceptually, this is an accurate description, given that the regression coefficients estimated within each context at the micro level are used as criterion (or dependent) measures in the macro-level (or second-level) model (cf. Equations 2 and 3). However, while this conceptualization of the relationship between micro- and macro-level variables has been understood for a number of years, concerns about the adequacy of estimating such models using traditional statistical techniques (viz., ordinary least squares, OLS) have been expressed. However, separate statistical advances throughout the 1980s improved the estimation procedures for these models (for reviews see Burstein and colleagues 1989; Raudenbush 1988), with the advances resulting in several different software packages being developed specifically for the estimation of hierarchical linear models (e.g., GENMOD, HLM, ML3, and VARCL).


In estimating the various components of the hierarchical linear model, a distinction is made among fixed effects, random effects, and variance components. Specifically, fixed effects are those parameter estimates that are assumed to be constant across contexts (e.g., the γ's from Equations 2 and 3), whereas random effects are parameter estimates that are free to vary across contexts (e.g., ß0j and ß1j from Equation 1). Hierarchical linear models also allow for the estimation of the variance components of the model. These include (1) the variance of the residuals from the micro-level model (i.e., the variance of the εij's identified as σ2 above); (2) the variance of the second-level residuals (i.e., U0j and U1j); and (3) the covariance of the second-level residuals (i.e., the covariance of U0j and U1j). The variance–covariance matrix of the second-level residuals was previously defined as T .

Estimation of Fixed Effects. One approach that could be used to estimate the γ's from Equations 2 and 3 is traditional OLS regression. However, because the precision of estimation of these parameters will vary as a function of contexts, the usual OLS assumption of equal error variances (i.e., homoscedasticity) will be violated. In order to deal with this violation the second-level regression coefficients (the γ's) are estimated using a more sophisticated procedure, generalized least squares (GLS). GLS techniques provide weighted estimates of the second-level regression coefficients such that the contexts that have more precise estimation of the micro-level parameters receive more weight in the estimation. That is, those contexts in which there is greater precision in estimating the parameters (the slopes and the intercepts) receive more weight in estimating the second-level regressions.

Estimation of Variance–Covariance Components. The components of the variance–covariance matrix T include the variance of the micro-level residuals, and the variance and covariance of the second-level residuals. These components are used in the GLS estimation of the fixed effects of the second-level model. However, the values of the components of this matrix are typically not known and must be estimated. The best methods for doing this are iterative methods that alternatively estimate the parameters of the models and then estimate the variance–covariance matrix T until a convergence is reached. Hierarchical linear models adopt the EM algorithm (Dempster et al. 1977) that produces maximum likelihood estimates for the variance–covariance components of T .

Estimation of Random Effects. The simplest way of estimating the coefficients for the micro-level model (i.e., Equation 1) is to compute an OLS regression for a specific context. In the present example, this would involve obtaining a regression equation relating expected LOS to severity of illness for all individuals within a specific hospital. If there are reasonably large sample sizes within each context, this analysis would provide relatively precise estimates of the coefficients of interest. These estimates will not be stable, however, if sample sizes are smaller. Further, inspection of the second-level models reveals that there is a second estimate of the coefficients from the micro-level models. Thus, for any particular observational unit there are two separate estimates of the micro-level regression coefficients: one from the micro-level regressions themselves and the other from the second-level regression model. The question that this leaves is which of these provides a more accurate estimate of the population parameters for the particular observational unit.

Rather than forcing a choice between one of these two estimates, hierarchical linear models use empirical Bayes estimation procedures (Morris 1983) to compute an optimally weighted combination of the two estimates. The empirical Bayes estimates are a weighted composite of the two estimates discussed above. The micro-level regression coefficients (the ßj's) estimated by OLS are weighted according to the precision with which they are estimated (i.e., their reliability). In cases where the OLS estimates are not very reliable (e.g., due to small sample size), the empirical Bayes procedure allots greater weight to the second-level estimates. Essentially, then, the weighted composite "shrinks" the micro-level estimate toward the second-level estimate, with the level of shrinkage being determined by the reliability of the micro-level estimate. It has been demonstrated that, in general, the empirical Bayes estimates have smaller mean squared errors than OLS estimates.

Statistical Tests. A variety of statistical tests for hypothesis testing are provided by the various computer programs used to estimate hierarchical linear model. For example, HLM (Bryk et al. 1994) computes a t-test to evaluate the hypothesis that the second-level regression parameters depart significantly from zero. In addition, chi-square tests are provided for tests of whether or not there is significant variation in the second-level residuals. These latter tests allow the researcher to determine the model that best fits the observed data. For example, it might be that there is no significant variation in the slopes across contexts; however, there might be significant variation in the intercepts (as in Figure 1b).


Centering. Often, as in the present example, interpretation of the intercepts is not straightforward, since a value of zero for the independent variable (in the present case, severity of illness) is not meaningful. In situations like this, it is possible to "center" the independent variable as a deviation from the mean level of that variable in the sample as follows.

With this specification, the intercepts now represent estimates of the expected length of stay for individuals in a specific hospital whose severity of illness is at the mean. The interpretation of the other parameters remain unaltered.

Longitudinal Data. Hierarchical linear models can also be used to analyze longitudinal data collected in order to examine questions regarding the assessment of change (Bryk and Raudenbush 1987). Under this approach, there are repeated observations within an observational unit and there is a sample of different units. This allows for a two-level conceptualization of development such that change in the individual units is modeled as a function of time and differences in the patterns of change across individual units can be modeled as a function of measurable characteristics of the individual units. Under this conceptualization, interest is in between-individual (unit) differences in within-individual (unit) change.

Statistical Software. As previously noted, a number of different software programs have been specifically developed in order to estimate hierarchical linear models. Kreft and colleagues (1994) reviewed five of the then-available packages. While they recommended ML3 (Prosser et al. 1991) for the "serious" user, they concluded that HLM's main advantage is its ease of use. Since that time, both programs have been updated and now versions for Windows '95 are available (viz., HLM 4 and MlwiN. Information on the latest version of these programs is available from the following Web sites: go to http://www.ssicentral.com/hlm/mainhlm.htm for information on HLM 4; go to http://www.ioe.ac.uk/mlwin for information on MlwiN.).


Hierarchical linear models provide statistically sophisticated ways for dealing with analyses in which data are obtained from multiple levels. Such data are common in sociological research, especially if the investigation deals with contextual effects or longitudinal designs. For more detailed discussions of hierarchical linear models, the interested reader is directed to the following sources that provide more in-depth coverage: Bryk and Raudenbush (1992); Goldstein (1995); or Hox (1995). (At the time of publication, a complete, on-line version of this text was available from: http://ioe.ac.uk/multilevel.what-new.html.


Boyd, L. H., and G. R. Iversen 1979 Contextual Analysis:Concepts and Statistical Techniques. Belmont, Calif.: Wadsworth.

Bryk, A. S., and S. W. Raudenbush 1987 "Application of Hierarchical Linear Models to Assessing Change." Psychological Bulletin 101:147–158.

——1992 Hierarchical Linear Models: Applications andData Analysis Methods. Newbury Park, Calif.: Sage.

——, and R. J. Congdon 1994 Hierarchical LinearModeling with the HLM/2L and HLM/3L Programs. Chicago: Scientific Software International.

Burstein, L., K. S. Kim, and G. Delandshere 1989 "Multilevel Investigations of Systematically Varying Slopes: Issues, Alternatives, and Consequences." In R. D. Bock, ed., Multilevel Analysis of EducationalData. New York: Academic Press.

Burstein, L., R. L. Linn, and F. J. Capell 1978 "Analyzing Multilevel Data in the Presence of Heterogeneous Within-Class Regressions." Journal of Educational Statistics 3:347–383.

Coleman, J. S. 1986 "Social Theory, Social Research, and a Theory of Action." American Journal of Sociology 91:1309–1336.

Dempster, A. P., N. M. Laird, and D. B. Rubin 1977 "Maximum Likelihood from Incomplete Data Via the EM Algorithm." Journal of the Royal StatisticalSociety, Series B 39:1–8.

Durkheim, E. (1898) 1951 Suicide. Glencoe Ill.: Free Press.

Goldstein, H. 1995 Multilevel Statistical Models. New York: Halstead.

Hox, J. J. 1995 Applied Multilevel Analysis. Amsterdam: TT-Publikaties.

Kreft, I., J. de Leeuw, and R. van der Leeden 1994 "Review of Five Multilevel Analysis Programs: BMDP-5V, GENMOD, HLM, ML3, VARCL." The AmericanStatistician 48:324–335.

Morris, C. 1983 "Parametric Empirical Bayes Inference: Theory and Applications." Journal of the AmericanStatistical Association 78:47–65.

Prosser, R., J. Rasbash, and H. Goldstein 1991 ML3:Software for Three-Level Analysis. London: Institute of Education, University of London.

Raudenbush, S. W. 1988 "Educational Applications of Hierarchical Linear Models: A Review." Journal ofEducational Statistics 13:85–116.

George Alder