Levels of Analysis

views updated


Determining the level of analysis is usually straightforward, but whether to, or how to, draw inferences from one level of analysis to another is a difficult problem for which there is no general solution. The cases used as the units in an analysis determine the level of analysis. These cases may be quite varied, for example, countries, political parties, advertisements, families, or individuals. Thus, analysis may occur at the individual level, family level, advertisement level, and so forth.

The types of variables used at any one level of analysis, however, may be quite different. As an example, in studying the determinants of individuals' attitudes toward public education, the individuals (the units of analysis) may be described in terms of their sex and race (measures of individual properties), whether they attended a public or private college, and the region of the country in which they reside (measures of the collectives to which they belong). The analysis in this example is at the individual level because the cases used are individuals who are described in terms of individual properties and the properties of the collectives to which they belong.

This article focuses on (1) the types of variables used to describe the properties of collectives and members and the use of these variables at different levels of analysis; (2) problems that arise when using relationships at one level of analysis to make inferences about relationships at another level of analysis; (3) a brief discussion of a statistical model that explicates these problems; (4) some useful data analytic techniques to use when data at two or more levels of analysis are available, and (5) proposed solutions or partial solutions to the problem of cross level inference when data at only a single level of analysis are available.


Lazarsfeld and Menzel (1969) propose a typology of the kinds of properties (variables) that describe "collectives" and "members." For example, in discussing the properties of collectives, Lazarsfeld and Menzel distinguish between analytical, structural, and global properties.

Analytical properties are obtained by performing some mathematical operation upon some property of each single member. These properties are typically referred to as aggregate variables. Examples are the percentage of blacks in cities, the sex ratio for different counties, and the Gini Index as a measure of inequality of incomes in organizations.

Structural properties of collectives are obtained by performing some operation on data about the relations of each member to some or all of the others. Such measures are common in network analysis. Friendship density, for example, could be defined as the relative number of pairs of members of a collective who are directly connected by friendship ties. Since the total number of potential ties in a group with N members is N (N−1)/2, one measure of density is the total number of ties divided by this number.

Global properties of collectives do not use information about the properties of individual members either singly or in relationship to one another. Having a democratic or nondemocratic form of government is a global property of collectives. Being a private rather than a public school is a global property of a school. The proportion of gross national product (GNP) spent on education is a global property of countries.

Thus, variables that describe collectives can be based on summary data concerning single members of those collectives, the relationships of members to other members, or some global characteristic of the collective itself. Turning to variables that describe the properties of members of collectives, there are four major types: absolute, relational, comparative, and contextual.

Absolute properties are obtained without making use either of information about the characteristics of the collective or of information about the relationships of the member being described to other members. Thus, sex, level of education, and income are absolute properties of individuals.

Relational properties of members are computed from information about the substantive relationships between them and other members. For example, the number of friends an individual has at school or the number of family members is a property of the individual based on other members in the collective.

Comparative properties characterize a member by a comparison between his or her value on some (absolute or relational) property and the distribution of this property over the entire collective to which the person belongs. A person's class rank and birth order are comparative properties.

Contextual properties describe members by a property of the collective to which they belong. For example, being from a densely populated census tract or a school with a certain percentage of nonwhite students is a contextual property describing the context in which the member acts. Contextual properties are characteristics of collectives that are applied to members.

Contextual variables remind us that the level of analysis is determined by the cases used as the units of analysis, not by the level of the phenomena described by a particular variable. Thus, all the variables that describe a collective may be used at the individual level of analysis as well as those that describe individual properties; for example, a person's attitude may be predicted on the basis of the percentage of blacks in the person's school (contextual/analytic), whether or not the school is private or public (contextual/global), the density of friendships at the school (contextual/structural), the person's sex (absolute), his or her class standing (comparative), and the number of friendship choices he or she receives (relational).


The section above describes abstractly two different levels of analysis, the collective (or aggregate) and the member (or individual). Sociologists typically distinguish levels concretely depending on the units of analysis; for example, the units may be schools, advertisements, children's stories, or riots. To make inferences from relationships discovered at one level of analysis to relationships at another level is not logically valid, and sociologists have labeled such inferences "fallacies." Still, at times one may be able to argue for the reasonableness of such inferences. These arguments may be based on statistical considerations (Achen and Shively 1995; Duncan and Davis 1953; Goodman 1953, 1959; King 1997) or on rationales that closely tie relationships at one level with those at another (Durkheim [1897] 1966; Dornbusch and Hickman 1959).

Disaggregative fallacies (often called ecological fallacies) are the classic case of cross level fallacies. Robinson (1950) brought them to the attention of sociologists. He cites two cases of cross level inferences, both of which involve making inferences about relationships at the individual level based on relationships discovered at the aggregate level. Robinson noted that the Pearson product moment correlation between the percent black and the percent illiterate in 1930 for the Census Bureau's nine geographical divisions was 0.95 and for states it was 0.77, while the correlation (measured by phi) on the individual level between being illiterate or not and being black or not was only 0.20. The relationship between percent illiterate and percent foreign-born was negative for regions and states (−0.62 and −0.53, respectively), while the relationship between being illiterate and being foreign-born at the individual level was positive (0.12).

Robinson demonstrated that relationships at one level of analysis do not have to be the same as those at another level. To assume that they must be the same or even that they must be quite similar is a logical fallacy.

Aggregative fallacies occur in the opposite direction, that is, when one assumes that relationships existing at the individual level must exist at the aggregate level. Robinson's results show that the positive relationship between being foreign born and being illiterate at the individual level may not be mirrored at the state level.

Universal fallacies (Alker 1969) occur when researchers assume that relationships based on the total population must be true for subsamples of the whole. It may be true, for example, that the relationship between population density and the crime rates of cities for all cities in the United States is not the same for southern cities or for cities with a population of over one million. Here, the fallacy is to assume that a relationship based on the total population must hold for selected subpopulations.

Similarly, one might commit a selective fallacy (Alker 1969) by assuming that relationships based on a particular sample of cities must hold for all cities. If the selected cities are a random sample of cities, this is a problem of statistical inference, but if they are selected on some other basis (e.g., size), then making inferences to all cities is a selective fallacy.

Cross-modality fallacies occur when the inference is from one distinct type of unit to another distinct type of unit. A cross-modality fallacy occurrs when trends in advertisement content are used to make inferences about trends in the attitudes of individuals, or designs on pottery are used to infer the level of need for achievement in different cultures. (Aggregative and disaggregative fallacies are cross-modality fallacies because groups and individuals are distinct units. But these fallacies have traditionally been classified separately.)

Cross-sectional fallacies occur when one makes inferences from cross-sectional relationships (relationships based on units of analysis from a single point in time) to longitudinal relationships. For example, if unemployment rates and crime rates are positively related at the city level, this fallacy is committed by inferring that increases over time in the unemployment rate are related to increases in the crime rate over time.

Longitudinal fallacies occur when one makes inferences from longitudinal relationships (relationships based on units of analysis across time units) to cross-sectional ones, that is inferring from a relationship between unemployment rates and crime rates over time to the relationship between these rates over units such as cities, counties, or states at a given point in time.

In all of their varied manifestations, cross-level inferences are not logically valid inferences (Skyrms 1975). That is, relationships on one level of analysis are not necessarily the same as those on another level. They may not even be similar. In the final section of this article, however, we note that data at one level of analysis may serve as evidence for relationships at another level of analysis even if they do not strictly imply such a relationship.


This section presents the results of a mathematical demonstration of why disaggregative and aggregative inferences are fallacies, that is, why results at the aggregate level are not necessarily mirrored at the individual level. The derivation of this model is not shown here but may be found in several sources (Duncan et al. 1961; Alker 1969; Hannan 1971; and Robinson 1950). Readers who prefer can skip to the next section without loss of continuity.

The individual level or total correlation (rtxy) between two variables (X and Y) can be written as a function of the correlation between group means (the aggregate level correlation: rbxy), the correlation of individual scores within groups (a weighted average of the correlations within each of the groups: rwxy), and the correlation ratios for the two variables, X and Y. The correlation ratio is the ratio of the variance between groups (the variance of the group means: Vbx) to the total variance (variance of the individual scores: Vtx). Thus, the individual level or total correlation can be written

Similarly, the individual level regression coefficient can be written as a function of the within-group regression coefficient, the group level (between group) regression coefficient, and the correlation ratio for variable X (equation 2):

It is a simple algebraic exercise to derive formulas for rbxy and bbyx (the aggregate level correlation and regression coefficients) in terms of correlation ratios, and correlation and regression coefficients at other levels. These formulas clearly demonstrate why one cannot use the ecological or group level correlation or regression coefficients to estimate individual level relationships: The individual level relationships are a function of group level relationships, within-group level relationships, and correlation ratios. This approach can be extended to include other levels of analysis, for example, individuals on one level, counties on another, states on another, and time as yet another level (see, e.g., Alker 1969; Duncan et al. 1961).


Obtaining data at the different levels of analysis solves the problem of inferring relationships from one level to another. Researchers in this situation know the relationship at both levels for their data. Such data also provide additional information about the relationships at different levels of analysis.

O'Brien (1998) shows that when individual level data are aggregated to create summary measures at the aggregate level (e.g., means or rates for aggregates), then it is possible to estimate the reliability of the aggregate level measures. Further, when two or more of the aggregate level measures are based on samples of the same respondents within each aggregate, correlated errors between aggregate level measures are likely to occur. This correlated error can be measured and the aggregate level relationships can be corrected for this spurious correlation as well as for unreliability in the aggregate level measures.

In some situations one may want to argue that the best measures of individual level "effects" (in a causal sense) are provided by analyses at the individual level that include as predictors relevant individual properties and the properties of the collective to which the individuals belong (contextual variables). Estimates of these individual level relationships are then "controlled" for group level effects (Alwin and Otto 1977).

Since the relationship between group level means may reflect nothing more than the relationship between variables at the individual level, it has been suggested that the best estimate of group level "effects" compares the regression coefficient for the group means and for the individual scores. Lincoln and Zeitz (1980) show how this may be done in a single regression equation while at the same time controlling for other relevant variables. For both of these techniques to work, some stringent assumptions must be met, including assumptions of no measurement error in the independent variables and of a common within-group regression coefficient. Most importantly, these techniques depend upon having data from different levels of analysis.

The introduction of Hierarchical Linear Models (HLM) provides a flexible method for examining the relationship of individual level variables to group level variables (Bryk and Raudenbush 1992). These models allow for the relationships between individual level variables to vary within different groups and for differences between these relationships to be predicted by group level characteristics. It might be the case, for example, that the relationship between socioeconomic standing and student achievement differs depending upon class size and whether the school is private or public. These models allow for the prediction of different relationships for different individuals based on group level characteristics. These models may be extended to several levels of analysis in which members are nested within collectives.


Even when data at only one level of analysis are available, cross level inferences can be and often are made by sociologists. There is no absolute stricture against making such inferences, but when researchers make them, they need to do so with some awareness of their limitations.

While both Duncan and Davis (1953) and Goodman (1953, 1959) maintain that it is generally inappropriate to use aggregate level (ecological) relationships to make inferences about individual level relationships, they each propose strategies that set bounds on the possible relationships that could exist at one level of analysis given relationships that exist at the other. The bounds are designed for use with aggregated data (analytic measures), and in some circumstances these techniques are useful.

Goodman (1953, 1959) suggested a technique called "ecological regression," which became the most widely used method for making inferences from aggregate level data to individual level relationships when group level variables are "analytical properties of collectives." Goodman's method has been extended by a number of authors and summarized in the work of Achen and Shiveley (1995). King (1997) has proposed a statistical "solution to the problem of ecological inference," but the success of that solution is controversial (Freedman et al. 1998).

"Theory" may also allow one to make cross level inferences. For example, Dornbusch and Hickman (1959) tested Riesman et al.'s contention (1950) that other-directedness in individuals declined in the United States during the first half of the twentieth century. They obviously could not interview individuals throughout the first half of the century, so they turned to advertisements in a women's magazine (Ladies Home Journal, 1890–1956) to examine whether these ads increasingly used themes of other-directedness. Their units of analysis were advertisements, but they explicitly stated that they wanted to make inferences about changes in the other-directedness of individuals. Is this justified? The answer is no, on strictly logical grounds, and this constitutes a cross-modality fallacy. Certainly changes in the contents of advertisements do not demonstrate changes in individuals' personalities. But Dornbusch and Hickman (1959) convincingly argue that advertisements (in this case) are likely to reflect aspects of other-directedness in the targets of the advertisements (individuals). They recognize the need for other tests of this hypothesis, using other types of data.

Perhaps the classic case in sociology of an analysis built on the ecological fallacy is Emile Durkheim's analysis of suicide ([1897] 1966). One factor that Durkheim sees as "protecting" individuals from suicide is social integration. When individuals are married, have children, are members of a church that provides a high degree of social integration (e.g., Catholic rather than Protestant), or live at a time when their countries are in crisis (e.g., a war or electoral crisis), they are seen as more integrated into social, religious, and political society and less likely to commit suicide. Much of the data available to Durkheim did not allow an analysis on the individual level. There was no "suicide registry" with detailed data on the sex, religion, family status, and so forth of those committing suicide. There were, however, census data on the proportion of Catholics, the proportion married, and the average family size in different regions. Other sources could be used to ascertain the rate of suicide for different regions. Using these data, Durkheim showed that Catholic countries had lower suicide rates than Protestant countries, and that, France and Germany, Catholic cantons exhibited lower suicide rates than Protestant cantons. Further, departments in France with larger average family sizes had lower suicide rates, and the suicide rate was lower during the months of electoral crises in France than during comparable months of the previous or following year. He combined this evidence with other evidence dealing with individuals (e.g., suicide rates for married versus unmarried men), and it was all consistent with his theory of suicide and social integration.

One could dismiss these aggregate level relationships by arguing that perhaps in Protestant countries those of other religions kill themselves at such a high rate that the suicide rates are higher in Protestant countries than in Catholic countries (and similarly in Protestant cantons in France and Germany). Isn't it possible that in departments with relatively small average family sizes there is a tendency for those in large-size families to kill themselves relatively more often? This would create a relationship at the aggregate level (department level) in which smaller average family size is associated with higher rates of suicide. It is possible, because relationships at one level of analysis are not necessarily mirrored at another level of analysis. But Durkheim's results are not easily dismissed.

Strict logic does not justify cross level inferences. But strict logic is not the only rational way to justify inferences. If a series of diverse relationships that are predicted to hold at the individual level are found at the aggregate level, they do not prove that the same relationships would be found at the individual level, but they are not irrelevant. It is incumbent on the critic of a study such as Durkheim's to give a series of alternative explanations explaining why the relationships at the aggregate level should differ from those at the individual level. If the alternative explanations are not very convincing or parsimonious, researchers are likely to find Durkheim's evidence persuasive. To the extent that social scientists are convinced that a set of advertisements is designed to appeal to motivations in their target population, that the target population of the magazine in which the advertisements appears represents the population of interest, and so on, they will find Dornbusch and Hickman's cross level inference persuasive. That a relationship at one level of analysis does not imply a relationship at another level of analysis does not mean that it cannot be used, along with other evidence, to help infer a relationship at another level of analysis.

Persuasion is a matter of degree and is subject to change. Sociologists would want to examine additional studies based on, for instance, other populations, modalities, and periods. These data might strengthen cross level inferences.


Achen, Christopher H., and W. Phillips Shively 1995 Cross-Level Inference. Chicago: University of Chicago Press.

Alker, Hayward R. 1969 "A Typology of Ecological Fallacies." In Mattei Dogan and Stein Rokkan, eds., Quantitative Ecological Analysis in the Social Sciences. Cambridge, Mass.: MIT Press.

Alwin, Duane F., and Luther B. Otto 1977 "High School Context Effects on Aspirations." Sociology of Education 50:259–272.

Bryk, Anthony S., and Stephen W. Raudenbush 1992 Hierarchical Linear Models: Applications and Data Analysis Methods. Newbury Park, Calif.: Sage Publications.

Dornbusch, Sanford M., and Lauren C. Hickman 1959 "Other-Directedness in Consumer-Goods Advertising: A Test of Riesman's Historical Theory." Social Forces 38:99–102.

Duncan, Otis D., Ray P. Cuzzort, and Beverly Duncan 1961 Statistical Geography: Problems in Analyzing Areal Data. Glencoe, Ill.: Free Press.

Duncan, Otis D., and Beverly Davis 1953 "An Alternative to Ecological Correlation." American Sociological Review 18:665–666.

Durkheim, Emile (1897) 1966 Suicide. New York: Free Press.

Freedman, D. A., S. P. Klein, M. Ostland, and M. R. Roberts 1998 "A Solution to the Ecological Inference Problem (Book Review)." Journal of the American Statistical Association 93:1518–1522.

Goodman, Leo A. 1953 "Ecological Regression and Behavior of Individuals." American Sociological Review 18:663–664.

——1959 "Some Alternatives to Ecological Correlation." American Journal of Sociology 64:610–625.

Hannan, Michael T. 1971 Aggregation and Disaggregation in Sociology. Lexington, Mass.: Lexington Books.

——, and Leigh Burstein 1974 "Estimation from Grouped Observations." American Sociological Review 39:374–392.

King, Gary 1997 A Solution to the Ecological Inference Problem. Princeton, N.J.: Princeton University Press.

Lazarsfeld, Paul F., and Herbert Menzel 1969 "On the Relation Between Individual and Collective Properties." In Amitai Etzioni, ed., A Sociological Reader on Complex Organizations. New York: Holt, Rinehart, and Winston.

Lincoln, James R., and Gerald Zeitz 1980 "Organizational Properties from Aggregate Data: Separating Individual and Structural Effects." American Sociological Review 45:391–408.

O'Brien, Robert M. 1998. "Correcting Measures of Relationship Between Aggregate-Level Variables for Both Unreliability and Correlated Errors: An Empirical Example." Social Science Research 27:218–234.

Reisman, David, Nathan Glazer, and Reuel Denney 1950 The Lonely Crowd. New Haven, Conn.: Yale University Press.

Robinson, William S. 1950 "Ecological Correlations and the Behavior of Individuals." American Sociological Review 15:351–357.

Skyrms, Brian 1975 Choice and Chance. Encino, Calif.: Dickenson.

Robert M. O'Brien