## Demographic Methods

## Demographic Methods

# DEMOGRAPHIC METHODS

Demographic methods are used to provide researchers and policymakers with useful information about the size and structure of human populations and the processes that govern population changes. A population, of course, may range in size from a small number of individuals surveyed locally to a large national population enumerated in periodic censuses to even larger aggregated entities.

We use demographic methods not only in purely demographic applications, but also in a variety of other fields, among them sociology, economics, anthropology, public health, and business. Demographers, like all researchers, must pay careful attention to the quality of data that enter into their analyses. Some circumstances under which we use these methods are more trying than others. In cases in which the data are viewed to be accurate and complete, the methods we use to analyze them are more straightforward than those that are applied to data of imperfect quality.

## DESCRIPTION OF DATA

First we must develop ways to describe our data in a fashion that allows the most important facts to leap out at us. As one example, let us examine a population's age structure or distribution. Simple descriptive statistics are doubtless helpful in summarizing aspects of population age structure, but demographers often use age "pyramids" to convey to an audience the youthfulness of a population, for example, or even to convey a rough sense of a nation's history.

We might note, for example, that 20 percent of Norway's population in 1997 was under the age of fifteen. In contrast, 39 percent of Mexico's population seven years earlier fell into this category. But it is perhaps more dramatic to create a visual display of these figures in the form of age pyramids.

An age pyramid is typically constructed as a bar graph, with horizontal bars—one representing each sex—emanating in both directions from a central vertical age axis. Age increases as one proceeds up the axis and the unit of the horizontal axis is either the proportion of the total population in each age group or the population size itself.

We see in panel A of figure 1 that Mexico's population is described by a very broad-based pyramid—actually, in two-dimensional space, a triangle, but by convention we refer to it as a pyramid—which reveals a remarkably large proportion of the population not yet having reached adulthood. The median age of this population is about twenty years. Such a distributional shape is common particularly among high fertility populations. In stark contrast we have the pyramid shown in panel B, representing the population of Norway. Rather than triangular, its age distribution is somewhat more rectangular in shape, which is typically seen among countries that have experienced an extended period of low fertility rates. It is easy to see that Norway's median age (about thirty-six years) is considerably higher than that of Mexico.

As mentioned above, not only can we examine the age structure of populations through the use of pyramids, but we can also gain much insight into a nation's history insofar as that history has either directly or indirectly influenced the size of successive birth cohorts. Note, for example, the age pyramid reflecting the population age structure of France on 1 January 1962 (figure 2).

In this figure, several notable events in France's history are apparent. We see (1) the military losses experienced in World War I by male birth cohorts of the mid to late 1890s, (2) the remarkable birth dearth brought about by that war (the cohort aged in their mid-forties or so in 1962), followed by (3) the return of prewar childbearing activity once the war ended (the cohort who in 1962 were in their early forties), and a similar pattern revolving around World War II, during which (4) a substantial decline in births took place (the cohort around age twenty in 1962), succeeded by (5) a dramatic baby boom in the years immediately following (those around age fifteen in 1962).

## COMPARISON OF CRUDE RATES

Much of the work in which social scientists are engaged is comparative in nature. For example, we might seek to contrast the mortality levels of the populations of two countries. Surely, if the two countries in question have accurate vital registration and census data, this task would appear to be trivial. For each country, we would simply divide the total number of deaths (D) in a given year by the total population (P) at the midpoint of that year. Thus we would define the crude death rate (CDR) for country A as:

where *t* denotes the beginning of the calendar year, ω is the oldest age attained in the population, nMxa is the death rate of individuals aged *x* to *x* + *n*, and nPxa is the number of individuals in that same age group. nMxa and nPxa are centered on the midpoint of the calendar year. The rightmost segment of this equation reminds us that the crude death rate is but the sum of the age-specific death rates weighted by the proportion of the population at each age.

Unfortunately, comparisons of crude death rates across populations can lead to misleading conclusions. This problem is in fact general to any sort of comparison based on crude rates that do not account for the confounding effects of factors that differentiate the two populations.

Let us examine the death rates of two very different countries, Mexico and Norway. The crude death rate (for both sexes combined) in Mexico in 1990, 5.2 per 1000, was approximately half that for Norway, 10.1, seven years later, in 1997. Knowing nothing else about these two countries, we might infer that the health of the Mexican population was considerably superior to that of the Norwegian population. However, once we standardize the crude death rates of the two countries on a common population age structure, we find just the reverse is true.

To accomplish this standardization, instead of applying the above equation to our data, we use the following:

where ASCDRa is the age-standardized crude death rate of country A, and nPxs is the number of individuals in the standardized population of that same age group. Any age distribution may be chosen as the standard, however, it is common simply to use the average of the two proportionate age distributions (i.e., each normalized to one in order to account for unequal population sizes).

Table 1 gives the death rates and number of persons for each country by five-year age groups (zero through four, five through nine, and so on through eighty and above). The first fact we glean from the table is that the mortality rates of the Norwegian population are substantially lower than those of their Mexican counterparts at virtually all ages. We see that nearly 40 percent of the Mexican population is concentrated in the three age groups having the lowest death rates (ages five through nineteen). Only half that proportion of the Norwegian population is found in the same age range. In contrast, only 6 percent of the Mexican population is above sixty years of age, an age range with which its highest level of mortality is associated. At the same time, about 20 percent of the Norwegian population is sixty or older. Compared with the Mexican crude death rate, then, the Norwegian rate is disproportionately weighted toward the relatively high age-specific death rates that exist in the older ages.

Applying the standardization method described by the above equation to the average of Norway's and Mexico's proportionate age distributions, we find that the resulting crude death rates are consistent with what we would infer from the two series of age-specific mortality rates. Mexico's age-standardized crude death rate, 8.7 per 1000 population, is one-third greater than that of Norway, 6.6 per 1000.

## LIFE TABLES

The *life table* is a methodological device used by demographers and others to examine the life—either literal or figurative—of a particular duration-dependent phenomenon. The original application of the life table was to the mortality patterns of human populations. Today, the life table technique is applied to such diverse areas as contraceptive efficacy, marital formation and dissolution, and organizational failure. Thus it is a remarkably general tool for examining the time-dependent survivorship in a given state.

To illustrate the use of the life table method, suppose we use as an example the mortality experience of the United States population in 1996. We might wish to derive the average number of years that an individual would live, subject to the series of age-specific death rates attributed to that population. To find the answer, we could construct a *complete life table*, as in table 2. It is complete in the sense that it is highly age-detailed, focusing on single years of age. This is in contrast to the *abridged life table*, which is usually constructed using five-year age groups. The abridged life table shown in table 3 refers also to the total population of the United States in 1996.

Most life tables that we see are called *period* or *current life tables*. They refer to a particular snapshot in time. Although they describe the mortality experience of an actual population, they do not

Standardization using Death Rates and Population Distributions from Mexico (1990) and Norway (1997) | |||||

mexico | norway | ||||

age group | population in 1000s (% distribution) | death rate (per 1000) | standard distribution (%) | population in 1000s (% distribution) | death rate (per 1000) |

source: http://www.ssb.no/www-open/english/yearbook/ and http://www.census.gov/ipc/www/idbprint.html | |||||

0-4 | 10,257 (12.62) | 8.40 | 9.75 | 303 (6.88) | 1.06 |

5-9 | 10,627 (13.08) | 0.61 | 9.96 | 301 (6.83) | 0.19 |

10-14 | 10,452 (12.86) | 0.52 | 9.43 | 264 (6.00) | 0.14 |

15-19 | 9,723 (11.97) | 0.99 | 9.00 | 265 (6.02) | 0.48 |

20-24 | 7,877 (9.69) | 1.50 | 8.24 | 299 (6.78) | 0.54 |

25-29 | 6,444 (7.93) | 1.88 | 7.86 | 343 (7.78) | 0.68 |

30-34 | 5,420 (6.67) | 2.21 | 7.17 | 338 (7.68) | 0.77 |

35-39 | 4,607 (5.67) | 2.88 | 6.46 | 319 (7.25) | 0.94 |

40-44 | 3,519 (4.33) | 3.80 | 5.72 | 313 (7.11) | 1.55 |

45-49 | 2,990 (3.68) | 5.30 | 5.28 | 303 (6.89) | 2.45 |

50-54 | 2,408 (2.96) | 7.30 | 4.73 | 286 (6.50) | 3.88 |

55-59 | 1,906 (2.35) | 11.32 | 3.48 | 203 (4.61) | 6.19 |

60-64 | 1,621 (2.00) | 15.17 | 2.97 | 174 (3.94) | 10.31 |

65-69 | 1,191 (1.47) | 23.57 | 2.74 | 177 (4.02) | 16.55 |

70-74 | 832 (1.02) | 33.27 | 2.53 | 177 (4.03) | 28.66 |

75-79 | 594 (0.73) | 54.53 | 2.13 | 156 (3.53) | 57.11 |

80+ | 780 (0.96) | 108.52 | 2.55 | 182 (4.14) | 122.48 |

Complete Life Table* for the United States, 1996 | ||||||

Exact Age x | 1qx | lx | 1dx | 1Lx | Tx | ex |

note: *This is technically an "interpolated life table" and not a complete life table based on single-year data. | ||||||

source: This life table is available on the web at http://www.cdc.gov/nchswww/datawh/statab/unpubd/mortabs/lewk2.htm | ||||||

0 | .00732 | 100,000 | 732 | 99,370 | 7,611,825 | 76.1 |

1 | .00054 | 99,268 | 53 | 99,240 | 7,512,455 | 75.7 |

2 | .00040 | 99,215 | 40 | 99,193 | 7,413,215 | 74.7 |

3 | .00031 | 99,175 | 31 | 99,158 | 7,314,022 | 73.7 |

4 | .00026 | 99,144 | 26 | 99,130 | 7,214,864 | 72.8 |

5 | .00023 | 99,118 | 23 | 99,106 | 7,115,734 | 71.8 |

6 | .00021 | 99,095 | 21 | 99,084 | 7,016,628 | 70.8 |

7 | .00020 | 99,074 | 19 | 99,064 | 6,917,544 | 69.8 |

8 | .00018 | 99,055 | 18 | 99,046 | 6,818,480 | 68.8 |

9 | .00016 | 99,037 | 15 | 99,029 | 6,719,434 | 67.8 |

10 | .00014 | 99,022 | 14 | 99,014 | 6,620,405 | 66.9 |

. | . | . | . | . | . | . |

. | . | . | . | . | . | . |

. | . | . | . | . | . | . |

80 | .05967 | 49,276 | 2,940 | 47,668 | 411,547 | 8.4 |

81 | .06566 | 46,336 | 3,043 | 44,676 | 363,879 | 7.9 |

82 | .07250 | 43,293 | 3,139 | 41,586 | 319,203 | 7.4 |

83 | .08033 | 40,154 | 3,225 | 38,403 | 277,617 | 6.9 |

84 | .08936 | 36,929 | 3,300 | 35,141 | 239,214 | 6.5 |

85 | (1.00000) | 33,629 | 33,629 | 204,073 | 204,073 | 6.1 |

describe the experience of an actual birth cohort—that is, a group of individuals who are born within a (narrowly) specified interval of time. If we wished to portray the mortality history of the birth cohort of 2000, for example, we would have to wait until the last individual of that cohort has died, or beyond the year 2110, before we would be able to calculate all of the values that comprise the life table. In such a life table, called a *generation* or *cohort life table*, we can explicitly obtain the probability of individuals surviving to a given age. As is intuitively clear, however, a generation life table is suitable primarily for historical analyses of cohorts now extinct. Any generation life table that we could calculate would be very much out of date and would in no way approximate the present mortality experience of a population. Thus, we realize the need for the period life table, which treats a population at a given point in time as a *synthetic* or *hypothetical cohort*. The major drawback of the period life table is that it refers to no particular cohort of individuals. In an era of mortality rates declining at all ages, such a life table will underestimate true life expectancy for any cohort.

The most fundamental data that underlie the formation of a period life table are the number of deaths attributed to each age group in the population for a particular calendar year (n*D*x), where *x* refers to the exact age at the beginning of the age interval and *n* is the width of that interval, and the number of individuals living at the midpoint of that year for each of those same age groups (n*P*x).

To begin the life table's construction, we take the ratio of these two sets of input data—n*D*x and n*P*x—to form a series of age-specific death rates, or n*M*x:

Abridged Life Table for the United States, 1996 | ||||||||

Exact Age x | nDx | nPx (in 1,000) | nqx | lx | ndx | nLx | Tx | ex |

source: nDx and nPx values are obtained from Peters, Kochanek, and Murphy 1998, and from the web site, http://www.cdc.gov/nchswww/datawh/statab/unpubd/mortabs/pop6096.htm, respectively. | ||||||||

0 | 28,487 | 3,769 | .00732 | 100,000 | 732 | 99,370 | 7,611,825 | 76.1 |

1 | 5,948 | 16,516 | .00151 | 99,268 | 150 | 396,721 | 7,512,455 | 75.7 |

5 | 3,780 | 19,441 | .00097 | 99,118 | 96 | 495,329 | 7,115,734 | 71.8 |

10 | 4,550 | 18,981 | .00118 | 99,022 | 117 | 494,883 | 6,620,405 | 66.9 |

15 | 14,663 | 18,662 | .00390 | 98,905 | 386 | 493,650 | 6,125,522 | 61.9 |

20 | 17,780 | 17,560 | .00506 | 98,519 | 499 | 491,372 | 5,631,872 | 57.2 |

25 | 20,730 | 19,007 | .00544 | 98,020 | 533 | 488,766 | 5,140,500 | 52.4 |

30 | 30,417 | 21,361 | .00710 | 97,487 | 692 | 485,746 | 4,651,734 | 47.7 |

35 | 42,499 | 22,577 | .00944 | 96,795 | 914 | 481,820 | 4,165,988 | 43.0 |

40 | 53,534 | 20,816 | .01283 | 95,881 | 1,230 | 476,549 | 3,684,168 | 38.4 |

45 | 67,032 | 18,436 | .01801 | 94,651 | 1,705 | 469,305 | 3,207,619 | 33.9 |

50 | 77,297 | 13,934 | .02733 | 92,946 | 2,540 | 458,779 | 2,738,314 | 29.5 |

55 | 96,726 | 11,362 | .04177 | 90,406 | 3,776 | 443,132 | 2,279,535 | 25.2 |

60 | 136,999 | 9,999 | .06649 | 86,630 | 5,760 | 419,530 | 1,836,403 | 21.2 |

65 | 200,045 | 9,892 | .09663 | 80,870 | 7,814 | 385,659 | 1,416,873 | 17.5 |

70 | 273,849 | 8,778 | .14556 | 73,056 | 10,634 | 339,620 | 1,031,214 | 14.1 |

75 | 321,223 | 6,873 | .21060 | 62,422 | 13,146 | 280,047 | 691,594 | 11.1 |

80 | 342,067 | 4,557 | .31754 | 49,276 | 15,647 | 207,474 | 411,547 | 8.4 |

85 | 576,541 | 3,762 | 1.00000 | 33,629 | 33,629 | 204,073 | 204,073 | 6.1 |

For each death rate, we compute the corresponding probability of dying within that age interval, given that one has survived to the beginning of the interval. This value, denoted by nqx, is computed using the following equation:
where n*a*x is the average number of years lived by those who die within the age interval *x* to *x*+*n*. (Except for the first year of life, it is typically assumed that deaths are uniformly distributed within an age interval, implying that nax=*n*/2.) Given the values of *q* and *a*, we are able to generate the entire life table.

The life table may be thought of as a tracking device, by which a cohort of individuals is followed from the moment of their birth until the last surviving individual dies. Under this interpretation, the various remaining columns are defined in the following manner: *l*x equals the number of individuals in the life table surviving to exact age *x*. We arbitrarily set the number "born into" the life table, *l*o, which is otherwise known as the *radix*, to some value—most often, 100,000. We generate all subsequent *l*x values by the following equation:

n*d*x equals the number of deaths experienced by the life table cohort within the age interval *x* to *x*+*n*. It is the product of the number of individuals alive at exact age *x* and the conditional probability of dying within the age interval:

The concept of "person-years" is critical to understanding life table construction. Each individual who survives from one birthday to the next contributes one additional person-year to those tallied by the cohort to which that person belongs. In the year in which the individual dies, the decedent contributes some fraction of a person-year to the overall number for that cohort.

n*L*x equals the total number of person-years experienced by a cohort in the age interval, *x* to *x*+*n*. It is the sum of person-years contributed by those who have survived to the end of the interval and those contributed by individuals who die within that interval:

Tx equals the number of person-years lived beyond exact age *x*:

ex equals the expected number of years of life remaining for an individual who has already survived to exact age *x*. It is the total number of person-years experienced by the cohort above that age divided by the number of individuals starting out at that age:

The n*L*x and *T*x columns are generated from the oldest age to the youngest. If the last age category is, for example, eighty-five and above (it is typically "open-ended" in this way), we must have an initial value for *T*85 in order to begin the process. This value is derived in the following fashion: Since for this oldest age group, *l*85=∞*d*85 (due to the fact that the number of individuals in a cohort who will die at age eighty-five or beyond is simply the number surviving to age eighty-five) and *T*85=∞*L*85, we have:

From the life table, we can obtain mortality information in a variety of ways. In table 2, we see, for example, that the expectation of life at birth, e0, is 76.1 years. If an individual in this population survives to age eighty, then he or she might expect to live 8.4 years longer. We might also note that the probability of surviving from birth to one's tenth birthday is *l*10/*l*0, or 0.99022. Given that one has already lived eighty years, the probability that one survives five additional years is *l*85/*l*80, or 33,629/49,276=0.68246.

## POPULATION PROJECTION

The life table, in addition, is often used to project either total population size or the size of specific age groups. In so doing, we must invoke a different interpretation of the n*L*x's and the *T*x's in the life table. We treat them as representing the age distribution of a *stationary population*—that is, a population having long been subject to zero growth. Thus, 5*L*20, for example, represents the number of twenty- to twenty-four-year-olds in the life table "population," into which *l*0, or 100,000, individuals are born each year. (One will note by summing the n*d*x column that 100,000 die every year, thus giving rise to stationarity of the life table population.)

If we were to assume that the United States is a *closed population*—that is, a population whose net migration is zero—and, furthermore, that the mortality levels obtaining in 1996 were to remain constant for the following ten years, then we would be able to project the size of any U.S. cohort up to ten years into the future. Thus, if we wished to know the number of fifty- to fifty-four-year-olds in 2006, we would take advantage of the following relation that is assumed to hold approximately:

where *τ* is the base year of the projection (e.g., 1996) and *t* is the number of years one is projecting the population forward. This equation implies that the fifty- to fifty-four-year-olds in 2006, 5*P*502006, is simply the number of forty-to forty-four-yearolds ten years earlier, 5*P*401996, multiplied by the proportion of forty- to forty-four-year-olds in the life table surviving ten years, 5*L*50/5*L*40.

In practice, it is appropriate to use the above relation in population projection only if the width of the age interval under consideration, *n*, is sufficiently narrow. If the age interval is very broad—for example, in the extreme case in which we are attempting to project the number of people aged ten and above in 2006 from the number zero and above (i.e, the entire population) in 1996—we cannot be assured that the life table age distribution within that interval resembles closely enough the age distribution of the actual population. In other words, if the actual population's age distribution within a broad age interval is significantly different from that within the corresponding interval of the life table population, then implicitly by using this projection device we are improperly weighing the component parts of the broad interval with respect to survival probabilities.

Parenthetically, if we desired to determine the size of any component of the population under *t* years old—in this particular example, ten years old—we would have to draw upon fertility as well as mortality information, because at time τ these individuals had not yet been born.

## HAZARDS MODELS

Suppose we were to examine the correlates of marital dissolution. In a life table analysis, the break-up of the marriage (as measured, e.g., by separation or divorce) would serve as the analogue to death, which is the means of exit in the standard life table analysis.

In the study of many duration-dependent phenomena, it is clear that several factors may affect whether an individual exits from a life table. Certainly, it is well-established that a large number of socioeconomic variables simultaneously impinge on the marital dissolution process. In many populations, whether one has given birth premaritally, cohabited premaritally, married at a young age, or had little in the way of formal education, among a whole host of other factors, have been found to be strongly associated with marital instability. In such studies, in which one attempts to disentangle the intricately related influences of several variables on survivorship in a given state, we invoke a hazards model approach. Such an approach may be thought of as a multivariate statistical extension of the simple life table analysis presented above (for theoretical underpinnings, see, e.g., Cox and Oakes 1984 and Allison 1984; for applications to marital stability, see, e.g., Menken, Trussell, Stempel, and Babakol 1981 and Bennett, Blanc, and Bloom 1988).

In the marital dissolution example, we would assume that there is a hazard, or risk, of dissolution at each marital duration, *d*, and we allow this duration-specific risk to depend on individual characteristics (such as age at marriage, education, etc.). In the *proportional hazards model*, a set of individual characteristics represented by a vector of covariates shifts the hazard by the same proportional amount at all durations. Thus, for an individual *i* at duration *d*, with an observed set of characteristics represented by a vector of covariates, * Z *i, the hazard function, μi(

*d*), is given by: where

*ß*is a vector of parameters and λ(

*d*) is the underlying duration pattern of risk. In this model, then, the underlying risk of dissolution for an individual

*i*with characteristics

*i is multiplied by a factor equal to exp[*

**Z***i*

**Z***ß*].

We may also implement a more general set of models to test for departures from some of the restrictive assumptions built into the proportional hazards framework. More specifically, we allow for time-varying covariates (for instance, in this example, the occurrence of a first marital birth) as well as allow for the effects of individual characteristics to vary with duration of first marriage. This model may be written as:
where λ(*d*) is defined as in the proportional hazards model, * Z *i(

*d*) is the vector of covariates, some of which may be time-varying, and

*ß*(

*d*) represents a vector of parameters, some of which may give rise to nonproportional effects. The model parameters can be estimated using the method of maximum likelihood. The estimation procedure assumes that the hazard,

**μ**i(

*d*), is constant within duration intervals. The interval width chosen by the analyst, of course, should be supported on both substantive and statistical grounds.

## INDIRECT DEMOGRAPHIC ESTIMATION

Unfortunately, many countries around the world have poor or nonexistent data pertaining to a wide array of demographic variables. In the industrialized nations, we typically have access to data from rigorous registration systems that collect data on mortality, marriage, fertility, and other demographic processes. However, when analyzing the demographic situation of less developed nations, we are often confronted with a paucity of available data on these fundamental processes. When such data are in fact collected, they are often sufficiently inadequate to be significantly misleading. For example, in some countries we have learned that as few as half of all actual deaths are recorded. If we mistakenly assume the value of the actual number to be the registered number, then we will substantially overestimate life expectancy in these populations. In essence, we will incorrectly infer that people are dying at a slower rate than is truly the case.

**The Stable Population Model.** Much demographic estimation has relied on the notion of stability. A *stable population* is defined as one that is established by a long history of unchanging fertility and mortality patterns. This criterion gives rise to a fixed proportionate age distribution, constant birth and death rates, and a constant rate of population growth (see, e.g., Coale 1972). The basic stable population equation is:

where *c*(*a*) is the proportion of the population exact age *a*, *b* is the crude birth rate, *r* is the rate of population growth, and *p*(*a*) is the proportion of the population surviving to exact age *a*. Various mathematical relationships have been shown to obtain among the demographic variables in a stable population. This becomes clear when we multiply both sides of the equation by the total population size. Thus, we have:

where *N*(*a*) is the number of individuals in the population exact age *a* and *B* is the current annual number of births. We can see that the number of people aged *a* this year is simply the product of the number of births entering the population *a* years ago—namely, the current number of births times a growth rate factor, which discounts the births according to the constant population growth rate, *r* (which also applies to the growth of the number of births over time)—and the proportion of a birth cohort that survives to be aged *a* today. Note that the constancy over time of the mortality schedule, *p*(*a*), and the growth rate, *r*, are crucial to the validity of this interpretation.

When we assume a population is stable, we are imposing structure upon the demographic relationships existing therein. In a country where data are inadequate, indirect methods allow us—by drawing upon the known structure implied by stability—to piece together sometimes inaccurate information and ultimately derive sensible estimates of the population parameters. The essential strategy in indirect demographic estimation is to infer a value or set of values for a variable whose elements are either unobserved or inaccurate from the relationship among the remaining variables in the above equation (or an equation deriving from the one above). We find that these techniques are robust with respect to moderate departures from stability, as in the case of quasi-stable populations, in which only fertility has been constant and mortality has been gradually changing.

**The Nonstable Population Model.** Throughout much of the time span during which indirect estimation has evolved, there have been many countries where populations approximated stability. In recent decades, however, many countries have experienced rapidly declining mortality or declining or fluctuating fertility and, thus, have undergone a radical departure from stability. Consequently, previously successful indirect methods, grounded in stable population theory, are, with greater frequency, ill-suited to the task for which they were devised. As is often the case, necessity is the mother of invention and so demographers have sought to adapt their methodology to the changing world.

In the early 1980s, a methodology was developed that can be applied to populations that are far from stable (see, e.g., Bennett and Horiuchi 1981; and Preston and Coale 1982). Indeed, it is no longer necessary to invoke the assumption of stability, if we rely upon the following equation:
where *r*(*x*) is the growth rate of the population at exact age *x*. This equation holds true for any closed population, and, indeed, can be modified to accommodate populations open to migration.

The implied relationships among the age distribution of living persons and deaths, and rates of growth of different age groups, provide the basis for a wide range of indirect demographic methods that allow us to infer accurate estimates of basic demographic parameters that ultimately can be used to better inform policy on a variety of issues. Two examples are as follows.

First, suppose we have the age distribution for a country at each of two points in time, in addition to the age distribution of deaths occurring during the intervening years. We may then estimate the completeness of death registration in that population using the following equation (Bennett and Horiuchi 1981):

where *N̂*(*a*) is the estimated number of people at exact age *a*, *D*(*x*) is the number of deaths at exact age *x* and *r*(*u*) is the rate of the growth of the number of persons at exact age *u* between the two time points. By taking the ratio of the estimated number of persons with the enumerated population, we have an estimate of the completeness of death registration in the population relative to the completeness of the enumerated population. This relative completeness (in contrast to an "absolute" estimate of completeness) is all that is needed to determine the true, unobserved age-specific death rates, which in turn allows one to construct an unbiased life table.

A second example of the utility of the nonstable population framework is shown by the use of the following equation:

where *N*(*x*) and *N*(*a*) are the number of people exact ages *x* and *a*, respectively, and x−a*p*a is the probability of surviving from age *a* to age *x* according to period mortality rates. By using variants of this equation, we can generate reliable population age distributions (e.g., in situations in which censuses are of poor quality) from a trustworthy life table (Bennett and Garson 1983).

## MORTALITY MODELING

The field of demography has a long tradition of developing models that are based upon empirical regularities. Typically in demographic modeling, as in all kinds of modeling, we try to adhere to the principle of parsimony—that is, we want to be as efficient as possible with regard to the detail, and therefore the number of parameters, in a model.

Mortality schedules from around the world reveal that death rates follow a common pattern of relatively high rates of infant mortality, rates that decline through early childhood until they bottom out in the age range of five to fifteen or so, then rates that increase slowly through the young and middle adult years, and finally rising more rapidly during the older adult ages beyond the forties or fifties. Various mortality models exploit this regular pattern in the data. Countries differ with respect to the overall level of mortality, as reflected in the expectation of life at birth, and the precise relationship that exists among the different age components of the mortality curve.

Coale and Demeny (1983) examined 192 mortality schedules from different times and regions of the world and found that they could be categorized into four "families" of life tables. Although overall mortality levels might differ, within each family the relationships among the various age components of mortality were shown to be similar. For each family, Coale and Demeny constructed a "model life table" for females that was associated with each of twenty-five expectations of life at birth from twenty through eighty. A comparable set of tables was developed for males. In essence, a researcher can match bits of information that are known to be accurate in a population with the corresponding values in the model life tables, and ultimately derive a detailed life table for the population under study. In less developed countries, model life tables are often used to estimate basic mortality parameters, such as *e*0 or the crude death rate, from other mortality indicators that may be more easily observable.

Other mortality models have been developed, the most notable being that by Brass (1971). Brass noted that one mortality schedule could be related to another by means of a linear transformation of the logits of their respective survivorship probabilities (i.e., the vector of *l*x values, given a radix of one). Thus, one may generate a life table by applying the logit system to a "standard" or "reference" life table, given an appropriate pair of parameters that reflect (l) the overall level of mortality in the population under study, and (2) the relationship between child and adult mortality.

## MARRIAGE, FERTILITY, AND MIGRATION MODELS

Coale (1971) observed that age distributions of first marriages are structurally similar in different populations. These distributions tend to be smooth, unimodal, and skewed to the right, and to have a density close to zero below age fifteen and above age fifty. He also noted that the differences in ageat-marriage distributions across female populations are largely accounted for by differences in their means, standard deviations, and cumulative values at the older ages, for example, at age fifty. As a basis for the application of these observations, Coale constructed a standard schedule of age at first marriage using data from Sweden, covering the period 1865 through 1869. The model that is applied to marriage data is represented by the following equation:
where *g*(*a*) is the proportion marrying at age *a* in the observed population and *μ*, σ, and *E* are, respectively, the mean and the standard deviation of age at first marriage (for those who ever marry), and the proportion ever marrying.

The model can be extended to allow for covariate effects by stipulating a functional relationship between the parameters of the model distribution and a set of covariates. This may be specified as follows:

where * X *i,

*i, and*

**Y***i are the vector values of characteristics of an individual that determine, respectively,*

**Z****μ**i, σi, and

*i, and*

**E***,*

**α***, and*

**ß***are the associated parameter vectors to be estimated.*

**Y**Because the model is parametric, it can be applied to data referring to cohorts who have yet to complete their marriage experience. In this fashion, the model can be used for purposes of projection (see, e.g., Bloom and Bennett 1990). The model has also been found to replicate well the first birth experience of cohorts (see, e.g., Bloom 1982).

Coale and Trussell (1974), recognizing the empirical regularities that exist among age profiles of fertility across time and space and extending the work of Louis Henry, developed a set of model fertility schedules. Their model is based in part on a reference distribution of age-specific marital fertility rates that describes the pattern of fertility in a *natural fertility population*—that is, one that exhibits no sign of controlling the extent of childbearing activity. When fitted to an observed age pattern of fertility, the model's two parameters describe the overall level of fertility in the population and the degree to which their fertility within marriage is controlled by some means of contraception. Perhaps the greatest use of this model has been devoted to comparative analyses, which is facilitated by the two-parameter summary of any age pattern of fertility in question.

Although the application of indirect demographic estimation methods to migration analysis is not as mature as that to other demographic processes, strategies similar to those invoked by fertility and mortality researchers have been applied to the development of model migration schedules. Rogers and Castro (1981) found that similar age patterns of migration obtained among many different populations. They have summarized these regularities in a basic eleven-parameter model, and, using Brass and Coale logic, explore ways in which their model can be applied satisfactorily to data of imperfect quality.

The methods described above comprise only a small component of the methodological tools available to demographers and to social scientists, in general. Some of these methods are more readily applicable than others to fields outside of demography. It is clear, for example, how we may take advantage of the concept of standardization in a variety of disciplines. So, too, may we apply life table analysis and nonstable population analysis to problems outside the demographic domain. Any analogue to birth and death processes can be investigated productively using these central methods. Even the fundamental concept underlying the above mortality, fertility, marriage, and migration models—that is, exploiting the power to be found in empirical regularities—can be applied fruitfully to other research endeavors.

## references

Allison, Paul D. 1984 *Event History Analysis*. Beverly Hills, Calif.: Sage Publications.

Bennett, Neil G., Ann K. Blanc, and David E. Bloom 1988 "Commitment and the Modern Union: Assessing the Link between Premarital Cohabitation and Subsequent Marital Stability." *American Sociological**Review* 53:127–138.

Bennett, Neil G., and Lea Keil Garson 1983 "The Centenarian Question and Old-Age Mortality in the Soviet Union, 1959–1970." *Demography* 20:587–606.

Bennett, Neil G., and Shiro Horiuchi 1981 "Estimating the Completeness of Death Registration in a Closed Population." *Population Index* 47:207–221.

Bloom, David E. 1982 "What's Happening to the Age at First Birth in the United States? A Study of Recent Cohorts." *Demography* 19:351–370.

——, and Neil G. Bennett 1990 "Modeling American Marriage Patterns." *Journal of the American Statistical**Association* 85 (December):1009–1017.

Coale, Ansley J. 1972 *The Growth and Structure of Human**Populations*. Princeton, N.J.: Princeton University Press.

——1971 "Age Patterns of Marriage." *Population**Studies* 25:193–214.

——, and Paul Demeny 1983 *Regional Model Life**Tables and Stable Populations*, 2nd ed. New York: Academic Press.

Coale, Ansley J., and James Trussell 1974 "Model Fertility Schedules: Variations in the Age Structure of Childbearing in Human Populations." *Population Index* 40:185–206.

Cox, D. R., and D. Oakes 1984 *Analysis of Survival Data*. London: Chapman and Hall.

Menken, Jane, James Trussell, Debra Stempel, and Ozer Babakol 1981 "Proportional Hazards Life Table Models: An Illustrative Analysis of Sociodemographic Influences on Marriage Dissolution in the United States." *Demography* 18:181–200.

Peters, Kimberley D., Kenneth D. Kochanek, and Sherry L. Murphy 1998 "Deaths: Final Data for 1996." *National Vital Statistics Reports*, vol. 47, no. 9. Hyattsville, Md.: National Center for Health Statistics.

Preston, Samuel H., and Ansley J. Coale 1982 "Age Structure, Growth, Attrition, and Accession: A New Synthesis." *Population Index* 48:217–259.

Rogers, Andrei, and Luis J. Castro 1981 "Model Migration Schedules." (Research Report 81–30) Laxenburg, Austria: International Institute for Applied Systems Analysis.

Neil G. Bennett