Stochastic Population Theory

views updated

STOCHASTIC POPULATION THEORY

Stochastic theory deals with random influences on populations and on the vital events experienced by their members. It builds on the deterministic mathematical theory of renewal processes and stable populations. Concentrating on structural and predictive models, it is distinct from statistical demography, which also deals with randomness but in the context of data analysis and inference under uncertainty. This entry treats macrodemographic processes of population growth and structure first, and microdemographic processes of individual experience second. Basic background for all these subjects is found in the classic textbooks by the demographer Nathan Keyfitz published in 1968 and 1985.

Random Rates and Random Draws

The population theorist Joel Cohen, in a 1987 encyclopedia article, has drawn a useful distinction between two sources of randomness, which he called environmental and demographic and that are also denoted with the terms random rates and random draws. Random rate models assume that the schedules of fertility, mortality, and migration that govern population change are not fixed but themselves fluctuate in response to partly haphazard exogenous influences from climate, economic and political factors, resources, or disease. Models with random draws take the population-level schedules as fixed and concentrate on the chance outcomes for individuals, like drawing cards from a shuffled deck. The lifetable is a model for random draws. Its _x column tells the probability that a randomly selected member of the population will "draw" an age at death older than x from the lottery of fate.

Random rate models for population growth and age structure have been proved to share some of the best properties of deterministic models. In particular, population age pyramids tend to forget their past, in the sense that the distribution of the population by age tends over time to become independent of the initial age distribution. This property is called ergodicity. Following the work of Z. M. Sykes, which was expanded upon by Young Kim, powerful theorems were proved by Hervé LeBras in 1971 and 1974 and by Cohen in 1976 and 1977. For example, under certain reasonable conditions, means, variances, and other moments of the proportions in age groups become independent of the initial age distribution and the number of births per year comes to fit a lognormal distribution. Much of the general theory of population dynamics in variable environments can be brought to bear; Shripad Tuljapurkar's 1990 book Population Dynamics in Variable Environments is a good source.

Time series models from economic demography, like ARIMA (Autoregressive Integrated Moving Average) models, are examples of random rate models. The economic demographer Ronald Lee and his colleagues, from the 1970s onward, studied short-term fluctuations in births and deaths in European preindustrial populations. They discovered systematic patterns of lagged responses to prices and previous vital rates, shedding light on historical population regulation.

In the 1990s random rate models were applied to the practical problem of putting measures of uncertainty analogous to confidence intervals around population projections. In 1992 Lee and Larry Carter introduced a stochastic model for forecasting mortality from historical trends and fluctuations in the logarithms of age-specific death rates. For many developing countries, the index of the level of overall mortality turns out to be well-modeled by a random walk with a constant country-specific drift. However, variability from age to age around the overall level remains poorly understood. Harking back to work of Keyfitz and Michael Stoto, Nico Keilman, Wolfgang Lutz and other demographers and statisticians have modeled historical patterns of errors in earlier forecasts and used them to generate uncertainty bounds for new forecasts, an approach surveyed in the National Research Council's 2000 volume "Beyond Six Billion," edited by John Bongaarts and Randy Bulatao.

Models for random draws, given fixed vital schedules, underlie much of demography. Branching processes were invented by I. J. Bienyamé in 1845 and rediscovered by Francis Galton and H. W. Watson around 1873. The number of progeny (males or females but not both) in each family in a population is assumed to be drawn independently from a given family-size distribution, and lines of descent form a random tree through succeeding generations. If the mean number of progeny is less than or equal to one, the probability of extinction is one. The randomness rules out eternally stationary populations.

The general model for random population dynamics in use by demographers, with age-dependent branching structure, was given its full mathematical specification by the statistician David Kendall in 1949. With random draws, the assumed statistical independence from unit to unit makes the standard deviations in demographic observables (like the sizes of age groups or counts of births and deaths) tend to vary like the square root of population size, but with a constant of proportionality that can be predicted from the models. With random rates (random from time to time but uniform across members of a population), the standard deviations tend to vary like the population size itself, and the constant of proportionality is a free parameter. Kenneth Wachter has studied the relative strengths of the two kinds of randomness in historical populations. Mixed models with partial independence from place to place and group to group are now on the horizon, drawing on measurements of geographical heterogeneity and covariation featured, for instance, by LeBras.

The rise of genomics has created new interest in stochastic models like branching processes which generate genealogical trees for individuals or genes in populations. With branching processes, total population size varies endogenously from generation to generation. Geneticists tend to prefer the models of Sewell Wright and R. A. Fisher in which total population size is constrained to be constant or to vary in an exogenously specified fashion. Samuel Karlin and Howard Taylor give full accounts in their 1981 textbook. In 1982 John Kingman developed a general theory of coalescence. Coalescent processes work backwards in time, starting, for instance, with living women and tracing their mothers, their mothers' mothers, and their maternal ancestors in each prior generation until all the lines coalesce in a single most recent common ancestress. A central result of this subject is a formula equating the mean number of generations back to coalescence with twice the (constant) "effective" population size. Data on differences in DNA sequences in present-day populations can be combined with these stochastic models to yield estimates, still controversial, of population sizes over hundreds of thousands of years or more. This work promises new opportunities for understanding the balance between random fluctuations and the dynamics of long-term population control.

Microdemographic Processes

At the microdemographic level, elements of chance impinge on most life-course transitions for individuals, on social determinants and motivations, and on the basic biology of conception, childbirth, survival, and death. The attention of demographers has focused particularly on the sources and consequences of heterogeneity from person to person in probabilities of vital events.

Drawing on statistical renewal theory, the demographers Mindel Sheps and Jane Menken, in their classic 1973 study Mathematical Models of Conception and Birth, modeled a woman's interval between births as a sum of independent random waiting times with their own parametric probability distributions. An important feature is heterogeneity from woman to woman in fecundability–that is, in probability of conception given full exposure to the risk of conception. James Wood and Maxine Weinstein applied refined models to the analysis of reproductive life history data. Inferring the strength of components of heterogeneity from observational data is difficult, and birth intervals provide one of the prime examples for non-parametric methods for estimating unobserved heterogeneity developed by James Heckman and Burton Singer in 1984.

In microdemography, stochastic models are required when variances are important along with mean values. For instance, the proportion of older people without living children may be more important than the mean number of living children per older person across the population. For such purposes, estimates of mean numbers of kin from stable population theory developed in 1974 by sociologists Leo Goodman, Keyfitz, and Thomas Pullum, need to be extended, generally through the use of demographic microsimulation. In microsimulation, a list of imaginary individuals is kept in the computer, and, time interval after time interval, the individuals are assigned events of marriage, childbirth, migration, death, and other transitions by comparing computer-generated pseudo-random numbers with user-specified schedules of demographic rates. An example is the SOCSIM program (short for "Social Structure Simulation") of Eugene Hammel and Kenneth Wachter. SOCSIM was first applied to estimate demographic constraints on preindustrial English households in collaboration with Peter Laslett, and later applied, as in Wachter's 1997 study, to forecasts of the kin of future seniors in the United States, England, and elsewhere.

Longevity and Frailty

Given that the life table is, in a sense, the oldest stochastic model in demography, it is not surprising that stochastic models for mortality by age, and specifically for heterogeneity in mortality, are prominent. From the 1980s onward, the Duke demographer Kenneth Manton and his colleagues developed multivariate stochastic models for health and survival transitions. These models are macrodemographic models inasmuch as they are driven by transition rates for population aggregates, but they are designed to make efficient use of individual-level data from longitudinal studies. They have become a prime tool for disentangling the roles of interacting covariates, including behaviors like smoking or physical conditions like blood pressure, in risks of death. Singer has pressed stochastic modeling into service for studies of effects of whole sequences of life-course experiences on health.

James Vaupel, Anatoli Yashin, Manton, Eric Stallard, and their colleagues developed, between 1979 and 1985, a model for hazard curves based on a concept of heterogeneous frailty. The hazard curve is a mathematically convenient representation of age-specific rates of mortality in terms of the downward slope of the logarithm of the proportion of a cohort surviving to a given age. In the frailty model, the shape of the hazard curve as a function of age is the same for everyone, but the level varies from person to person by a factor, fixed throughout each person's life, called the person's frailty. Frailties are often assumed to follow a Gamma probability distribution, while the common shape of the hazard curves at older ages is often taken to be an exponential function in line with the model of Benjamin Gompertz introduced in 1825. As a cohort ages, people with higher frailties die more quickly, leaving a set of survivors selected for lower frailties. The 2000 article "Mortality Modeling: A Review" by Yashin, Ivan Iachine and A. Begun is a good introduction.

The frailty model of Vaupel and his colleagues figures prominently in the biodemography of longevity, where human hazard curves are compared with hazard curves in other species including fruit flies, nematode worms, and yeast. Such heterogeneity may be a significant contributor to observed tapering in hazard curves at extreme ages found across species. Alternative stochastic models include models based on statistical reliability theory for complex engineered systems proposed by Leonid Gavrilov and Natalia Gavrilova, and Markov models for the evolution of hazard curves through step-by-step transitions suggested by genetic theory.

The study of fertility transitions, in the aspect in which it emphasizes social interaction and the diffusion of attitudes, information and innovations, relies extensively on a broader class of stochastic models. In 2001 Hans-Peter Kohler drew on random pathdependent models advanced in the 1980s by Brian Arthur and other economists, to explain patterns of contraceptive choice. The amplifying effects of peer-group influences is a significant theme in accounts of very low fertility in developed societies.

Stochastic theory provides a unifying framework which ties together the many substantive areas of demography as a whole and links them with active research fronts in the other biological and social sciences.

TABLE 1