Simulation Models

views updated

SIMULATION MODELS

The term "simulation" is used to refer to a wide range of quantitative analytic strategies in the population sciences. Defining the term is thus a necessary task, but it also turns out to be a conceptually useful exercise. In a broad sense, demographers are nearly always engaged in simulating something. From aggregate population projections and forecasts to behavioral models for family and household formation, demographers use mathematical models to represent (i.e., simulate) the population processes and outcomes found in the real world. Using this general definition, simulation rapidly becomes indistinguishable from quantitative analysis, and ultimately applied mathematics. At the other end of the scale, the definition could be restricted to an explicit representation of the population dynamics of a stochastic process (e.g., a birth, or a job loss) that operates at the level of the individual, is not analytically tractable, and thus requires numerical methods for cumulating up to population-level outcomes over time. This definition would limit the term to the kinds of models that have become known as "stochastic microsimulation" and "agent-based modeling," and it would exclude all but a handful of current demographic methods and applications.

In the middle lies a definition that borrows slightly from each: the basic concept of projection from the most general version, and the absence of analytical tractability from the more restricted version. Under this definition, simulation is distinguished by a focus on dynamic modeling, and the need to explicitly calculate each step in the entire path of events to get from a starting to an ending state of the population. It is a projection of the path, rather than a solution for an end state. This definition excludes the standard life table methods for stationary and stable population projection that form the core of most formal demography, as these can be solved analytically for their equilibrium outcomes. It also excludes the statistical analysis of survey data that forms the core of most social demography, as this is focused on estimating the underlying rates and parameters of these processes rather than projecting their population-level outcomes over time. In effect, this definition of simulation excludes both the traditional macro methods of demographic analysis as well as the traditional micro methods.

That does not sound very promising at first glance, but what is left over is actually quite important: the middle ground that links the micro to the macro. Simulation techniques enable the analyst to specify interesting models of individual behavior and to investigate how these patterns interact and aggregate into population-level outcomes over time. Simulation is the social scientist's equivalent of a laboratory. It borrows from the micro level both the attention to individual level processes and the parameters estimated for these processes, without losing sight of the aggregate dynamic outcomes. And it borrows from the macro level the focus on population dynamics, without the constraint that the underlying model be analytically solvable. As a result, simulation allows a much more nuanced, theory-driven approach to investigating some of the most intriguing questions in the population sciences.

States and Rates

All forms of simulation have two basic components in the model: states–a set of classes that define the individuals in the population, and rates–a set of rules that define the dynamics of moving from one state to the next. The states can be indexed by attributes that divide the population into homogeneous classes, like race, years of experience with an employer, or vaccination status. But they can also be indexed at the level of the individual if maximal heterogeneity is desired. In that case, each individual has a unique index. The states can be based on attributes that do not change (sex), that change deterministically (age), or that change probabilistically (first birth) and allow return (marital status).

The rates are dynamic rules that describe the conditions under which events occur, and the states at risk of these events. In the simplest form, these rules can be specified as homogeneous fixed rates, such as a single aggregate fertility, mortality, or unemployment rate. But because most fixed rate systems are analytically tractable, they are typically not analyzed using simulation. Simulation is used instead to analyze settings in which the rates vary in more or less complex ways. They can vary over time, using any parametric or non-parametric specification. They can vary by the state indices described above, much like component projection life tables. And they can vary endogenously as a function of the state of the system, as in models for population growth that build in the carrying capacity of the environment. Rates can be specified in either deterministic or stochastic form.

Macro- and Micro-simulation

The range of models that emerge from these possible variations in states and rates can be grouped into two broad categories: macro-simulation models and micro-simulation models. (Evert van Imhoff and Wendy Post (1998) provide a good discussion of the similarities and differences.) Macro-simulation models divide the population into a limited number of states, with deterministic transition rates between states. Such models typically translate into a set of ordinary differential equations (often nonlinear) that can be solved numerically through iterative updating. Micro-simulation models operate with the states indexed at the individual level, and events specified as a stochastic process. These translate into an algorithm that can be implemented as a computer-generated Monte Carlo simulation.

A macrosimulation model. To get a feel for the difference, consider a model for the spread of an infectious sexually transmitted disease. A macro-simulation model might look something like Figure 1.

In Figure 1, three population states are S, the number of susceptible individuals; I, the number of infected individuals; and D, the cumulative number of deaths from infection. The two rates that govern transition from the states are R_i, the rate of infection (morbidity), and R_d, the death rate due to infection (mortality). The infection rate is a function of the number of susceptible and infected individuals, and the "force of infection" denoted by [.beta]. A simple specification for the force of infection is given by the following:

which represents the average contact rate c, and the probability of transmission given contact ?, per person (S + I). The death rate is a function of the number of persons infected and the death rate ? The system of differential equations that describes this process is:

If the force of infection, [.beta], is fixed, this system is analytically tractable, and one can solve for the usual quantities such as the reproduction rate, doubling time, and equilibrium prevalence. Norman Bailey's classic text (1957) and Roy Anderson's more recent volume (1982) are good references for these methods. But a slightly more realistic model would also include vital dynamics, at which point obtaining solutions for S(t), I(t), and D(t) will require numerical methods.

To implement this model, one needs to specify a set of input parameters for the rates, and initial conditions, such as the starting number in each state. The equations can then be iteratively solved. For each iteration, the value of the state variables will be updated sequentially by the amounts defined by the system of equations above. Because the rates are deterministically specified, the values predicted for S(t), I(t), and D(t) have no stochastic variability. If the simulation is run repeatedly with the same set of inputs, the results will be identical. Variation is only

FIGURE 1

obtained by varying the inputs (e.g., the components of the rates), and typically a researcher will vary some of the input parameters in order to conduct sensitivity analyses. With complex models that have many inputs, the dependence of the outcomes on the direct and interactive effects of the inputs is often of interest, and becomes an analytic task in its own right. One approach is to use a systematic scheme such as Latin hypercube sampling to generate data on both inputs and outputs; analyzing these data can in turn use more traditional statistical methods to provide numerical summaries of the sensitivity.

It is a relatively straightforward matter to modify models like this to obtain more realistic representations of the process, as there is no constraint that the system of equations remains analytically tractable. In the example above, one could add to the number of states–either breaking out additional states to represent stages of infection (e.g., the primary, latent, and secondary stages of syphilis; the variably infectious periods of HIV; or the acquired immunity of measles), or to represent population subgroups that may partner preferentially (e.g., age groups). The rates can also be modified, adding policy-relevant components like treatment or prophylaxis. Using sensitivity analyses on these rates, one can estimate the potential impact of alternative intervention strategies.

A microsimulation model. A micro-simulation model of this process, by contrast, consists of a repeated set of computer-generated experiments for the partnership and disease processes in which each person, partnership, and disease transmission is explicitly represented.

The procedure consists of the following steps:

Create a sample of (for example) 1,000 susceptible persons.
Randomly infect one person.
Randomly choose C pairs of persons without replacement to be the starting couples (which implies serial monogamy).
Randomly choose one pair of persons.
If these persons are both single: toss a coin with probability ?, to form a partnership. If they are a couple: toss a coin with probability ? that the partnership is dissolved; if it is a discordant couple (one S, one I), toss a coin with probability ? that transmission occurs.
For each infected person, toss a coin with probability ?, that that person dies.
Return to step 3.

Given the probabilistic nature of the events, repeated runs of this micro-simulation model will result in different outcomes. In this particular case, the micro-simulation will lead, on average, to the same solutions for S(t), I(t), and D(t) as the macro-simulation above, as long as the partnership dynamics are set to be consistent with the dynamics implied by the contact rate c in the macro model.

Differences between Micro- and Macro-simulation

For equivalent models of the underlying process, then, the difference between the micro- and macro-simulation is not in the outcomes they project, but in the flexibility they allow to represent the details of the process, and the different kinds of information they provide.

One of these differences is in the treatment of uncertainty. The variation in outcomes produced by repeated runs of the micro-simulation model provides an estimate of the uncertainty inherent in the dynamic process. In some settings this variation can be substantial, and the ability to quantify it very important. While principled methods for estimating the uncertainty associated with deterministic macro-simulation projections have been developed by Adrian Raftery and his colleagues (1995), they require additional effort to implement. For stochastic micro-simulation, the variability is part of the output, and can be analyzed using standard techniques. Another difference is the size of the population that can be modeled–a difference that is reflected in the labels macro and micro.

Macro models, dealing with aggregate subgroups, can be used to simulate arbitrarily large populations. Their limitations are driven by the number of subgroups rather than the number of persons within each subgroup. Micro models, because they represent each individual, are limited in the size of population that can be simulated. But this limitation is also the source of their flexibility. Micro-simulation models make it possible to investigate more complicated and detailed dynamics. In the disease transmission example used above, it is possible simply to relax the monogamy constraint in the micro-simulation to allow for much more complicated partnership network patterns. This requires a small change in the dynamic rule, removing the restriction against partnership formation for persons already in a partnership. The result will be not just the emergence of "two-stars" (a person with two partners), but larger configurations like three-stars, triangles, 4-cycles, and long paths. All of this will come from the relaxation of the constraint, not from explicit parameterization of each form. By contrast, the macro-simulation model would require that each network configuration (that is, each infection composition category) be explicitly broken out and represented as a state, and all of the transitions between the states would need to be specified. Both the model, and the data requirements, quickly become overwhelming.

Computational and Documentation Issues

Advances in computing power and software development have put the simulation toolkit within the reach of most researchers. For those interested in macro-simulation, there are a number of useful packages, like STELLA and Madonna, that allow the programming for macro-simulations to be done using a flexible, intuitive graphical interface, and require only a standard desktop computer. These packages can easily be self-taught, and, like popular statistical packages, they make it possible (for better or worse) to ignore almost all of the mathematical subtleties needed for the solution of the equation system. They allow the researcher to focus on modeling the states and rate functions, where demographic expertise is most important. This also makes them well suited for teaching basic introductory courses on modeling population dynamics. (An example is the text by Hannon and Ruth, included in the Bibliography.)

For those interested in micro-simulation, it is still typically necessary to be able to program in a language like Fortran, C+, or Java. But there are several packages that are useful for demographers, including the following:

Socsim <http://www.demog.berkeley.edu/~wachter/socstory.html>
Lipro <http://www.nidi.nl/research/prj70101.html>
Urbansim <http://www.urbansim.org/papers/Urbansim_Reference_Guide-09.pdf>
Swarm <http://www.swarm.org/index.html> and
Sugarscape <http://www.brook.edu/dybdocroot/sugarscape/>.

These programs require considerably more sophistication to use than their macro-simulation counterparts, and often use more computing power, but the payoff is much more control over the process being modeled. With the growing interest in these kinds of models in all of the sciences, it is likely that the software technology for micro-simulation modeling will evolve rapidly in coming years. For now, the choice of which model to use should be guided by the nature of the process being modeled, the level of detail needed, and the technical resources available.

One of the challenges posed by simulation models is how to describe them in published analyses. In contrast to standard statistical methods, the programs built to run both macro- and micro-simulations are purpose built, and every analysis is different–both in terms of the state space, and in the specification of the transition probabilities. There is no generally accepted standard for documenting these programs, and there is typically not enough space in published articles to describe the complete set of assumptions and algorithms. One proposed solution is to set up websites associated with journals to publish the programs behind the articles. This approach has been explored by journals like Nature, but in general, the question of how to validate and replicate simulation-based research through the standard publication mechanism remains an unsettled issue.

Role of Simulation in Demography

Simulation has been used in many areas of demography, for nearly as long as the computer has been available as a research tool. Examples include population projection, from the classic text by Mindel Sheps and Jane Menken (1973) to more recent volumes (for example, by Wolfgang Lutz, James Vaupel et al, 1999) and the projection of kinship resources under changing fertility (or mortality) regimes by Jane Menken (1985) and Ken Wachter (1997).

To take one instance, this kind of work is likely to play an increasingly important role in understanding the demographic impact of AIDS in populations experiencing a generalized epidemic. The study of the population dynamics of HIV transmission and the demographic consequences of AIDS has stimulated a substantial simulation-based literature, especially in the analysis of transmission networks. Examples include the work of Roy Anderson and Robert May (1988), John Bongaarts (1989), Alberto Palloni (1996), and Martina Morris (1997).

The concept of evolution is rooted in dynamic models, so simulation methods are also found in all areas of the population sciences that deal with evolution, from the population genetics of Hartl and Clark (1989) and the evolutionary biology of Simon Levin (1994) to the models of cultural evolution developed by Eugene Hammel and his colleagues (1979) and Luigi Cavalli-Sforza and Marcus Feldman (1981), and Herb Gintis's (2000) work on evolutionary game theory.

Simulation models have also played a major role in the study of the population-environment system. Examples include the work by Jay Forrester (1969) and Paul Waddell (2002) in urban ecology, and Elinor Ostrom's (1990) work on "governing the commons." In many of these areas, the models of evolutionary social dynamics, freed from the constraints of mathematical convenience, are providing insights into social systems that challenge the findings of general equilibrium models, and hold great promise for theory development in the social sciences. For the first time, the models enable researchers to represent the full range of dynamics induced by social interaction at the individual, institutional, and environmental levels. The research agenda opened by these methods is both daunting and exhilarating.

It would therefore be a mistake to file simulation methods under "advanced mathematical demography" and assume they are the province of a few select wizards and marginal to the field. Demography has always been fundamentally tied to its methods. Formal demography was for many years what distinguished a demographer from other social scientists. But, constrained by the requirements of mathematically tractable solutions, the development of formal demography became increasingly technical and difficult after the 1970s. As a result, the field grew rapidly where the data and the methods were more accessible, and the theory less constrained by the math–in the micro-level analyses that are the hallmark of contemporary social demography. Micro-level research has deepened the roots of demography in its constituent social science disciplines, and given it a stronger base in theories of human behavior. Without the macro-level connection, however, the population is missing from population science. That connection is what simulation has to offer. If there is going to be a vital demography in the future, the simulation methods described here will likely be at its core, linking the micro to the macro. With these new tools, demographers will again be able to explore the frontiers of research on population dynamics, with a set of tools that facilitates theoretically richer models.

bibliography

Anderson, Roy M. 1982. The Population Dynamics of Infectious Diseases. London: Chapman Hall.

Anderson, Roy M., Robert M. May, et al. 1988. "Possible Demographic Consequences of AIDS in Developing Countries." Nature 332: 228–234.

Bailey, Norman T. J. 1975. The Mathematical Theory of Infectious Diseases. New York: Hafner Press.

Bongaarts, John. 1989. "A Model of the Spread of HIV Infection and the Demographic Impact of AIDS." Statistics in Medicine 8: 103–120.

Cavalli-Sforza, Luigi L., and Marcus W. Feldman. 1981. Cultural Transmission and Evolution: A Quantitative Approach. Princeton, NJ: Princeton University Press.

Forrester, Jay Wright. 1969. Urban Dynamics. Cambridge, MA: M.I.T. Press.

Gintis, Herbert. 2000. Game Theory Evolving: A Problem-Centered Introduction to Modeling Strategic Behavior. Princeton, NJ: Princeton University Press.

Hammel, Eugene A., Chad K. McDaniel, et al. 1979. "Demographic Consequences of Incest Tabus: A Microsimulation Analysis." Science 205: 972–977.

Hannon, Bruce, and Matthias Ruth. 1997. Modeling Dynamic Biological Systems. New York: Springer.

Hartl, Daniel L., and Andrew G. Clark. 1989. Principles of Population Genetics. Sunderland, MA: Sinauer.

Levin, Simon A., ed. 1994. Frontiers in Mathematical Biology. New York: Springer-Verlag.

Lutz, Wolfgang, James W. Vaupel and Dennis A. Ahlburg, eds. 1999. Frontiers of Population Forecasting, Supplement to Vol. 24 of Population and Development Review. New York: Population Council.

Menken, Jane A. 1985. "Age and Fertility: How Late Can You Wait?" Demography 22: 469–483.

Morris, Martina. 1997. "Sexual Networks and HIV." AIDS 11: S209–S216.

Ostrom, Elinor. 1990. Governing the Commons: The Evolution of Institutions for Collective Action. Cambridge, Eng.: Cambridge University Press.

Palloni, Alberto. 1996. "Demography of HIV/AIDS." Population Index 62: 601–652.

Raftery, Adrian E., Geof H. Givens, et al. 1995. "Inference from a Deterministic Population Dynamics Model for Bowhead Whales (with Discussion)." Journal of the American Statistical Association 90: 402–430.

Sheps, Mindel C., and Jane A. Menken. 1973. Mathematical Models of Conception and Birth. Chicago: University of Chicago Press.

van Imhoff, Evert, and Wendy Post. 1998. "Microsimulation Methods for Population Projection." Population: An English Selection 10: 97–138.

Wachter, Kenneth W. 1997. "Kinship Resources for the Elderly." Philosophical Transactions of the Royal Society of London Series B-Biological Sciences 352: 1,811–1,817.

Waddell, Paul. 2002. "Urban Sim: Modeling Urban Development for Land Use, Transportation and Environmental Planning." Journal of the American Planning Association. 68(3): 297–314.

Martina Morris

Encyclopedia of Population