Integrated Public Use Microdata Series
Integrated Public Use Microdata Series
Integrated Public Use Microdata Series (IPUMS) refers to three distinct collections of individual-level data. IPUMS-USA is a coherent individual-level national database describing the characteristics of 100 million Americans in fifteen decennial census years spanning the period from 1850 through 2000. IPUMS-International includes data from 47 censuses of 13 countries, and new data are being added regularly. IPUMS-CPS provides annual data from the March demographic supplement of the Current Population Survey for the period from 1962 to 2006.
IPUMS contains information about both individuals and households in a hierarchical structure, so researchers can construct new variables based on information from multiple household members. Because it is microdata as opposed to summary aggregate data, IPUMS allows researchers to create tabulations tailored to particular research questions and to carry out individual-level multivariate analyses. Among key research areas that can be studied using the IPUMS are economic development, poverty and inequality, industrial and occupational structure, household and family composition, the household economy, female labor force participation, employment patterns, population growth, urbanization, internal migration, immigration, nuptiality, fertility, and education.
The database includes comprehensive hypertext documentation—the equivalent of over 10,000 pages of text—including detailed analyses of the comparability of every variable across every census year. Both the database and the documentation are distributed through an on-line data access system at http://ipums.org, which provides powerful extraction and search capabilities to allow easy access to both metadata and microdata. The National Science Foundation and the National Institutes of Health fund the project, so all data and documentation are available to researchers without cost.
IPUMS-USA is the oldest and best known of the three data series. It combines nationally representative probability samples produced by the Census Bureau for the period since 1940 with new high-precision historical samples produced at the University of Minnesota and elsewhere. By putting all the samples in a common format, imposing consistent
variable coding, and carefully documenting changes in variables over time, IPUMS is designed to facilitate the use of the census samples as a time series.
The sizes of IPUMS-USA samples are shown in Table 1. The only census year missing from the series is 1890, which was destroyed by fire. From 1970 onward, available samples include at least six percent of the population, and census years prior to 1970 cover approximately one percent of the population. Because the population grew rapidly in the nineteenth and twentieth centuries, the older 1 percent samples are considerably smaller than recent ones.
The large samples available for recent census years have proven valuable for study of small population subgroups, ranging from same-sex couples to the grandchildren of immigrants. In many instances, the large samples also permit the use of innovative methods; to take just one example, these files have allowed demographers to carry out multilevel contextual analyses by making it feasible to assess the characteristics of small geographic areas. Accordingly, projects are underway to create larger samples for the earlier census years. As shown in Table 1, higher-density samples are in preparation for the 1880, 1900, 1930, and 1960 census years.
IPUMS-USA is designed to encourage analyses that incorporate multiple census years for the study of change over time. The census has always contained certain core questions that are generally comparable over the entire time span of the database. Other questions have come and gone. Table 1 describes many of the subject areas covered by the census since 1850 (many of these topics correspond to multiple variables in the database). In general, the IPUMS samples include all the census questions available in each year, but for the period from 1940 onward, some detail is suppressed in order to preserve respondent confidentiality. In particular, geographic detail is far superior in the pre-1940 samples. In fact, the samples for the period prior to 1940 include the names and addresses of the respondents. On the other hand, topics such as income, educational attainment, and migration have only been covered by the census for the last six decades of the twentieth century. All IPUMS samples also include a common set of constructed variables to allow easy data manipulation
|Availability of select IPUMS-USA subject areas, 1850–2000|
|Note:X = available in that census year; P = available in a future data release.|
|County group/microdata area||—||—||—||—||—||—||—||—||—||P||X||X||X||X|
|State economic area||X||X||X||X||X||X||X||X||X||X||P||—||—||—||—|
|Size of place||X||X||X||X||X||X||X||X||X||X||—||—||X||X||X|
|Ownership of dwelling||—||—||—||—||X||X||X||X||—||—||X||X||X||X||X|
|Value of house or property||—||—||—||—||—||—||—||X||X||X||X||X||X||X||X|
|Total family income||—||—||—||—||—||—||—||—||—||X||X||X||X||X||X|
|Relationship to household head||—||—||—||X||X||X||X||X||X||X||X||X||X||X||X|
|Age at first marriage||—||—||—||—||—||—||—||X||—||X||X||X||X||—||—|
|Duration of marriage||—||—||—||—||X||X||—||—||X||—||—||—||—||—||—|
|Children ever born||—||—||—||—||X||X||—||—||X||X||X||X||X||X||—|
|Years in the United States||—||—||—||—||X||X||X||X||X||—||—||X||X||X||X|
|Class of worker||—||—||—||—||—||X||X||X||X||X||X||X||X||X||X|
|Weeks worked last year||—||—||—||—||—||—||—||—||X||X||X||X||X||X||X|
|Total personal income||—||—||—||—||—||—||—||—||—||X||X||X||X||X||X|
|Wage and salary income||—||—||—||—||—||—||—||—||X||X||X||X||X||X||X|
|Value of personal or real estate||X||X||X||—||—||—||—||—||—||—||—||—||—||—||—|
|Surname similarity code||X||X||X||X||X||X||X||X||X||X||—||—||—||—||—|
Most important among these is a set of family interrelationship variables that have proven broadly useful in the construction of consistent family composition and own-child fertility measures.
In 2010 the scope of the U.S. census will be sharply reduced. The Census Bureau plans to eliminate the detailed long-form census questionnaire, and the census will include only a few basic inquiries such as age, sex, race, and relationship of each person to the householder. Information about such topics as income, education, housing, migration, or disability will instead be provided by the American Community Survey (ACS). With virtually the same questions as the Census 2000 long form questionnaire, the ACS provides data on three million households each year. The ACS has released an annual public use microdata file since 2000, and IPUMS has incorporated these samples. For most purposes, ACS samples are closely comparable to those from censuses, so the shift from census to surveys introduces only minor discontinuities to the data series.
|Brazil||1970, 1980, 1991, 2000|
|Chile||1960, 1970, 1982, 1992, 2002|
|Colombia||1964, 1973, 1985, 1993|
|Costa Rica||1963, 1973, 1984, 2000|
|Ecuador||1962, 1974, 1982, 1990, 2001|
|France||1962, 1968, 1975, 1982, 1990|
|Mexico||1960, 1970, 1990, 2000|
|South Africa||1996, 2001|
|United States||1960, 1970, 1980, 1990, 2000|
|Venezuela||1971, 1981, 1990|
|Samples in progress|
|Argentina||1970, 1980, 1991, 2001|
|Austria||1971, 1981, 1991, 2001|
|Bolivia||1976, 1992, 2001|
|Canada||1971, 1981, 1991|
|Czech Rep.||1991, 2001|
|Dominican R.||1960, 1970, 1981|
|Fiji||1966, 1986, 1996|
|Greece||1971, 1981, 1991, 2001|
|Honduras||1961, 1974, 1988|
|Hungary||1970, 1980, 1990, 2001|
|Indonesia||1971, 1980, 1990, 1995|
|Israel||1961, 1972, 1983, 1995|
Researchers who require annual data for the preceding four decades can turn to IPUMS-CPS, which provides a coherent version of the widely used U.S. population survey produced by the Bureau of Labor Statistics. The CPS includes virtually all the subject areas covered by the decennial census and the ACS, but provides much greater detail in certain areas, such as health insurance coverage.
The IPUMS-International project is extending the IPUMS paradigm to approximately fifty countries around the world. Large quantities of machine-readable microdata survive from census enumerations since the 1950s, but few of them are available to researchers and most are at risk of becoming unreadable. The first goals of the IPUMS-International project are to preserve machine-readable census microdata files wherever possible and to obtain permission to disseminate anonymized samples of the data to researchers. Then—just like the original IPUMS project—researchers convert the data into consistent format, supply comprehensive documentation, and make microdata and documentation available through a Web-based data dissemination system. Table 2 summarizes current and planned IPUMS-International data releases. The project began releasing data in 2003, and with information on 143 million persons, by June 2006 it already exceeded the scale of IPUMS-USA. The project has concluded agreements to preserve and disseminate data from some 200 censuses in 71 countries; negotiations with most of the other major countries of the world are underway. Information on the IPUMS-International release schedule is available at http://ipums.org.
Integrated Public Use Microdata Series. Minnesota Population Center. http://ipums.org.
Ruggles, Steven. 2006. The Minnesota Population Center Data Integration Projects: Challenges of Harmonizing Census Microdata Across Time and Place. 2005 Proceedings of the American Statistical Association, Government Statistics Section. Alexandria, VA: American Statistical Association.
Ruggles, Steven, et al. 2003. IPUMS-International. Historical Methods 36 (2): 60–65.
Ruggles, Steven, et al. 2003. IPUMS Redesign. Historical Methods 36 (1): 9–21.