Integrated Public Use Microdata Series

views updated

Integrated Public Use Microdata Series


Integrated Public Use Microdata Series (IPUMS) refers to three distinct collections of individual-level data. IPUMS-USA is a coherent individual-level national database describing the characteristics of 100 million Americans in fifteen decennial census years spanning the period from 1850 through 2000. IPUMS-International includes data from 47 censuses of 13 countries, and new data are being added regularly. IPUMS-CPS provides annual data from the March demographic supplement of the Current Population Survey for the period from 1962 to 2006.

IPUMS contains information about both individuals and households in a hierarchical structure, so researchers can construct new variables based on information from multiple household members. Because it is microdata as opposed to summary aggregate data, IPUMS allows researchers to create tabulations tailored to particular research questions and to carry out individual-level multivariate analyses. Among key research areas that can be studied using the IPUMS are economic development, poverty and inequality, industrial and occupational structure, household and family composition, the household economy, female labor force participation, employment patterns, population growth, urbanization, internal migration, immigration, nuptiality, fertility, and education.

The database includes comprehensive hypertext documentationthe equivalent of over 10,000 pages of textincluding detailed analyses of the comparability of every variable across every census year. Both the database and the documentation are distributed through an on-line data access system at, which provides powerful extraction and search capabilities to allow easy access to both metadata and microdata. The National Science Foundation and the National Institutes of Health fund the project, so all data and documentation are available to researchers without cost.

IPUMS-USA is the oldest and best known of the three data series. It combines nationally representative probability samples produced by the Census Bureau for the period since 1940 with new high-precision historical samples produced at the University of Minnesota and elsewhere. By putting all the samples in a common format, imposing consistent

variable coding, and carefully documenting changes in variables over time, IPUMS is designed to facilitate the use of the census samples as a time series.

The sizes of IPUMS-USA samples are shown in Table 1. The only census year missing from the series is 1890, which was destroyed by fire. From 1970 onward, available samples include at least six percent of the population, and census years prior to 1970 cover approximately one percent of the population. Because the population grew rapidly in the nineteenth and twentieth centuries, the older 1 percent samples are considerably smaller than recent ones.

The large samples available for recent census years have proven valuable for study of small population subgroups, ranging from same-sex couples to the grandchildren of immigrants. In many instances, the large samples also permit the use of innovative methods; to take just one example, these files have allowed demographers to carry out multilevel contextual analyses by making it feasible to assess the characteristics of small geographic areas. Accordingly, projects are underway to create larger samples for the earlier census years. As shown in Table 1, higher-density samples are in preparation for the 1880, 1900, 1930, and 1960 census years.

IPUMS-USA is designed to encourage analyses that incorporate multiple census years for the study of change over time. The census has always contained certain core questions that are generally comparable over the entire time span of the database. Other questions have come and gone. Table 1 describes many of the subject areas covered by the census since 1850 (many of these topics correspond to multiple variables in the database). In general, the IPUMS samples include all the census questions available in each year, but for the period from 1940 onward, some detail is suppressed in order to preserve respondent confidentiality. In particular, geographic detail is far superior in the pre-1940 samples. In fact, the samples for the period prior to 1940 include the names and addresses of the respondents. On the other hand, topics such as income, educational attainment, and migration have only been covered by the census for the last six decades of the twentieth century. All IPUMS samples also include a common set of constructed variables to allow easy data manipulation

Table 2
Availability of select IPUMS-USA subject areas, 18502000
Note:X = available in that census year; P = available in a future data release.
Household record               
County group/microdata area PXXXX
State economic areaXXXXXXXXXXP
Metropolitan areaXXXXXXXXXXXXXX
Urban/rural statusXXXXXXXXXXXX
Ownership of dwellingXXXXXXXXX
Mortgage statusXXXXXX
Value of house or propertyXXXXXXXX
Monthly rentXXXXXXXX
Total family incomeXXXXXX
Person record               
Relationship to household headXXXXXXXXXXXX
Marital statusXXXXXXXXXXXX
Age at first marriageXXXXX
Duration of marriageXXX
Times marriedXXXXXX
Children ever bornXXXXXXXX
Parents birthplacesXXXXXXXXX
Ancestry            XXX
Years in the United StatesXXXXXXXXX
Mother tongueXXXXXX
Language spokenXXXXXXX
School attendanceXXXXXXXXXXXXXXX
Educational attainmentXXXXXXX
Employment statusXXXXXXXXX
Class of workerXXXXXXXXXX
Weeks worked last yearXXXXXXX
Weeks unemployedXXXXX
Total personal incomeXXXXXX
Wage and salary incomeXXXXXXX
Value of personal or real estateXXX
Migration statusXXXXXXX
Veteran statusXXXXXXXXX
Surname similarity codeXXXXXXXXXX

Most important among these is a set of family interrelationship variables that have proven broadly useful in the construction of consistent family composition and own-child fertility measures.

In 2010 the scope of the U.S. census will be sharply reduced. The Census Bureau plans to eliminate the detailed long-form census questionnaire, and the census will include only a few basic inquiries such as age, sex, race, and relationship of each person to the householder. Information about such topics as income, education, housing, migration, or disability will instead be provided by the American Community Survey (ACS). With virtually the same questions as the Census 2000 long form questionnaire, the ACS provides data on three million households each year. The ACS has released an annual public use microdata file since 2000, and IPUMS has incorporated these samples. For most purposes, ACS samples are closely comparable to those from censuses, so the shift from census to surveys introduces only minor discontinuities to the data series.

Table 3
IPUMSInternational samples
Available 6/2006 
Brazil1970, 1980, 1991, 2000
Chile1960, 1970, 1982, 1992, 2002
Colombia1964, 1973, 1985, 1993
Costa Rica1963, 1973, 1984, 2000
Ecuador1962, 1974, 1982, 1990, 2001
France1962, 1968, 1975, 1982, 1990
Kenya1989, 1999
Mexico1960, 1970, 1990, 2000
South Africa1996, 2001
United States1960, 1970, 1980, 1990, 2000
Vietnam1989, 1999
Venezuela1971, 1981, 1990
Samples in progress 
Argentina1970, 1980, 1991, 2001
Austria1971, 1981, 1991, 2001
Bolivia1976, 1992, 2001
Canada1971, 1981, 1991
Czech Rep.1991, 2001
Dominican R.1960, 1970, 1981
El Salvador1992
Egypt1986, 1996
Fiji1966, 1986, 1996
Greece1971, 1981, 1991, 2001
Guatemala1973, 1981
Honduras1961, 1974, 1988
Hungary1970, 1980, 1990, 2001
Indonesia1971, 1980, 1990, 1995
Israel1961, 1972, 1983, 1995
Italy1981, 1991

Researchers who require annual data for the preceding four decades can turn to IPUMS-CPS, which provides a coherent version of the widely used U.S. population survey produced by the Bureau of Labor Statistics. The CPS includes virtually all the subject areas covered by the decennial census and the ACS, but provides much greater detail in certain areas, such as health insurance coverage.

The IPUMS-International project is extending the IPUMS paradigm to approximately fifty countries around the world. Large quantities of machine-readable microdata survive from census enumerations since the 1950s, but few of them are available to researchers and most are at risk of becoming unreadable. The first goals of the IPUMS-International project are to preserve machine-readable census microdata files wherever possible and to obtain permission to disseminate anonymized samples of the data to researchers. Thenjust like the original IPUMS projectresearchers convert the data into consistent format, supply comprehensive documentation, and make microdata and documentation available through a Web-based data dissemination system. Table 2 summarizes current and planned IPUMS-International data releases. The project began releasing data in 2003, and with information on 143 million persons, by June 2006 it already exceeded the scale of IPUMS-USA. The project has concluded agreements to preserve and disseminate data from some 200 censuses in 71 countries; negotiations with most of the other major countries of the world are underway. Information on the IPUMS-International release schedule is available at


Integrated Public Use Microdata Series. Minnesota Population Center.

Ruggles, Steven. 2006. The Minnesota Population Center Data Integration Projects: Challenges of Harmonizing Census Microdata Across Time and Place. 2005 Proceedings of the American Statistical Association, Government Statistics Section. Alexandria, VA: American Statistical Association.

Ruggles, Steven, et al. 2003. IPUMS-International. Historical Methods 36 (2): 6065.

Ruggles, Steven, et al. 2003. IPUMS Redesign. Historical Methods 36 (1): 921.

Steven Ruggles

About this article

Integrated Public Use Microdata Series

Updated About content Print Article