Demographic databases are systematic listings or files of statistical information on the characteristics of the members of a population, typically at the level of individuals, families, or households. A database of the population of the United States has been compiled every ten years since the first census was conducted in 1790. This was initially limited to such basic demographic information as age, sex, and race. It has grown to include much fuller demographic profiles of people, including education, employment and occupation, marital status, and income, along with characteristics of both the household and the housing unit.

Early Uses of Demographic Databases

Demographic databases, first as paper reports and then, with the advent of the computer, as electronic files, have become widely used in both the public and private sectors. In the case of the decennial U.S. census, before 1970 very little demographic information in the form of cross-tabulations was available in any format other than paper volumes. Demographic and social statistics for geographic areas ranging from states to cities to neighborhoods had to be sought in hundreds of separate volumes. Any attempt to aggregate this information required transforming it into a machine-readable form. Maps associated with the data were supplied separately; no geographic information systems were available.

Since 1970 the Census Bureau has increased the amount of information available in machine-readable form and at the same time has reduced the amount available on paper. For example, about 450,000 pages were published from the 1990 census; the total from the 2000 census is expected to be about 100,000 pages.

Problems and Opportunities Associated with Electronic Files

The shift to electronic files poses some potential problems. There is the risk that computers of the future will not be capable of reading information produced in the recent past. Moreover, federal statistical agencies in the United States, to save funds and speed delivery, have used the Internet both as an archive and as a means of data retrieval; there is no assurance that future users of the data will have easy access.

As a result of the use of computers, data users are no longer limited to data aggregated into geographic areas. Databases composed of information on individuals, commonly known as public use microdata samples, are also commonly available in machine-readable form. (They are carefully screened to ensure that information about specific individuals cannot be determined.) For the U.S. census, these samples, extracted from the full set of census returns, enable researchers to develop cross-tabulations not found in the summary files produced by the Census Bureau. Another enhancement to basic demographic databases is the ability to map aggregate information by using geographic line files that can be interpreted by computer mapping software.

The Internet and Electronic Linkage

The Internet, along with high-speed computers, has enabled researchers to move beyond the analysis of just a single database. Linkage of demographic information from disparate sources is readily attainable. For example, health records from hospitals can be linked with social security information and further linked to educational attainment data. This ability to link files, even with personal identifiers eliminated from the records, raises the possibility that individuals could be identified and their demographic and socioeconomic information disclosed. This potential for disclosure is an issue for both governmental statistical agencies and the private sector.

The Private Sector

Demographic databases are important tools in the private sector. Deciding where to build a plant or open a store and gauging whether a sufficient market exists for a proposed product require the use of demographic databases, if necessary created by the private sector. The first major private sector database, the "Survey of Buying Power," was published in 1929. It consisted of demographic and socioeconomic data for all counties and cities in the United States. Like the census, the survey eventually was produced in machine-readable form.

Private sector producers of demographic databases have moved beyond basic demographic and geographic tabulations. Through the use of statistical techniques such as factor analysis and cluster analysis, techniques that enable researchers to combine demographic variables into clusters composed of similar lifestyle characteristics, specialized databases have been generated for geographies such as postal codes and company trading areas. Information from warranty cards submitted by purchasers of products, subscription lists, telephone directories, and other sources is linked to create databases that can be used for direct marketing purposes. Although these linked files are nongovernmental, questions of privacy and confidentiality are still relevant.

A Public Good

In the United States demographic databases compiled by the federal statistical agencies are considered a public good. These agencies have no copyright protection and are permitted to charge for their data only to the extent that the charges cover the cost of dissemination. It is assumed that the cost of collection has already been paid by the taxpayers, and the agencies are supported through the federal budget. The United States is nearly alone in following this policy. In most other countries national statistical offices are expected to pay their own way, charging users for their data. At the same time these offices are pressured to make their information available electronically on the Internet or through other means so that as many people as possible have access to it.

Databases in the Future

The integration of demographic databases through the use of advanced computer capabilities may eventually give researchers the capability of one-stop shopping, with databases linked not only nationally but internationally. The United Nations and other international organizations, through their publications and the Internet, already produce and publish some databases of international demographic statistics. In the future researchers may be able to retrieve data on characteristics such as race, education, and income, taking account of all the different definitions, based on comprehensive metadata, in the form of one international file.

