Data Sources and Collection Methods

views updated


Health data are the facts that, when assembled and analyzed, yield the information required by health care planners, providers, and users in order to maintain effective and efficient public health services. Potential sources of information about health are numerous and diverse, but in practice four main sources are used: medical records, certificates of vital and other health-related events, responses in surveys, and facts obtained in the course of conducting research. An interesting fifth source, unobtrusive data, is also considered here.


Even the simplest medical records contain something in each of the following categories:

  1. Personal identifying data: name, age (birth date), sex, and so on.
  2. Socio-demographic data: sex, age, occupation, place of residence.
  3. Clinical data: medical history, investigations, diagnoses, treatment regimens.
  4. Administrative data: referrals, sites of care.
  5. Economic data: insurance coverage, method of payment.
  6. Behavioral data: adherence to the recommended regimen (or otherwise).

In modern clinics and hospitals, and in many public health departments, data in each of these categories can be found in the records of individuals who have received services there, but not all the data are in the same file. Administrative and economic data are usually in separate files from clinical data; both are linked by personal identifying information. Behavioral information, such as the fact that an individual did not obtain prescribed medication or fails to keep appointments can be extracted by linking facts in a clinical record with the records of medications dispensed and/or appointments kept. Records in hospitals and clinics are mostly computer-processed and stored, so it is technically feasible to extract and analyze the relevant information, for instance, occupation, diagnosis, and method of payment for the service that was provided, or behavioral information. Such analyses are often conducted for routine or for research purposes, although there are some ethical constraints to protect the privacy and preserve the confidentiality of individuals.


Vital records (certifications of births and deaths) are similarly computer-stored and can be analyzed in many ways. Collection of data for birth and death certificates relies on the fact that recording of both births and deaths is a legal obligationand individuals have powerful reasons, including financial incentives such as collection of insurance benefits, for completing all the formal procedures for certification of these vital events. The paper records that individuals require for various purposes are collected and collated in regional and national offices, such as the U.S. National Center for Health Statistics, and published in monthly bulletins and annual reports. Birth certificates record details such as full name, birthdate, names and ages of parents, birthplace, and birthweight. These items of information can be used to construct a unique sequence of numbers and alphabet letters to identify each individual with a high degree of precision. Death certificates contain a great deal of valuable information: name at birth as well as at death, age, sex, place of birth as well as death, and cause of death. The personal identifying information can be used to link the death certificate to other health records. The reliability of death certificate data varies according to the cause and place: Deaths in hospitals have usually been preceded by a sufficient opportunity for investigations to yield a reliable diagnosis, but deaths at home may be associated with illnesses that have not been investigated, so they may have only patchy and incomplete old medical records or the family doctor's working diagnosis, which may be no more than an educated guess. Deaths in other places, such as on the street or at work, are usually investigated by a coroner or medical examiner, so the information is reasonably reliable. Other vital records, for example, marriages and divorces and dissolution of marriages, have less direct utility for health purposes but do shed some light on aspects of social health.


Unlike births and deaths, health surveys are experienced by only a sample of the people; but if it is a statistically representative sample, inferences about findings can be generalized with some confidence. Survey data may be collected by asking questions either in an oral interview or over the telephone, or by giving the respondents a written questionnaire and collecting their answers. The survey data are collated, checked, edited for consistency, processed and analyzed generally by means of a package computer program. A very wide variety of data can be collected this way, covering details such as past medical events, personal habits, family history, occupation, income, social status, family and other support networks, and so on. In the U.S. National Health and Nutrition Surveys, physical examinations, such as blood pressure measurement, and laboratory tests, such as blood chemistry and counts, are carried out on a subsample.

Records of medical examinations on school children, military recruits, or applicants for employment in many industries are potentially another useful source of data, but these records tend to be scattered over many different sites and it is logistically difficult to collect and collate them centrally.


The depth, range, and scope of data collected in health is diverse and complex, so it cannot be considered in detail here. Research on fields as diverse as biochemistry, psychology, genetics, and sports physiology have usefully illuminated aspects of population health, but the problem of central collection and collation and of making valid generalizations reduces the usefulness of most data from health-related research for the purpose of delineating aspects of national health.


Unobtrusive methods and indirect methods can be a rich source of information from which it is sometimes possible to make important inferences about the health of the population or samples thereof. Economic statistics such as sales of tobacco and alcohol reveal national consumption patterns; counting cigarette butts in school playgrounds under collected conditions is an unobtrusive way to get a very rough measure of cigarette consumption by school children. Calls to the police to settle domestic disturbances provide a rough measure of the prevalence of family violence. Traffic crashes involving police reports and/or insurance claims reveal much about aspects of risk-taking behavior, for example, the dangerous practice of using cell phones while driving. These are among many examples of unobtrusive data sources, offered merely to illustrate the potential value of this approach.

John M. Last

(see also: Birth Certificates; Certification of Causes of Death; National Health Surveys; Record Linkage; Registries; Vital Statistics )


Last, J. M. (1997). Public Health and Human Ecology, 2nd edition. Stamford, CT: Appleton and Lange.

Slee, V. N.; Slee, D. A.; and Schmidt, H. J. (2000). The Endangered Medical Record. St. Paul, MN: Tringa Press.

Webb, E. J.; Campbell, D. T.; and Schwartz, R. D. et al. (1988). Unobtrusive Measures: Non-interactive Research in the Social Sciences. Chicago: Rand McNally.