Surveys

views updated May 21 2018

SURVEYS

Most surveys have several common characteristics (Fowler). Their purpose is to generate information that statistically summarizes issues of interest in the study population. This information is collected by asking people (respondents) questions, either in person or over the telephone. In most cases, a sampling strategy is used to select only a fraction of the population that is actually interviewed. Those interviews are highly structured and standardized such that each respondent is asked the same questions in the same way and is provided a predetermined set of response categories.

Explaining survey mechanics is beyond the scope of this entry. Instead, we suggest two survey methods books geared toward nonmethodologists. Aday approaches designing good surveys by building on a reporter’s stock questions: who do you want to study, what do you want to know about them, where will the data be collected, when do you want to do the field work, why is this information needed, and how will the questions be asked. Fowler pragmatically focuses on enhancing the quality of collected data by identifying the best practices for question design, interviewing procedures and skills, and achieving high response rates. Two user-friendly electronic resources are also recommended. One is a methodology reference tool (Trochim), and the other is a statistics reference tool (Statsoft).

Using good survey design and best practices helps to minimize survey error. Survey errors are deviations of the observed findings from their ‘‘true’’ values (Groves), and come in two categories. Sampling errors result from the fact that when a sample is drawn, there is a chance that it may not be representative of the population from which it is taken. If probability-sampling methods are used, sampling errors can be calculated, and confidence intervals can be established. Confidence intervals are often expressed using statements like ‘‘these results have a margin of error of +/- 5 percent.’’ The best way to reduce sampling error in a probability sample usually involves increasing sample size. When nonprobability sampling methods are used, such as convenience samples of people approached on street corners or in shopping malls, it is not possible to determine the accuracy of the findings, or to know what broader population the sample represents.

The second category of survey errors involves nonsampling errors. Three sources contribute to this problem: the interviewer, the questions, and the respondent (Aday). Good interviewers can increase response rates (the percent of people who participate), minimize the number of questions that are not answered (missing data), and increase the consistency of the measurement process (reliability). Well-designed and crafted questions are easy for interviewers to ask and for respondents to answer. Such questions are brief, use simple and familiar words that do not have multiple meanings, avoid technical jargon, use the active voice and good grammar, and do not involve compound sentences or double negatives.

When using survey methods with older adults, one wonders whether there will be more nonsampling errors than usual. Older adults are more likely to experience health and cognitive problems than younger adults, and these may prevent older adults from participating or diminish the quality of the information that they provide (Herzog and Rodgers). Although more research is needed, the literature has generally not identified disproportionately larger nonsampling errors among older adults. For the oldest old and the least healthy, however, there is some evidence that both willingness to participate and the quality of the information provided may be compromised (Herzog and Rodgers). Therefore, when designing surveys for those subgroups, special emphasis on minimizing respondent burden and providing greater flexibility is warranted.

Cross-sectional versus longitudinal surveys

An important distinction among surveys involves cross-sectional versus longitudinal studies. Cross-sectional surveys are exemplified by public opinion polls conducted during election campaigns. These polls typically use random digit dialing telephone surveys of a sample of the voting age population, and their purpose is to gauge voting preferences. The findings can be used to see whether voting preferences vary across age groups. For example, approval ratings can be examined within age decades, and one might find that the older the age group, the more likely that conservative candidates were preferred over liberal ones.

Such age-group comparisons represent inter-individual (or between individual) differences associated with age at a single point in time. They can not be interpreted as intra-individual (i.e., aging or within individual) effects such that as people grow older they become more likely to support conservative candidates. That would reflect the life course fallacy in which cross-sectional age differences are attributed solely to the aging process (Riley). In fact, age-group comparisons involve aging and cohort effects. Cohort effects reflect the fact that older individuals have not simply aged more than the younger individuals, they also went through their formative years, as well as other life course stages, during different historical periods. Consider the case where the number of years of formal education is compared across age groups. The results likely will show that each successively older age group has achieved less education. Surely this does not reflect the aging process, because that would mean we lose years of education as we age. Rather, such results reflect cohort succession, or the process by which educational aspirations and opportunities have steadily increased with each new generation.

Longitudinal studies are necessary to avoid the life course fallacy. In longitudinal studies the same sample is followed over time. Typically this involves interviewing the same respondents every year or so. Using these data one can examine intra-individual effects. Longitudinal data would likely show that the cross-sectional age-group differences in educational attainment reflected cohort succession rather than the aging process. That is, we would see that as birth cohorts age, their educational attainment levels remain largely unchanged. The drawback to longitudinal studies lies in their opportunity and tracking costs. These include identifying and obtaining baseline data on an appropriate birth cohort, and then continuing to track those individuals over time.

Even with longitudinal data there may be another problem. If the sample of persons is restricted to the members of a single birth cohort, then the results would be subject to the cohort centrism fallacy. The cohort centrism fallacy is that just because we observe changes of a certain type in one birth cohort over time does not mean that similar changes will occur in other birth cohorts (Riley). No two birth cohorts experience the same life course stages during the same historical periods. Thus, some birth cohorts have their lives shaped by a remarkably unique set of experiences, such as ‘‘the greatest generation’’ (Brokaw).

The best way to avoid both the life course and cohort centrism fallacies is to have comparable longitudinal data on several successive cohorts. This can be done two ways (Campbell). One involves designing longitudinal studies to include samples from several birth cohorts, and to follow those cohorts over a prolonged period. A more pragmatic approach involves using available data from several different birth-cohort-specific longitudinal studies that are now available courtesy of the Inter-university Consortium for Political and Social Research (ICPSR; for a complete listing see their website at www.icpsr.umich.edu).

Limitations of survey research and problems with interpretations

Surveys obtain information by asking people questions. Those questions are designed to measure some topic of interest. We want those measurements to be as reliable and valid as possible, in order to have confidence in the findings and in our ability to generalize beyond the current sample and setting (i.e., external validity). Reliability refers to the extent to which questions evoke reproducible or consistent answers from the respondent (i.e., random measurement error is minimized). Validity refers to the extent to which the questions are actually getting at what we want them to measure (i.e., nonrandom measurement error is minimized). The relationship between reliability and validity can be intuitively seen using the metaphor of a target containing a series of concentric rings extending from the ‘‘bulls eye’’ (Trochim). A reliable and valid measure would look like a tightly clustered group of shots all in the bulls-eye; a reliable but invalid measure would look like a tightly clustered group of shots at the target periphery; a valid but unreliable measure would look like a scattering of shots all over the target; and an unreliable and invalid measure would look like a scattering of shots across only one side of the target.

At the root of these measurement issues is how the survey questions are asked. Careful crafting of survey questions is essential, and even slight variations in wording can produce rather different results. Consider one of the most commonly studied issues in aging: activities of daily living (ADLs). ADLs refer to the basic tasks of everyday life such as eating, dressing, bathing, and toileting. ADL questions are presented in a staged fashion asking first whether the respondent has any difficulties in performing the task by themselves and without the use of aids. If any difficulty is reported, the respondent is then asked how much difficulty he or she experiences, whether any help is provided by another person or by an assisting device, how much help is received or how often the assisting device is used, and who is that person and what is that device.

Surprisingly, prevalence estimates of the number of older adults who have ADL difficulties vary by as much as 60 percent from one national study to another. In addition to variations in sampling design, Wiener, Hanley, Clark, and Van Nostrand report that differences in the prevalence estimates result from the selection of which specific ADLs the respondents are asked about, how long the respondent had to have the ADL difficulty before it counts, how much difficulty the respondent had to have, and whether the respondent had to receive help to perform the ADL. Using results from a single study in which different versions of ADL questions were asked of the same respondents, Rodgers and Miller (1997) have shown that the prevalence rate can range from a low of 6 percent to a high of 28 percent. With those same data, Freedman has found that the prevalence of one or more ADL difficulties varies from 17 percent to nearly 30 percent depending on whether the approach reflects residual difficulty (i.e., even with help or the use of an assisting device) or underlying difficulty (i.e., without help or using an assisting device).

A related concern is the correspondence between self-reported ADL abilities and actual performance levels. Although there are obvious drawbacks to direct observation of ADLs (including privacy), performance-based assessments of lower and upper body physical abilities can be conducted in personal interviews. Examples for the upper body include assessing grip strength using hand-held dynamometers, the ability to hold a one gallon water jug at arms length, and to pick up and replace pegs in a pegboard, while examples for the lower body include measured and timed walks, standing balance tests, and repeated chair stands. Simonsick and colleagues have shown that carefully crafted questions eliciting self-reports of lower- and upper-body physical abilities are generally consistent with performance-based assessments on the same respondents.

Even when reliable and valid questions are asked, there can still be serious problems due to missing data. Missing data comes in three varieties: people who refuse to participate (the issue of response rates), questions that are left unanswered (the issue of item missing values), and (in longitudinal studies) respondents who are lost to follow-up (the issue of attrition). The problem is that missing data results in (1) biased findings if the people for whom data is missing are systematically different, (2) inefficient statistical estimates due to the loss of information, and (3) increased analytic complexity because most statistical procedures require that each case has complete data (Little and Schenker). Methods to deal with missing data include naive approaches like unconditional mean imputation (i.e., substituting the overall sample mean), and sophisticated methods like expectation-maximization algorithms or multiple imputation procedures. The utility of these methods depends on whether the data is missing completely at random, or if it reflects a nonignorable pattern. The latter requires use of the more sophisticated approaches.

The most important limitation of surveys has to do with internal validity, or the establishment of causal relationships between an independent variable (the cause, denoted by X) and a dependent variable (the effect, denoted by Y). There are three fundamental criteria for demonstrating that X is a probabilistic cause of Y (Suppes): (1) the probability of Y given that X has occurred must be greater than the probability of Y in the absence of X; (2) X must precede Y in time; and, (3) the probability of X must be greater than zero. Implicit in the first criterion is the presence of a comparison group. Several threats to internal validity exist that constitute rival hypotheses for the explanation that X causes Y (Campbell and Stanley). When well designed and administered, the classic two-group experimental design eliminates these because the assignment to either the experimental or control group is randomly determined and both groups are measured before and after the experimental group is exposed to X. Therefore, the potential threats to internal validity are equivalent for both the experimental and control groups, leaving the difference between the before and after comparisons due solely to the experimental group’s exposure to X. Thus, experimental designs meet the criteria for probabilistic causation.

In survey research, however, this is not the case because assignment to the experimental versus control group has not been randomized and the time sequence has not been manipulated. Therefore, the survey researcher must make the case that the causes are antecedent to the consequences, and that the groups being compared were otherwise equivalent. The former is often only addressable by logic, and the latter is only addressable by matching the groups being compared on known risk factors, or by statistically adjusting for known risk factors. In contrast, well-performed randomization creates equivalence on everything, whether it is known or not. That is why survey-based research traditionally includes numerous covariates in an attempt to resolve the problem of potential confounders. Basically, survey researchers must rule out all competing explanations of the observed relationship between X and Y in order to suggest (but not demonstrate) that a causal relationship exists.

Given the limitations of surveys that have been mentioned in this entry, one might ask why surveys are conducted at all. There are several important reasons. Surveys gather data about relationships between people, places, and things as they exist in the real world setting. Those relationships can not all be examined in laboratory experiments. Moreover, surveys allow the collection of data about what people think and feel, and facilitate the collection of information in great breadth and depth. Surveys are also very cost-efficient. Finally, surveys are an excellent precursor for planning and designing experimental studies. Thus, despite their limitations, surveys are and will continue to be a major source of high-quality information with which to explore the aging process.

Major recent surveys

We now briefly turn to recent, major surveys with which analyses of the aging process are conducted. Due to space constraints, we can neither identify all nor describe in detail any of these surveys. Therefore, we have simply selected nine recent and widely used large-scale surveys that are publicly available from or through the ICPSR. These surveys include the General Social Survey (GSS), the third National Health and Nutritional Examination Study (NHANES III), the Longitudinal Studies on Aging (LSOA I and II), the Australian (Adelaide) Longitudinal Study of Aging, the Established Populations for the Epidemiologic Study of the Elderly (EPESE), the Hispanic EPESE, the National Long-Term Care Survey (NLTCS), and the now combined Health and Retirement Survey (HRS) and Survey on Assets and Health Dynamics of the Oldest-Old (AHEAD). Table 1 provides a thumbnail sketch of each of these surveys in terms of their eligibility rules, birth cohorts, observation windows, interview frequency, sample size, major topical foci, and the availability of linked administrative records. Further details on these and nearly eight thousand other surveys can be found at the ICPSR Internet site.

Frederic D. Wolinksy Douglas K. Miller

See also Cohort Change; Panel Studies; Qualitative Research.

BIBLIOGRAPHY

Aday, L. A. Designing and Conducting Health Surveys: A Comprehensive Guide, 2d ed. San Francisco: Jossey-Bass, 1996.

Brokaw, T. The Greatest Generation. New York: Random House, 1998.

Campbell, D. T., and Stanley, J. C. Experimental and Quasi-Experimental Designs for Research. Chicago: Rand McNally, 1963.

Campbell, R. T. ‘‘A Data-Based Revolution in the Social Sciences.’’ ICPSR Bulletin 14, no. 3 (1994): 1–4.

Fowler, F. J., Jr. Survey Research Methods, 2d ed. Newbury Park, Calif.: Sage, 1993.

Freedman, V. A. ‘‘Implications of Asking ‘Ambiguous’ Difficulty Questions: An Analysis of the Second Wave of the Assets and Health Dynamics of the Oldest Old Study.’’ Journal of Gerontology: Social Sciences 55B (2000): S288– S297.

Groves, R. M. Survey Errors and Survey Costs. New York: John Wiley, 1989.

Herzog, A. R., and Rodgers, W. L. ‘‘The Use of Survey Methods in Research on Older Americans.’’ In The Epidemiologic Study of the Elderly. Edited by R. B. Wallace and R. F. Woolson. New York: Oxford University Press, 1992. Pages 60–90.

Little, R. J. A., and Schenker, N. ‘‘Missing Data.’’ In Handbook of Statistical Modeling for the Social and Behavorial Sciences. Edited by G. Arminger, C. C. Clogg, and M. E. Sobel. New York: Plenum Press, 1995. Pages 39–75.

Riley, M. W. ‘‘A Theoretical Basis for Research on Health.’’ In Population Health Research. Edited by K. Dean. London, England: Sage Publications, 1993. Pages 37–53.

Rodgers, W., and Miller, B. ‘‘A Comparative Analysis of ADL Questions in Surveys of Older People.’’ Journals of Gerontology 52B (1997 Special Issue): 21–36.

Simonsick, E. M.; Kasper, J. D.; Guralnik, J. M.; Bandeen-Roche, K.; Ferrucci, L.; Hirsch, R.; Leveille, S.; Rantanen, T.; and Fried, L. P. ‘‘Severity of Upper and Lower Extremity Functional Limitation: Scale Development and Validation with Self-Report and Performance-Based Measures of Physical Function.’’ Journal of Gerontology: Social Sciences 56B (2001): S10–S19.

Statsoft, Inc. The Electronic Statistics Textbook. Tulsa, Okla.: Statsoft, 1999. Available on the World Wide Web at www.statsoft.com

Suppes, P. Models and Methods in the Philosophy of Science: Selected Essays. Dordrecht, Netherlands: Kluwer Academic Publishers, 1993.

Trochim, W. M. The Research Methods Knowledge Base, 1st ed. 1999. Available on the World Wide Web at http://trochim.human.cornell.edu

Wiener, J. M.; Hanley, R. J.; Clark, R.; and VanNostrand, J. F. ‘‘Measuring the Activities of Daily Living: Comparisons Across National Surveys.’’ Journals of Gerontology 45 (1990): S229–S237.

Encyclopedia of Aging Wolinksy, Frederic D.; Miller, Douglas K.

Surveys

views updated Jun 11 2018

SURVEYS

The word "survey" comes from the Latin sur (over) and videre (to see), and it eventually came to mean a general or comprehensive view of anything. Studies that involve the systematic collection of data about populations are usually called surveys. This is especially true when they are concerned with large or widely dispersed groups of people. When they deal with only a fraction of a total population—a fraction representative of the total—they are called sample surveys. The term "sample survey" should ideally be used only if the part of the population studied is selected by accepted statistical methods.

Surveys can be classified broadly into two types—descriptive and analytical. In a descriptive survey the objective is simply to obtain certain information about large groups. In an analytical survey, comparisons are made between different subgroups of the population in order to discover whether differences exist among them that may enable researchers to form or verify hypotheses about the forces at work in the population.

Surveys differ in terms of purpose, subject matter, coverage, and source of information. In the field of epidemiology, surveys have been used to study the history of the health of populations, diagnose community health, study the working of health services, complete the clinical history of chronic diseases, search for the cause of health and disease, contribute to the formation of health care policy, and to evaluate the effects of different approaches to the organization of health services. More recently, health-survey data have been identified as a key resource for the development of health indicators, such as alcohol consumption and the prevalence of smoking, in the twenty-first century. The Health for All initiative of the World Health Organization is a policy that can be translated into three operational goals: increase in life expectancy and sustainable life; improved equity in health between and within countries; and access for all to sustainable health systems. Efforts have been made to promote standards for international comparability of such health indicators.

Wayne Millar

(see also: National Health Surveys; Sampling; Survey Research Methods )