Skip to main content
Select Source:

Evaluation Research

Evaluation Research

Defining characteristics

Methodological steps and principles


Ours is an age of social-action programs, where large organization and huge expenditures go into the attempted solution of every conceivable social problem. Such programs include both private and public ventures and small-scale and large-scale projects, ranging in scope from local to national and international efforts at social change. Whenever men spend time, money, and effort to help solve social problems, someone usually questions the effectiveness of their actions. Sponsors, critics, the public, even the actors themselves, seek signs that their program is successful. Much of the assessment of action programs is irregular and, often by necessity, based upon personal judgments of supporters or critics, impressions, anecdotes, testimonials, and miscellaneous information available for the evaluation. In recent years, however, there has been a striking change in attitudes toward evaluation activities and the type and quality of evidence that is acceptable for determining the relative success or failure of social-action programs.

Two trends stand out in the modern attitude toward evaluation. First, evaluation has come to be expected as a regular accompaniment to rational social-action programs. Second, there has been a movement toward demanding more systematic, rigorous, and objective evidence of success. The application of social science techniques to the appraisal of social-action programs has come to be called evaluation research.

Examples of the applications of evaluation research are available from a wide variety of fields. One of the earliest attempts at building evaluation research into an action program was in the field of community action to prevent juvenile delinquency. The 1937 Cambridge-Somerville Youth Study provided for an experimental and a control group of boys, with the former to receive special attention and advice from counselors and other community agencies. The plan called for a ten-year period of work with the experimental group followed by an evaluation that would compare the record of their delinquent conduct during that decade with the record of the control group. The results of the evaluation (see Powers & Witmer 1951) showed no significant differences in conduct favorable to the program. A subsequent long-term evaluation of the same program failed to find new evidence of less criminal activity by persons in the experimental group but added a variety of new theoretical analyses to the evaluation (McCord et al. 1959).

Several evaluations of programs in citizenship training for young persons have built upon one another, thus providing continuity in the field. Riecken (1952) conducted an evaluation of summer work camps sponsored by the American Friends Service Committee to determine their impact on the values, attitudes, and opinions of the participants. His work was useful in specifying those areas in which the program was successful or unsuccessful as well as pointing up the importance of measuring unsought by-products of action programs. Subsequently, Hyman, Wright, and Hopkins carried out a series of evaluations of another youth program, the Encampment for Citizenship (1962). Their research design was complex, including a comparison of campers’ values, attitudes, opinions, and behavior before and after a six-week program of training; follow-up surveys six weeks and four years after the group left the program; three independent replications of the original study on new groups of campers in later years; and a sample survey of alumni of the program. These various studies demonstrated the effectiveness of the program in influencing campers’ social attitudes and conduct; they also examined the dynamics of attitudinal change.

Evaluations have been made in such varied fields as intergroup relations, induced technological change, mass communications, adult education, international exchange of persons for training or good will, mental health, and public health. Additional examples of applications of evaluation research, along with discussions of evaluation techniques, are presented by Klineberg and others in a special issue of the International Social Science Bulletin (1955) and in Hyman and Wright (1966).

Defining characteristics

A scientific approach to the assessment of a program’s achievements is the hallmark of modern evaluation research. In this respect evaluation research resembles other kinds of social research in its concern for objectivity, reliability, and validity in the collection, analysis, and interpretation of data. But it can be distinguished as a special form of social research by its purpose and the conditions under which the research must be conducted. Both of these factors affect such components of the research process as study design and its translation into practice, allocation of research time and other resources, and the value or worth to be put upon the empirical findings.

The primary purpose of evaluation research is “to provide objective, systematic, and comprehensive evidence on the degree to which the program achieves its intended objectives plus the degree to which it produces other unanticipated consequences, which when recognized would also be regarded as relevant to the agency” (Hyman et al. 1962, pp. 5–6). Evaluation research thus differs in its emphasis from such other major types of social research as exploratory studies, which seek to formulate new problems and hypotheses, or explanatory research, which places emphasis on the testing of theoretically significant hypotheses, or descriptive social research, which documents the existence of certain social conditions at a given moment or over time (Selltiz et al. 1959). Since the burden is on the evaluator to provide firm evidence on the effects of the program under study, he favors a study design that will tend toward maximizing such evidence and his confidence in conclusions drawn from it. Although good evaluation research often seeks explanations of a program’s success or failure, the first concern is to obtain basic evidence on effectiveness, and therefore most research resources are allocated to this goal.

The conditions under which evaluation research is conducted also give it a character distinct from other forms of social research. Evaluation research is applied social research, and it differs from other modes of scholarly research in bringing together an outside investigator to guarantee objectivity and a client in need of his services. From the initial formulation of the problem to the final interpretation of findings, the evaluator is duty-bound to keep in mind the very practical problem of assessing the program under study. As a consequence he often has less freedom to select or reject certain independent, dependent, and intervening variables than he would have in studies designed to answer his own theoretically formulated questions, such as might be posed in basic social research. The concepts employed and their translation into measurable variables must be selected imaginatively but within the general framework set by the nature of the program being evaluated and its objectives (a point which will be discussed later). Another feature of evaluation research is that the investigator seldom has freedom to manipulate the program and its components, i.e., the independent variable, as he might in laboratory or field experiments. Usually he wants to evaluate an ongoing or proposed program of social action in its natural setting and is not at liberty, because of practical and theoretical considerations, to change it for research purposes. The nature of the program being evaluated and the time at which his services are called upon also set conditions that affect, among other things, the feasibility of using an experimental design involving before-and-after measurements, the possibility of obtaining control groups, the kinds of research instruments that can be used, and the need to provide for measures of long-term as well as immediate effects.

The recent tendency to call upon social science for the evaluation of action programs that are local, national, and international in scope (a trend which probably will increase in future years) and the fact that the application of scientific research procedures to problems of evaluation is complicated by the purposes and conditions of evaluation research have stimulated an interest in methodological aspects of evaluation among a variety of social scientists, especially sociologists and psychologists. Methodological and technical problems in evaluation research are discussed, to mention but a few examples, in the writings of Riecken (1952), Klineberg (1955), Hyman et al. (1962), and Hayes (1959).

While it is apparent that the specific translation of social-science techniques into forms suitable for a particular evaluation study involves research decisions based upon the special nature of the program under examination, there are nonetheless certain broad methodological questions common to most evaluation research. Furthermore, certain principles of evaluation research can be extracted from the rapidly growing experience of social scientists in applying their perspectives and methods to the evaluation of social-action programs. Such principles have obvious importance in highlighting and clarifying the methodological features of evaluation research and in providing practical, if limited, guidelines for conducting or appraising such research. The balance of this article will discuss certain, but by no means all, of these compelling methodological problems.

Methodological steps and principles

The process of evaluation has been codified into five major phases, each involving particular methodological problems and guiding principles (see Hyman et al. 1962). They are (1) the conceptualization and measurement of the objectives of the program and other unanticipated relevant outcomes; (2) formulation of a research design and the criteria for proof of effectiveness of the program, including consideration of control groups or alternatives to them; (3) the development and application of research procedures, including provisions for the estimation or reduction of errors in measurement; (4) problems of index construction and the proper evaluation of effectiveness; and (5) procedures for understanding and explaining the findings on effectiveness or ineffectiveness. Such a division of the process of evaluation is artificial, of course, in the sense that in practice the phases overlap and it is necessary for the researcher to give more or less constant consideration to all five steps. Nevertheless it provides a useful framework for examining and understanding the essential components of evaluation research.

Conceptualization. Each social-action program must be evaluated in terms of its particular goals. Therefore, evaluation research must begin with their identification and move toward their specification in terms of concepts that, in turn, can be translated into measurable indicators. All this may sound simple, perhaps routine, compared with the less structured situation facing social researchers engaged in formulating research problems for theoretical, explanatory, descriptive, or other kinds of basic research. But the apparent simplicity is deceptive, and in practice this phase of evaluation research repeatedly has proven to be both critical and difficult for social researchers working in such varied areas as mental health (U.S. Dept. of Health, Education & Welfare 1955), juvenile delinquency (Witmer & Tufts 1954), adult education (Evaluation Techniques 1955), and youth programs for citizenship training (Riecken 1952; Hyman et al. 1962), among others. As an example, Witmer and Tufts raise such questions about the meaning of the concept “delinquency prevention” as: What is to be prevented? Who is to be deterred? Are we talking only about “official” delinquency? Does prevention mean stopping misbehavior before it occurs? Does it mean reducing the frequency of misbehavior? Or does it mean reducing its severity?

Basic concepts and goals are often elusive, vague, unequal in importance to the program, and sometimes difficult to translate into operational terms. What is meant, for example, by such a goal as preparing young persons for “responsible citizenship”? In addition, the evaluator needs to consider possible effects of the program which were unanticipated by the action agency, finding clues from the records of past reactions to the program if it has been in operation prior to the evaluation, studies of similar programs, the social-science literature, and other sources. As an example, Carlson (1952) found that a mass-information campaign against venereal disease failed to increase public knowledge about these diseases; nevertheless, the campaign had the unanticipated effect of improving the morale of public health workers in the area, who in turn did a more effective job of combating the diseases. The anticipation of both planned and unplanned effects requires considerable time, effort, and imagination by the researcher prior to collecting evidence for the evaluation itself.

Research design. The formulation of a research design for evaluation usually involves an attempt to approximate the ideal conditions of a controlled experiment, which measures the changes produced by a program by making comparisons of the dependent variables before and after the program and evaluating them against similar measurements on a control group that is not involved in the program. If the control group is initially similar to the group exposed to the social-action program, a condition achieved through judicious selection, matching, and randomization, then the researcher can use the changes in the control group as a criterion against which to estimate the degree to which changes in the experimental group were probably caused by the program under study. To illustrate, suppose that two equivalent groups of adults are selected for a study on the effects of a training film intended to impart certain information to the audience. The level of relevant information is measured in each group prior to the showing of the film; then one group sees the film while the other does not; finally, after some interval, information is again measured. Changes in the amount of information held by the experimental group cannot simply be attributed to the film; they may also reflect the influence of such factors in the situation as exposure to other sources of information in the interim period, unreliability of the measuring instruments, maturation, and other factors extraneous to the program itself. But the control group presumably also experienced such nonprogrammatic factors, and therefore the researcher can subtract the amount of change in information demonstrated by it from the changes shown by the experimental group, thereby determining how much of the gross change in the latter group is due to the exclusive influence of the program.

So it is in the ideal case, such as might be achieved under laboratory conditions. In practice, however, evaluation research seldom permits such ideal conditions. A variety of practical problems requires alterations in the ideal design. As examples, suitable control groups cannot always be found, especially for social-action programs involving efforts at large-scale social change but also for smaller programs designed to influence volunteer participants; also ethical, administrative, or other considerations usually prevent the random assignment of certain persons to a control group that will be denied the treatment offered by the action programs.

In the face of such obstacles, certain methodologists have taken the position that a slavish insistence on the ideal control-group experimental research design is unwise and dysfunctional in evaluation research. Rather, they advocate the ingenious use of practical and reasonable alternatives to the classic design (see Hyman et al. 1962; and Campbell & Stanley 1963). Under certain conditions, for example, it is possible to estimate the amount of change that could have been caused by extraneous events, instability of measurements, and natural growth of participants in a program by examining the amount of change that occurred among participants in programs similar to the one being evaluated. Using such comparative studies as “quasi-control” groups permits an estimate of the relative effectiveness of the program under study, i.e., how much effect it has had over and above that achieved by another program and assorted extraneous factors, even though it is impossible to isolate the specific amount of change caused by the extraneous factors. Another procedure for estimating the influence of nonprogrammatic factors is to study the amount of change which occurs among a sample of the population under study during a period of time prior to the introduction of the action program, using certain of the ultimate participants as a kind of control upon themselves, so to speak. Replications of the evaluation study, when possible, also provide safeguards against attributing too much or too little effect to the program under study. Admittedly, all such practical alternatives to the controlled experimental design have serious limitations and must be used with judgment; the classic experimental design remains preferable whenever possible and serves as an ideal even when impractical. Nevertheless, such expedients have proven useful to evaluators and have permitted relatively rigorous evaluations to be conducted under conditions less perfect than those found in the laboratory.

Error control. Evaluation studies, like all social research, involve difficult problems in the selection of specific research procedures and the provision for estimating and reducing various sources of error, such as sampling bias, bias due to non-response, measurement errors arising in the questions asked or in recording of answers, deliberate deception, and interviewer bias. The practices employed to control such errors in evaluation research are similar to those used in other forms of social research, and no major innovations have been introduced.

Estimating effectiveness. To consider the fourth stage in evaluation, a distinction needs to be made between demonstrating the effects of an action program and estimating its effectiveness. Effectiveness refers to the extent to which the program achieves its goals, but the question of just how much effectiveness constitutes success and justifies the efforts of the program is unanswerable by scientific research. It remains a matter for judgment on the part of the program’s sponsors, administrators, critics, or others, and the benefits, of course, must somehow be balanced against the costs involved. The problem is complicated further by the fact that most action programs have multiple goals, each of which may be achieved with varying degrees of success over time and among different subgroups of participants in the program. To date there is no general calculus for appraising the over-all net worth of a program.

Even if the evaluation limits itself to determining the success of a program in terms of each specific goal, however, it is necessary to introduce some indexes of effectiveness which add together the discrete effects within each of the program’s goal areas. Technical problems of index and scale construction have been given considerable attention by methodologists concerned with various types of social research (see Lazarsfeld & Rosenberg 1955). But as yet there is no theory of index construction specifically appropriate to evaluation research. Steps have been taken in this direction, however, and the utility of several types of indexes has been tentatively explored (see Hyman et al. 1962). One type of difficulty, for example, arises from the fact that the amount of change that an action program produces may vary from subgroup to subgroup and from topic to topic, depending upon how close to perfection each group was before the program began. Thus, an information program can influence relatively fewer persons among a subgroup in which, say, 60 per cent of the people are already informed about the topic than among another target group in which only 30 per cent are initially informed. An “effectiveness index” has been successfully employed to help solve the problem of weighting effectiveness in the light of such restricted ceilings for change (see Hovland et al. 1949; and Hyman et al. 1962). This index, which expresses actual change as a proportion of the maximum change that is possible given the initial position of a group on the variable under study, has proven to be especially useful in evaluating the relative effectiveness of different programs and the relative effectiveness of any particular program for different subgroups or on different variables.

Understanding effectiveness. In its final stage, evaluation research goes beyond the demonstration of a program’s effects to seek information that will help to account for its successes and failures. The reasons for such additional inquiry may be either practical or theoretical.

Sponsors of successful programs may want to duplicate their action program at another time or under other circumstances, or the successful program may be considered as a model for action by others. Such emulation can be misguided and even dangerous without information about which aspects of the program were most important in bringing about the results, for which participants in the program, and under what conditions. Often it is neither possible nor necessary, however, to detect and measure the impact of each component of a social-action program. In this respect, as in others noted above, evaluation research differs from explanatory survey research, where specific stimuli are isolated, and from experimental designs, where isolated stimuli are introduced into the situation being studied. In evaluation research the independent variable, i.e., the program under study, is usually a complex set of activities no one of which can be separated from the others without changing the nature of the program itself. Hence, explanations of effectiveness are often given in terms of the contributions made by certain gross features of the program, for example, the total impact of didactic components versus social participation in a successful educational institution.

Gross as such comparisons must be, they nevertheless provide opportunities for testing specific hypotheses about social and individual change, thereby contributing to the refinement and growth of social science theories. It is important to remember, however, that such gains are of secondary concern to evaluation research, which has as its primary goal the objective measurement of the effectiveness of the program.

Certain forms of research design promise to yield valuable results both for the primary task of evaluation and its complementary goal of enlarging social knowledge. Among the most promising designs are those that allow for comparative evaluations of different social-action programs, replication of evaluations of the same program, and longitudinal studies of the long-range impact of programs. Comparative studies not only demonstrate the differential effectiveness of various forms of programs having similar aims but also provide a continuity in research which permits testing theories of change under a variety of circumstances. Replicative evaluations add to the confidence in the findings from the initial study and give further opportunity for exploring possible causes of change. Longitudinal evaluations permit the detection of effects that require a relatively long time to occur and allow an examination of the stability or loss of certain programmatic effects over time and under various natural conditions outside of the program’s immediate control.

Viewed in this larger perspective, then, evaluation research deserves full recognition as a social science activity which will continue to expand. It provides excellent and ready-made opportunities to examine individuals, groups, and societies in the grip of major and minor forces for change. Its applications contribute not only to a science of social planning and a more rationally planned society but also to the perfection of social and psychological theories of change.

Charles R. Wright

[see alsoExperimental design; Survey analysis.]


CAMPBELL, DONALD T.; and STANLEY, J. S. 1963 Experimental and Quasi-experimental Designs for Research on Teaching. Pages 171–246 in Nathaniel L. Gage (editor), Handbook of Research on Teaching. Chicago: Rand McNally.

CARLSON, ROBERT O. 1952 The Influence of the Community and the Primary Group on the Reactions of Southern Negroes to Syphilis. Ph.D. dissertation, Columbia Univ.

Evaluation Techniques. 1955 International Social Science Bulletin 7: 343–458.

HAYES, SAMUEL P. 1959 Measuring the Results of Development Projects: A Manual for the Use of Field Workers. Paris: UNESCO.

HOVLAND, CARL I.; LUMSDAINE, ARTHUR A.; and SHEFFIELD, FREDERICK D. 1949 Experiments on Mass Communication. Studies in Social Psychology in World War II, Vol. 3. Princeton Univ. Press.

HYMAN, HERBERT H.; and WRIGHT, CHARLES R. 1966 Evaluating Social Action Programs. Unpublished manuscript.

HYMAN, HERBERT H.; WRIGHT, CHARLES R.; and HOPKINS, TERENCE K. 1962 Applications of Methods of Evaluation: Four Studies of the Encampment for Citizenship. Berkeley: Univ. of California Press.

KLINEBERG, OTTO 1955 Introduction: The Problem of Evaluation. International Social Science Bulletin 7: 346–352.

LAZARSFELD, PAUL F.; and ROSENBERG, MORRIS (editors) 1955 The Language of Social Research: A Reader in the Methodology of Social Research. Glencoe, Ill.: Free Press.

McCoRD, WILLIAM; McCoRD, JOAN; and ZOLA, IRVING K. 1959 Origins of Crime: A New Evaluation of the Cambridge–Somerville Youth Study. New York: Columbia Univ. Press.

POWERS, EDWIN; and WITMER, HELEN L. 1951 An Experiment in the Prevention of Delinquency. New York: Columbia Univ. Press; Oxford Univ. Press.

RIECKEN, HENRY W. 1952 The Volunteer Work Camp: A Psychological Evaluation. Reading, Mass.: Addison-Wesley.

SELLTIZ, CLAIRE et al. (1959) 1962 Research Methods in Social Relations. New York: Holt.

U.S. DEPT. OF HEALTH, EDUCATION & WELFARE, NATIONAL INSTITUTES OF HEALTH 1955 Evaluation in Mental Health: Review of Problem of Evaluating Mental Health Activities. Washington: Government Printing Office.

WITMER, HELEN L.; and TUFTS, EDITH 1954 The Effectiveness of Delinquency Prevention Programs. Washington: Government Printing Office.

Cite this article
Pick a style below, and copy the text for your bibliography.

  • MLA
  • Chicago
  • APA

"Evaluation Research." International Encyclopedia of the Social Sciences. . 11 Dec. 2017 <>.

"Evaluation Research." International Encyclopedia of the Social Sciences. . (December 11, 2017).

"Evaluation Research." International Encyclopedia of the Social Sciences. . Retrieved December 11, 2017 from

evaluation research

evaluation research A type of policy research devoted to assessing the consequences, intended and unintended, of a new policy programme or of an existing set of policies and practices, including measurement of the extent to which stated goals and objectives are being met, and measurement of displacement and substitution effects. Evaluation research became a self-identified field during the 1960s, and has common roots with the War on Poverty. (It was obviously useful to know the impact of the various social programmes to combat discrimination, deprivation, and other perceived social ills.) Because evaluation research tends to be applied, interdisciplinary, and methodologically opportunistic, practitioners tend to publish via specialized outlets such as the journals Evaluation Review (formerly Evaluation Quarterly) and New Directions in Program Evaluation.

Cite this article
Pick a style below, and copy the text for your bibliography.

  • MLA
  • Chicago
  • APA

"evaluation research." A Dictionary of Sociology. . 11 Dec. 2017 <>.

"evaluation research." A Dictionary of Sociology. . (December 11, 2017).

"evaluation research." A Dictionary of Sociology. . Retrieved December 11, 2017 from