A social experiment is the random assignment of human subjects to two groups to examine the effects of social policies. One group, called the “treatment group,” is offered or required to participate in a new program, while a second group, the “control group,” receives the existing program. The two groups are monitored over time to measure differences in their behavior. For example, a social experiment can compare a program that gives unemployed individuals a financial reward for finding a job with one that does not. Or, a social experiment might compare students in schools that receive a new curriculum with students in schools that do not. Because the randomization procedure guarantees that the two groups are otherwise similar, the measured differences in their behavior can be causally attributed to the new program. The behavioral differences are sometimes called the “impacts” of the program. Commonly measured behavioral outcomes in social experiments include earnings, employment, receipt of transfer payments, health, educational attainment, and child development. Sample sizes in social experiments have ranged from under 100 to well over 10,000.
Some social experiments have more than one treatment group. In such cases, each treatment group is assigned to a different program. The various treatment groups may be compared to each other to determine the differential impacts of two of the tested programs, or they may be compared to the control group to determine the impact of the program relative to the status quo. The human subjects may be chosen randomly from the general population or, more commonly, may be chosen randomly from a target population, such as the disadvantaged.
Social experiments have been used extensively since the late 1960s. According to Greenberg and Shroder (2005) almost 300 social experiments have been conducted since then. Social experiments are very much like medical laboratory experiments in which the treatment group is given a new drug or procedure, while the control group is given a placebo or the standard treatment. Laboratory experiments have also been used extensively in the field of economics, since the 1970s (Smith 1994), but they differ from social experiments in that they are used mainly to test various aspects of economic theory, such as the existence of equilibrium or the efficiency of market transactions, rather than the effects of a social program. Also, economics laboratory experiments usually do not have a control group; instead, cash-motivated members of a treatment group are given the opportunity to engage in market transactions in a controlled environmental setting to determine whether they behave in a manner consistent with the predictions of economic theory. Some laboratory experiments in economics have been used to test public policy alternatives.
Much of the foundation of the modern approach to social experimentation can be traced back to the work of the famous statistician Ronald Fisher in the 1920s. Fisher refined the notion of random assignment and pointed out that no two groups could ever be identical. He noted that allocation of subjects to treatment and control groups by pure chance (by the flip of a coin or from a table of random numbers, for example) ensures that differences in the average behavior of the two groups can be safely attributed to the treatment. As a result, the direction of causality can be determined using basic statistical calculations. Fisher also recognized that randomization provides a means of determining the statistical properties of differences in outcomes between the groups.
The first major social experiment was the New Jersey Income Maintenance Experiment, which was initiated in the United States in 1968. Although a few smaller social experiments preceded the New Jersey Experiment (such as the Perry Preschool Project in 1962), they were much smaller in scope and much less sophisticated. The New Jersey Experiment tested the idea of a negative income tax (NIT), first proposed by the economists Milton Friedman and James Tobin in the 1960s. The New Jersey Experiment was the first of five NIT experiments conducted in North America (four in the United States and one in Canada) that had very sophisticated designs and many treatment groups. Problems evaluating certain aspects of these complex experiments led to much simpler experimental designs in ensuing years.
From the 1970s to the present, social experiments have been conducted in numerous social policy areas, including child health and nutrition, crime and juvenile delinquency, early child development, education, electricity pricing, health services, housing assistance, job training, and welfare-to-work programs. Notable experiments include the Rand Health Insurance Experiment, which tested different health insurance copayment plans; the Moving to Opportunity Experiments, which tested programs enabling poor families to move out of public housing; four unemployment insurance experiments that tested the effects of various financial incentives to induce unemployed individuals to return to work; and a number of welfare-to-work experiments that tested ways of helping welfare recipients find jobs.
Although widely acknowledged as the ideal way to determine the causal effects of proposed social policies, social experiments have several important limitations. First, and perhaps most importantly, social experiments require that a control group be denied the policy change given to the treatment group. Because control groups in social experiments are typically disadvantaged, denial of program services may be viewed as constituting an ethical breach, thus limiting social experiments to places where resources prevent all eligible individuals from being served. Also, treatments that make a participant worse off are also viewed as unethical and politically infeasible.
Second, although well-designed experiments have a high degree of internal validity (inferences are valid for the tested sample), they may not have external validity (they are not generalizable to other settings). One common criticism of experiments is that because of their limited size, they do not generate the macroeconomic, “community,” effects that a fully operational program would generate. For example, a fully operational job training program may affect the wages and employment of nonparticipants and may affect social norms and attitudes, whereas a limited size experiment would not. Additionally, there is no way of knowing for sure whether a successful experiment in one location would be successful in another location, especially because social experiments are typically conducted in places that are chosen not randomly, but for their capability and willingness to participate in an experiment.
Third, social experiments take time to design and evaluate, usually several years. Policymakers may not want to wait the required time to find out if a particular program works.
Finally, in practice, it has often proven difficult to implement random assignment. For one reason or another, individuals may not be willing to participate in a research study, and in cases where collaboration between researchers and government agencies is required, some may be unwilling to participate. As a result, the treatment and control groups that are tested may turn out to be unrepresentative of the target population.
Because of the various limitations of social experiments, other means of evaluating the effects of social policies have been developed. These are generally termed “nonexperimental” or “quasi-experimental” methods. Nonexperimental methods monitor the behavior of persons subjected to a new policy (the treatment group) and select a “comparison group” to serve the role of a control group. But because randomization is not used to select the two groups, it is never known for sure whether the comparison group is identical to the treatment group in ways other than receipt of the treatment. Many researchers match treatment group members to persons in the nonparticipating population to make the groups as similar as possible. The matches are usually done using demographic and economic characteristics such as age, education, race, place of residence, employment and earnings history, and so on. One popular matching technique is propensity score matching, which uses a weighted average of the observed economic and demographic characteristics of the nonparticipating population to create a comparison group.
A particularly attractive nonexperimental method is the “natural experiment.” Natural experiments often are used to test the effects of social policies already in place. The natural experiment takes advantage of the way a new policy has been implemented so that the comparison group is almost a true control group. For example, military conscription (being draft eligible) during the Vietnam War was done by a national lottery that selected individuals for military service solely according to their date of birth. Thus, theoretically the group selected for military service should be identical to those not chosen, because the only difference is date of birth. Researchers wanting to test the effects of military conscription on individuals’ future behavior could compare outcomes (for example, educational attainment or earnings) of those conscripted with those not conscripted and safely attribute the “impacts” to conscription (Angrist 1990). Because not all conscripted individuals actually serve in the military and because some non-conscripted individuals volunteer for military service, it is also possible to estimate the impact of actual military service on future behavior by adjusting the impacts of conscription for differences in the proportion serving in the military in the treatment and comparison groups. However, the validity of this procedure rests crucially on the comparability of the military service veterans in the two samples.
Social experiments have changed in character since the late 1960s. Many early social experiments such as the NIT experiments, the Unemployment Insurance Experiments, and the Rand Health Insurance Experiment tested a “response surface” in which subjects were given “quantifiable” treatments of varying tax or subsidy rates. In contrast, most of the more recent social experiments are “black box,” meaning that a package of treatments is given to the treatment group, and it is not possible to separately identify the causal effects of each component of the package.
Black-box experiments have been criticized because they tend to have much less generalizability than response-surface experiments. Hence, many researchers have called for a return to nonexperimental evaluation as the preferred method of analyzing the effects of social policies. However, those favoring experimental methods have countered that social experimentation should remain the bedrock of social policy evaluation because the advantages are still great relative to nonexperimental methods (Burtless 1995). In an attempt to “get inside the black box,” those sympathetic with the social experiment as an evaluation tool have proposed ways of combining experimental and nonexperimental evaluation methods to identify causal effects of social policies (Bloom 2005). Nonexperimental methods are necessary because of a selection bias that arises when members of the treatment group who receive certain components of the treatment are not a random subset of the entire treatment group. In the future, social policy evaluation may make greater use of both evaluation methodologies—using experiments when feasible and combining them with nonexperimental methods when experiments cannot answer all the relevant policy questions.
SEE ALSO Negative Income Tax
Angrist, Joshua D. 1990. Lifetime Earnings and the Vietnam Era Draft Lottery: Evidence from Social Security Administrative Records. American Economic Review 80 (3): 313–336.
Bloom, Howard S., ed. 2005. Learning More from Social Experiments. New York: Russell Sage Foundation.
Burtless, Gary. 1995. The Case for Randomized Field Trials in Economic and Policy Research. Journal of Economic Perspectives 9 (2): 63–84.
Greenberg, David, and Mark Shroder. 2005. The Digest of Social Experiments. 3rd ed. Washington, DC: Urban Institute Press.
Greenberg, David, Donna Linksz, and Marvin Mandell. 2003. Social Experimentation and Public Policymaking. Washington, DC: Urban Institute Press.
Smith, Vernon. 1994. Economics in the Laboratory. Journal of Economic Perspectives 8 (1): 113–131.
Philip K. Robins