Bootstrap Method

views updated

Bootstrap Method

The bootstrap method, introduced by Bradley Efron (1979, 1982), is a technique for making statistical inference. In the typical estimation scenario, one draws a sample Y =[y₁, …, y_n ] from an unknown distribution F and then computes an estimate θ = g (Y) of some parameter or quantity θ that is of interest. If the distribution of (θ – θ ) were known, it would be straightforward to find values c₁, c₂ such that

for some small value of α, say 0.1 or 0.05. Rearranging terms inside the parentheses results in a confidence interval, θ – c₂ ≤ θ ≤ θ – c₁, for the quantity θ. The distribution of (θ – θ ) is unknown in most situations, however. In some cases, the finite-sample distribution of (θ – θ ) can be approximated by the limiting distribution, but this may not work well in some cases.

The bootstrap method is based on the idea of replicating the estimation scenario described above by drawing B samples from an estimate F of F and computing estimates , b =1 …, B. Given the B values , it is straightforward to find values c₁^*, c₂^* such that

by finding appropriate percentiles of the differences Substituting the values into (1) yields

Rearranging terms inside the parentheses in (3) yields an estimated confidence interval

Here, confidence intervals have been estimated based on the differences but other approaches may also be used (see Efron and Tibshirani [1993] for examples). The approximation in (2) is due to the fact that B must be finite, while the approximation in (3) is due to the fact that the original estimate θ is based on a finite sample of size n; the first approximation improves as B → ∞, and the second approximation improves as n → ∞

Variants of the bootstrap method also differ according to how the bootstrap samples Y^* are constructed. In the naive bootstrap, Y^* is constructed by drawing from the empirical distribution of the original data Y. This approach “works” in the sense of yielding estimated confidence intervals with correct coverages whenever the limiting distribution of θ is normal, as proved by Enno Mammen (1992). However, the naive bootstrap often fails in other situations (see Bickel and Freedman [1981] for examples). In cases where data are bounded, constructing bootstrap samples by drawing from a smooth estimate of F, rather than the empirical distribution function, often yields confidence intervals with correct coverages; this approach is called a smooth bootstrap. The smooth bootstrap has been used with data envelopment analysis (DEA) estimators to estimate confidence intervals for measures of technical efficiency (see Simar and Wilson [1998, 2000] for details).

The bootstrap method has been increasingly used as computers have become faster and cheaper. It is particularly useful in situations where limiting distributions involve unknown parameters that may be difficult to estimate, which is often the case with nonparametric estimators such as DEA. The method is also useful in situations where limiting normal distributions provide poor approximations to finite-sample distributions of estimators. By constructing bootstrap samples by drawing from either the empirical distribution or a smooth estimate of the distribution of the original data, the bootstrap method incorporates information about higher moments (e.g., skewness, kurtosis, etc.) that is ignored when limiting normal distributions are used to approximate finite sample distributions. This sometimes leads to bootstrap confidence interval estimates with better coverage properties than more conventional confidence interval estimates.

SEE ALSO Data Envelopment Analysis; Econometric Decomposition; Frequency Distributions

BIBLIOGRAPHY

Bickel, Peter J., and David A. Freedman. 1981. Some Asymptotic Theory for the Bootstrap. Annals of Statistics 9: 1196–1217.

Efron, Bradley. 1979. Bootstrap Methods: Another Look at the Jackknife. Annals of Statistics 7: 1–16.

Efron, Bradley. 1982. The Jackknife, the Bootstrap, and Other Resampling Plans. Philadelphia: Society for Industrial and Applied Mathematics.

Efron, Bradley, and Robert J. Tibshirani. 1993. An Introduction to the Bootstrap. London: Chapman and Hall.

Mammen, Enno. 1992. When Does Bootstrap Work? Asymptotic Results and Simulations. Berlin and New York: Springer-Verlag.

Simar, Léopold, and Paul W. Wilson. 1998. Sensitivity Analysis of Efficiency Scores: How to Bootstrap in Nonparametric Frontier Models. Management Science 44: 49–61.

Simar, Léopold, and Paul W. Wilson. 2000. A General Methodology for Bootstrapping in Non-parametric Frontier Models. Journal of Applied Statistics 27: 779–802.

Paul W. Wilson

International Encyclopedia of the Social Sciences