Sequential Analysis

views updated

Sequential Analysis


Sequential analysis is the branch of statistics concerned with investigations in which the decision whether or not to stop at any stage depends on the observations previously made. The motivation for most sequential investigations is that when the ends achieved are measured against the costs incurred (including the cost of making observations), sequential designs are typically more efficient than nonsequential designs; some disadvantages of the sequential approach are discussed later.

The term “sequential” is occasionally extended to cover also investigations in which various aspects of the design may be changed according to the observations made. For example, preliminary experience in an experiment may suggest changes in the treatments being compared; in a social survey a small pilot survey may lead to modifications in the design of the main investigation. In this article attention will be restricted mainly to the usual situation in which termination of a single investigation is the point at issue.

In a sequential investigation observations must be examined either one by one as they are collected or at certain stages during collection. A sequential procedure might be desirable for various reasons. The investigator might wish to have an up-to-date record at any stage, either for general information or because the appropriate sample size depends on quantities that he can estimate only from the data themselves. Alternatively, he may have no intrinsic interest in the intermediate results but may be able to achieve economy in sample size by taking them into account. Three examples will illustrate these points:

(1) An investigator may wish to estimate to within 10 per cent the mean weekly expenditure on tobacco per household. In order to determine the sample size he would need an estimate of the variability of the expenditure from household to household, and this might be obtainable only from the survey itself.

(2) A physician wishing to compare the effects of two drugs in the treatment of some disease may wish to stop the investigation if at some stage a convincing difference can already be demonstrated using the available data.

(3) A manufacturer carrying out inspection of batches of some product may be able to pass mostof his batches with little inspection but may carryout further inspection of batches of doubtful quality. A given degree of discrimination between good and bad batches could be achieved in various ways, but a sequential scheme will often be more economical than one in which a sample of constant size is taken from each batch [SeeQuality control, statistical for further discussion of suchapplications].

The most appropriate design and method of analysis of a sequential investigation depend on the purpose of the investigation. The statistical formulation of that purpose may take one of a number of forms, usually either estimation of some quantity to a given degree of precision or testing a hypothesis with given size and given power against a given alternative hypothesis. Economy in number of observations is typically important for sequential design. Some details of particular methods are given in later sections.

Sometimes a sequential investigation, although desirable, may not be practicable. To make effective sequential use of observations they must become available without too great a delay. It would not be possible, for example, to do a sequential analysis of the effect of some social or medical policy if this effect could not be assessed until five years had elapsed. In other situations it may be possible to scrutinize the results as they are obtained, but only at very great cost. An example might be a social survey in which data could be collected rather quickly but in which a full analysis would be long and costly.

History. An important precursor of the modern theory of sequential analysis was the work done in 1929 by Dodge and Romig (1929–1941) on double sampling schemes. Their problem was to specify sampling inspection schemes that discriminated between batches of good and bad quality. The first stage of sampling would always be used, but the second stage would be used only if the results of the first were equivocal; furthermore, the size of the second sample and the acceptance criteria might depend on the first stage results. Bartky (1943) generalized this idea in his “multiple sampling,” which allowed many stages, and his procedure was very closely related to a particular case of the general theory of sequential analysis that Wald was developing simultaneously.

This theory, developed for the testing of military equipment during World War II, is summarized in Wald’s book Sequential Analysis (1947). It represents a powerful exploitation of a single concept, the “sequential probability ratio test,” which has provided the basis of most subsequent work. Related work proceeding simultaneously in Great Britain is summarized by Barnard (1946). Where as Wald’s theory provided the specification of a sampling scheme satisfying given requirements, Barnard’s work was devoted to the converse problem of examining the properties of a given sequential scheme. Barnard drew attention to the close analogy between sequential inspection schemes and games of chance. Indeed, some of the solutions to gaming problems provided by seventeenth-century and eighteenth-century mathematicians are directly applicable to modern sequential schemes.

Postwar theoretical development, stimulated primarily by Wald’s work, has perhaps outrun practical applications. Many recent workers have apparently felt that the standard sequential theory does not provide answers to the right questions, and a number of new lines of approach have been attempted.

Sequential estimation. Suppose that in a large population a proportion, p, of individuals show some characteristic (or are “marked”) and that in a random sample of size n the number of marked individuals is X. Then the proportion of marked individuals is X/n. By standard binomial distribution theory, the standard error of X/n is . The standard error expressed as a proportion of the true value is therefore , and when p is small this will be approximately (np)-J. Now np is the mean value of X = n(X/n). Intuitively, therefore, one could achieve an approximately constant proportional standard error by choosing a fixed value of X. That is, sampling would be continued until a predetermined number of marked individuals had been found. This is called a “stopping rule.” The sample size, n, would be a random variable. If p happened to be very small, n would tend to be very large; an increase in p would tend to cause a decrease in n. This procedure is called “inverse sampling,” and its properties were first examined by Haldane (1945).

At first sight it seems natural to estimate p by X/n, This estimator is slightly biased under inverse sampling, and some statisticians would use the modified estimator (X– l)/(n – 1), which is unbiased. Others feel that X/n is preferable despite the bias. In most practical work the difference is negligible.

Inverse sampling is one of the simplest methods of sequential estimation. One could define different stopping rules and for any particular stopping rule examine the way in which the precision of X/n varied with p; conversely, one could specify this relationship and ask what stopping rule would satisfy the requirement.

If a random variable is normally distributed with mean μ and variance σ2, a natural requirement might be to estimate μ with a confidence interval of not greater than a given length at a certain probability level. With samples of fixed size, the length of the confidence interval depends on σ, which is typically unknown. The usual Student t procedure provides intervals of random and unbounded length. Stein (1945) describes a two-sample procedure (with preassigned confidence-interval length) in which the first sample provides an estimate of σ. This estimate then determines the size of the second sample; occasionally a second sample is not needed. This scheme leads naturally to a general approach to sequential estimation. Suppose that, as in inverse sampling, one proceeds in a fully sequential manner, taking one observation at a time, and stops when the desired level of precision is reached. This precision may be determined by customary standard error formulas. Anscombe (1953) showed that in large samples this procedure will indeed yield estimates of the required level of precision. Thus, suppose an investigator wished to estimate the mean number of persons per household in a certain area, with a standard error of 0.1 person, and he had little initial evidence about the variance of household size. He could sample the households until the standard error of the mean, given by the usual formula , fell as low as 0.1. A practical difficulty might be that of ensuring that the sampling was random.

Sequential hypothesis testing. Suppose that one wishes to test a specific hypothesis, H0, in such a way that if H0 is indeed true it will usually be accepted and that if an alternative hypothesis, H1, is true H0 will usually be rejected. In the most elementary case H0 and H1 are simple hypotheses; that is, each specifies completely the probability distribution of the generic random variable, X. Suppose f0(x) and fi(x) are the probabilities (or probability densities) that X takes the value x when H0 and Ht are true, respectively.

Sequential probability ratio test (SPRT). Wald proposed the following method of sequential hypothesis testing. Independent observations, Xi, are taken sequentially and result in values, xt. Definetwo positive constants, A0 and A,. At the nthstage, calculate

If f1n/f0n < A0 accept Hn; if reject f1n/f0n > A0take the next observation and repeat the procedure.

Wald called this procedure the “sequential probability ratio test” (SPRT). The ratio f,fOB, normally called the “likelihood ratio,” plays an important part in the Neyman-Pearson theory of hypothesis testing, a fact that probably largely explains Wald’s motivation. The likelihood ratio also occurs naturally in Bayesian inference [SeeBAYESIAN INFERENCE].

Let α be the probability of rejecting H0 when it is true, and let β be the probability of accepting H0 when H1 is true. It can be shown that At = /?/(1 – a) and At < (1 /3)/a and that the SPRT with A0 = /3/(l -) and A, = (1 ft)/a will usually have error probabilities a.’ and /3.’ rather close to a and /3. (The inequalities arise because sampling usually stops when the bounds A0 and Aj are slightly exceeded rather than equaled.)

The number of observations, n, required before a decision is reached is a random variable. Wald gave approximate formulas for E0(n) and , the mean number of observations when H0 or HI is true. Wald conjectured, and Wald and Wolfowitz (1948) proved, that no other test (sequential or not) having error probabilities equal to a.’ and j8.’ can have lower values for either E0(n) or Ej(n) than those of the SPRT.

As an example, suppose that H0 specifies that the proportion, p, of “marked” individuals in a large population is pa and that H states that p is PI where pt pa. At the nth stage, if x marked individuals have been found, the likelihood ratio is

Sampling will continue as long as

using the appropriate formulas for the bounds.Taking logarithms, this inequality is

or a0 + bn < x < at + bn,

where a0, at, and b are functions of pc, Pi, x, and ft. The SPRT can thus be performed as a simple graphical procedure, with coordinate axes for x and n and two parallel boundary lines, x = a0 + bn and x = a1 + bn (see Figure 1). The successive

values of X are plotted to form a “sample path,” and sampling stops when the sample path crosses either of the boundaries.

Graphic solutions with parallel lines are also obtained for the test of the mean of a normal distribution with known variance and for the parameter of a Poisson distribution.

The SPRT is clearly a powerful and satisfying procedure in situations where one of two simple hypotheses is true and where the mean number of observations is an appropriate measure of the sampling effort. Unfortunately, some or all of these conditions may not hold. A continuous range of hypotheses must usually be considered; the hypotheses may not be simple; and the variability of the number of observations, as well as its mean, may be important.

Suppose that there is a single parameter, 0, describing the distribution of interest, and further suppose that H0 and HI specify two particular values of B -. 0,, and 0,. For every value of 8, including 00 and 0l, quantities of interest are the probability, L(0), of accepting H0 (called the “operating characteristic” or O.C.) and the average number of observations, Ee(n) (called the “average sample number,” ASN, rather than the “average sample size,” for obvious reasons). Approximate formulas for both these quantities are found in Wald’s book (1947). The O.C. is an approximately smooth function between 0 and 1, taking the values L(00) = 1 a, L(0t) = ft (see Figure 2). The ASN, that is, E(n), normally has a maximum for a value of 0 between 00 and 0! (see Figure 3). In the binomial problem discussed above, for example, the maximum ASN occurs close to the value p = b, for which, on the average, the sample path tends to move parallel to the boundaries. It is remarkable, though, that in many situations of practical interest this maximum value of the ASN is less than the size of the nonsequential procedure that tests Ho

and H1 with the same error probabilities as the SPRT.

Closed procedures. It can be shown in most cases that an SPRT, although defined for indefinitely large n, must stop some time. Individual sample numbers may, however, be very large and the great variability in sample number from one sample to another may be a serious drawback. Wald suggested that the schemes should be “truncated” by taking the most appropriate decision if a boundary had not been reached after some arbitrary large number of readings. The properties of the SPRT are, however, affected unless the truncation sample size is very high, and a number of authors (Bross 1952; Armitage 1957; Anderson 1960; Schneiderman Armitage 1962) have recently examined closed procedures (that is, procedures with an upper bound to the number of observations) of a radically different type. In general, it seems possible to find closed procedures which are only slightly less efficient than the corresponding SPRT at H0 and Hj but which are more efficient in intermediate situations. (For an account of the use of closed procedures in trials of medical treatments, see Armitage 1960.) Frequently the hypotheses to be tested will be composite rather than simple, in that the probability distribution of the observations is not completely specified. For example, in testing the mean of a normal distribution the variance may remain unspecified. Wald’s approach to this problem was not altogether satisfactory and recent work (for example, following Cox 1952) has tended to develop analogues of the SPRT using sufficient statistics where possible [SeeSUFFICIENCY], A sequential t-test of this type was tabulated by Arnold (see U.S. National Bureau of Standards 1951). The usual approximations to the O.C. and ASN do not apply directly in these situations. Cox (1963) has described a large-sample approach based on maximum likelihood estimates of the parameters.

Two-sided tests of hypotheses. In many statistical problems two-sided tests of hypotheses are more appropriate than one-sided tests. In an experiment to compare two treatments, for example, it is customary to specify a null hypothesis that no effective difference exists and to be prepared to reject the hypothesis if differences in either direction are demonstrable. One approach to two-sided sequential tests (used, for example, in the standard sequential t-test) is to allow an alternative composite hypothesis to embrace simple hypotheses on both sides of the null hypothesis. It may be more appropriate to recognize here a three-decision problem, the decisions being to accept the null hypothesis (that is, to assert no demonstrable difference), to reject it in favor of an alternative in one direction, or to reject it in the other direction. A useful device then is to run simultaneously two separate two-decision procedures, one to test H0 against Ht and the other to test H0 against where H1 and are alternatives on different sides of H0 (see Sobel Wald 1949).

Other approaches. Wald’s theory and the sort of developments described above are in the tradition of the Neyman-Pearson theory of hypothesis testing, with its emphasis on risks of accepting or rejecting certain hypotheses when these or other hypotheses are true. This approach is arbitrary in many respects; for example, in the SPRT there is no clear way of choosing values of a and /3 or of specifying an alternative to a null hypothesis. Much theoretical work is now based on statistical decision theory [SeeDECISION THEORY; see also Wald 1950]. The aim here is to regard the end product of a statistical analysis as a decision of some sort, to measure the gains or losses that accrue, under various circumstances, when certain decisions are taken, to measure in the same units the cost of making observations, and to choose a rule of procedure that in some sense leads to the highest expectation of gain. Prior probabilities may or may not be attached to various hypotheses. With certain assumptions, the SPRT emerges as an optimal solution for the comparison of two simple hypotheses, but there is no reason to accept it as a general method of sequential analysis. Chernoff (1959) has developed a large-sample theory of the sequential design of experiments for testing composite hypotheses. The aim is to minimize cost in the limiting situation in which costs of wrong decisions far outweigh costs of experimentation. Account is taken of the choice between different types of observation (for example, the use of either of two treatments). A somewhat different approach (for instance, Wetherill 1961) is to stop an investigation as soon as the expected gain achieved by taking a further observation is outweighed by the cost of the observation. The formulation of the problem requires the specification of prior probabilities and its solution involves dynamic programming [SeePROGRAMMING].

The recent interest shown in statistical inference by likelihood, with or without prior probabilities, has revealed a conflict between this approach and the more traditional methodology of statistics, involving significance tests and confidence intervals [SeeLIKELIHOOD]. In the likelihood approach, inferences do not depend on stopping rules. There is, on this view, no need to have a separate theory of estimation for inverse sequential sampling or to require that investigations should follow a clearly defined stopping rule before their results can be rigorously interpreted.

The effect of all this work is at present hard to assess. The attraction of a global view is undeniable. On the other hand, the specification of prior distributions and losses may be prohibitively difficult in most scientific investigations.

A number of workers have studied problems involving progressive changes in experimental conditions. Robbins and Monro (1951) give a method for successive approximation to the value of an independent variable, in a regression equation, corresponding to a specified mean value of the dependent variable. Similar methods for use in stimulus-response experiments are reviewed by Wetherill (1963). In industrial statistics much attention has been given to the problem of estimating, by a sequence of experiments, the set of operating conditions giving optimal response. This work has been stimulated mainly by G. E. P. Box, whose evolutionary operation is a method by which continuous adjustments to operating conditions can be made [SeeEXPERIMENTAL DESIGN, article on RESPONSE SURFACES].

P. Armitage

[Directly related are the entries Estimation; Hypothesis testing; Screening And Selection.]


Useful bibliographies and surveys of sequential analysis are given in Jackson 1960; Johnson 1961; and Wetherill 1966. Wald 1947 remains an important source book; an elementary exposition of the Sprt, with examples, is contained in Columbia University, Statistical Research Group 1945. Armitage 1960 is concerned with medical applications. Federer 1963 discusses sequential procedures in screening and selection problems.

Anderson, T. W. 1960 A Modification of the Sequential Probability Ratio Test to Reduce the Sample Size. Annals of Mathematical Statistics 31:165–197.

Anscombe, F. J. 1953 Sequential Estimation. Journal of the Royal Statistical Society Series B 15:1–29. → Contains eight pages of discussion.

Armitage, P. 1957 Restricted Sequential Procedures. Biometrika 44:9–26.

Armitage, P. 1960 Sequential Medical Trials. Oxford: Blackwell; Springfield, 111.: Thomas.

Barnard, G. A. 1946 Sequential Tests in Industrial Statistics. Journal of the Royal Statistical Society Supplement 8:1–26. → Contains four pages of discussion.

Bartky, Walter 1943 Multiple Sampling With Constant Probability. Annals of Mathematical Statistics 14:363–377.

Bhoss, I. 1952 Sequential Medical Plans. Biometrics 8:188–205.

Chernoff, Herman 1959 Sequential Design of Experiments. Annals of Mathematical Statistics 30:755–770.Columbia University, Statistical Research Group 1945 Sequential Analysis of Statistical Data: Applications. New York: Columbia Univ. Press.

Cox, D. R. 1952 Sequential Tests for Composite Hypotheses. Cambridge Philosophical Society, Proceedings 48:290–299.

Cox, D. R. 1963 Large Sample Sequential Tests for Composite Hypotheses. Sankhya: The Indian Journal of Statistics 25-.5–12.

Dodge, Harold F.; and Romig, Harry G. (1929–1941) 1959 Sampling Inspection Tables: Single and Double Sampling. 2d ed., rev. enl. New York: Wiley; London: Chapman. This book is a republication of fundamental papers published by the authors in 1929 and in 1941 in the Bell System Technical Journal.

Federer, Walter T. 1963 Procedures and Designs Useful for Screening Material in Selection and Allocation, With a BIBLIOGRAPHY. Biometrics 19:553–587.

Haldane, J. B. S. 1945 On a Method of Estimating Frequencies. Biometrika 33:222–225.

Jackson, J. Edward 1960 BIBLIOGRAPHY on Sequential Analysis. Journal of the American Statistical Association 55:561–580.

Johnson, N. L. 1961 Sequential Analysis: A Survey. Journal of the Royal Statistical Society Series A 124: 372–411.

Bobbins, Herbert; and Monro, Sutton 1951 A Stochastic Approximation Method. Annals of Mathematical Statistics 22:400–407.

Schneiderman, M. A.; and Armitage, P. 1962 A Family of Closed Sequential Procedures. Biometrika 49: 41–56.

Sobel, Milton ; and Wald, Abraham 1949 A Sequential Decision Procedure for Choosing One of Three Hypotheses Concerning the Unknown Mean of a Normal Distribution. Annals of Mathematical Statistics 20:502–522.

Stein, Charles M. 1945 A Two-sample Test for a Linear Hypothesis Whose Power is Independent of the Variance. Annals of Mathematical Statistics 16:243–258.

U.S. National Bureau OF Standards 1951 Tables to Facilitate Sequential t-Tests. Edited by K. J. Arnold. National Bureau of Standards, Applied Mathematics Series, No. 7. Washington: U.S. Department of Commerce.

Wald, Abraham 1947 Sequential Analysis. New York: Wiley.

Wald, Abraham (1950) 1964 Statistical Decision Functions. New York: Wiley.

Wald, Abraham; and Wolfowitz, J. (1948) 1955 Optimum Character of the Sequential Probability Ratio Test. Pages 521–534 in Abraham Wald, Selected Papers in Statistics and Probability. New York: McGraw-Hill. → First published in Volume 19 of the Annals of Mathematical Statistics.

Wetherill, G. B. 1961 Bayesian Sequential Analysis. Biometrika 48:281–292.

Wetherill, G. B. 1963 Sequential Estimation of Quantal Response Curves. Journal of the Royal Statistical Society Series B 25:1–48. → Contains nine pages of discussion.

Wetherill, G. Barrie 1966 Sequential Methods in Statistics. New York: Wiley.