views updated

# Index Numbers

I. Theoretical AspectsErik Ruist

BIBLIOGRAPHY

II. Practical ApplicationsEthel D. Hoover

BIBLIOGRAPHY

III. SamplingPhilip J. McCarthy

BIBLIOGRAPHY

## I. THEORETICAL ASPECTS

An index number measures the magnitude of a variable relative to a specified value of the variable. For example, suppose that a certain type of apple sold at an average price of 10 cents last year but sells at an average price of 11 cents this year. If last year’s price is chosen as a base and is arbitrarily set equal to 100 (to be thought of as a pure number or as 100 per cent), then the index number identified with this year’s price would be 110 (that is, [11/10] x 100), which indicates that this year’s price is 10 per cent higher than last year’s price. Rather than comparing the price at two different dates, we might wish to compare the current price of this type of apple in New York City with that in Chicago, in which case we would choose the price in one of those two cities as a base and express the price in the other city relative to the base. Thus, index numbers can be used to make comparisons over both time and space.

If index numbers were used only to compare such variables as the price of a single commodity at different dates or places, there would be little need for a special theory of index numbers. However, we might wish to compare, for example, the general price levels of commodities imported by the United States in two different years. The prices of some commodities will have risen, and the prices of others will have fallen. The problem that arises is how to combine the relative changes in the prices of the various commodities into a single number that can meaningfully be interpreted as a measure of the relative change in the general price level of imported commodities. This example illustrates perhaps the major problem dealt with in index number theory, and this article discusses primarily the various solutions that have been proposed.

### Two approaches

The possibility of using an index number as an aggregate measure of the price change of several commodities seems to have been recognized in the eighteenth century, but deliberate theoretical discussions did not begin until the middle of the nineteenth century. Among the formulas suggested then were those of the German economists Étienne Laspeyres and Hermann Paasche, which are still used extensively. The choice of formula was made according to what was considered to be “fair.” A major step forward in the development of criteria by which to judge the various formulas was the set of tests suggested by Irving Fisher (1922). From the 1920s on, however, greater care was taken to place the index in an economic context. Thinking mainly of cost-of-living comparisons, investigators defined the price index as the relative change in income necessary to maintain an unchanged standard of living.

The two lines of thought, the mainly statistical one and the economic one, still exist side by side. The economic approach starts with some economic considerations but does not arrive at a definite result. In contrast to this, the statistical approach starts with some subjectively chosen rules but arrives at formulas that may be used directly in practical work. Fortunately, the index formulas derived in these two ways are usually very similar. There is, however, still need of a more unified theory of index numbers.

### The statistical theory

The pure statistical theory of index numbers is general in the sense that it could be applied to any index without regard to what the index represents. In order to avoid confusion, this discussion will refer mainly to price indexes and will revert to quantity indexes only when necessary for the development of the price index theory.

Consider the relations Pi1/pi0 (i = 1,... n) between the prices of n commodities at two points of time, t0 and t1. These relations, called price ratios or price relatives, could be considered as elements having a certain distribution, the central measure of which is sought. Thus, it is natural to construct a weighted arithmetic, harmonic, or geometric average of the price ratios.

By choosing the weights in different ways it is possible to arrive at many types of indexes. Unweighted arithmetic averages were long used. However, by the first half of the nineteenth century, Arthur Young, Joseph Lowe, and G. Poulett Scrope, all of England, used weighted arithmetic averages of the price ratios. The weight Wi given to the price ratio Pi1/Pi0 was

where q,i is a quantity of commodity i showing its general importance in the list of commodities. If q% is specified to be the quantity of commodity i traded during a period around t”, so that qt = qi0, the resulting formula is the one suggested in 1864 by Laspeyres, namely,

The subscripts on PL simply indicate that the index measures the price level at date tx relative to that at the base date t”.

There is a complication here that does not seem to have attracted much attention. In practice, prices are observed at points of time, whereas quantities generally have to be taken as referring to periods of time. However, the denominator in Laspeyres’s formula is usually interpreted as the actual value of transactions during the base period t0. This implies either that prices have been constant within the period or that the p; are to be interpreted as average prices. To avoid ambiguity it will be assumed in the following discussion that the periods are very short, so that the prices can be regarded as constant during each period.

A formula equivalent to Laspeyres’s but using qix instead of qin in the weights naturally suggested itself and seemed equally justifiable. It was advocated in 1874 by Paasche.

Alfred Marshall suggested that instead of using quantities referring to one of the two points of time compared by the index, an average of the corresponding quantities should be used—that is,

Because this index formula was strongly advocated by F. Y. Edgeworth, it is often called Edgeworth’s formula. The use of a geometric average of the quantities associated with the two dates has also been suggested.

#### Fisher’s tests

The various formulas gave results that were sometimes widely different, and it was evident that criteria for judging the quality of the formulas were needed. Fisher suggested a set of criteria that have remained the most extensive and widely used. Before describing Fisher’s criteria, let us define P” as the index number given by formula P expressing the change in the general price level between dates r and s relative to the price level at date r; that is, date r is the base.

The time reversal test states that in comparing the prices at two dates, a formula should give the same result regardless of which of the two dates is chosen as the base. For example, if a formula indicates that the price level in 1965 was double that in 1964 when 1964 is taken as the base, then it should indicate that the price level in 1964 was one half that in 1965 when 1965 is taken as the base. Symbolically, the test requires that

PrsP>r = 1,for all r and s.

The circular test requires that

PrsPst = 1, for all r, s, and t; r t.

For example, if a formula indicates that the price level doubled between 1963 and 1964 and then doubled again between 1964 and 1965, it should indicate that the price level in 1965 was four times that in 1963 when 1965 and 1963 are compared directly. Although this test has great intuitive appeal, Fisher argued that it is not an essential test and even suggested that a formula that satisfies it exactly should generally be rejected.

The factor reversal test presupposes that the weights used in the price index formula are functions of quantities. It states that if the price index P0,1 is multiplied by a corresponding quantity index Q0,1 derived by interchanging p and q in the index formula, the result should equal the value ratio—that is,

Fisher’s tests have been found to be inconsistent with each other. However, Fisher classified all the formulas he tested according to which tests they fulfilled. He found only four formulas that deserved a “superlative” rating, all of which used quantities from both t" and t,. They included Edgeworth’s formula, noted above, and the corresponding formula with geometric instead of arithmetic means of quantities. Also “superlative” were the arithmetic and geometric means of Laspeyres’s and Paasche’s formulas. Fisher called the geometric mean of these formulas the “ideal” index.

#### Chain indexes

The discussion so far has dealt with a comparison of prices between only two points of time. In practice, however, indexes are calculated for many dates at regular intervals—for example, monthly or annually. Thus, the question that arises is how a series of index numbers should be calculated. For convenience, it is customary to use a fixed base, say t”, and calculate successively Pm,Po2, ooo,P0/:. This means that if Laspeyres’s formula is used, a comparison of prices at h-i and tk is in fact made using quantities that do not refer to either of these dates but refer instead to t0.

Since very often the comparison of the prices of one date with those of the immediately preceding date is more important than the comparison with the base date, it may be useful to compute for every tr an index Pr.1>r and then define

P0k = Pol.Pl2...Pk-1,k.

This chain index, originally suggested by Marshall in 1887, satisfies the circular test for any r <s > t. In spite of Fisher’s doubts, satisfaction of the circular test is a very attractive property, and chain indexes are often used. Any index formula could be applied for the index P,..,, within the links.

The development of the chain index may be said to have originated in a desire to find an index with the properties expressed by Fisher’s circular test. It is interesting to note that a somewhat similar result may be obtained by reasoning from the factor reversal test. Following Division (1925) and Törnqvist (1937), we start from the criterion

Taking logarithms of both sides and differentiating with respect to t, we have

To obtain symmetry between the price index and the quantity index, the terms may be equated pair-wise—that is, the first term on the left-hand side of the equality sign may be set equal to the first term on the right-hand side of the equality sign, and similarly for the second terms.

Integrating the equations, we obtain for P,n

where . Thus, the value of P0i depends on the development of the Ci(t) between t” and t, which are the proportions of the different commodities in the total budget. If an assumption is made about this development, the integral can be solved explicitly. Thus, if the c;(t) are assumed to be constant over time—that is, if all commodities are assumed to have a price elasticity of demand equal to one, we obtain the geometric index formula

If no assumptions are made about the ct, the mean value theorem of the integral calculus may be applied to (1) to obtain

where cūi is a weighted mean of the Cj(£) over the period from t0 to tj. If this period is not too long, çi may be approximated by the share of commodity i in the total expenditure during the period. The definite integral in (1) can be split up into a sum of several integrals, each covering a period short enough to make this approximation satisfactory. The resulting series of price indexes corresponds to a chain index composed of weighted geometric means of price changes.

#### Best linear indexes

A further development of the idea that an index does not compare only two points of time has been suggested by Theil (1960). He argues that in the calculation of an index the situation during all observed points of time should influence the result symmetrically.

In terms of the notation used above, Theil starts from an arithmetic mean index with fixed weights —that is,

If o is substituted for wi, this index may be written

Here Pt, and P0 may be called absolute indexes to contrast with the relative index P0t.

Using the absolute indexes, each individual price could be represented as

Pit = αiPt + βil,

where vit is a disturbance or error term. If all prices moved proportionately, the vit could be made zero. In general, however, this is not the case. But if the vit cannot be made zero, the parameters ai and αi can be determined so that the vit are minimized in some sense. If the parameters are determined in this way, the resulting index is called a best linear index.

If a quantity index is defined in a similar way, symmetric best linear price and quantity indexes with some interesting optimal properties can be derived. By a suitable choice of minimization procedure, the indexes could be made to minimize the sum of squares of “cross-value discrepancies,” that is, they could be made to minimize

where r and s take on all observed values of t. This is a kind of generalization of the factor reversal test. The factor reversal test could be said to specify that

It has been found that the best linear index tends to be biased in the sense that the differences are systematically positive. To construct a best linear unbiased index, Kloek and de Wit (1961) introduced the condition

Fulfillment of this condition means that the factor reversal test is satisfied “on the average.” As in the case of the best linear index, the weights are obtained by finding the largest latent root of a certain matrix.

### The economic theory

The economic theory of index numbers is most often discussed in terms of a consumer price index. The object is to measure the changes in the cost of living of a person or of a group of persons who have identical tastes for goods. A utility, U (not necessarily a cardinal number), is associated with every combination of quantities of goods that is under consideration. The person is assumed to be well adapted to the prevailing price situation, so that given his income he chooses the set of quantities that gives him the highest level of utility. For each point of time t (defining a set of prices) there exists for each level of utility U an expenditure μ(t,U) which is the lowest possible expenditure to attain U.

#### The constant utility index

At the point of time t" the person’s choice of quantities and his expenditure ε(t0, U0) are observed. The equivalent observations are made at t,. If it can be stated that for the person observed μn = μ, the price index for this level of utility is

This index, sometimes referred to as the “indifference defined index” or “the constant utility index,” was first discussed by Koniis ([1924] 1939). It is to be noted that the price index is associated with a certain value of U—in this case U0. For other values of U the price index may be different.

It is very seldom known whether U is greater than, equal to, or less than U1; that is, the value of /^(ti.Uo) is generally not known. Thus, to estimate μ(ti,l/0) it is necessary to make certain assumptions or approximations. Alternatively, it is possible to find an upper and/or a lower limit to the value of the constant utility index.

Upper and lower limits. It can be shown that a Laspeyres index gives an upper bound to P0i. The Laspeyres index shows the change in expenditure necessary to buy the same quantities of all goods at ti as those actually bought at t0. However, this corresponding expenditure at tt may under the new price structure be disposed of differently so as to give a level of utility higher than U The utility U would then be attainable with a lower expenditure, and the indifference defined index for U would be lower than the Laspeyres index.

In a similar way it is possible to show that a Paasche index gives a lower limit to a constant utility index for the utility level U1,. Thus,

and

However, since without further assumptions nothing is known about the relation between Pn1(Un) and Pni(Ui), these rules do not, contrary to what has sometimes been believed, give upper and lower limits for the same index.

Several attempts have been made to arrive at simultaneous upper and lower limits. Thus, Staehle (1935) tried to find a utility level U’ such that the set of quantities corresponding to μ(tn,U’) would cost as much at t0 prices as μ(t0,U0). Since clearly U’ d U0 (the budget was available at t0 but was not chosen), μ(t1, U’) ≤ µ(t1Uo) and

Thus, this index constitutes a lower limit to Pol(U0), and since P£t gives an upper limit, the desired result is obtained. There remains, however, the problem of determining μ(t1, U’). Under certain circumstances it can be determined from family budget data.

Ulmer (1949) took a quite different approach, which can be described as follows. Let D” be the difference between Pn1(U0) and Po1(U1). It could be positive or negative. The difference between the Laspeyres and Paasche indexes can then be written

PL-Pr

=[PL-P(U0)] + [P(U1)-Pp] + [P(U0)-PU1]

= DL + Dr + Du,

where the subscript “01” on the P’s has been dropped for convenience. Using this identity together with the inequalities (3a) and (3b), it can be shown that

PL-DL-Dr≤P(U0)≤

and that

PP≤(U1)≤ Pp + DL + DP.

Hence, both constant utility indexes are given upper and lower bounds.

Although (DL + Dp) is not directly observable, (DL + Dp + Du) = (PL - Pp) is, and Ulmer argued that it is usually reasonable to suppose that

max (Dl+Dr) ≤ max (Dl+Dr+Du)

and therefore that maxt, (PL - Pr) would be a conservative estimate of the difference between the upper and lower bounds on P(U0) or P(U1).

Point estimates. Several attempts have been made to arrive at a point estimate of a constant utility index by using family budget data. If such data were available for each period for which the index was to be calculated, criteria could be developed to find in each period a family or group of families with a given level of utility. By comparing their expenditures in different periods, an index could be calculated. These methods have not been used in practice.

An approximation to a constant utility index has been found by Theil (1965). Following Theil, it may be shown that

where ci(t) is, as before, the value share of commodity i in the total budget at time t. Using this relation and applying the Taylor expansion to logμ(t,U0) as a function of logp1tlogp2t .... logpnt, we obtain (keeping terms up to the second degree) the relation

which gives, for example,

This is very similar to formula (2), which was obtained by purely statistical reasoning.

Thus, the economic and statistical lines of thought point to similar formulas. There is, however, still very little interaction between the two approaches.

Erik Ruist

## BIBLIOGRAPHY

Divisia, Francois 1925 L’ indice monétaire et la theorie de la monnaie. Revue d’ economie politique 39:842-861.

Fisher, Irving (1922) 1927 The Making of Index Numbers: A Study of Their Varieties, Tests, and Reliability. 3d ed., rev. Boston: Houghton Miffiin.

Frisch, Ragnar 1936 Annual Survey of General Economic Theory: The Problem of Index Numbers. Econ-ometrica 4:1-38.

International Statistical Institute 1956 Bibliographic sur les nombres indices; Bibliography on Index Numbers. The Hague: Mouton.

Kloek, T.; and De Wit, G. M. 1961 Best Linear and Best Linear Unbiased Index Numbers. Econometrica 29:602-616.

Kloek, T.; and Theil, H. 1965 International Comparisons of Prices and Quantities Consumed. Econometrica 33:535-556.

KonÜs, A. A. (1924) 1939 The Problem of the True Index of the Cost of Living. Econometrica 7:10-29. → First published in Russian.

Staehle, H. 1935 A Development of the Economic Theory of Price Index Numbers. Review of Economic Studies 2:163-188.

Theil, H. 1960 Best Linear Index Numbers of Prices and Quantities. Econometrica 28:464-480.

Theil, H. 1965 The Information Approach to Demand Analysis. Econometrica 33:67-87.

TÜrnqvist, Leo 1937 Finlands Banks konsumtionspris-index. Nordisk tidskrift for teknisk økonomi 8:79-83.

Ulmer, Melville J. (1949) 1950 The Economic Theory of Cost of Living Index Numbers. New York: Columbia Univ. Press.

## II. PRACTICAL APPLICATIONS

The search for a measure of the effect on the purchasing power of money of the influx of precious metals into Europe after the discovery of America resulted in the first index number of price changes, as far as we know today. In 1764 an Italian nobleman, Giovanni Rinaldo Carli, calculated the ratios of prices for three commodities—grain, wine, and oil—for dates close to 1500 and 1750. A simple average of these three ratios constituted his measure of the price change that had occurred over the 250-year period. This idea of isolating the effect of price changes in the measures of value changes in economic life has been a dominant and continuing theme in the development and use of index numbers.

Measures of changes in prices and changes in quantities have become a familiar and useful part of current economic life. In most countries of the world official agencies now issue regular reports on one or more of the following kinds of index numbers: wholesale prices, retail prices (often called cost-of-living), prices of goods in foreign trade, quantities of goods produced, and quantities of goods in foreign trade. These indexes are frequently supplemented with indexes for domestic trade and for specialized types of goods, such as agricultural commodities and raw materials.

In most cases these official indexes are designed as general-purpose indexes, and their availability leads to their use for many and varied purposes. “General-purpose” indexes are at variance with the first principle advocated by many serious students of the making of index numbers, stated by Wesley C. Mitchell as “denning the purpose for which the final results are to be used” ([1915] 1938, p. 23). Irving Fisher, who examined various methods of computing index numbers, disagreed with this principle and thought that “…from a practical standpoint, it is quite unnecessary to discuss the fanciful arguments for using ’ one formula for one purpose and another for another,’ in view of the great practical fact that all methods (if free of freakishness and bias) agree!” ([1922] 1927, p. 231). Melville Ulmer and others put forth the point of view, now generally accepted in principle by most economists, that the making of index numbers should be tied to economic concepts and that these concepts should be expressed in operational terms (Ulmer 1949, pp. 23-24). But despite the massive amount of discussion and the long history of index number practice, the empirical difficulties of closing the gap between theory and practice have not been overcome in many cases.

This article is concerned with some of the practical aspects of index number making—the general characteristics of indexes, the kinds of data employed, and the problems and difficulties encountered because we do not live in a static economy. It is intended as a nontechnical guide to those index number practices that an index user should review to help determine whether a specific index is the appropriate one for his purpose and is likely to be an adequate approximation for the use he intends (or more realistically, what limitations are likely to result from the use of the only index available). Although the discussion is oriented largely to price indexes, the major problems and procedures also apply to quantity indexes.

### Index designations

Most of the indexes of prices labeled “wholesale” refer to the primary market level—the prices charged by manufacturers or producers to wholesalers and other buyers. In a few countries these indexes relate to the wholesale level of distribution.

Indexes of changes in retail prices are popularly referred to as cost-of-living indexes. However, the indexes in all countries are basically measures of price change (with minor variations), and official titles attempt to indicate this. In France the index is called the general retail price index; in the Federal Republic of Germany, the price index of living; in the United Kingdom, the index of retail prices. Although the name “cost-of-living index” is still retained by some countries, the most common name is “consumer price index.” This name grew out of the controversy over the United States index during World War II, when it was an important factor in wage stabilization. “Consumer price index” was adopted to help clarify interpretation of what the index measured.

The names for other indexes may vary to a minor degree, but in general they are self-explanatory: e.g., index numbers of industrial production; index numbers of the volume of wholesale (or retail) trade; quantity and unit value of commodities in external trade.

### Calculation formulas

In most countries, indexes of wholesale and retail prices are measures of price changes with “fixed” quantity weights as of some earlier period; that is, the calculation framework is that of a Laspeyres formula, (1) below, or some modification of it [see Index Numbers, article on Theoretical Aspects]. The use of this form for approximating changes in living costs is particularly controversial, although the practical difficulties of translating a “welfare” or “utility” concept into practice are generally recognized. The use of a consumer price index with fixed weights for the escalation of wages or for determining wage policy is an attempt to maintain the real purchasing power of the workers’ dollar and might be interpreted as providing the income required to maintain the base-period utility level. However, the base-period utility level can be attained without keeping the kinds and quantities of goods fixed. When there are differential changes in prices among items, consumers are likely to buy a different collection of goods. If they increase their purchases of products whose relative prices have declined, escalation according to a fixed weight index will allow an increase in utility.

Whatever the limitations of a consumer price index on a conceptual basis, the present systems of index numbers do provide a measure of the average changes in prices for the quantities of the earlier year. The corresponding quantity indexes provide a measure of average changes in quantities at fixed prices, those of the base year. The relationships are

and

These simplified forms refer only to an individual commodity. When they are expressed for aggregations of commodities, the Greek capital sigma (Σ) is used to indicate the sums, and the formulas for the Laspeyres indexes become

The denominator of both expressions is the total value in the base period of the aggregation of goods in the index. The numerator of the price index is the value of the base-period goods at current prices. The numerator of the quantity index is the value of current quantities at base-period prices.

The corresponding forms for Paasche indexes, which use current weights, are

The numerator of these expressions is the total expenditure in the current period. Since it is unusual to have separate quantity data available to use in these formulas for every item in the index, an algebraic equivalent (which is in effect a weighted average of relatives, or price ratios) is generally employed, to permit the use of value data: e.g., the Laspeyres price index becomes

Since it is also unusual for the values of consumption, production, or other variables to relate exactly to the period selected as a reference base for the index series, many indexes use the following form for a price index with fixed quantity weights:

where the subscript a refers to the date for which the values are available.

These formal calculation formulas are further modified in practice to accommodate changes in samples of items, sources of reports, incomplete data, and similar unpredictable problems of non-comparability. The usual procedure, called linking, is to calculate an index for each period, with the preceding period as a base, and multiply successive indexes together to obtain an index on a fixed base.

#### The reference-base period

Index numbers have two kinds of base dates: the date to which the consumption or production weights refer (date a in the preceding formula) and the date on which the price-change comparisons are based (date 0 in this formula). The former may be called the weight base, and the latter the reference base. It is customary to set the index equal to 100 at the reference-base date.

The initiation of index numbers and the deep interest in their fluctuations indicate that the needs for such economic measures were engendered by unusual periods of economic activity, such as inflations, depressions, and wars. This is probably the reason why there historically has been so much emphasis on the choice of a “normal” period for a reference base. Such a choice is extremely difficult because few, if any, periods can be said to be normal for all segments of the economy. The emphasis is perhaps justified to the extent that an index series is more meaningful when the weight base and the reference base are identical or not widely separated in time. In practice compromises must be made, and the two base dates seldom correspond exactly.

For the most part indexes are more reliable for short-period comparisons and, theoretically, more reliable for periods close to the weight base. But because of the extensive resources required to keep the weights continuously up to date and because in practice moderate weight changes have less influence on an index than have price changes, a common practice is to change the reference-base period more frequently than the weighting structure.

An index with a fixed reference base makes direct comparisons between the base selected and any one of the succeeding dates. Comparisons with any period other than the reference base require conversion to a new base. Such conversions, essentially, are made every time a percentage change is obtained, as shown in Table 1.

If the conversions are done correctly, percentage changes from one date to another will aways be the same for the original series and for the converted series (except for minor rounding differences, as shown in two instances in Table 1).

In about half the countries that maintain retail price indexes the practice has been to select a reference-base period reasonably close to the weight base and to change the reference base when the indexes are revised. This has the advantage of alerting the user to the fact that revisions have been made and of getting him to consider whether such revisions are so important that the indexes before and after the revision cannot be considered comparable. In Hungary the index is computed with the preceding year as base, and in the United Kingdom with January of each year as base. In the United States the present policy is to update the reference base about every ten years and to convert most economic index series to a uniform base to facilitate their use in economic analysis.

Uniform base periods for various economic series and for various components of the same index must be used with some discrimination, in order to present facts in proper perspective. Since there is seldom one period that can be considered “normal” for all segments of an economy, conversions of indexes to other periods for supplementary exposition is often required. For example, the change in

Table 1 - Example of the conversion of an index to a new reference base
ORIGINAL BASE 1950 = 100 CONVERTED BASE 1955 = 100 CONVERTED BASE 1960 - 100
Per cent change from preceding Per cent change from preceding Per cent change from preceding
YearIndexdateIndexdateIndexdate
1950100.0 90.9 87.0
1955110.010.0100.010.195.710.0
1960115.04.5104.54.5100.04.5
1965120.04.3109.14.4104.34.3

the base period for United States indexes after World War II gave rise to complaints that the consumer price index gave a distorted picture. A postwar reference base (1947-1949 average) was substituted for a prewar base (1935-1939 average) at a time when the price rise for the commodity sector was leveling off but the rise for services, prices for which had remained fairly stable during the war, was beginning. The postwar base period thus highlighted a major price rise for medical care and other services throughout the 1950s. A fairer picture of the changes for commodities relative to services was afforded by comparisons with prewar prices. The rise from 1939 for services did not equal that for commodities until 1962.

#### Weights

The system of weights for price indexes relates to the level of distribution for which measures are desired.

For wholesale price indexes, censuses and similar surveys are utilized to derive the total value of sales, exclusive of taxes, for all commodities produced or processed by the private sector of the economy and sold in primary markets. In some cases both imports and exports are included. In some countries weights are limited to sales of goods for domestic consumption, thus including imports but not exports. Generally the principal exclusions from the “universe” covered by wholesale price indexes are business services, construction and real estate, sales by government, military production, securities, and goods produced and consumed within the same plant.

For consumer price indexes, weights representing the importance of individual goods and services are usually derived from special surveys of expenditures by the groups in the population for which price changes are measured, e.g., urban wage earners, farmers, low-income families, families of two or more, single consumers, etc. Generally the weights include all taxes directly associated with the purchase or ownership of specific goods and services, such as sales and excise taxes, property taxes, car registration fees, and the like. The principal exclusions are direct taxes, such as income taxes; expenditures for investments; contributions to churches and other organizations; and goods and services received without direct cash outlay, such as gifts received, home-produced foods, fringe benefits paid for by employers, and services supplied by government agencies without payment of a special tax or fee.

Indexes of industrial production generally use weights of the “census value-added” type; that is, the weights are usually proportional to value added at factor cost in different industries, as given by census data, and are derived by reducing values of gross output by costs of raw materials, fuels, containers, industrial services, etc. In a few countries weights represent gross value of production.

#### Price data

The sample of commodities and services priced regularly for the computation of price indexes is usually selected with considerable care, so that the items are well distributed among the major classification groups. The number of individual items priced varies considerably from country to country but in most cases is large enough to provide fairly reliable indexes, for groups and subgroups, of commodities and services.

Prices for the selected items in the wholesale price indexes are obtained from a sample of manufacturers or other producers and refer to the form in which the item enters commercial markets. Thus, as raw materials are processed into semifinished or finished goods, prices at each successive stage of processing may be included if the product is sold in primary markets in that form. Prices are usually reported for a precise specification, a specific class of buyer, and a specific level of distribution, quantity of purchase, and set of delivery terms. Prices are usually net of discounts, allowances, and excise taxes. Practices vary with regard to transportation costs; in the United Kingdom “delivered” prices are used for imported goods purchased by industry and “ex-works” prices for domestically produced goods; in the United States prices exclude delivery costs unless it is the normal custom to quote on a delivered basis.

Price data for consumer price indexes are usually obtained from a sample of stores and service establishments in a sample of communities that provides a good geographical coverage of the country. In a few cases the index is confined to one city or a few large cities. The prices quoted are generally cash prices for goods as offered for sale to the consumer. Usually all sales and other taxes applicable to the purchase of the specific item are added to the quoted price. In the United States and a few other countries, concessions and discounts are deducted.

The care with which data are gathered determines in large measure whether a price index for a particular period is good, bad, or indifferent. The essence of the collection process is to obtain “comparable” prices for successive periods, so that changes in the index refer to price changes only, not to a mixture of price, quality, and marketing changes. Comparability of outlets or producers is obtained by using a matched sample for each two successive time periods.

The description or specification of the individual quality of the item for which data are requested plays a key role. The precision of the descriptions used by various countries differs in degree, but the principle of comparability is generally adhered to. For Ireland’s consumer price index, for example, the requirement is that the item priced conform to a general commodity description and that it be in substantial demand in the area for which prices are being reported. Comparability is achieved by obtaining prices of the identical item in the same store for two successive periods. In most countries, however, detailed descriptions or specifications are developed to identify the quality of the item to be priced. These detailed descriptions define quality in terms of physical features, such as the kind and grade of materials, parts, construction and workmanship, size or capacity, strength, packaging, and similar factors or identification characteristics. The item is described as it enters into transactions in the market, and the assumption is made that the physical makeup determines performance characteristics.

The specifications adopted generally allow some latitude for minor variations in quality, in recognition of the many small differences from firm to firm or store to store. Within limits explicitly stated, all articles are considered comparable, and the price of the one specific item sold in largest volume each period is usually reported. Substitutions from time to time of items that fall outside the stated quality limits require special comparison procedures (as discussed for quality change below).

#### Major problems and limitations

There are many problems in the making of index numbers that materially affect the precision with which they can be applied. The unpredictable difficulties that occur as a regular part of the collection-and-com-parison process were referred to briefly above. The major problems and limitations pervading most indexes are discussed in the following paragraphs.

Definition of “universe.” One limitation that has relevance to the uses made of the indexes is the definition of the “universe” to which the indexes relate, that is, which segment of the population, which categories of business, etc. Retail or consumer price indexes for city families may be inappropriate for estimating the change in prices paid by farm families, not only because farm families may purchase their goods in different places but also because food and housing are a less important part of their expenditures than of the expenditures of city families. An index of wholesale prices of commodities, regardless of how good a measure it is, tells nothing about changes in other costs of doing business, such as wages and salaries, costs of printing, advertising, and other business services. Nor can it be assumed that wholesale prices in commercial markets are a good indicator of changes in prices paid by governments, because of special contract arrangements typical of government purchases. In such circumstances, when lack of an appropriate series forces the use of an index that is available, the user must evaluate its limitations for the specific purpose.

Sampling error. Sampling of some kind is a requirement for practically all indexes. Sampling, as opposed to complete coverage, introduces the familiar sampling error—a difficult factor to measure. In the absence of measures of sampling error, we can only judge precision intuitively, by knowing the composition of the sample. Although measurement of sampling error was being attempted in the United States for its consumer price index [seeIndex Numbers, article onSampling], actual measures were still lacking. It is probable that understatements and overstatements of price change are quite small for the comprehensive “all-items” indexes in most countries but are larger at the group and subgroup levels.

Quality changes. The problems that occupy the greater part of the time and effort of those who compile indexes are the identification and measurement of the effect of quality changes in the items in the market, the introduction of new items and variations and disappearance of old, changes in the importance of various products and services from one time to another, changes in the types of establishments through which goods flow during the marketing process, and many of the other facets of change in an economy that is not standing still. The measurement instrument must seek to disentangle the changes in prices from the effects on price of all the other changes that take place.

The problems of eliminating the effect of quality changes are particularly difficult, and the extent to which adjustments are made for them varies considerably from country to country. Where they are taken into account, market price valuations for quality changes are generally obtained in one of three ways: (1) by assuming that the price difference is all due to quality change; (2) by estimating the value of the quality difference associated with changes in physical characteristics; or (3) by estimating the value of quality changes through operating characteristics.

When two varieties of an item are selling in volume simultaneously, the assumption that a difference in price between them is entirely due to a quality difference is realistic and reasonable. When two or more varieties do not sell simultaneously, it is uncertain whether a price difference results entirely from price change, entirely from quality change, or from a mixture of both. Thus, for automobiles and other highly complex products, the prices of new models are seldom compared directly with those of the old, since it is a common market practice to introduce price changes simultaneously with model changes. In these cases producers and sellers aid in identifying the changes made in the physical characteristics and provide production costs or estimated market prices, to permit the development of quality adjustment factors. In a few countries the adjustments for quality changes for some of the complex products are applied to one or more operating or use characteristics. For example, for turbines, in the Soviet Union the price or cost per unit of potential power is obtained. In Sweden the estimate of the relative worth of different automobile models is based on results of engineering, road, and other tests.

Considerable ingenuity and experimental work has been devoted to the quality problem in index numbers. But the very elusive nature of the quality concept, combined with the difficulties of detecting quality improvement and deterioration and of deriving objective values for these changes, means that considerable judgment and discrimination must be exercised. In some cases it is likely that the mechanics of the index make too large an adjustment for quality changes, and since most of the changes are labeled “improvement,” some downward bias may be introduced. On the other hand, it is also probable that insufficient allowance has been made for quality improvement of other items. In any one monthly, quarterly, or annual interval, the total index will be made up largely of commodities and services that are unchanged in quality, and the effect of incomplete measurement for individual changes up and down is likely to be unimportant. For longer periods of time the influence may be greater, particularly for specific goods, where small quality changes cumulate from year to year and may not be detected. An evaluation of how much the factor of quality change may have influenced the movement of an index must be based on fairly detailed knowledge of both the timing and degree of quality change during the period under consideration and of the way in which they were accounted for in the index calculation.

New products. Closely related to the quality problem is the problem of timing the introduction of new products. Modifications or new varieties of older products are generally put into an index, when they have been in the market long enough to sell in substantial volume, by adjusting for quality change in the manner indicated above. But truly new products—those, such as television, that have no earlier counterpart—are generally introduced into an index only at major revisions, when new weights are available to reflect their impact throughout the weighting system. New products frequently enter the market in small volume at relatively high prices. As production and sales volume increase, price reductions are generally made. Some upward bias in an index can result if new items are introduced after major price reductions have occurred. However, the specific timing for introducing new items and the method of handling volume changes in the weights are still matters of some disagreement between index technicians.

Long-term comparisons and revision. Here the phrase “long-term comparisons” is used to mean comparisons over a period that encompasses a revision of an index. These present special problems. Theoretically, revisions in the conceptual structure, coverage, system of weights and/or operational aspects of index construction result in an index different from the previous one. If, however, an agency always presented a revised index as a new measure, individual users would have to provide some kind of a bridge from one index to the other to obtain a long-term perspective. Consequently, the issuing agencies usually “chain” the different indexes together to form a seemingly continuous series. This practice is common in most countries and provides many advantages. But users must also recognize, through a study of the changes in the makeup of the two or more separate indexes that have been chained together, the limitations involved in such comparisons.

The retail price index of the United Kingdom and the consumer price index of Sweden are examples of indexes with annual changes in weights. Technically speaking, the index for each year is different from the index for the preceding year because the weights represent a different level of living. Studies of the effect of changes in weights on the index indicate that year-to-year comparisons are so nearly the same with the old and the new weights that for all practical purposes the effect of the weight differences can be ignored for comparisons over two or three years. Over a period of five or ten years the differences may be more significant, since the continuous series includes the net effects of changes in the distribution of living expenditures over the longer period. In the United Kingdom it is felt that short-term comparisons are the most important to index users. In the United States, the importance of short-term comparisons is recognized but it is felt that the available resources should be devoted to maintaining the adequacy of current price data. The net effects of changes in living habits in the United States over approximately ten years are introduced at one time, during a major revision, rather than in smaller increments. Hence, the problems to be considered by users over periods of more than ten years are the same in both countries.

A more serious question would be raised if the conceptual structure of an index were changed. If a consumer price index were revised to measure the constant level of utility defined in economic theory or if weights at wholesale were changed from total value of shipments at each stage of processing to “value added” weights, such changes would have major influences on the index, and “long-term” comparisons would be practically meaningless.

A practical guide to users on the effect of changes made during revisions is usually provided by issuing agencies, in the form of concurrent indexes on both the old and new basis, either through continuation of the old after the new has been issued or (less frequently) through the recalculation retroactively of the preceding index.

Ethel D. Hoover

## BIBLIOGRAPHY

Carli, Giovanni (1760) 1785 Del valore e della pro-porzione de’ metalll monetati con i generi in Italia prima delle scoperte dell’ Indie, col confronto del valore e della proporzione de’ tempi nostri. Volume 7, pages 1-190 in Giovanni Carli, Delle opere. Milan: Nell’ Imperial Monistero di S. Ambrogio Maggiore.

Definitions and Explanatory Notes. 1964 United Nations, Statistical Office, Monthly Bulletin of Statistics [1963], no. 5 (Supplement). → This supplement is a good general reference for descriptions of the statistical series for the various countries of the world in the Monthly Labor Bulletin. The descriptions are necessarily brief but include the main features, as well as a reference to the various national publications, where greater detail may be found.

Fisher, Irving (1922) 1927 The Making of Index Numbers: A Study of Their Varieties, Tests, and Reliability. 3d ed., rev. Boston: Houghton Mifflin.

Gilbert, Milton 1961a Quality Changes and Index Numbers. Economic Development and Cultural Change 9:287-294.

Gilbert, Milton 1961b The Problem of Quality Changes and Index Numbers. U.S. Bureau of Labor Statistics, Monthly Labor Review 84:992-997.

Gilbert, Milton 1962 Quality Change and Index Numbers: The Reply. U.S. Bureau of Labor Statistics, Monthly Labor Review 85:544-545.

Griliches, Zvi 1962 Quality Change and Index Numbers: A Critique. U.S. Bureau of Labor Statistics, Monthly Labor Review 85:542-544.

Hofsten, Erland Von 1952 Price Indexes and Quality Changes. Stockholm: Forum.

Hoover, Ethel D. 1961 The CPI and Problems of Quality Change. U.S. Bureau of Labor Statistics, Monthly Labor Review 84:1175-1185.

International Labor Office 1962 Computation of Consumer Price Indices: Special Problems. Report No. 4. Geneva: The Office.

Mitchell, wesley C. (1915) 1938 The Making and Using of Index Numbers. 3d ed. U.S. Bureau of Labor Statistics, Bulletin No. 656. Washington: Government Printing Office. → The preface contains a discussion of the 1915 and 1921 editions.

Mudget, Bruce D. 1951 Index Numbers. New York: Wiley.

Organization For European Economic Co-Operation 1956 Quantity and Price Indexes in National Accounts, by Richard Stone. Paris: The Organization.

Ulmer, Melville J. 1949 The Economic Theory of Cost of Living Index Numbers. New York: Columbia Univ. Press.

United Nations, Economic And Social Council 1965 The Gathering and Compilation of Statistics of Prices. E/CN.3/328. Mimeographed.

U.S. Bureau Of The Census 1960 Historical Statistics of the United States, Colonial Times to 1957: A Statistical Abstract Supplement. Washington: Government Printing Office.

U.S. Bureau Of The Census 1965 Historical Statistics of the United States, Colonial Times to 1957: Continuation to 1962 and Revisions. Washington: Government Printing Office.

U.S. bureau of labor statistics 1964 Computation of Cost-of-living Indexes in Developing Countries. Bureau of Labor Statistics, Report No. 283. Washington: The Bureau.

U.S. Congress, Joint Economic Committee 1961 Government Price Statistics. 2 parts. Hearings before the Sub-committee on Economic Statistics. Washington: Government Printing Office. → Part 1 contains the report prepared by the Price Statistics Review Committee of the National Bureau of Economic Research, The Price Statistics of the Federal Government: Review, Appraisal, and Recommendations. Part 2 contains the comments of witnesses before the Joint Economic Committee on the contents of the report.

## III. SAMPLING

No matter how one resolves the conceptual and practical problems described in the two accompanying articles, the actual construction of an index number will almost always be based on sampling. The quality of an index will, therefore, depend upon the nature of the sampling process. Although this dependence has long been recognized (see King 1930, for discussion and references to earlier work), the sampling aspects of index number construction have been relatively neglected.

This article emphasizes the following points: (1) most economic index numbers have a complex sampling structure; (2) the sampling precision of an index number can be defined, even though conceptual and practical problems are not fully solved; (3) estimates of sampling error are required both for the analytic use of index numbers and for reasonable allocation of resources in designing the data-gathering procedure on which the index number is based; and (4) most components of sampling error can be estimated only by use of replication.

In order to make the discussion specific, it is framed in terms of Laspeyres indexes of consumer prices, with special reference to the consumer price index of the United States Bureau of Labor Statistics. Nearly all the discussion, however, is readily applicable to other kinds of index numbers.

#### Sampling aspects of a price index

The various points at which sampling must be employed in order to provide data for constructing a consumer price index are easily identifiable.

A Laspeyres price index for an individual consumer can be viewed as a weighted average of price ratios for the commodities and services purchased by the individual, where the weights are proportions of total expenditures in the base period for the different items of the index. The weights may be called base year value weights. An index for a group of consumers, for example, those living in a particular city or geographic area, then involves the following major sampling problems: (1) average base year value weights must be estimated from a sample of consumers; (2) since it is impossible to price all of the goods and services purchased by all of the consuming units in the population, this list of goods and services must be sampled; and (3) an average price for any good or service, in either the base year or at a current point in time, must be estimated from a sample of outlets.

The foregoing sampling problems relate to the production of an index for a particular city or geographic region. If one wishes to construct an index that relates to a country, then it becomes necessary to select a sample of cities or regions and combine their individual results into an over-all index. Finally, prices must be collected at repeated points in time, and thus temporal sampling is involved.

### Some general views on sampling

Although price indexes are based on highly complex sampling structures, measures of sampling precision are not available for any of the currently prepared indexes, although I understand that the Bureau of Labor Statistics is planning to provide such information about the consumer price index starting in early 1967 (see Wilkerson 1964). Three related arguments have been set forth to justify the absence of such reporting.

The Laspeyres index follows the prices of a sample of goods and services through time. Because the universe of goods and services available to the consumer is continually changing (some items change in quality, others disappear, and new items enter the universe), it is necessary to make a variety of adjustments in the sample items and in observed prices. Since there exists no “best” procedure for making these adjustments, the index is subject to a procedural error. It is then argued that the sampling error is probably small in relation to the procedural error and that it is therefore neither necessary nor desirable to attempt to estimate its magnitude.

Because of the complexity of the adjustment procedures, it is frequently stated that it is impossible to define and estimate that portion of the sampling variability of an index that arises from the sampling of commodities. Hence it is impossible to define or estimate the sampling precision of the index itself.

A third argument admits that it might be possible to employ probability sampling for all components of a price index. But the great complexity of the design and data-gathering operations are then stressed and the conclusion is reached that the attainment of this goal would require the use of more or less unlimited resources. These views have been expressed by Hofsten (1952, p. 42; 1959, p. 403) and Jaffe (1961), among others; and direct quotations from these articles are provided by McCarthy (1961, pp. 205-209).

#### Definition of sampling precision

The argument that it is impossible to discuss sampling precision because of the changing nature of the universe of commodities is clearly basic to a consideration of the other two arguments. Adelman (1958) seems to accept this view and, as a consequence, sets forth a method of index number construction that is more directly in line with modern sampling theory as described by Hansen, Hurwitz, and Madow (1953). She suggests periodic stratified resamplings of the changing commodity universe, together with the use of a chain index in place of the Laspeyres index. The mean-ingfulness of the Adelman approach to index number construction will not be argued here. Rather, it will be argued that it is quite reasonable to talk about the sampling precision of a Laspeyres index, provided (1) that a very general view of sampling precision, similar to that described by Stephan and McCarthy (1958, pp. 226-229), is adopted and (2) that one does not always expect to measure this precision by the application of standard formulas from the theory of sampling.

Replication to estimate sampling error. Assume the existence of a set of adjustment procedures that are used to follow a sample of goods and services through time, so that sampling variability arises only from the fact that a sample of items is selected at time zero. If one now thinks of drawing an indefinitely large number of independent samples in accordance with the same sampling procedure and of independently following each of these through to time t in accordance with the defined adjustment procedures, the resulting values of the index will define the sampling distribution of the index with respect to the sampling of items. The variance of this distribution is an acceptable measure of sampling precision for the index, and it includes a component for any inherent variability of the adjustment procedure. Furthermore, an estimate of this variance can easily be obtained by actually drawing two or more independent samples of items and independently following them through time, that is, through the use of replicated samples. It should be observed that the use of two independent samples, for example, does not mean that each sample must be as large as the desired over-all sample of commodities. Each sample may be only half as large as the over-all sample, and the published index would be the average of the two resulting indexes. Of course the reliability of the estimate of variance would improve as the number of independent samples increases. It should also be noted that in practice the independence of the samples would be difficult to preserve as time goes on.

Bias. The measure of sampling precision just defined is obviously taken about the mean of the sampling distribution of the index. If the population value of this index at time t is denoted by R(t)P, where R(t)P would be obtained by applying the adjustment procedures to all commodities, then the difference between the expected value of the index and R(t)P is the bias of the estimate arising from the sampling and estimation procedures. If the selection were based on expert judgment, then such bias might arise because all the experts might, consciously or unconsciously, not consider for selection items having a different form of price behavior from those items considered for selection. [SeeSample Surveys, article onNonprobability Sampling.]

Imperfection of adjustment procedures. In addition, one usually questions the adjustment procedures and therefore views R as only an approximation to the index that would be obtained through the use of a “perfect” adjustment procedure.

Three components of total error. As a result of the foregoing, the total error in a single estimate can be viewed as the sum of three components. The first component represents the error of variability that arises from the use of sampling (plus possible contributions from variability in applying adjustment procedures); the second component represents the bias arising from the sampling and estimation procedures; and the third component represents the bias that arises through inherent imperfection in the adjustment procedures. Other errors may of course arise from interviewing, in clerical work, or from computations; but these will not be treated in this article. [SeeErrors, article OnNonsampling Errors.]

It would appear that at least some of the differences in opinion on sampling for index numbers can be traced to a failure to distinguish carefully among these three components of error, particularly between the first and third components. All writers agree that it is unlikely that anyone will ever be able to devise a “perfect” set of rules for treating quality changes and for introducing new items into the index, but this does not mean that it is impossible or unnecessary to estimate the values of all three components.

#### Importance of sampling error

Next consider the argument that this precision is dominated by the procedural error and can therefore be ignored. Some investigations reported by McCarthy (1961) suggest that the procedural error of current consumer price indexes may indeed dominate the sampling error, although empirical investigations of the over-all effect of procedural error are almost as lacking as those of sampling error. This does not necessarily mean that sampling error can be ignored. It remains important for several reasons, in particular:

(1)If the goal is to estimate the level of the “true” index at various points in time and if resources are fixed, then the most efficient way of improving the accuracy of these estimates would be to divert resources from the maintenance of a relatively large sample of commodities and to use these resources in basic research aimed at reducing the magnitude of the procedural error. It is clear that good estimates of sampling precision and of bounds on the procedural error are required in order to make judgments of this kind.

(2)If the goal is to estimate short-term changes in the level of the “true” index, then it appears likely that sampling error will be more important than procedural error and hence an estimate of sampling error becomes essential.

(3)The construction of a price index involves not only a set of adjustment procedures and the sampling of commodities but also the sampling of localities and the sampling of price reporters within these localities. There must be a balance between these errors and the sampling errors arising from the other parts of the design. Again it is impossible to discuss such a balancing operation unless some attempt is made to measure these components of error.

Reporting estimates of error. Estimates of error for the various components of a price index should also be available in published form to assist those who wish to use the indexes in a critical fashion. When one considers that small monthly changes in important indexes may lead to major policy decisions, and that these indexes are basic tools in much economic analysis, the necessity for having measures of error becomes apparent. Kruskal and Telser (1960) have emphasized the latter point.

### Probability sampling for index numbers

In order to guard against nonmeasurable biases from sampling and estimation, it seems reasonable that some appropriate form of probability sampling should be utilized in the selection of each sample that enters an index design. The selection of a sample of consumers, from which to estimate the base year expenditure weights, and the selection of a sample of cities or regions, from which to obtain current price data, should cause no more difficulty than is encountered in the ordinary large-scale sample survey. The sampling of goods and services does, however, pose an especially difficult problem. Nevertheless, the following are some of the convincing reasons for attempting to use probability sampling methods in the original selection of items: (1) The replicated sample approach can provide an estimate of sampling precision for almost any type of sampling procedure, but it cannot even indicate the existence of bias. The only way to ensure that biases due to sampling and estimation are small or nonexistent is to use appropriate probability sampling methods. (2) A probability model will make clear the manner in which one can obtain two or more independent samples of goods and services. (3) Even the mere attempt to make the sampling of goods and services conform to some appropriate probability model will force one to make definite decisions about problems of definition and estimation that exist no matter how such a sample is chosen but that can too easily be ignored with judgment procedures.

#### Probability sampling of goods and services

Although probability sampling of goods and services has not been the practice in the past, the general format that possible procedures would probably follow can be indicated. Items of expenditure would be divided into major groups, then into subgroups, sub-subgroups, and so on. Ultimately this subdivision process leads to what may be termed specific items, for example, one item might be mattresses for single beds. These specific items could be grouped into strata, using any available information about substitutability, similarity of price movements, and other related variables. The first sampling operation would then consist of selecting one or more specific items out of each stratum.

Drawing a specific item into the sample usually draws an entire cluster of specified-in-detail items into the sample. One or more specified-in-detail items must be chosen from the cluster defined by each of the selected specific items, and this is the second sampling operation to be considered. For example, with the single-bed mattress item, one might specify number of coils; gauge of wire; type of cover; padding material; and so on. The chosen specified-in-detail items are the ones on which price quotations are to be obtained. At this second level of sampling, the problems become much more difficult than at the first level. Complete lists of specified-in-detail items will be difficult, if not impossible, to obtain; some specified-in-detail items may not be purchased by the consumer group to which the index is supposed to refer; and expenditure weights may not be available for many of the items. Possibly anything that one can do at this level (for example, using a restricted list of specified-in-detail items instead of a complete list or assuming equal base year expenditure weights when actual weights are unequal) is going to be only an approximation to what one would like to do; but at least this type of approach can be described accurately, and it should be possible to investigate the effects of some of the approximations that are used. The U.S. Bureau of Labor Statistics has experimented with this approach in connection with a recently completed revision of the U.S. consumer price index, and their experiences will be available as a guide to others in the future. Banerjee (1960) has written on this aspect of the sampling problem.

Probability sampling of outlets. It might also be observed that the probability sampling of outlets, from which to obtain current price reports, is a much more troublesome problem than might appear at first sight. Lists of outlets are difficult to obtain; many different commodities will ordinarily be priced in the same store and this introduces correlation among the price quotations; in addition, the maintenance of a panel of price reporters is complicated by the birth and death of firms.

The production of an index number obviously involves a highly complex network of samples. Even though probability sampling could be used for all components, it would be extremely difficult, or even impossible, to apply on a routine basis ordinary variance estimating procedures.

The difficulties involved in the determination of sampling variability for estimates derived from complex sample surveys are not unique to index number problems. The necessity for obtaining “simple” procedures for the routine estimation of sampling error has long been recognized and has been discussed by many authors under such titles as “interpenetrating samples,” “replicated samples,” “ultimate clusters,” and “random groups” (Deming 1960; Hansen, Hurwitz, & Madow 1953, vol. 1, p. 440; Mahalanobis 1946; Stephan & McCarthy 1958, pp. 226-229). This matter was discussed briefly here in connection with the sampling of goods and services, but a more detailed treatment of the application of the principles of replication to sampling for index numbers has been given by McCarthy (1961).

Philip J. Mccarthy

## BIBLIOGRAPHY

Adelman, Irma 1958 A New Approach to the Construction of Index Numbers. Review of Economics and Statistics 40:240-249.

Banerjee, K. S. 1960 Calculation of Sampling Errors for Index Numbers. Sankhya 22:119-130.

Hansen, Morris H.; Hurwitz, William N.; and Madow, William G. (1953) 1956 Sample Survey Methods and Theory. Vol. 1. New York: Wiley.

Hofsten, Erland Von 1952 Price Indexes and Quality Changes. Stockholm: Bokforlaget Forum.

Hofsten, Erland Von 1959 Price Indexes and Sampling. Sankhya 21:401-403.

Jaffe, Sidney A. 1961 The Consumer Price Index: Technical Questions and Practical Answers. Part 2, pages 603-611 in U.S. Congress, Joint Economic Committee, Hearings: Government Price Statistics. 87th Congress, 1st Session. Washington: Government Printing Office.

King, Willford I. 1930 Index Numbers Elucidated. New York: Longmans.

Kruskal, William H.; and Telser, Lester G. 1960 Food Prices and the Bureau of Labor Statistics. Journal of Business 33:258-279.

Mccarthy, Philip J. 1961 Sampling Considerations in the Construction of Price Indexes With Particular Reference to the United States Consumer Price Index. Part 1, pages 197-232 in U.S. Congress, Joint Economic Committee, Hearings: Government Price Statistics. 87th Congress, 1st Session. Washington: Government Printing Office.

Mahalanobis, P. C. 1946 Recent Experiments in Statistical Sampling in the Indian Statistical Institute. Journal of the Royal Statistical Society Series A 109: 326-378. → Contains eight pages of discussion.

Stephan, Frederick F.; and Mccarthy, Philip J. (1958) 1963 Sampling Opinions: An Analysis of Survey Procedure. New York: Wiley.

Wilkerson, Marvin 1964 Measurement of Sampling Error in the Consumer Price Index: First Results. American Statistical Association, Business and Economics Section, Proceedings [1964]: 220-233.