If each object in some collection is characterized by its size, it is possible to rank-order the objects from the largest to the smallest. One may then view the rank as the horizontal axis of a coordinate system on which the objects are arranged, so that the largest is assigned the horizontal coordinate x = 1, the next largest x = 2, etc., assuming no ties. The vertical coordinate, y, can be viewed as the size of each object. For example, if the objects are cities of the United States rank-ordered according to population (as reported in U.S. Bureau of the Census 1964), New York will have coordinates x = l,y = 7,781,984, Chicago will have coordinates x = 2, y = 3,550,404, San Francisco will have x = 12, y = 740,316, etc. Such data can be represented on a bar graph. If the number of objects is very large, the bar graph can be approximated by a continuous curve through the pairs of coordinates so defined, as in Figure 1. When ties occur the tied observations are given consecutive ranks.
A question investigated at great length by George K. Zipf (1949) concerned the mathematical properties of rank-size curves obtained from collections of many different sorts of objects. In particular, Zipf examined the rank-size curves of cities (by population), biological genera (by number of species), books (by number of pages), and many other collections.
Zipf found that in most instances the rank-size
curves were very nearly segments of rectangular hyperbolas, that is, curves whose equations are of the form xy = constant, or, as expressed in logarithmic coordinates, log x + log y = constant. Therefore, when such rank-size curves are plotted on log-log paper they are very nearly straight lines with slopes close to —1. At least, this was the case with the collections that Zipf singled out for attention. The relation rank x size = constant is sometimes referred to as Zipfs law.
The principle of least effort. Zipf attempted to derive his law from theoretical considerations, which he summarized in the so-called principle of least effort. The connection between this principle and the rank-size law is by no means clear, and Zipfs theoretical arguments now have at most only historical interest. However, his work attracted wide attention and spurred investigations more rigorous and theoretically more suggestive than his own.
The following discussion will concern Zipf’s law in its statistical-linguistic context. Let the collection be the words used in some large verbal output, say in a book or in a number of issues of a newspaper. Such a collection is called a “corpus.” Let the size assigned to each word be the number of times the word appears in the corpus. Then in most cases the “largest” (that is, the most frequently occurring) word will be the, the next “largest” will be and, etc. (Thorndike& 1944). By the nature of the ordering, the curve will be monotonically decreasing (a J curve)—at first steeply falling, then gradually flattening out—for by the time the low frequency words are reached, there will be many having the same (small) number of occurrences. Their ranks, however, will keep increasing, because in a rank-size graph objects having the same size are nevertheless assigned consecutive ranks. Therefore, larger and larger blocks of the bar graph will have the same height, which means that the continuous curve through the bar graph will become increasingly flatter. The rectangular hyperbola, being asymptotic to the horizontal axis, also has this property, which in part accounts for the good agreement between the hyperbola and the rank-size graph.
It is known that the most frequently occurring words in any language are usually the shorter ones. In English, for example, these’ are articles, prepositions, and conjunctions. If the use of a word represents effort by the speaker, and if the speaker tries to minimize effort, he can be expected to use the shortest words most frequently. But there are comparatively few short words, because there are comparatively few combinations to be formed from few letters (or phonemes). Consequently, the number of different high-ranking words (those with low rank numbers) will be small. By a similar argument, the words well along in the ranking are, on the average, the longer ones, and so their numbers will be large; thus, many will have equal frequencies. It is, in fact, observed that the largest number of different words in a typical corpus of some thousands of words are those used only once.
In developing his argument from the principle of least effort in the context of language statistics, Zipf postulated opposing tendencies on the part of the speaker and on the part of the hearer. From the point of view of the speaker, so his argument goes, the language most economical of effort would be one with very few words, each word having many meanings. From the point of view of the hearer, on the contrary, the ideal language would be one in which each word has a unique meaning, since in that case the labor of matching meaning with context would be saved. A balance is struck between the effort-economizing tendency of the speaker and that of the hearer by a certain distribution of ranges of meaning associated with the distribution of frequencies with which words are used.
Whatever merit Zipf’s principle of least effort has in the context of language statistics, his arguments to the effect that the same principle is responsible for the rank-size distributions of the great variety of collections from cities (ranked by populations) to applicants for marriage licenses (ranked by distances between the homes of the bride and the groom) seem of questionable relevance. It is not clear how the principle of least effort operates in each instance to produce the observed rank-size curve. Similar distributions may be traceable to mathematically isomorphic processes, but Zipf, in his principle of least effort, emphasizes not the possible mathematical genesis of his law but its alleged origins in the nature of human behavior.
That this can be misleading can be seen in a hypothetical case of a scholar who, having noted that the weights of beans, of rabbits, and of people are normally distributed, concludes that the normal distribution is a manifestation of a “life force” (because beans, rabbits, and people are biological objects) and seeks the manifestation of this “force” in all instances where the normal distribution is observed. The normal distribution may arise, and presumably often does, from a certain kind of interplay of chance events (roughly, addition of many nearly independent and not wildly dissimilar random variables). Thus, insofar as the normal distribution arises in many contexts, its genesis may be the common statistical structure of the contexts, not their contents—that is, not whether they involve beans or people, animate or inanimate objects. [SeeProbability, article onFormal Probability, for a discussion of how the normal distribution arises from the interplay of chance events; this is called the central limit theorem.]
Explanations of Zipf’s law and of rank-size relations in general, like the explanations of the normal distribution discussed above, are to be sought in the statistical structure of the events that might generate these relations instead of in the nature of the objects to which the relations apply.
Consider a frequency density in which the horizontal axis represents nonnegative size while the vertical axis represents the relative frequency with which objects of a given size are encountered in some large collection. Call this frequency density f(x), so that where N is the number of objects in the collection. Consider now which is the number of objects having sizes greater than x. But this is exactly the rank (according to size) of the object whose size is x, assuming that the ranks of objects of equal size have been assigned arbitrarily among them. Therefore the rank-size curve is essentially the integral of the size-frequency curve. Any mathematical theory that applies to the one will apply to the other after the transformation just described has been performed. In particular, Zipf’s law holds (G(x) = K/x) if and only if the frequency density, f(x), is of form K/x2, since .
Investigators seeking a statistical rationale for Zipf’s rank-size curves studied the associated frequency density curves. Probabilistic models underlying some common size-frequency distributions are well known. For example, if there is a collection of objects characterized by sizes, and the size of each object is considered as the sum of many random (at least approximately independent) variables, none of which has a dispersion dominating the dispersions of others, then the resulting distribution of sizes will be approximately a normal distribution. This probabilistic process adequately accounts for the normal distributions so frequently observed in nature. On the other hand, suppose that each object suffers repeated increments or decrements that are proportional to the sizes of the objects on which they impinge. Then the ultimate equilibrium distribution will be a so-called logarithmic normal one, that is, a frequency distribution of a random variable whose logarithm is normally distributed. Such frequency distributions are also commonly observed [seeDistributions, Statistical, article onSpecial Continuous Distributions].
The problem of finding a statistical rationale for Zipf’s law, therefore, is that of finding a probabilistic process that would result in an equilibrium distribution identical with the derivative of Zipfs rank-size curve—that is, one where the frequency would be inversely proportional to the square of the size.
Herbert Simon (1955) proposed such a model for the size-frequency distribution of verbal outputs. The essential assumption of the model is that as the corpus is being created (in speaking or writing) the probability of a particular word being added to the already existing list is proportional to the total number of occurrences of words in that frequency class and, moreover, that there is also a nonnegative probability that a new word will be added to the list. In some variants of Simon’s model, the latter probability is a constant; in others it decreases as the corpus grows in size (to reflect the depletion of the vocabulary of the speaker or writer). The resulting equilibrium frequency distribution coincides with Zipf’s, at least in the high-frequency range.
Moreover, it appears that similar models are plausible rationales for many other distributions. Essentially it is assumed that the increments impinging on objects are proportional to the sizes of the objects and also that new objects are added to the population according to a certain probability law. The first assumption leads to a logarithmic normal distribution. Combined with the second assumption (the “birth process“) it leads to so-called Yule distributions, which greatly resemble Zipf’s.
If the derivation indicated by Simon is accepted, the principle of least effort becomes entirely superfluous, for clearly it is the probabilistic structure of events rather than their content that explains the frequency density and hence the rank-size distributions that Zipf considered to be prima facie evidence for the principle of least effort.
The principle of least effort was not entirely abandoned by those who sought rationales for Zipf’s law, at least for its manifestation in language statistics. Benoit Mandelbrot (1953), for example, restated the principle of least effort as follows. Assume, as Zipf did, that there is an effort or cost associated with each word. Then if the speaker is to economize effort, clearly he should select the cheapest word and speak only that word. However, discourse of this sort would not convey any information, since if the same word is spoken on all occasions, the hearers know in advance what is going to be said and get no new information from the message. The problem is, according to Mandelbrot, not to minimize effort (or cost, as he calls it) unconditionally but rather to minimize it under the constraint that a certain average of information per word must be conveyed. Equivalently, the problem is to maximize the information per word to be conveyed under the constraint that a certain average cost of a word is fixed.
Here Mandelbrot was able to utilize the precise definition of the amount of information conveyed by a message, as formulated in the mathematical theory of communication (Shannon & Weaver 1949). Having cast the problem into mathematical form, Mandelbrot was able to derive Zipfs ranksize curve as a consequence of an assumption related to the principle of least effort, namely, the minimization of cost, given the amount of information to be conveyed [seeInformation Theory].
Actually, Mandelbrot’s derived formula was more general than Zipf’s. While Zipf used rank x size = constant, Mandelbrot obtained from his model the formula
pr = P(r + m)-B,
in which Pr, being the frequency of occurrence, represents size, r is rank, and P, m, and B are constants. If B = 1 and m = 0, Mandelbrot’s formula reduces to Zipf’s rank-size law.
As would be expected, the generalized formula fits most rank-size verbal-output curves better than Zipfs; in addition, it is derived rigorously from plausible assumptions.
Zipf himself suggested a generalization of his rank-size law, namely,
With q as an extra free parameter, clearly more of the observed rank-size curves could be fitted than without it. Of greater importance to a rank-size theory is a rationale for introducing the exponent q. However, Zipf’s arguments on this score are as vague as those related to the originally postulated law.
Frank A. Haight (1966) pointed out the dependence of the observed size-frequency relationship on the way the data are grouped. Suppose, for example, the size-frequency relationship of cities is examined. Clearly, in order to obtain several cities in each population class the populations must be rounded off—say, to the nearest thousand or ten thousand. Haight has shown that if Z is the number of digits rounded off in the grouping and if Zipf’s generalized rank-size law, given above, holds for cities, then the size-frequency distribution will be given by
where [x] is the integral part of x. Here pn (Z) is the fraction of cities with population near n (rounded off by Z digits).
As Z becomes large, this distribution tends to the zeta distribution, namely,
pn = (2n - l)-1/q - (2n + l)-1/q.
The zeta distribution gives a fairly good fit to the populations of the world metropolitan areas rounded off to the nearest million. (There are 141 such areas with populations close to one million, 46 with populations close to two million, etc.) The number of accredited colleges and universities in the United States with student populations rounded off to the nearest thousand is also well fitted by the zeta distribution.
Robert H. MacArthur (1957) treated the problem of the relative abundance of species in a natural organic population by means of a model isomorphic to a random distribution of n — 1 points on a line segment. The distances between the points then represent the “sizes of biological niches” available to the several species and therefore the abundance of the species. Thus the size of the rth rarest species among n species turns out to be proportional to
It appears, then, that the original rank-size law proposed by Zipf is only one of many equally plausible rank-size laws. Clearly, if objects can be arranged according to size, beginning with the largest, some monotonically decreasing curve will describe the data. The fact that many of these curves are fairly well approximated by hyperbolas proves nothing, since an infinitely large number of curves resemble hyperbolas sufficiently closely to be identified as hyperbolas. No theoretical conclusion can be drawn from the fact that many J curves look alike. Theoretical conclusions can be drawn only if a rationale can be proposed that implies that the curves must belong to a certain class. The content of the rationales becomes, then, the content-bound theory. Specifically, the constants contained in the proposed mathematical model can receive a content interpretation.
For example, in Mandelbrot’s model 1/B is interpreted as the “temperature” of the verbal output. Taken out of context the “temperature of a language sample” seems like an absurd notion. But the term is understandable in the meaning of the exact mathematical analogue of temperature, as the concept is derived in statistical thermodynamics. In this way, formal structural connections are established between widely different phenomena, which on a priori grounds would hardly have been suspected to be in any way related.
Such discoveries are quite common. A strict mathematical analogy was found between the distribution of the number of bombs falling on districts of equal areas in London during the World War II bombing and the distribution of the numbers of particles emitted per unit time by a radioactive substance. The unifying principle is to be found neither by examining bombs nor by examining radioactive substances but rather by inquiring into the probabilistic structure of the events in question.
Having noted that the rank-size relation is simply another way of viewing the size-frequency relation, it can be seen that all the studies of the latter are relevant to the former. Lewis F. Richardson (1960) gathered extensive data on the incidence of “deadly quarrels“—that is, wars, riots, and other encounters resulting in fatalities [see the biography of Richardson]. Designating the size of a deadly quarrel by the logarithm of the number of dead, he studied the associated size-frequency relation, seeking to derive a law of “organization for aggression.” He believed he had found evidence for such a law in the circumstance that the size-frequency relation governing Manchurian bandit raids was very similar to the one governing Chicago gangs in the prohibition era. Here one might also interpose the objection that the similarity may have nothing to do with aggression as such, being simply a reflection of the probabilities governing the formation, growth, and dissolution of human groups. Curiously, a comparison of the distribution of sizes of casual groups such as people gathered around swimming pools (Coleman & James 1961) turned out to be different from that of gangs. Moreover, the former type is derived from a stochastic process in which single individuals can join or leave a group, while the latter derive from a process in which no individual can leave unless the whole group disintegrates (Horvath & Foster 1963). These results are intriguing, because the difficulty with which an individual may leave a gang is well known, and so Richardson’s conjecture may have been not without foundation.
It appears, therefore, that the search for the stochastic processes underlying observed rank-size or size-frequency relations can result in important theoretical contributions.
Coleman, James S.; and James, John 1961 The Equilibrium Size Distribution of Freely-forming Groups. Sociometry 24:36—45.
Haight, Frank A. 1966 Some Statistical Problems in Connection With Word Association Data. Journal of Mathematical Psychology 3:217-233.
Horvath, William J.; and Foster, Caxton C. 1963 Stochastic Models of War Alliances. General Systems 8:77-81.
Macarthur, Robert H. 1957 On the Relative Abundance of Bird Species. National Academy of Sciences, Proceedings 43:293-295.
Mandelbrot, BenoÎt 1953 An Informational Theory of the Statistical Structure of Language. Pages 486-502 in Willis Jackson (editor), Communication Theory. New York: Academic Press; London: Butterworth. → Contains three pages of discussion of Mandelbrot’s article.
Pielou, E. C; and Arnason, A. Neil 1966 Correction to One of MacArthur’s Species-abundance Formulas. Science 151:592 only. → Refers to MacArthur 1957.
Rapoport, Anatol 1957 Comment: The Stochastic and the “Teleological” Rationales of Certain Distributions and the So-called Principle of Least Effort. Behavioral Science 2:147-161.
Richardson, Lewis F. 1960 Statistics of Deadly Quarrels. Pittsburgh: Boxwood.
Shannon, Claude E.; and Weaver, Warren (1948–1949) 1959 The Mathematical Theory of Communication. Urbana: Univ. of Illinois.
Simon, Herbert A. 1955 On a Class of Skew Distribution Functions. Biometrika 42:426-439. → Reprinted in Simon’s Models of Man.
Thorndike, Edward L.; and Lorge, Irving 1944 The Teacher’s Word Book of 30,000 Words. New York: Columbia Univ., Teachers College.
U.S. Bureau Of The Census 1964 Population and Land Area of Urbanized Areas: 1960 and 1950. Table 22 in U.S. Bureau of the Census, Census of Population: 1960. Volume 1: Characteristics of the Population. Part 1. Washington: Government Printing Office.
Zipf, George K. 1949 Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. Reading, Mass.: Addison-Wesley.