Markov Chains
Markov Chains
A Markov chain is a chance process having the special property that one can predict its future just as accurately from a knowledge of the present state of affairs as from a knowledge of the present together with the entire past history.
The theory of social mobility illustrates the idea. Considering only eldest sons, note the status of successive generations of a particular male line in a society divided into three classes: upper, middle, and lower. If we assume the movement of a family among the three social classes is a chance process—governed by probabilistic laws rather than deterministic ones—several possibilities present themselves. An independent process represents a perfectly mobile society: the probability that a man is in a particular class depends in no way on the class of his father. At the other extreme is a society in which the probability that a man is in a particular class depends on the class of his father, that of his grandfather, that of his greatgrandfather, etc.
A Markov chain is a process of intermediate complexity: the probability that a man is in a given class may depend on the class of his father, but it does not further depend on the classes of his earlier antecedents. For instance, while upperclass and middleclass fathers may have different probabilities of producing sons of a given class, an upperclass father whose own father was also upper class must have the same probability of producing a son of a given class as does an upperclass father whose father was, say, lower class. (This example will be used to illustrate each concept introduced below.)
No chance process encountered in applications is truly independent—in particular, no society is perfectly mobile. While in the same way no natural process exactly satisfies the Markov chain condition, many of them come close enough to make a Markov chain model useful.
Formal definitions For an exact formulation of the Markov chain concept, introduced in 1907 by the Russian mathematician A. A. Markov, imagine a system (family, society, person, organism) that passes with each unit of time (minute, hour, generation) from one to another of the s states E_{1} E_{2},. .. E_{3} (Upper class, E_{1} middle class, E_{2} and lower class, E_{3}, are the states in the social mobility example.) Assume that a chance process governs the evolution of the system; the chance process is the collection of probability laws describing the way in which the system changes with time. The system is that which undergoes change; a particular analysis may involve many systems of the same kind (many families, or many societies, etc.), all obeying the same process or set of probability laws.
Since the system passes through various states in sequence, time moves in jumps, rather than continuously; hence the integers 1, 2, 3,. .. provide a natural time index. Denote by P(E_{k}ǀE_{i}, E_{j}) the conditional probability that at time n + 2 the system is in state E_{k}, given that at times n and n + 1 it was in states E_{i} and E_{j} in that order; and similarly for longer or shorter conditioning sequences of states. We make the usual assumption that the conditional probabilities just defined do not depend on n (that is, the conditional probabilities do not change as time passes). The process is a Markov chain if P(E_{k}ǀE_{i}, E_{j}) = P(E_{k}ǀE_{j}), P(E_{l}ǀE_{i},E_{j},E_{k}) = P(E_{l},E_{k}), P(E_{m}ǀE_{i},E_{j},E_{k},E_{l}) = P(E_{m}ǀE_{l}), etc.
The transition probabilitie p_{ij} = P(E_{j}ǀE_{i}) determine the fundamental properties of the Markov chain; they form an s by s transition matrixP = (p_{ij}) the basic datum of the process. (A matrix such as P that has only nonnegative entries and has rows summing to 1 is called a stochastic matrix.)
An independent process can be considered a special sort of Markov chain for which p_{ij} = P_{j} (that is, the p_{ij} do not depend on i): the rows of P are all the same in this case. The secondorder transition probabilities provide a second example of information contained in P . If the system is presently in state E_{i}, then the conditional probability that it will pass to E_{j} and then to E_{k} in the next two steps is P(E_{j},ǀE_{i})P(E_{k}ǀE_{i},E_{j}) = P_{ij} P_{jk} hence the conditional probability that the system will occupy E_{k} two time units later is
(Summing over the index j accounts for all the possible intermediate states.) But this secondorder transition probability is just the (j, k) th entry in p ^{2} (the matrix P times itself in the sense of matrix multiplication).
Table 1  

E_{1}  E_{2}  E_{3}  
E_{1} (upper class)  .448  .484  .068 
E_{2} (middle class)  .054  .699  .247 
E_{3} (lower class)  .011  .503  .486 
Social mobility in England and Wales is approximately described by a Markov chain with transition matrix shown in Table 1. (These numbers must, of course, be arrived at empirically. The problem of estimation of transition probabilities is discussed later.) For example, a middleclass man has chance .054 of producing a son (recall that only eldest sons are considered here) who enters the upper class. Now the chance that a man in the upper class has a middleclass grandson is =(.448 x .484) + (.484 x .699) + (.068 x .503)= .589, while the chance that a man in the middle class has a middleclass grandson is = .639.
Notice that these last two probabilities are different, which raises a point often misunderstood. The Markov chain definition prescribes that an upperclass man with upperclass antecedents must have the same chance of producing a middleclass son as an upperclass man with lowerclass antecedents has. But influence—stochastic influence, so to speak—of a man on his grandson, which does exist if is entirely consistent with the definition, which requires only that the grandfather exert this influence exclusively through the intermediate generation (the father).
To describe natural phenomena by Markov chains requires idealization; the social mobility example carried through here makes this obvious. Since Markov chains allow for dependence, however, they can with satisfactory accuracy account for the evolution of diverse social, psychological, biological, and physical systems for which an independent chance process would make too crude a model. On the other hand, Markov chains have simple enough structure to be mathematically tractable.
As another example of an approximate Markov chain, consider sociological panel studies. A potential voter is asked his party preference each month for six months preceding an election; his answer places him in state E_{l} (Republican), E_{2} (Democrat), or E_{3} (undecided). The voter’s progress among these states (approximately) obeys the Markov rule if, in predicting his August preference on the basis of his previous ones, one can without (essential) loss disregard them all except the most recent one, namely that for July (and similarly for the other predictions). In learning theory, Markov models have proved fruitful for describing organisms that change state from trial to trial in response to reinforcement in a learning process. (See Bibliography for applications in such areas as industrial inspection, industrial mobility, sickness and accident statistics, and economics.)
Mathematical analysis The transition matrix P determines many of the characteristics of a Markov chain. We have seen that the (i, j) th entry of P ^{2} is the conditional probability, given that the system is in E_{i}, that it will be in E_{j} two steps later. In the same way, the nth order transition probability the conditional probability that the system will occupy E_{j} after n steps if it is now in E_{i} is the (i,j) th entry in P ^{n}. But P alone does not determine the absolute probability that at time n the system is in state E_{i}. For this we need P and the initial probabilities .
The probability that the system occupies states E_{i} and E_{j} respectively at times n and n +1 is P_{ij} adding over the states possible at time n, we conclude that or, if a ^{(n)} denotes the row vector . More generally, the m th order transition probabilities link the absolute probabilities for time n + m to those for time n: or a ^{(n)}p ^{m} = a ^{(n+m)}. Thus the absolute probabilities are completely determined, via the relation a ^{(n)} = a ^{(1)}P ^{n1}, by the transition probabilities p_{ij} and the initial probabilities In other words, P and a ^{(1)} completely specify the probability laws of the Markov chain.
Chains with all p_{ij} > 0. The most important mathematical results about Markov chains concern the stability of the and the behavior of for large n. Although some of the p_{ij} may be zero, suppose they are all strictly greater than zero. (Later this restriction will be lifted and other chains considered.) One expects that for large n should not depend much on i—the effect of the initial state should wear off. This is indeed true: there exist positive numbers p_{1},. . ., p_{8} such that is close to P_{j} for large . It follows that approaches as n becomes large, so that the absolute probabilities stabilize near the p_{i}, no matter what the initial probabilities are.
The numbers p_{i} have several important mathematical properties, which in turn have probability interpretations. Clearly they sum to one: Σ_{i}p_{i} = 1. Moreover, if n is large, is near Σ_{i}P_{i}p_{ij} while is near p_{j} Since , we have Σ_{i}p_{i}p_{ij} = p_{j}, or, with p = (p_{1},. .. , p_{8},), p P = p . Therefore the p_{i} solve the system
of s + 1 equations in s unknowns. Since it turns out that this system has but one solution, the limits p_{i} can be found by solving it, and this is the method used in practice. The vector p for the social mobility example is (.067, .624, .309). Whatever a man’s class, there is probability near .309 that a descendant a great many generations later is in the lower class.
The quantities p_{j} connect up with the absolute probabilities . If a ^{(n)} = p , then a ^{(n+1)} = a ^{(n)}P = pP = p , and similarly a ^{(m)} = p for all m beyond n. In other words, if the system has at a given time probability exactly p_{j} of being in state E_{j} (for each j) then the same holds ever after. For this reason, the P_{j} are called a set of stationary probabilities;since the solution to (1) is unique, they form the only such set.
Thus a system evolving according to the laws of the Markov chain has longrange probability approximately P_{j} of being in E_{ij}; and if the probability of being in E_{j} at a particular time is exactly p_{j}, this relationship is preserved in the future. The law of large numbers gives a complementary way of interpreting these facts. Imagine a large number of systems evolving independently of one another, each according to the laws of a common Markov chain. No matter what the initial states of the many systems may be, after a long time the numbers of systems in the various states will become approximately proportional to the p_{j} And once this state of affairs is achieved, it will persist, except for fluctuations that are proportionately small if the number of systems involved is large. In the social mobility example, the proportions in the three classes of a large number of family lines will eventually be nearly 067:624:309.
A system setting out from state E_{i} is certain to return again to E_{i} after one or more steps; the expected value of the number of steps until first return turns out to be 1/p_{i}, a result that agrees qualitatively with intuition: the lower the probability of a state the longer the average time between successive passages through it. There exist also formulas for the expected number of steps to reach E_{j} for a system starting in E_{i}. In the example from mobility theory, the mean time to pass from the lower class to the upper class is 26.5 generations, while the mean for the return trip is but 5.6 generations (which should be a lesson to us all).
Other Markov chains. In the analysis sketched above, it was assumed that the p_{ij} were all strictly greater than zero. More generally, the same results hold even if some of the p_{ij} equal zero, provided the Markov chain has the property that the states all communicate (for each i and j there is an n for which > 0) and provided also that there is no periodicity (no integer m exceeding 1 exists such that a passage from E_{i} back to E_{i} is possible in n steps only if n is divisible by m). Such a chain is called ergodic. The results can also be reformulated to cover nonergodic chains.
The theory extends in still other useful ways. (a) A chance process is a Markov chain of second order if the probability distribution of future states depends only on the two most recently visited states: P(E_{l}ǀE_{i},E_{j},E_{k}) =P(E_{i},E_{j},E_{k}), P(E_{m}ǀE_{i},E_{j},E_{k},E_{l}) = P(E_{m}ǀE_{k},E_{l}), etc. These processes contain ordinary Markov chains as a special case and make possible a more precise description of some phenomena. The properties derived above carry over to chains of second (and higher) order, (b) If, for instance, the system is a population and the state is its size, there are then infinitely many possible states. The theory generalizes to cover such examples, but the mathematics becomes more difficult; for example (1) becomes a system of infinitely many equations in infinitely many unknowns, (c) Continuous time, rather than time that goes in jumps, is appropriate for the description of some systems—for example, populations that change state (size) because of births and deaths that can occur at any instant of time. Such processes in principle admit an approximate description within the discretetime theory: one observes the system periodically (every year or minute or microsecond) and ignores the state occupied at other time points. However, since such an analysis is often unnatural, an extensive theory of continuoustime processes has grown up.
Statistical analysis One may believe a given process to be a Markov chain—or approximately a Markov chain—because of his knowledge of the underlying mechanism. Or he may hypothesize that it is a Markov chain, hoping to check this assumption against actual data. Under the Markov assumption, he may want to draw from actual data conclusions about the transition probabilities P_{ij} which are the basic parameters of the model. In other words, statistical problems of estimation and hypothesis testing arise for Markov chains just as they do for independent processes [seeEstimation; Hypothesis Testing].
Suppose one observes n systems (governed by a common Markov chain), of which n_{i} start in state E_{i}, and follows each of them through one transition. If n_{ij}n is the number that step from E_{i} to E_{j} so that Σ_{j}n_{ij} = n_{i} then the loglikelihood function is Σ_{l.j}n_{ij} log p_{ij} ; maximizing this function with the s constraints Σ_{j}p_{ij} = 1 gives the natural ratios n_{ij}/n_{i} as the maximum likelihood estimators of the P_{ij} The numerical matrix given for the social mobility example was obtained in this way from a sample of some 3,500 fatherson pairs in England and Wales. If the n_{i}, are large, the estimators are approximately normally distributed about their mean values p_{ij}. By seeing how far the n_{ij}/n_{i}, differ from putative transition probabilities P_{ij}, one can test the null hypothesis that these pi; are the true parameter values. The statistic appropriate for this problem is Σ_{i.j}(n_{ij} – n_{i}p_{ij})^{2}/n_{i}p_{ij}; it has approximately a chisquare distribution with s (s – 1) degrees of freedom if the n_{i} are large.
Other tests are possible if the systems under observation are traced for more than one step. Suppose n_{ijk} is the number of the systems that start in E_{i}, step to E_{j}, and then step to E_{k}. If the process is a Markov chain, then (a) the chance that the second step carries the system from E_{j} to E_{k} does not depend on the initial state E_{i} and (b) the transition probabilities for the second step are the same as those for the first.
Let a dot indicate an index that has been summed out; for example, n_{.ij} = Σ_{k}n_{kij} is the number of systems that are in states E_{i} and E_{j} at times 2 and 3, respectively (with the state at time 1 completely unspecified). Assuming (a), one can test (b) by comparing the estimators ni}. /n,.. of the transition probabilities for the first step with the estimators n_{ij.}/n_{i}. . of the transition probabilities for the second step. And one can test (a) itself by comparing the ratios n_{ijk}/n_{.i.} (the estimated secondstep transition probabilities) with the ratios n_{ijk}/n_{.jk} (the estimated secondstep transition probabilities, allowing for possible further influence of the initial state); if (a) holds,n_{ijk}/n_{ij.} should, independently of i, be near n_{.jk}/n_{.j.}; thus one can check on the Markov chain assumption itself.
This statistical analysis, based on following a large number of independently evolving systems through a small number of steps, really falls under the classical chisquare and maximum likelihood theory. There is also a statistical theory for the opposite case, following a small number of systems (perhaps just one) through many steps. Although the estimates and tests for this case have forms similar to those for the first case, the derivation of their asymptotic properties is more involved.
The statistical analysis of Markov chains may be extended in various directions. For example, a mode of analysis in which times of stay in the states are considered, together with transition probabilities, is discussed by Weiss and Zelen (1965).
Patrick Billingsley
[Other relevant material may be found in Panelstudies; Social Mobility.]
BIBLIOGRAPHY
Anderson, T. W.; and Goodman, Leo A. 1957 Statistical Inference About Markov Chains. Annals of Mathematical Statistics 28:89110. → A detailed treatment of the problem of many short samples.
Billingsley, Patrick 1961 Statistical Methods in Markov Chains. Annals of Mathematical Statistics 32: 1240. → A review paper covering the problem of one long sample from a Markov chain. Contains a large bibliography.
Blumen, Isadore; Kogan, Marvin; and Mccarthy, Philip 1955 The Industrial Mobility of Labor as a Probability Process. Cornell Studies in Industrial and Labor Relations, Vol. 6. Ithaca, N.Y.: Cornell Univ. Press. → A detailed empirical study of mobility problems.
Feller, William 1950–1966 An Introduction to Probability Theory and Its Applications. 2 vols. New York: Wiley. → A classic text covering continuoustime processes as well as chains with infinitely many states. Excellent examples. The second edition of the first volume was published in 1957.
Glass, D. V.; and Hall, J. R. 1954 Social Mobility in Great Britain: A Study in Intergeneration Changes in Status. In D. V. Glass (editor), Social Mobility in Britain. London: Routledge. → This is the source of the original data for the social mobility example.
Goodman, Leo A. 1961 Statistical Methods for the Mover–Stayer Model. Journal of the American Statistical Association 56:841–868.
Goodman, Leo A. 1962 Statistical Methods for Analyzing Processes of Change. American Journal of Sociology 68:5778. → This and the preceding paper treat modifications of the Markov model for mobility problems.
Jaffe, Joseph; Cassotta, Louis; and Feldstein, Stanley 1964 Markovian Model of Time Patterns of Speech. Science 144:884–886.
Kemeny, John G.; and Snell, J. Laurie 1960 Finite Markov Chains. Princeton, N.J.: Van Nostrand. → Contains both theory and examples. This is the source of the figures for the social mobility example.
Prais, S. J. 1955 Measuring Social Mobility. Journal of the Royal Statistical Society Series A 118:5666. → This is the source of the analysis of the social mobility example.
Weiss, George H.; and Zelen, Marvin 1965 A SemiMarkov Model for Clinical Trials. Journal of Applied Probability 2:269–285.
Cite this article
Pick a style below, and copy the text for your bibliography.

MLA

Chicago

APA
"Markov Chains." International Encyclopedia of the Social Sciences. . Encyclopedia.com. 18 Jan. 2019 <https://www.encyclopedia.com>.
"Markov Chains." International Encyclopedia of the Social Sciences. . Encyclopedia.com. (January 18, 2019). https://www.encyclopedia.com/socialsciences/appliedandsocialsciencesmagazines/markovchains
"Markov Chains." International Encyclopedia of the Social Sciences. . Retrieved January 18, 2019 from Encyclopedia.com: https://www.encyclopedia.com/socialsciences/appliedandsocialsciencesmagazines/markovchains
Citation styles
Encyclopedia.com gives you the ability to cite reference entries and articles according to common styles from the Modern Language Association (MLA), The Chicago Manual of Style, and the American Psychological Association (APA).
Within the “Cite this article” tool, pick a style to see how all available information looks when formatted according to that style. Then, copy and paste the text into your bibliography or works cited list.
Because each style has its own formatting nuances that evolve over time and not all information is available for every reference entry or article, Encyclopedia.com cannot guarantee each citation it generates. Therefore, it’s best to use Encyclopedia.com citations as a starting point before checking the style against your school or publication’s requirements and the mostrecent information available at these sites:
Modern Language Association
The Chicago Manual of Style
http://www.chicagomanualofstyle.org/tools_citationguide.html
American Psychological Association
Notes:
 Most online reference entries and articles do not have page numbers. Therefore, that information is unavailable for most Encyclopedia.com content. However, the date of retrieval is often important. Refer to each style’s convention regarding the best way to format page numbers and retrieval dates.
 In addition to the MLA, Chicago, and APA styles, your school, university, publication, or institution may have its own requirements for citations. Therefore, be sure to refer to those guidelines when editing your bibliography or works cited list.