Time Series Analysis
TIME SERIES ANALYSIS
Longitudinal data are used commonly in sociology, and over the years sociologists have imported a wide variety of statistical procedures from other disciplines to analyze such data. Examples include survival analysis (Cox and Oakes 1984), dynamic modeling (Harvey 1990), and techniques for pooled cross-sectional and time series data (Hsiao 1986). Typically, these procedures are used to represent the causal mechanisms by which one or more outcomes are produced; a stochastic model is provided that is presumed to extract the essential means by which changes in some variables bring about changes in others (Berk 1988).
The techniques called time series analysis have somewhat different intellectual roots. Rather than try to represent explicit causal mechanisms, the goal in classical time series analysis is "simply" to describe some longitudinal stochastic processes in summary form. That description may be used to inform existing theory or inductively extract new theoretical notions, but classical time series analysis does not begin with a fully articulated causal model.
However, more recent developments in time series analysis and in the analysis of longitudinal data more generally have produced a growing convergence in which the descriptive power of time series analysis has been incorporated into causal modeling and the capacity to represent certain kinds of causal mechanisms has been introduced into time series analysis (see, for example, Harvey 1990). It may be fair to say that differences between time series analysis and the causal modeling of longitudinal data are now matters of degree.
CLASSICAL TIME SERIES ANALYSIS
Classical time series analysis was developed to describe variability over time for a single unit of observation (Box and Jenkins 1976, chaps. 3 and 4). The single unit could be a person, a household, a city, a business, a market, or another entity. A popular example in sociology is the crime rate over time in a particular jurisdiction (e.g., Loftin and McDowall 1982; Chamlin 1988; Kessler and Duncan 1996). Other examples include longitudinal data on public opinion, unemployment rates, and infant mortality.
Formal Foundations. The mathematical foundations of classical time series analysis are found in difference equations. An equation "relating the values of a function y and one or more of its differences Δy, Δ2y . . . for each x-value of some set of numbers S (for which each of these functions is defined) is called a difference equation over the set S" (Δy=yt−yt−1, Δ2=Δ(yt−yt−1) = yt−2yt−1−yt−2, and so on) (Goldberg 1958, p. 50). The x-values specify the numbers for which the relationship holds (i.e., the domain). That is, the relationships may be true for only some values of x. In practice, the x-values are taken to be a set of successive integers that in effect indicate when a measure is taken. Then, requiring that all difference operations Δ be taken with an interval equal to 1 (Goldberg 1958, p. 52), one gets the following kinds of results (with t replacing x): Δ2yt+kyt=2k+ 7, which can be rewritten yt−2yt−1+(1−k)yt−2= 2k+7.
Difference equations are deterministic. In practice, the social world is taken to be stochastic. Therefore, to use difference equations in time series analysis, a disturbance term is added, much as is done in conventional regression models.
ARIMA Models. Getting from stochastic difference equations to time series analysis requires that an observed time series be conceptualized as a product of an underlying substantive process. In particular, an observed time series is conceptualized as a "realization" of an underlying process that is assumed to be reasonably well described by an unknown stochastic difference equation (Chatfield 1996, pp. 27–28). In other words, the realization is treated as if it were a simple random sample from the distribution of all possible realizations the underlying process might produce. This is a weighty substantive assumption that cannot be made casually or as a matter of convenience. For example, if the time series is the number of lynchings by year in a southern state between 1880 and 1930, how much sense does it make to talk about observed data as a representative realization of an underlying historical process that could have produced a very large number of such realizations? Many time series are alternatively conceptualized as a population; what one sees is all there is (e.g., Freedman and Lane 1983). Then the relevance of time series analysis becomes unclear, although many of the descriptive tools can be salvaged.
If one can live with the underlying world assumed, the statistical tools time series analysis provides can be used to make inferences about which stochastic difference equation is most consistent with the data and what the values of the coefficients are likely to be. This is, of course, not much different from what is done in conventional regression analysis.
For the tools to work properly, however, one must at least assume "weak stationarity." Drawing from Gottman's didactic discussion (1981, pp. 60– 66), imagine that a very large number of realizations were actually observed and then displayed in a large two-way table with one time period in each column and one realization in each row. Weak stationarity requires that if one computed the mean for each time period (i.e., for each column), those means would be effectively the same (and identical asymptotically). Similarly, if one computed the variance for each time period (i.e., by column), those variances would be effectively the same (and identical asymptotically). That is, the process is characterized in part by a finite mean and variance that do not change over time.
Weak stationarity also requires that the covariance of the process between periods be independent of time as well. That is, for any given lag in time (e.g., one period, two periods, or three periods), if one computed all possible covariances between columns in the table, those covariances would be effectively the same (and identical asymptotically). For example, at a lag of 2, one would compute covariances between column 1 and column 3, column 2 and column 4, column 3 and column 5, and so on. Those covariances would all be effectively the same. In summary, weak stationarity requires that the variance-covariance matrix across realizations be invariant with respect to the displacement of time. Strong stationarity implies that the joint distribution (more generally) is invariant with respect to the displacement of time. When each time period's observations are normally distributed, weak and strong stationarity are the same. In either case, history is effectively assumed to repeat itself.
Many statistical models that are consistent with weak stationarity have been used to analyze time series data. Probably the most widely applied (and the model on which this article will focus) is associated with the work of Box and Jenkins (1976). Their most basic ARIMA (autoregressive-integrated moving-average) model has three parts: (1) an autoregressive component, (2) a moving average component, and (3) a differencing component.
Consider first the autoregressive component and yt as the variable of interest. An autoregressive component of order p can be written as yt− Φ1yt−1−∙∙∙−Φpyt−p.
Alternatively, the autoregressive component of order p (AR[p]) can be written in the form Φ (B) yy, where B is the backward shift operator—that is, (B)yt=yt−1, (B2)yt=yt−2 and so on—and ϕ(B)= 1−ϕ1B−∙∙∙−ϕ pbp . For example, an autoregressive model of order 2 is yt−ϕ1yt−1−ϕ2yt−2.
A moving-average component of order q, in contrast, can be written as εt−θ1εt−1−∙∙ ∙−θqεt−q. The variable εt is taken to be "white noise," sometimes called the "innovations process," which is much like the disturbance term in regression models. It is assumed that εt is not correlated with itself and has a mean (expected value) of zero and a constant variance. It sometimes is assumed to be Gaussian as well.
The moving-average component of order q (MA[q]) also can be written in the form Θ(B)εt, where B is a backward shift operator and Θ(B) = 1 −Θ1B−∙∙∙−ΘqBq. For example, a moving-average model of order 2 is εt−Θ1εt−1−Θ2εt−2.
Finally, the differencing component can be written as Δdytwhere the d is the number differences taken (or the degree of differencing). Differencing (see "Formal Foundations," above) is a method to remove nonstationarity in a time series mean so that weak stationarity is achieved. It is common to see ARIMA models written in general form as Θ(B)Δdyt=Θ(B)εt.
A seasonal set of components also can be included. The set is structured in exactly the same way but uses a seasonal time reference. That is, instead of time intervals of one time period, seasonal models use time intervals such as quarters. The seasonal component usually is included multiplicatively (Box and Jenkins 1976, chap. 9; Granger and Newbold 1986, pp. 101–114; Chatfield 1996 pp. 60–61), but a discussion here is precluded by space limitations.
For many sets of longitudinal data, nonstationarity is not merely a nuisance to be removed but a finding to be highlighted. The fact that time series analysis requires stationarity does not mean that nonstationary processes are sociologically uninteresting, and it will be shown shortly that time series procedures can be combined with techniques such multiple regression when nonstationarity is an important part of the story.
ARIMA Models in Practice. In practice, one rarely knows which ARIMA model is appropriate for the data. That is, one does not know what orders the autoregressive and moving-average components should be or what degree of differencing is required to achieve stationarity. The values of the coefficients for these models typically are unknown as well. At least three diagnostic procedures are commonly used: time series plots, the autocorrelation function, and the partial autocorrelation function.
A time series plot is simply a graph of the variable to be analyzed arrayed over time. It is always important to study time series plots carefully to get an initial sense of the data: time trends, cyclical patterns, dramatic irregularities, and outliers.
The autocorrelation function and the partial autocorrelation function of the time series are used to help specify which ARIMA model should be applied to the data (Chatfield 1996, chap. 4). The rules of thumb typically employed will be summarized after a brief illustration.
Figure 1 shows a time series plot of the simulated unemployment rate for a small city. The vertical axis is the unemployment rate, and the horizontal axis is time in quarters. There appear to be rather dramatic cycles in the data, but on closer inspection, they do not fit neatly into any simple story. For example, the cycles are not two or four periods in length (which would correspond to six-month or twelve-month cycles).
Figure 2 shows a plot of the autocorrelation function (ACF) of the simulated data with horizontal bands for the 95 percent confidence interval. Basically, the autocorrelation function produces a series of serial Pearson correlations for the given time series at different lags: 0, 1, 2, 3, and so on (Box and Jenkins 1976, pp. 23–36). If the series is stationary with respect to the mean, the autocorrelations should decline rapidly. If they do not, one may difference the series one or more times until the autocorrelations do decline rapidly.
For some kinds of mean nonstationarity, differencing will not solve the problem (e.g., if the nonstationarity has an exponential form). It is also important to note that mean nonstationarity may be seen in the data as differences in level for different parts of the time series, differences in slope for different parts of the data, or even some other pattern.
In Figure 2, the autocorrelation for lag 0 is 1.0, as it should be (correlating something with itself). Thus, there are three spikes outside of the 95 percent confidence interval at lags 1, 2, and 3. Clearly, the correlations decline gradually but rather rapidly so that one may reasonably conclude that the series is already mean stationary. The gradual decline also usually is taken as a sign autoregressive processes are operating, perhaps in combination with moving-average processes and perhaps not. There also seems to be a cyclical pattern, that is consistent with the patterns in Figure 1 and usually is taken as a sign that the autoregressive process has an order of more than 1.
Figure 3 shows the partial autocorrelation function. The partial autocorrelation is similar to the usual partial correlation, except that what is being held constant is values of the times series at lags shorter than the lag of interest. For example, the partial autocorrelation at a lag of 4 holds constant the time series values at lags of 1, 2, and 3.
From Figure 3, it is clear that there are large spikes at lags of 1 and 2. This usually is taken to mean that the p for the autoregressive component is equal to 2. That is, an AR component is necessary. In addition, the abrupt decline (rather than a rapid but gradual decline) after a lag of 2 (in this case) usually is interpreted as a sign that there is no moving-average component.
The parameters for an AR model were estimated using maximum likelihood procedures. The first AR parameter estimate was 0.33, and the second was estimate -0.35. Both had t-values well in excess of conventional levels. These results are consistent with the cyclical patterns seen in Figure 1; a positive value for the first AR parameter and a negative value for the second produced the apparent cyclical patterns.
How well does the model fit? Figures 4 and 5 show, respectively, the autocorrelation function and the partial autocorrelation function for the residuals of the original time series (much like residuals in conventional regression analysis). There are no spikes outside the 95 percent confidence interval, indicating that the residuals are probably white noise. That is, the temporal dependence in the data has been removed. One therefore can conclude that the data are consistent with an underlying autoregressive process of order 2, with coefficients of 0.33 and -0.35. The relevance of this information will be addressed shortly.
To summarize, the diagnostics have suggested that this ARIMA model need not include any differences or a moving-average component but should include an autoregressive component of order 2. More generally, the following diagnostic rules of thumb usually are employed, often in the order shown.
- If the autocorrelation function does not decline rather rapidly, difference the series one or more times (perhaps up to three) until it does
- If either before or after differencing the autocorrelation function declines very abruptly, a moving-average component probably is needed. The lag of the last large spike outside the confidence interval provides a good guess for the value of q. If the autocorrelation function declines rapidly but gradually, an autoregressive component probably is needed.
- If the partial autocorrelation function declines very abruptly, an autoregressive component probably is needed. The lag of the last large spike outside the confidence interval provides a good guess for the value of p. If the partial autocorrelation function declines rapidly but gradually, a moving-average component probably is needed.
- Estimate the model's coefficients and compute the residuals of the model. Use the rules above to examine the residuals. If there are no systematic patterns in the residuals, conclude that the model is consistent with the data. If there are systematic patterns in the residuals, respecify the model and try again. Repeat until the residuals are consistent with a white noise process (i.e., no temporal dependence).
Several additional diagnostic procedures are available, but because of space limitations, they cannot be discussed here. For an elementary discussion, see Gottman (1981), and for a more advanced discussion, see Granger and Newbold (1986).
It should be clear that the diagnostic process is heavily dependent on a number of judgment calls about which researchers could well disagree. Fortunately, such disagreements rarely matter. First, the disagreements may revolve around differences between models without any substantive import. There may be, for instance, no substantive consequences from reporting an MA compared with an MA. Second, ARIMA models often are used primarily to remove "nuisance" patterns in time series data (discussed below), in which case the particular model used is unimportant; it is the result that matters. Finally and more technically, if certain assumptions are met, it is often possible to represent a low-order moving-average model as a high-order autoregressive model and a low-order autoregressive model as a high-order moving-average model. Then model specification depends solely on the criteria of parsimony. That is, models with a smaller number of parameters are preferred to models with a larger number of parameters. However, this is an aesthetic yardstick that may have nothing to with the substantive story of interest.
USES OF ARIMA MODELS IN SOCIOLOGY
It should be clear that ARIMA models are not especially rich from a substantive point of view. They are essentially univariate descriptive devices that do not lend themselves readily to sociological problems. However, ARIMA models rarely are used merely as descriptive devices (see, however, Gottman 1981). In other social science disciplines, especially economics, ARIMA models often are used for forecasting (Granger and Newbold 1986). Klepinger and Weiss (1985) provide a rare sociological example.
More relevant for sociology is the fact that ARIMA models sometimes are used to remove "nuisance" temporal dependence that may be obstructing the proper study of "important" temporal dependence. In the simplest case, ARIMA models can be appended to regression models to adjust for serially correlated residuals ( Judge et al. 1985, chap. 8). In other words, the regression model captures the nonstationary substantive story of interest, and the time series model is used to "mop up." Probably more interesting is the extension of ARIMA models to include one or more binary explanatory variables or one or more additional time series. Nonstationarity is now built into the time series model rather than differenced away.
Intervention Analysis. When the goal is to explore how a time series changes after the occurrence of a discrete event, the research design is called an interrupted time series (Cook and Campbell 1979). The relevant statistical procedures are called "intervention analysis" (Box and Tiao 1975). Basically, one adds a discrete "transfer function" to the ARIMA model to capture how the discrete event (or events) affects the time series. Transfer functions take the general form shown in equation (1):
If both sides of equation (1) are divided by the left-hand side polynomial, the ratio of the two polynomials in B on the right-hand side is called a transfer function. In the form shown in equation (1), r is the order of the polynomial for the "dependent variable" (yt), s is the order of the polynomial for the discrete "independent variable" (xt), and b is the lag between when the independent "switches" from 0 to 1 and when its impact is observed. For example, if r equals 1, s equals 0, and b equals 0, the transfer function becomes ω0/1-δ1. Transfer functions can represent a large number of effects, depending on the orders of the two polynomials and on whether the discrete event is coded as an impulse or a step. (In the impulse form, the independent variable is coded over time as 0,0, . . . 0,1,0,0, . . . ,0. In the step form, the independent variable is coded over time as 0,0, . . . 1,1 . . . 1. The zeros represent the absence of the intervention, while the ones represent the presence of the intervention. That is, there is a switch from 0 to 1 when the intervention is turned on and a switch from 1 to 0 when the intervention is turned off.) A selection of effects represented by transfer functions is shown in Figure 6.
In practice, one may proceed by using the time series data before the intervention to determine the model specification for the ARIMA component, much as was discussed above. The specification for the transfer function in the discrete case is more ad hoc. Theory certainly helps, but one approach is to regress the time series on the binary intervention variable at a moderate number of lags (e.g., simultaneously for lags of 0 periods to 10 periods). The regression coefficients associated with each of the lagged values of the intervention will roughly trace out the shape of the time path of the response. From this, a very small number of plausible transfer functions can be selected for testing.
In a sociological example, Loftin et al. (1983) estimated the impact of Michigan's Felony Firearm Statute on violent crime. The law imposed a two-year mandatory add-on sentence for defendants convicted of possession of a firearm during the commission of a felony. Several different crime time series (e.g., the number of homicides per month) were explored under the hypothesis that the crime rates for offenses involving guns would drop after the law was implemented. ARIMA models were employed, coupled with a variety of transfer functions. Overall, the intervention apparently had no impact.
Multiple Time Series. ARIMA models also may be extended to more than one time series (Chatfield 1996, Chap. 10) Just as the goal for the univariate case was to find a model that transformed the single time series into white noise, the goal for the multivariate case is to find a model that will transform a vector of time series into a white noise vector. In effect, each series is regressed simultaneously not only on lagged functions of itself and the disturbance term but on functions of all other time series and their disturbance terms. In practice, this sometimes reduces to building transfer function models that include a single response time series and several input time series, much as in multiple regression. For example, Berk et al. (1980) explored how water consumption varied over time with the marginal price of water, weather, and a number of water conservation programs.
The mathematical generalization from the univariate case is rather straightforward. The generalization of model specification techniques and estimation procedures is not. Moreover, multivariate time series models have not made significant inroads into sociological work and therefore are beyond the scope of this chapter. Interested readers should consult Chatfield (1996) for an introduction or Granger and Newbold's (1986) for a more advanced treatment.
Time series analysis is an active enterprise in economics, statistics, and operations research. Examples of technical developments and applications can be found routinely in a large number of journals (e.g., Journal of the American Statistical Association, Journal of Business and Economic Statistics, Journal of Forecasting). However, time series analysis has not been especially visible in sociology. Part of the explanation is the relative scarcity of true time series for sociological variables collected over a sufficiently long period. Another part is that time series analysis is unabashedly inductive, often making little use of substantive theory; time series analysis may look to some a lot like "mindless empiricism." However, in many sociological fields, true time series data are becoming increasingly available. Under the banner of "data analysis" and "exploratory research," induction is becoming more legitimate. Time series analysis may well have a future in sociology.
Berk R. A. 1988 "Causal Inference for Sociological Data." In N. Smelser, ed., The Handbook of Sociology. Newbury Park, Calif.: Sage.
——, T. F. Cooley, C. J. LaCivita, S. Parker, K. Sredl, and M. Brewer 1980 "Reducing Consumption in Periods of Acute Scarcity: The Case of Water." Social Science Research 9:99–120.
——, and G. C. Tiao 1975 "Intervention Analysis with Applications to Economic and Environmental Problems." Journal of the American Statistical Association 70:70–79.
Chamlin, Mitchel B. 1988 "Crime and Arrests: An Autoregressive Integrated Moving Average (ARIMA) Approach." Journal of Quantitative Criminology 4(3):247–258.
Chatfield, C. 1996 The Analysis of Time Series: An Introduction, 5th ed. New York: Chapman and Hall.
Cook, T. D., and D. T. Campbell 1979 Quasiexperimentation. Chicago: Rand McNally.
Cox, D. R., and D. Oakes 1984 Analysis of Survival Data: London: Chapman and Hall.
Freedman, D. A., and David Lane 1983 "Significance Testing in a Nonstochastic Setting." In P. J. Bickel, K. A. Doksum, and J. L. Hodges, Jr., eds., A Festschrift for Erich L. Lehman. Belmont, Calif.: Wadsworth International Group.
Goldberg, Samuel 1958 Introduction to Difference Equations. New York: Wiley.
Gottman, J. M. 1981 Time-Series Analysis. Cambridge, UK: Cambridge University Press.
Granger, C. W. J., and P. Newbold 1986 Forecasting Economic Time Series. Orlando, Fla.: Academic Press.
Harvey, A. C. 1990 The Econometric Analysis of Time Series, 2nd ed. London: Harvester Wheatshear.
Hsiao, Cheng 1986 Analysis of Panel Data. Cambridge, UK: Cambridge University Press.
Judge, D. G., W. E. Griffiths, R. C. Hill, H. Lutkepohl, and T.-C. Lee 1985 The Theory and Practice of Econometrics. New York: Wiley.
Kessler, D. A., and S. Duncan 1966 "The Impact of Community Policing in Four Houston Neighborhoods." Evaluation Review 6(20):627–669.
Klepinger, J. D. H., and J. G. Weis 1985 "Projecting Crime Rates: An Age, Period, and Cohort Model Using ARIMA Techniques." Journal of Quantitative Criminology 1:387–416.
Loftin, C., M. Heumann, and D. McDowall 1983 "Mandatory Sentencing and Firearms Violence: Evaluating an Alternative to Gun Control." Law and Society Review 17(2):287–318.
——, and D. McDowall 1982 "The Police, Crime, and Economic Theory." American Sociological Review 47:393–401.
Richard A. Berk