sequence analysis

views updated

sequence analysis A series of questions about how social processes are ordered, either temporally or spatially, together with the techniques for answering these.

Many areas of sociology are concerned with events or actions in their temporal context—or with what we might call sequence problems. The literatures on careers and the life-course are obvious examples. Sequence analysis seeks to determine the patterning of events (types of job shifts or whatever) in an ordered list or chain. Since there is no assumption of real time (as opposed to symbolic time), it is possible also to examine the successive parts of a ritual, or the order of steps in a manufacturing process (where the ‘time’ involved is in some sense artificial), as well as the sequencing of real-time events such as the changes of status involved in a work history or criminal career. Events in any sequence can be unique or can repeat and may have varying degrees of interdependence. Whole sequences may themselves be interrelated. Sequence can be investigated as an independent or dependent variable; for example, we may wish to know which sequence of job experiences best predicts unemployment, or which prior variables explain sequential steps in an occupational career. Some sequence analysis is interested merely in determining patterns in a series of events as an end in itself—as, for example, in the case of research into the ordering of steps in a dance.

This is a newly developing area in which sociologists are taking their lead mainly from the other social sciences. There is a long tradition of sequence analysis in psychology, in such areas as learning, cognition, and theories of developmental stages. Economists have studied the sequences involved in (among other things) consumption behaviour and the emergence of innovations. Linguists often explore the steps involved in constructing meaningful text. Political science includes sequential studies of (for example) the process of federal budgetary decision-making.

In sociology, a simple conception of sequential analysis occurs in the linear stage theories of modernization, development, rationalization, revolution, and so forth, associated with the names of Karl Marx, Robert Michels, Robert Redfield, and others. More sophisticated are the various career theories, such as those to be found in the literature on work histories, since these allow for more contingency and accident than do stage theories. The most developed forms of sequence analysis permit all sequences to be interdependent in a complex network. Andrew Abbott, one of the leading proponents of sequence analysis in sociology, refers to these as ‘interactional field theories’, and claims that they are rooted in the ‘contextualist paradigm’ developed by the Chicago School between the First World War and the 1930s. Examples would include Harrison White's network analysis of the vacancy chain system in labour markets (Chains of Opportunity. 1970) and Abbott's own study of the careers of German musicians during the seventeenth and eighteenth centuries (‘Measuring Resemblance in Sequence Data’, American Journal of Sociology, 1990
).

Techniques for coding sequences, together with the associated computer software for analysing these, tend to be borrowed and adapted from existing applications in biology, cognitive psychology, and related fields. There are many such programs available, and since developments in this area are being funded mainly by biotechnology money, advances tend to be rapid. One such method, much used and refined by Abbott himself, is that of so-called optimal matching or optimal alignment. This computes a distance between any pair of sequences, based on the minimum number of replacements and insertions that would be required to transform one sequence into another. (The technique is borrowed from biology, where it has been used to investigate the resemblance of DNA molecules, and to construct trees of descent among them.)

For example, one may code the rhetorical pattern in a decision-making process, such that a typical data sequence might be: 1Z2, 1Z8, 20Z4, 2Z6. The letter Z is here used simply to separate the number of times an element is observed (which precedes the letter) from the identity of the element (which follows it). This sequence might therefore mean something like: 1 unit of ‘summarizing the position reached during an earlier round of decision-making’; followed by 1 unit of ‘formulating a new proposal not earlier alluded to’; followed by 20 units of ‘debate of a proposal placed before the meeting’; followed by 2 units of ‘suggesting amendments to a proposition newly tabled’. Having collected and coded the relevant data in this fashion we can then see that (for example) it takes two changes to turn the sequence 1Z2, 1Z9, 2Z15, 20Z43, 5Z21, 5Z23 into the sequence 2Z2, 1Z9, 1Z10, 1Z15, 20Z43, 5Z21, 5Z23. This can be observed by arraying the sequences so that their similarities are aligned as follows:

1Z2

1Z9

2Z19

20Z43

5Z21

5Z23

1Z2

1Z9

1Z10

2Z19

20Z43

5Z21

5Z23



A unit of Z2 has been added and a unit of Z15 has been turned into a unit of Z10.

By a series of such iterations, the computer program assesses the distance (and therefore the similarities) between successive pairs of sequences, by calculating the number and types of insertions and replacements necessary for transforming sequences so that strings become fully or partially aligned. Different programs identify different sequence regularities, depending upon the interests of the researcher, using a variety of techniques (swaps, insertions, replacements, transposition) and alternative ways of assigning ‘costs’ to these operations. In this way regularities are identified in the ordering of events in sequences. The researcher may then use these to construct a causal narrative in which the sequences feature as explanandum or explanans.

In an entertainingly polemical article (‘Of Time and Space: The Contemporary Relevance of the Chicago School’, Social Forces, 1997
), Abbott has argued that many sociologists today have ‘given up writing about the real world, hiding in stylized worlds of survey variables, historical forces, and theoretical abstractions’. He is particularly critical of the decontextualized ‘variables paradigm’ represented by most versions of contemporary causal modelling. In these approaches, the sociological variables of interest (class, bureaucracy, race, or whatever) are abstracted from their contexts in social (including geographical) space and social time, and can be connected only by inventing a series of just-so stories which assume that if people typically behave in such and such a way, then the variables will be related to a particular extent. (And, as Abbott observes, the variance explained is usually small and shows every sign of diminishing over the years.) By contrast, the questions and techniques associated with sequence analysis locate social facts in their real settings, and study actual patterns rather than decontextualized variables. By using these methods ‘we can look directly at social action by particular actors in particular social times and places’. In this way, according to Abbott, sequence analysis represents a return to sociology's disciplinary roots in the study of social process and social interaction.

Critics argue that decisions about the coding of data into sequences, about the definition and identification of discrete sequences, and about the choice of alignment procedures and protocols all seem disturbingly arbitrary.