Content analysis is used in the social sciences as one means of studying communication—its nature, its underlying meanings, its dynamic processes, and the people who are engaged in talking, writing, or conveying meaning to one another. Although not a research method sui generis, content analysis is roughly distinguishable from other methods by two characteristics. First, its data—in contrast to ethnographic reports, for example, or census enumerations—are the verbal or other symbols which make up the content of communications (letters, books, sermons, conversations, television programs, therapeutic sessions, paintings, and the like). Second, its procedures differ in emphasis from those of the historian or literary critic: they aim to be exact and repeatable, to minimize any vagueness or bias resulting from the judgments of a single investigator. Thus, each content analysis employs an explicit, organized plan for assembling the data, classifying or quantifying them to measure the concepts under study, examining their patterns and interrelationships, and interpreting the findings.

Within these broad limits the techniques of content analysis are diverse, and the objectives range from mapping propaganda campaigns, for example, to explaining international conflict and integration; from abstracting the ideas and beliefs expressed in folklore or movies of a given period to tracing the epochal alternations in societal values over many centuries; from charting the interaction between patient and therapist to assessing the psychological states of great men in the past.

No general theory of communication is yet in common use among the several social sciences to guide these varied analyses. Implicit in each investigation is a special conceptual model, or set of ideas and assumptions, about the nature of the particular communication process under study. To test this conceptual model or to add new ideas to it, the researcher uses the concrete data of communication. In the empirical phase of the research he is led by his model to select particular communications and to search for order among them by adapting certain conventional procedures of sampling, measurement, and analysis. In the interpretative phase, in comparing his findings with his initial conceptions, so as to understand their broader significance, he encounters certain special problems and possibilities.

The use of content analysis in the social sciences today—its methods and its problems of interpretation—has been affected both by related developments in other fields and by historical demands for certain practical applications. Early in the twentieth century students of journalism began to count the newspaper linage devoted to foreign affairs or to sports, comparing one newspaper with another, for example, and later comparing newspaper content with the content of other media. In literary criticism such devices as type of rhyme or ratio of adjectives to verbs were tabulated, as a means of differentiating the styles of writers or of literary periods or to settle disputes about authorship or the chronology of an author's work. Meanwhile, educators were constructing formulas for the readability of printed materials, utilizing proportions of easy and hard words, length of sentences, and the like.

During the 1930s certain applications of such techniques began to be made in the social sciences. Sorokin's monumental study of social and cultural changes in western Europe over the entire course of history rests in part upon an analysis of works of art, music, literature, and philosophy according to their central meanings (1937–1941). Lasswell developed a scheme for categorizing the content of patients' responses in psychiatric interviews as pro-self, anti-self, pro-other, or anti-other, and for counting the frequency with which such categories occurred (1938). Lasswell also, with a number of associates, pioneered the application of content analysis to the study of public opinion and propaganda, an effort immensely stimulated by the demands of the United States government during World War II. Mass communication was conceptualized, within a political framework, as “who says what to whom, how, with what effect,” and large-scale analyses were made, for example, of the frequency with which key symbols (democracy, communism, England, Hitler) were given indulgent, deprecatory, or neutral presentation (e.g., Lasswell & Leites 1949). These wartime efforts encouraged content analyses in other areas—focused on the intentions of particular communicators, the kinds of material brought to the attention of particular audiences, or the cultural values underlying the communicator's assessment of what the audience wants.

When Berelson (1952) made his critical survey of the applications of content analysis methods, he found several books and articles reporting the use of various techniques, e.g., techniques of sampling the content of newspapers by successively selecting specific newspapers, issues, and relevant content within each issue; techniques of categorizing and counting key words, themes, or whole documents; techniques of increasing the reliability of classifying and counting. To guide the application of such techniques, however, Berelson found only one conceptual model in widespread use: the Lasswellian model of the purposive one-way communication intended to influence a mass audience on controversial public issues (Berelson 1952, p. 57 and passim). Thus, broader scientific utilization of the available techniques waited upon a growing interdisciplinary understanding of the many-faceted communication process and upon the closer fitting of techniques to theory.

Important developments in the application of content analysis to social science models may be illustrated by a few examples from the profuse literature of the 1950s and 1960s (see also Work Conference on Content Analysis 1959).

Interaction process

Bales and his associates have developed one of several procedures for analyzing the content of communication observed in small groups (e.g., Bales 1952). Observers sitting behind a one-way screen categorize each of the remarks and gestures (acts) directed by each group member to other members as the group attempts to solve an assigned problem. Bales's standard set of 12 categories (shows solidarity, shows tension release, agrees, gives suggestions, etc.) indexes certain sociological properties of the interaction of a group: positive or negative direction, instrumental or expressive character, and the focus on such system problems as control, tension management, or integration. Thus categorized, data from many groups are used (with the aid of statistical devices and mathematical models) to describe the group process—the patterning of content, phasing over time, group structure. From these descriptions inferences are drawn about the underlying nature of this process. For example, the findings may show that typically a group leader emerges who both initiates and receives more communications than any other member; or that the process of problem solving goes through phases emphasizing, first, orientation; then, evaluation; and, finally, control [seeInteraction, article oninteraction process analysis].

Studies of therapy. Such analyses of the content of interaction, made at the time of observation or, later, through recordings or transcripts, are also applied to the therapeutic process in social work, counseling, and psychiatry (e.g., the review by Auld & Murray 1955). Category systems based on psychological theories of behavior combined with principles of client-centered therapy are used to trace the interview process, the client-therapist relationship, changes in predominant content over time, or differences between types of treatments. Standard measures employed in content analysis of psychotherapy include Bales's categories (1952) and the Discomfort-Relief Quotient (D.R.Q.), developed by John Dollard and O. H. Mowrer (see, e.g., Auld & Murray 1955, pp. 379–380) to show the ratio between the client's discomfort responses (reflecting tension, unhappiness, pain) and his relief responses (reflecting satisfaction, comfort, enjoyment). In Japan (Shiso … 1959) content analysis has been applied to the exchanges of letters published in life-counseling columns of newspapers and magazines. [SeeMental disorders, treatment of, article onclient-centered counseling.]

Psychological state of the communicator

Analysis of an individual's communications as an index of his underlying motives is exemplified in a study of suicide letters by Osgood and Walker (1959). Within the framework of stimulus-response theory these researchers formulated a number of predictions about the structure and content of suicide letters in contrast to ordinary letters and to simulated suicide notes: the letters of bona fide suicides are characterized, for example, by greater stereotypy; more evidences of conflict; more construetions of the demand, command, and request type that express needs of the speaker and require some reaction from another person to satisfy these needs.

To test these predictions, the researchers use 16 different measures—some already standard and some specially designed—for the analysis and comparison of the letters. As measures of the stereotypy of each letter, for instance, they divide the number of different words by the total words, count repetitions of phrases, or take the ratio of nouns and verbs to the number of adjectives and adverbs. To measure conflict, they determine the degree to which assertions are qualified, the number of syntactical constructions expressing ambivalence (such as “but,” “if,” “however”), or the extent to which both positive and negative assertions are combined in the same letter. They also employ Osgood's evaluative assertion analysis, a standard procedure that isolates from their context all terms by which the communicator evaluates an object, rates these terms as favorable or unfavorable, and then combines these ratings to index the communicator's over-all attitude toward the object (see, e.g., Work Conference on Content Analysis 1959, pp. 41–54). Comparisons of such measures for suicide letters and ordinary letters support the researchers' hypotheses in most instances, although the results for suicide versus simulated suicide notes are less clear.

Historical personalities. In similar fashion, content analysis seems potentially useful to the historian or biographer who is seeking to understand the personalities of great men through their writings and speeches. For solving questions about authorship of documents, rigorous procedures have been developed (as in the study of the disputed Federalist papers: Mosteller & Wallace 1964). For examining motives, attitudes, and psychological states of historical persons, however, content analysis of their communications has been less widely used, despite suggestive studies of Goebbel's diary or the autobiography of Richard Wright (see Garraty in Work Conference on Content Analysis 1959, pp. 171–187).

Culture and society

In their study of popular religion, Schneider and Dornbusch (1958) illustrate the use of content analysis to reflect, not the psychological states of single persons, but the values of an entire society. These researchers selected 46 representative works of American inspirational literature, published over an 80-year period, choosing best sellers to assure that the books were read. They classified each, paragraph by paragraph, according to its main themes (”religion brings physical health,” “happiness can be expected by most men”) and then ascertained what proportion of the total paragraphs is devoted to each theme. In their report they described the changes in these themes within the wider context of historical trends in American religion and culture and used a socio-logical model to interpret the functions of popular religion in society.

A considerable tradition of such analyses rests on the assumption that cultural values which have been institutionalized in certain segments of the society are represented in the communications of individuals from these segments. Some analyses stress the social determinants of the ideas or values expressed in folklore and sermons, for example; some emphasize the cultural determinants of such expressions. Thus, public communications of American business leaders are taken to reflect their business creed; Brazil's riddles and myths, to reflect the didactic aspects of its religion; Japan's popular songs, to reflect the loneliness and helplessness of its postwar era (Shiso … 1959, p. 122).

Content analyses are conducted by selecting and adapting certain empirical procedures used in social science generally: typically, the methods of using available data (although new materials for special objectives may occasionally be acquired by questioning or observing) and the methods of measurement combined with sampling and statistical analysis.

Use of available data

Most commonly, the content analyst chooses from the vast store of communications already available in libraries, clinics, archives, records, and family attics. Thus, he must know how to utilize the benefits of available data, while avoiding their pitfalls.

Advantages. Several advantages accrue to the student of communication who decides to use materials that already exist rather than to elicit new ones. (1) Time, labor, and expense can often be saved when the researcher can go directly to the heart of his analysis, bypassing preliminary field work, experimentation, or commissioning of documents. (2) When massive data are required, beyond the scope of a single new study, existing content materials frequently afford wide ranges of potentially relevant variables and of refinement in the measurement of each variable. (3) Most important, the available data afford the only means of studying certain kinds of communication problems. Past events cannot be observed directly by the re-searcher, nor can events beyond the recollection of respondents living today be reached through questioning. Thus, the analysis of historical situations or of long-term trends—the important study of social change—depends upon the prior existence of relevant materials. Similarly, study of cross-cultural communications from remote places (e.g., of world-wide tastes in movies or folklore or of similarities and differences in attention to major political symbols in different countries) may require materials that cannot be elicited by the researcher directly. Communication contents in technical fields that are beyond the competence of the researcher may have been originally assembled in usable form by an expert such as a psychiatrist, a social worker, or an ethnographer. Sometimes, as in letters or diaries, existing materials may provide deep insights into intimate feelings or personal relationships; and sometimes, as in Sorokin's analysis, they may widen the investigator's focus to include macroscopic social or cultural systems.

Pitfalls. Against such impressive assets must be set certain basic problems to be overcome in the utilization of data not originally assembled for the present purposes. (1) The materials are often incomplete. The content analyst must attempt to discover any absences of letters from a file of correspondence or of speeches from a set, which may mean that the data lack representativeness. (2) The data may lack reliability or validity. An isolated record of a historical event, for example, cannot be checked through comparison of different accounts or through direct observation or questioning by the researcher. Clues to validity can often be obtained, however, by comparing two sets of data believed to reflect the same concept, as does Sorokin (1937–1941) when he shows the parallelism between trends in scientific discoveries and the trends in empirical thought derived from content analysis. (3) Data from differing sociotemporal contexts may not be directly comparable, as sources of information may themselves change over time or from one country to another, or the same categories may take on different meanings. This difficulty requires careful documentation and the search for linguistic equivalences. (4) Finally, the data that come to the researcher in a form he does not fully understand may not fit his definitions of the concepts under scrutiny. Unlike the researcher who handles data he himself has collected, he is often unfamiliar with the circumstances under which the communications originally took place. Yet the content of a diary may depend upon whether it was written for public or private consumption, and the answer to an open question may be affected by interviewer bias. Here the important caveat is to attempt to reconstruct the process by which the data were produced, spelling out and, insofar as possible, offsetting any limitations and biases and recasting the data in a form suitable for the new problem.

Although the researcher may on occasion have to reject given data because he cannot adequately assess their limitations or find suitable means of compensating for them, the great variety of available data which may in some sense be classified as communications constitutes a highly valuable re-source for the further application of content analysis.

Use of measurement

The content analyst makes use of his data to measure his concepts, rather than to describe them in discursive language. His data consist of certain concrete communications of certain concrete individuals (the cases). His conceptual model contains corresponding definitions of particular types of orientations, actions, or characteristics (the properties) of particular types of persons or collectivities. What he does, in effect, is to treat the sense data (the written or spoken words, the gestures or pictures which he observes as manifestations or indicants of these properties (the ideas which he holds in his mind). Measurement is defined here, then, as the classification of cases (persons, groups) in terms of a given property, according to some rules for selecting and combining appropriate communications data as indicants.

Composite measures. The measurement rules followed in content analysis vary in detail with the study; but in general they are characterized by a two-stage procedure that results in a composite, rather than a simple, measure. The researcher does not simply classify (code) each case as a whole. Rather, he breaks down the total communication into a set of constituent units (e.g., words, assertions, articles, books); he first codes each of these content units separately, and then he recombines the coded units to provide the composite measure. Bales, for example, in classifying his cases (groups) according to various dimensions of interaction, might well have observed an entire small-group session and then assigned over-all ratings (simple measures) to indicate the extent to which, for example, solidarity was expressed or tension-management activities had occurred. Instead, he broke down the property (interaction) into small content units (acts), categorized the behavior act by act, and then counted the number of acts in each category. This composite measure gives a group profile —a distribution of the total number of acts among code categories—by which groups are classified according to the extent to which members show agreement, engage in tension-management activities, and so on.

Coding. At the first stage the coding process involves a measuring instrument for assigning to each content unit certain code designations that indicate how much of (or which attributes of) the property it possesses. This instrument consists of (1) a code, or set of code designations. Made up of numerals, symbols, or names of categories, the code lists all the categories marked off on each dimension of each property. (Properties are conceived of as having one or more main dimensions, or aspects; for multidimensional measures, the measures of single dimensions—whether simple or composite—must ultimately be combined to reflect the property as a whole.) The instrument further contains (2) coding instructions, which, on the one hand, define each dimension and its categories in terms of the conceptual model and, on the other hand, specify the kinds of data to be taken as indicants under each category. The coding instrument for a particular study is sometimes taken from an existing body of theory (such as Riesman's “inner-direction” versus “other-direction”); it may be a standard code developed by other researchers (such as the D.R.Q. or the verb-adjective ratio); or it may be developed from the empirical data of the study.

Combining. At the second stage—combining the content units to refer to the communication as a whole—the content analyst may simply count the number of units in each category (e.g., to show the number of favorable and unfavorable assertions or to arrive at the mean percentage of paragraphs devoted to dogma in a sample of religious books). Such frequency counts of similar units have the effect of weighting the category to show how predominant or pervasive that category is within the communication as a whole. Sometimes the units are given equal weight (e.g., Bales 1952); or different weights may be assigned, e.g., for different degrees of attitude intensity, as assessed by judges, or for differing degrees of impact upon an audience; Sorokin (1937–1941) weighted the influence of great thinkers according to the number of special monographs devoted to each.

Alternatively, the content analyst may adapt various available procedures to uncover the empirical patterning among different types of units, to show how specific acts, attitudes, or characteristics may fit together within a single communication process. For instance, factor analysis may be used to isolate broad dimensions (as of mental health attitudes expressed in mass media or of sensationalism in the handling of news), or Guttman scaling may be used to uncover cumulative patterns (as of the various duties and functions assigned in state laws to boards of education). Such procedures have the virtue of containing built-in tests of the correspondence between the researcher's conception of the property and the rules that he follows in making his measurements. These tests obviate the necessity of relying entirely upon the investigator's arbitrary judgment for the combining and weighting of indicants.

Such patterning is further disclosed, of course, when the investigator, having completed his measurement of single properties, proceeds to examine the positive or negative correlations between properties. References to the devil and to writing may be found to go together in certain folk tales; or a psychotherapy patient may tend to dissociate his thoughts of mother from his thoughts of homosexuality (e.g., Osgood's contingency analysis in Work Conference in Content Analysis 1959, pp. 54–78). Some content analysts apply statistical tests to estimate the likelihood that such correlations are due entirely to chance, although there are often problems of appropriateness of the particular tests (as when the several communications of selected individuals may not meet the assumptions of statistical independence).

Utility of composite measurement. The characteristic two-stage measurement procedure of first coding and then combining content units often enhances the precision possible in simple over-all measurement, while intercoder reliability is reportedly high. Coding rules can be defined more specifically and coding decisions made more easily for a small content unit than for an entire communication (see Schneider & Dornbusch 1958, pp. 165–169, for a comparison of global ratings of entire books with paragraph-by-paragraph ratings of the same books). Although the detailed procedure is typically more time consuming and laborious, computer programs will without doubt be increasingly used to expedite a number of these operations (e.g., the General Inquirer system for content analysis, Stone et al. 1962). Mathematically, the composite measure is quantitative, a characteristic which often facilitates its use in relation to other measures. Even though each act or unit of meaning may be coded on a nominal scale (described in words as favorable or unfavorable, showing or not showing solidarity, etc.) at the first stage of the procedure, at the second stage, when all these codes are combined, the resultant measure consists of numbers or proportions (of remarks that are favorable or of group activities that show solidarity). Such numerical data are used to classify the individuals or groups along scales that are at least at the ordinal level.

Nevertheless, the quantitative, precise appearance of many composite measures used in content analysis may be deceptive. Without a clear correspondence between measurement operations and the communication process, serious problems of interpretation arise.

Sampling. Just as content analysis requires measurement, it also requires rigorous procedures for sampling. The procedures generally employed by social scientists refer to the selection of both the concrete cases to be studied (the communicating persons or groups) and the communications to be used as indicants. Some content analyses deal with only a single case (e.g., Wilson as a single historical figure or western Europe as a single society). When many cases are studied, so as to separate common properties from those peculiar to exceptional cases, samples are often chosen by standard probability procedures that aim to represent the conceptual universe through the sample selected. A second important aim is to select a sample of cases that will facilitate the analysis— as Osgood chooses samples for comparative analysis of ordinary persons and suicides.

Similar sampling procedures are applied to the determination of which communications will be examined, since it is by no means always necessary to analyze all the writings of a given man, all the meetings of a given group, or all the propaganda of a given country. Selections are often made by stratifying or classifying the major items, such as books, prayers, pictographs, records of single meetings, paragraphs, and then taking a probability sample from each stratum.


Just as each piece of content analysis uses certain empirical methods to arrive at its findings, it also employs procedures for interpreting the scientific and theoretical significance of the findings by comparing them with the conceptual model. The methods for arriving at such interpretations have been less clearly codified than the empirical methods —many of which have been taken over wholesale from other applications—and there is much discussion and some confusion about the kinds of inferences which are appropriate or valid. Nevertheless, the accumulating body of interpretations derived is now beginning to explain the relationships between communicators, recipients, and the patterned content of the communications themselves. These interpretations shed light on historical changes and dynamic processes of communicative interaction. They often go behind the meanings of the language to the underlying social structure of the group or the psychological state of the individual. Content analysis may show, for example, that—quite apart from the content of communication—in a task group a leader tends to emerge who initiates and receives about half the communication. Or such nonlexical aspects of speech as stuttering or hesitation may reveal the anxiety of a patient in an interview.


The content analyst whose main objective is exploratory makes his interpretations by working primarily from data to model, adding new ideas to his theories after he has completed the empirical phase of the research. Here the special character of the composite measurement procedure can be a notable asset. The careful handling of details and the search for patterning among them often serves to clarify the concepts with which the inquiry began, and to uncover latent relationships and processes not immediately apparent to an investigator. Thus, the sociologist can reveal the balance between instrumental communications, through which a group may pursue its goals but which place strains upon its members, and expressive communications, which ease such strains and tend to re-establish the equilibrium (Bales 1952). Or the historian, through composite measures based on subject matter and key words in the clauses of the Grand Remonstrance, enacted by the British Parliament in the seventeenth century, can expose its character as a propaganda vehicle rather than a constitutionally important document (Knight 1960). In such exploration the methods of interpretation, though rarely explicit, require creative effort —a jump from evidence to ideas, a sensitivity to potential linkages between empirical clues and existing theory and knowledge.

Inadequate use of theory

The fruitless character of content analysis without careful reference to adequate theory is, unfortunately, all too often overlooked. Complex techniques of measurement and analysis may be applied blindly, without questioning their theoretical relevance. Content may be arbitrarily broken down into units that distort messages by wrenching them from their setting. One-to-one inferences may be drawn, from content descriptions to states of the communicator or his social system, without recognition or assessment of the isomorphism implied (as discussed in George 1959). Little consideration may be paid to the meaning of a particular frequency count, as this might refer to the intensity of an individual's attitude or to the consensus with which several individuals hold the same attitude or to the calculated impact of repetition upon an audience. Yet such oversights in connecting techniques with theory can yield meaningless—even misleading—results.

Hypothesis testing. Errors of sheer empiricism are less likely when the researcher, instead of exploring, can use content analysis to test hypotheses. This is often feasible when the conceptual model is highly enough developed to suggest an interpretation in advance of the content analysis (e.g., Osgood & Walker 1959). In hypothesis testing the researcher cannot avoid an explication of the presumed relationship between theory and operations. Here he uses logical or mathematical reasoning to specify what the expected findings would be if the assumptions of the model were in accord with the facts. Again, of course, any evidence derived from testing the model can only be as good as the model itself; the importance of the evidence is bounded by the imagination and the theoretical grasp with which the research begins.

Supporting analyses. Just as the model of communication includes not only the message unit but also underlying attitudes, behavior patterns, values, and social structures, so content analysis alone cannot provide a full understanding of communication. However ideally executed and interpreted and however widely replicated, the approach must often be supplemented by other approaches, which focus more widely upon the several aspects of the communication process. Thus, precise estimates of the intended meaning of a communication depend on knowledge of the situational and behavioral con-text (George 1959); the content of therapy cannot be assessed without an outside measure of the recovery of the patient; the interpretation of a suicide letter requires the identification of the writer as actually suicidal or not; the presumed appeals of propaganda or advertising must be checked against the responses of the audience. The very ability of the language to communicate must often be tested —for instance, by Taylor's “cloze” procedure, in which sample recipients are given a message in mutilated form and asked how far they can reconstruct it (see Work Conference on Content Analysis 1959, pp. 78–88).

A full understanding of communication will rest ultimately, of course, upon accumulation of ideas and facts from many related studies. Among these, the findings of content analysis can make a special contribution because of their objectivity. The content analysis of letters by Osgood and Walker, for example, is more open to evaluation and replication by other scholars than the less systematic handling of Polish peasant correspondence by Thomas and Znaniecki; the content analysis by Schneider and Dornbusch is more open than Max Weber's insightful construction of ideal types from the writings of a Benjamin Franklin or a Jonathan Edwards.

Matilda White Riley and Clarice S. Stoll

[Other relevant material may be found inFactor analysisandScaling.]


content analysis Content analysis reduces freely occurring text to a much smaller summary or representation of its meaning. Bernard Berelson (Content Analysis in Communication Research, 1952) defines it as ‘a research technique for the objective, systematic and quantitative description of the manifest content of communication’, though this is an overly narrow description. The technique was largely developed in the 1940s for propaganda and communication studies (‘Who says what to whom and with what effect?’, as Harold Lasswell puts it in his essay on ‘Describing the Contents of Communication’, in Lasswell et al. (eds.), Propaganda, Communication and Public Opinion, 1946
), and has increasingly made use of linguistics and information science.

In its simplest form, content analysis consists of word counts (for example to create a concordance, establish profiles of topics, or indicate authorship style), but grammatical and semantic improvements have increasingly been sought. These include attempts to ‘lemmatize’, or count variants and inflections under a root word (such that ‘am’, ‘are’, ‘is’, ‘will’, ‘was’, ‘were’, and ‘been’ are seen as variants of ‘be’), and to ‘disambiguate’, or distinguish between different meanings of a word spelt the same (such as ‘a bit of a hole’, ‘a 16-bit machine’, ‘he bit it off’). More ambitiously, content analysis seeks to identify general semantic concepts (such as ‘achievement’ or ‘religion’), stylistic characteristics (including understatement or overstatement), and themes (for example ‘religion as a conservative force’), and this normally requires complex interaction of human knowledge and fast, efficient computing power, typified by a system such as the Harvard General Inquirer. Content analysis has concerns and techniques in common with artificial intelligence although it has to be able to cope with more general and open-ended materials. See also CODING.

