Information Storage and Retrieval

I. THE FIELDJoseph Becker











The major determinants behind current information storage and retrieval efforts are the great volume of data pouring from our printing presses and our inability to locate much of it after it has appeared.

Responsibility for storage and retrieval of printed information has traditionally rested with the librarian. Early libraries concentrated on arranging books in some prescribed order on shelves. As the number of books increased, a complex organization became necessary in order to make the contents of a library collection more readily accessible. To provide such organization, librarians developed subject-classification schemes, the card catalogue, and other tools. These bibliographic devices now constitute the basic structure for control of library collections and are the fundamental finding aids that researchers employ.

Although conventional library tools today make location of a particular title among miles and miles of shelving a routine and simple task, they are not designed to provide more than a rough-cut approach to the subjects covered. For the users of general libraries this may be all that is needed, but when the same subject-classification techniques are applied to highly specialized collections of nonbook, technically detailed data, the imprecision of such methods of content retrieval becomes apparent. Because all knowledge and language are dynamic, constantly changing processes, any subject classification becomes obsolete almost from the moment of its creation. Furthermore, as one moves into increasingly specialized areas of knowledge, research becomes more complex. As new ideas generate new facts and new terminology, the task of organizing them and establishing their proper relationship to one another becomes ever more difficult.

An important distinction has been made between systems that locate documents and systems that produce information. Yehoshua Bar-Hillel has emphasized the difference between “literature searching” and “information retrieval,” pointing out that the problems of storing and retrieving documents should be considered apart from problems concerning information. Literature searching, he contends, involves determining which documents or books are relevant to a chosen topic. Information retrieval is the act of obtaining answers to questions about a selected subject (1957).

Emphasis has thus been placed on finding new ways and means of codifying or indexing data so that they will lend themselves to correlation at time of searching. The trend has been to achieve greater depth of content analysis. Not only have new analytical methods been devised but investigations have also been made of the feasibility of employing electronic machines for analysis, storage, and retrieval of information. Because of the great mass of data involved, new storage and handling techniques may have to be invented; these techniques must be more advanced than those customarily used for manually shelving books and filing documents. The emergence of information storage and retrieval as a new field reflects an awareness among librarians and others that the selection and manipulation of fragments of information, rather than of entire documents, will require unconventional tools.

A critical need for more advanced information systems has evolved because of the steady growth of publishing and the complex ways in which information has come in recent years to pervade decision-making processes in business, science, and government. References to the effects of expanded publishing were made by Fremont Rider (1944) and Vannevar Bush (1945). Shortly thereafter, the implications of the “information explosion” in science and technology were discussed at the first international conference on the subject, held in London by the Royal Society (1948). At that time, it was already clear that the publishing rate in science and technology was increasing exponentially and that specialization in individual sciences and the development of interdisciplinary research were generating multiple uses for the same information. Although interest in information storage and retrieval thus received its start in the world of science, it soon spread to other areas, particularly business, industry, and government and, notably, to institutions like the U.S. Library of Congress (King 1963).

Another factor responsible for the independent development of the field of information storage and retrieval has been the impact of technology. Research and development in the computer sciences, the photographic industry, and signal communication promise to provide powerful new methods and techniques for information storage and retrieval. Modern data-processing equipment has already been successfully applied to the numerical areas of scientific computation and business operations. The prospect of being able to use computers to solve nonnumerical problems that involve natural language has been a major impetus encouraging the evolution of advanced information storage and retrieval techniques. The appearance in 1948 of Shannon’s theoretical foundation for a general theory of information stimulated researchers to investigate the possibility of applying the principles of mathematics to the problems of information communication, by means of computers (Shannon & Weaver 1949).

The field of information storage and retrieval involves librarians, documentalists, mathematicians, system designers, linguists, equipment manufacturers, operations researchers, and computer programmers, among others. All are concerned with methods of expediting the prompt retrieval of information in such diverse areas as libraries, business and industry, military command and control, and scientific research. Because the field is interdisciplinary, considerable confusion regarding the boundaries of the effort has existed. A comprehensive bibliography covering the broad spectrum of interest appeared in 1958 (Bourne 1958-1962), and an introductory textbook on the subject was published in 1963 (Becker & Hayes 1963).

Classification of subjects in documents

Several specialists have devoted themselves to research into the problems of information organization. Among them is Mortimer Taube, who is identified with the concept of coordinate indexing, which provides a method of coordinating index terms as combinations rather than permutations. Taube called his index terms uniterms, and a coordinate index consists of a set of uniterm cards on which appear the identification numbers of the documents relevant to each uniterm. Searching is accomplished by selecting those uniterm cards pertinent to a request and correlating their document numbers. Matching numbers represent those documents for which the uniterms are simultaneously relevant (Taube et al. 1953-1957).

Calvin Mooers, one of the earliest proponents of coordinate indexing, proposed a concept of storing in one fixed place the codes for the subjects in a document, one code being superimposed on another. This technique is particularly applicable where coding space is at a premium, such as on edge-notched cards. Mooers also conducted extensive research into the mathematical structure of coding (Mooers 1951).

James W. Perry and Allen Kent have advanced the idea of the so-called telegraphic abstract, in which a phrase represents the logical unit of thought in a document, subphrases represent the individual words and concepts, and role-indicators describe the role that a particular word plays in the phrase. By using this method it is possible to describe a document in an artificial language or system that has more meaning than the sum of separately assigned subjects (Perry et al. 1956). Faceted classification, still another technique for organizing concepts expressed in documents, has been examined by S. R. Ranganathan and Brian C. Vickery (see Vickery 1958).

Computer analysis of natural language

A number of experiments have been conducted, and corresponding computer programs have been written, on the possibility of using computers to perform quasi-intellectual functions. Within the past few years, increasing emphasis has been placed on machine analysis of the syntax and semantics of natural language. This has led to the development of computer programs for such functions as language data-processing, machine translation, automatic indexing, automatic abstracting, concord ance building, and text condensation. Still other computer programs have been written for the preparation of permuted title indexes as well as conventional printed indexes (Edmundson & Wyllys 1961). Several researchers have produced computer programs that embody sophisticated mathematical principles for searching natural language. Research work has explored ways of extracting meaning from a text by means of word association, syntactical analysis, and even contextual analysis. M. E. Maron and J. L. Kuhns have applied the calculus of probability to automatic indexing in an attempt to establish a theory of relevance (Maron & Kuhns 1960).

Converting text to machine readable form

The ability to convert original data automatically from the printed page to an input form usable by machines is fundamental if electronic computers are to be employed in work involving information. Until this becomes possible the use of computers cannot be considered economical. In the absence of automatic conversion equipment it is necessary either to type or keypunch the data over again. These processes are expensive, slow, and unreliable. For these reasons, efforts are continuing to produce character-recognition machines (Symposium …1962). These are devices engineered to scan automatically the letters, words, and sentences of a text and to convert them directly into discrete digital representations.

The goal is to “read” rapidly large quantities of printed information, so that further processing of the data can be performed by a computer. Optical scanning and magnetic-ink reading are the two most common character recognition techniques in use. Thus far, only alphanumeric data in a prescribed type font are readable by machine. Research in auditory recognition is also under way to determine whether a machine can automatically discriminate phonetic sounds and, in so doing, produce a satisfactory digital code for input to a computer.

Compact storage of source material

Microfilm is, at present, the most effective means of storing original documents and of thereby controlling their volume. An impressive array of different cameras and a multiplicity of microfilm media are available commercially. Roll film, aperture cards, film in cartridges, microfiche, sheet film, and microcards are but a few of the examples of common microforms at present in use in information installations.

New dry processes have been introduced to overcome the disadvantages of wet chemical development, which is normally associated with the silver-halide film process. Diazo, for example, is a film which is exposed with ultraviolet light and developed in gaseous ammonia. Kalvar, another film, is exposed with ultraviolet light but is developed with heat at a temperature equivalent to that of a warm iron. The latest dry process is photochromies, which claims data-compression ratios of up to 400:1 with practically no loss of resolution. Photochromic film is exposed with ultraviolet light and can be erased, if necessary, with white light.

Printed material that is compressed into a microform calls for auxiliary equipment—inspection viewers, service viewers, and printing equipment for individual page copying. Equipment available on the market makes it possible to view any microform and to obtain a copy of an entire page or part of a page in seconds. Devices that fall into this category provide push-button copying, frame by frame, using manual, semiautomatic, or fully automatic auxiliary means.

Ever since Vannevar Bush proposed a Memex machine in 1945, much equipment has been designed to combine the dense-storage capability of film with the searching speed of electronics. The Rapid Selector, the first such device to be built, recorded frames of abstracts and corresponding digital codes on a 2,000-foot reel of 35-mm. film.

Following the Rapid Selector, other equipment appeared, such as Minicard, Media, Flip, File-search, Lodestar, Verac, and Walnut. Originally, each of these devices was designed to satisfy the needs of highly specialized information-system customers, but all of them represent technical progress toward combined use of electronic and photographic media for many purposes.

Minicard and Media are systems which store digital and graphic information on chips of film. Flip, Filesearch, and Lodestar, on the other hand, require the stored information to be contained in sequence on reels of film. Verac and Walnut are slightly different. The former uses a store of glass plates on which a matrix of images is recorded; the latter, strips of Diazo film for recording. Both, like all the others, have electronic-searching capabilities.

None of the techniques mentioned above was designed for book storage; in adapting them for this purpose, greatest attention has been focused on recording articles in technical journals or in special multipage reports. Since all these techniques are basically photographic, storage is not limited to the printed text alone, for all forms of graphic material can be stored by use of the same treatment (Becker & Hayes 1963, pp. 193-218).

Communication of information

No discussion of the technologies pertinent to the field of information storage and retrieval would be complete without consideration of the role of communication.

In the early 1950s, RCA conducted a demonstration of Ultrafax at the Library of Congress. A film copy of Gone With the Wind was sent over communication lines to a receiving point in a distant city. This facsimile transmission heralded the use of communication facilities for the transfer of visual data from one point to another. Video recording and transmission provide still another medium for sending graphic information over great distances.

Retrieval at a distance of digital and graphic information presupposes the availability of an interconnected communications network. On this assumption, research has been conducted to explore the relationship between man and machine in order to define more clearly the division of tasks between them. This in turn has led to further research of on-line systems, which establish direct communication between the man at an input-output console and the computer. The Massachusetts Institute of Technology has led the research effort to place at a user’s fingertips the communications equipment needed to interrogate a large store of information under the control of a computer while numerous other users are simultaneously using it (Kessler 1965).

From the above review of machine research-and-development activities pertinent to the field of information storage and retrieval several desiderata emerge: to find automatic means of converting printed data to machine language; to achieve more compact storage of source material; to enhance intellectual access to information; and to display or provide information rapidly in a form suitable for individual use. Microphotography, the evolution of unconventional subject-classification systems, the application of computers, and the use of communications techniques, among other things, represent various stages in the historical development of information storage and retrieval.

Joseph Becker

Among the more important information services available to scientists have been abstracting services and data archives. In response to the developing information crisis in the social sciences, they are now being adapted to the special needs of these disciplines. Abstracting services prepare and distribute succinct synopses and summaries of the growing volume of publications and research activities, whereas archives acquire, process, and make available existing social science data.

Neither of these services has been fully able to accomplish the purposes for which it was established. This relative failure is partly due to the inadequate facilities available for solving an overwhelming challenge. It is also explained in part by the lack of concern with and knowledge about information problems, needs, and solutions in the social sciences. The dimensions of the information crisis are only now being isolated, and the tools for its resolution are still in an early stage of development. This article will outline the scope of the information crisis in the social sciences, the range of responses designed to resolve the crisis, the contributions and limitations of abstracting and archive services, and the relation of the operations and concerns of these services to broader theoretical developments in the social sciences.

The information crisis

The contemporary information crisis in the social sciences has several salient features that create special difficulties for the social science information services. First, with the increase in the size and heterogeneity of the social science community, opposing demands are often placed on information services. For example, some scholars value brief, descriptive reviews of a wide range of published articles, whereas others prefer longer, evaluative analyses of a limited range of current publications. Second, with the growth of social science research and writing, it is becoming extremely difficult to locate and report on developments and materials relevant to the social sciences. The expansion of the social sciences around the world, together with the extensive amount of classified research conducted by governments and other groups, further adds to this difficulty. Third, research in the social sciences is perhaps more erratic in quality than that in other sciences. As a consequence, a great volume of irrelevant and harmful materials clogs the social science communication channels, consuming valuable resources and detracting from the cumulative development of the field.

Considerable attention has been given to the consequences of the information crisis, both upon individual scientists and upon science viewed as a system (Menzel 1964). Individual scholars find it increasingly difficult, if not impossible, to keep abreast of new concepts, methods, theories, and findings, even in their narrow field of specialization, let alone in adjacent, relevant fields. This difficulty does not refer solely to the scholar’s lack of time to read all the relevant literature or to his inability to remember what is relevant in what he reads. It also refers to the absence of information facilities that are capable of locating, organizing, and retrieving the relevant material.

This maldistribution of information manifests itself in costly duplication of research efforts and in a failure to build upon prior scientific work. In an exploratory study of bibliographic needs of economists, psychologists, and anthropologists, approximately 60 per cent of each discipline reported that “they sometimes or often have failed to learn in time about some relevant prior work that would have made a difference in their research or teaching” (Appel & Gurr 1964). Clearly, the expansion of information is creating a major crisis for the future development of science. Unless new methods are perfected to help locate and preserve available information, it will be extremely difficult to maintain the cumulative effect of research. Abstracting services and data archives are only two of many methods that have been developed to retrieve, process, and distribute information in a more rational and effective manner.

Responses to the information crisis

Efforts to cope with the information challenge can be described from three different perspectives: first, overarching, multifaceted programs that focus on the entire information system; second, specific devices such as abstracting services and archives developed to meet particular needs: and third, issues and problems related to the organization and performance of specific kinds of information services.

National information programs

An increasing number of countries are formulating national information policies involving a wide range of specific programs designed to rationalize the nation’s scientific information system. There are a number of motivations behind these developments, not the least of which relate to national security interests. For example, the United States government, especially the military sector, is a major source of research and development funds for many scientific disciplines. In order to plan and manage their programs, these sponsors need information as to who is engaged in what problems, how great the scope of the effort is, who is supporting the work, what additional work is being planned, what the time schedule is, etc. However, “resources information” is only one element in planning scientific programs. In addition, the scientists engaged in the work also need information programs.

A major milestone in the American government’s development of a national information policy occurred with the passage of the National Defense Education Act of 1958, an act empowering the National Science Foundation to establish a science information center that would support and encourage work related to indexing, abstracting, translating, and the development of mechanized systems for retrieving, storing, and disseminating information. A number of government and private groups have made their own studies of the information crisis, and these contributed to the creation in 1958 of the National Federation of Science Abstracting and Indexing Services, a group fostering greater cooperation and coordination between its members; these efforts also contributed, partly through the encouragement of the National Science Foundation, to the creation in 1964 of the National Council of Social Science Data Archives, a cooperative venture encouraging the coordination and development of the activities of its members (Mitchell 1964). These various developments have also helped to make individual scholarly associations more conscious of their own information problems. For example, the American Psychological Association has conducted a series of studies on a wide range of information problems experienced in that field.

These American programs have major implications for social science communication in other countries, because the American services are increasing their coverage of information produced in other countries, because the American scientific output looms so large in the over-all international perspective, and because the information services in many countries are relatively weak or do not exist.

However, many countries have especially strong information programs in the physical, natural, and medical sciences, as well as in technology. Most noteworthy is the All-Union Institute of Scientific and Technical Information (Vsesoyuznyi Institut Nauchnoi i Tekhnecheskoi Informatsii, usually abbreviated VINITI), originally created by the Soviet Union’s Academy of Sciences in 1952; a somewhat comparable group exists in Poland; both Uruguay and Argentina, as well as the state of Sao Paulo in Brazil, have established research councils that perform various information functions; and through the encouragement of the American government, research and information services have been established by several other countries, including Turkey and Thailand.

International information programs have also been initiated. Of special significance to the social sciences are the activities of the United Nations Educational, Scientific and Cultural Organization (UNESCO), the International Committee for Social Sciences Documentation, and the International Social Science Council. These groups encourage coordinated, international efforts in standardizing information procedures and in exchanging information; they also produce various abstracts, indexes, and bibliographies.

Range of specific information services

The varied nature of the information crisis suggests the need for a wide range of information services. Furthermore, no single service is fully capable of meeting all the needs for which its genre of services is designed. For example, an abstracting service will probably not be able to facilitate casual “browsing” and also provide the detailed information that some scholars may need. However, recent developments in mechanized information-storage-and-retrieval techniques are likely to increase the variety and quality of services that any information facility will be able to offer.

There are several ways to categorize information services—by the techniques they use; by the way they are organized and financed; by their subject-matter specialization. For present purposes they . will be arbitrarily classified according to the quality and (for lack of a better word) completeness of the information they provide. At one extreme are various services that simply help locate references to larger bodies of information, which may or may not have the relevant information a scholar requests. Midway are services that help locate information and also provide a capsule description of the information; some services also evaluate the materials. At the other extreme are services that locate, evaluate, and provide the information in a form the user requests.

Bibliographies, indexes, directories, and library card catalogues are examples of services that typically present limited information; they help the user locate materials that might be relevant to his interests. These services differ in the completeness and organization of their files. Some attempt to refer to all publications relevant to a particular topic (such as the annual bibliographical issue of the Journal of Asian Studies); others refer only to published books, thereby excluding the large volume of periodical literature. Some organize their information by broad subject topics; others alphabetize by authors, topics, or key words. Relatively few give any detailed information about the contents of an article, although library catalogue cards typically provide some information about chapter or section headings. These simple locator devices are perhaps most helpful to users interested in very broad topics. Primary consideration in evaluating such services would include the completeness of listings, the general categories used in organizing the files, and the degree to which the lists of titles accurately indicate the content of the materials.

Information services that provide more than locator references but less than a complete copy of the entire information, fully evaluated, can be classified according to whether they emphasize descriptions or evaluations. Annotated bibliographies, abstracts, and clearing houses tend to emphasize the descriptive dimensions. Abstracts will be discussed in greater detail later in this article. Annotated bibliographies often present less information than abstracts do, and several of the more useful ones have tended to focus on fairly specific topics, such as juvenile delinquency or the relationships between education and national development (e.g., Stanford Research Institute’s Human Resources and Economic Growth). Several clearing houses have been established to assist scholars. These facilities—such as the Science Information Exchange of the Smithsonian Institution—provide, on request, copies of abstracts reporting on research related to the kinds of information desired. The primary concern is helping the user to contact sources and people who might be able to provide him with more complete answers to his questions.

Other information services go beyond mere description; they attempt to place current publications in some larger context, and they attempt to evaluate in various ways the quality and contribution of current scholarly output. Annual reviews, encyclopedias, guides to literature, textbooks, and state-of-the-art reviews (such as Sociology Today, edited by Robert K. Merton, Leonard Broom, and Leonard S. Cottrell, Jr., and Anthropology Today, edited by A. L. Kroeber) represent attempts to place current research in some larger perspectives; the present encyclopedia, for example, is a major effort to take stock of social science developments over the last three decades at the conceptual, methodological, and theoretical level.

Although these evaluation services perform invaluable functions, they have well-known limitations. For example, they typically review only books, not articles. (Some abstracts, such as those in The American Behavioral Scientist, cover both. Some publications, such as Current Sociology, combine an evaluative state-of-the-art review with an appended annotated bibliography.) Book reviews do not cover as many books as abstracts cover; they do not give equal treatment to all parts of the book; they are subject to the biases of the reviewer; and reasons of space tend to limit the review to a relatively few, cursory comments. Some social science publications attempt to emphasize quality and thoroughness of their reviews by inviting rather extensive analytical reviews of selected publications; “regular-length” reviews receive smaller space, and “book-notice” reviews are given only a few paragraphs.

Information services of the final category to be mentioned are those that not only evaluate materials but also provide all the information the user might need. In some instances this evaluation results in discarding poor-quality and outmoded materials; in other instances the service provides all available information but also evaluates its strengths, weaknesses, possibilities, and limitations. Specialized libraries, data archives, and information centers are major examples of these kinds of services, although none is a perfect example. These services are becoming increasingly important.

Abstracting services

Most of the general issues relevant to the activities of any information service have already been mentioned. Four of these will serve as focuses for the following brief discussion of abstracting services: coverage, currency, quality of the abstract, and organization of the abstracting services’ files or cataloguing systems. In addition, several evolving trends and likely developments will be discussed. Technical and administrative aspects of abstract operations, however, will not be considered.


The rapid expansion in the volume of social science research, in combination with the spread of the social sciences into more and more countries, continually increases the burden of the overtaxed abstracting services. Several comparative statistics suggest that none of the social science abstracting services is likely to reach the scope of operations currently found in some of the physical and natural sciences. In the social sciences, Sociological Abstracts in 1964 published 3,114 abstracts from about 115 journals, published in approximately twenty countries and in eleven languages. American and other English-language journals are by far the most important sources abstracted. Psychological Abstracts in 1964 published 10,500 abstracts from about 450 journals, published in approximately a dozen languages and in two dozen countries. On the other hand, Chemical Abstracts published approximately 165,000 abstracts in 1962. It abstracted materials from some 8,000 journals, in more than fifty languages, from approximately 85 countries. In 1960, the Russian abstracting service VINITI was processing approximately 15,000 periodicals, published in 65 different countries. One of the largest American abstracting services in this same year had a staff of 500 and an annual budget of $5 million.

By these standards the total output of all social science abstracting services is relatively small. The country coverage of most services is rather limited and, usually, only includes several of the major producers. Coverage is least in eastern Europe and in some of the developing countries—areas where the social sciences themselves are least developed, where there are relatively few social scientists, and where the language problem is an obstacle to abstracting services.

Since some of the new periodicals being established in the developing countries (e.g., America latina.) append summaries in one or more Western languages, geographical coverage in the social sciences may become less of a problem in the future. Problems of geographical coverage are also being handled by the establishment of regional social science documentation centers. For example, UNESCO has established such a center in Rio de Janeiro, Brazil, and one in New Delhi, India. (The latter, however, is scheduled to terminate its existence before 1970.) The creation and strengthening of abstracting services in individual developing countries will also serve to facilitate efforts of scholars to locate relevant materials from other countries.

Even the best social science abstracting services tend to focus almost exclusively on major periodicals. Their coverage is limited, leaving out the growing number of fugitive documents, government reports, reprints, master’s theses, doctoral dissertations, and conference papers. This situation is likely to improve as abstracting services coordinate their efforts more closely and publish abstracts prepared by other services and as there are technological advances in computer-produced abstracts.

If these developments rapidly increase the volume of published abstracts, other technical innovations, now in the developmental stage, will probably be adopted. For example, new kinds of abstracting publications—such as science newspapers—will be produced, and new mechanical document-switching centers, similar to the Defense Documentation Center of the American government, will be created to send out abstracts or documents on request.


Time is another dimension of coverage. That is, coverage decreases with an increase in the interval between the appearance of an article and the appearance of its abstract. Since the financial and human resources of abstracting services rarely keep pace with the growth of scientific literature, the currency problem is likely to become more serious. It is already a serious problem in some fields. For example, a study of the information situation among psychologists discovered that articles in psychological journals were based on work initiated, on the average, between 30 and 36 months prior to publication. It takes approximately another 15 months before the article is abstracted in Psychological Abstracts. Between the initiation of a research project and final abstracting of the article, the project directors report on their work at professional meetings, various drafts or preprints are distributed to a select audience, and about nine months pass between the submission of an article to a journal and its eventual publication. Similar delays occur in other fields. For example, articles submitted to some Russian scientific journals in 1952 were not published until 1955. In today’s rapidly developing scientific world such time lags can be very costly. A number of scholars may be working on problems already solved and spending funds that could be better used to advance, rather than to replicate, scientific findings.

Two developments are likely to help reduce the time-lag problem. First, there are a number of information services that report on research in progress. In the United States, clearing houses such as the State Department’s Office of External Research, together with the Smithsonian Institution’s Science Information Exchange, perform this function. Annual reports of foundations, as well as directories of current research, also provide information on research in progress. The second development—automated, or computer-produced, abstracts—has yet to be perfected, although various groups have been working on this technique. Since many abstracting services rely on authors of articles to provide their own abstracts, and since this procedure helps to retard the publication of abstracts, any program that reduces the abstract’s reliance on authors would help to solve the currency problem.


Social science abstracting services face peculiar problems in insuring the completeness, accuracy, and general quality of their abstract entries. For one thing, it is generally recognized that a relatively high proportion of social science writings have serious defects: data are questionable, samples are limited, methods are inadequate, findings are trite, and conclusions are not well supported. Such literature may not be worth abstracting; furthermore, to treat it seriously is likely to be a disservice to the future development of a scientific discipline. In short, the quality problem refers in large part to the general quality of social science work, and the eventual solution to this problem will depend in large part on the greater selectivity of journals and publishers, as well as on the general maturing of the social sciences.

The quality problem is one reason evaluative reviews and state-of-the-art summaries are so valuable to some scholars. However, abstracting services. have to consider the variety of needs scholars have, and therefore they cannot afford to be too selective. Some users may value “browsability” more than depth and critical judgment, exhaustiveness more than selectivity, hard facts more than syntheses, and methods more than concepts. These contradictory demands on information services have contributed to a growth in research concerning the kinds of information needs scientists have. (It is generally recognized that the needs of science as a system may not coincide with the felt needs of individual scientists; the individual may limit his concern to his own narrow specialization, whereas the development of science may depend on the blending of developments from two or more specialties.)

Quite aside from the quality of research publications and the kinds of abstracts needed, there is the separate issue of the quality and accuracy of the abstract itself. Since a high proportion of the abstracts are prepared by the authors of the abstracted article, the problem of quality is related to the general problem of the quality of social science work. It is very likely that experiments will be conducted to compare the usefulness of abstracts prepared by authors with that of abstracts prepared by professional abstractors and, eventually, by computers.

Organization of the files

A major purpose of abstracts is to help scholars locate materials they might want to investigate in greater detail. To serve this purpose, the abstract must briefly and accurately cover the relevant portions of the larger manuscript. Furthermore, the abstract journal must be indexed so that the user can locate all the relevant materials quickly, with the assurance that he is not missing anything because of the searching procedure forced on him by the abstract’s index or filing system. Also, the abstract’s filing system must avoid providing the user with too much information: that is, it should ideally provide him only with materials directly pertinent to his initiating question. (If he is only browsing, this may be an irrelevant consideration.) The organization of an abstract’s filing or cataloguing system is an important determinant of how much the abstract can contribute to the advancement of a science.

Implied in the last statement is an assumption that abstracts are used only for research related to the advancement of science; but, of course, they are used for teaching and other purposes as well. Even researchers will differ in the demands they make of an abstract, so a catalogue system suitable to one group of scholars may be useless and annoying to another. Some abstract users are interested primarily in very general topics, such as legislative systems; others are seeking information about research methods; others wish to see how a single concept has been used in various contexts; still others will be primarily interested in specific variables or in empirical relationships between specified kinds of variables. Not only are the initiating concerns different, but so are the complexities of the request. For example, a cataloguing system will be organized one way if most of its requests are for general topics and another way if most of its requests refer to information on multivariate relations between such items as liquidity preference, religiousness, and political involvement.

Social science abstracting journals tend to be organized around traditional subject-matter topics rather than around variables, concepts, and findings. Even the topics tend to be gross in content. For example, in 1964 Sociological Abstracts organized its materials into approximately fifty categories, including “public opinion,” “political sociology,” and “sociology of the family.” On the other hand, Psychological Abstracts has more than three times this number of categories, and it publishes a very detailed and extensive supplementary subject index.

Most of the shortcomings mentioned above can be attributed to a lack of resources available to abstracting services and to the precomputer creation of their organizing principles. With the maturing of the social sciences, the recognition of new methods of organizing materials, and the pressure from an accumulating mass of literature, a new view of abstracting problems and services is slowly appearing. Studies of information needs will help determine the most appropriate organization of a catalogue; the development of natural language systems for storing, organizing, and retrieving information may change the perspective on cataloguing systems (Gardin 1965; Scheuch & Stone 1964); and the success of computerized information services, such as the Defense Documentation Center, may alter the concept of what services an information center can provide.

Emergent trends

Reference has already been made to several developments that have the possibility of profoundly improving the information services that to this time abstracts have provided. The extent and speed of these improvements will depend on financial considerations, the development of a national information policy, the future course of the social sciences, and the progress other scientific fields make in solving their information problems. Many of these developments cannot be clearly seen, because information specialists are only now discovering the potentialities provided by modern computer technology. Computer-produced translations can expand the amount of materials that can be considered for abstracting; computer-produced abstracts can increase the number of abstracts that can be prepared, and they can also reduce the lag between publication and abstract; by means of “interest profiles”—a list of words indicating a person’s interests—computers can facilitate a more aggressive dissemination of information; and by means of computer-based systems for information storage and retrieval, abstracts can be more than document-switching services; they can become invaluable research tools.

The last point deserves special mention, since it may involve a radical departure in traditional conceptions of abstracting services. At the present time abstracts are considered devices to alert scholars to materials they might wish to explore more fully. However, exploratory work in developing systems for information storage and retrieval in social science data archives indicates that such systems can be used to retrieve and organize specific kinds of information, such as “findings” on the relationship between family structure and educational achievement. Therefore, rather than merely providing references to existing information, abstracting services could develop into facilities that actually create new information. That is, they can serve as sources of data.

Information banks will be of primary interest to those who are seeking results of studies rather than to those who are concerned more with the methods by which the information was collected and analyzed. In the physical sciences and in engineering, specialized “mission-oriented” groups have been created to service requests for data, results, methods, and analytical procedures. In the United States, groups such as the Defense Metals Information Center and the Thermophysical Properties Research Center are prepared to winnow out irrelevant and poorly performed research; they provide state-of-the-art summaries, information on the latest findings and techniques, and references to who is doing what in the field. These and other similar groups provide specific, evaluated answers to questions, not just information on whom to see or what to read in order to answer the question.

There are indications that these same kinds of information centers will develop within the social sciences. For example, in 1964 Michigan State University began to create a diffusion documents center, a facility designed to provide bibliographic references, existing data, and other information related to the adoption of various new farming and other techniques and practices. Also, the Special Operations Research Office of the American University has created CINFAC, a service that responds to requests for information, materials, and analyses of the human factors involved in insurgency and counterinsurgency situations in specific geographical areas. There are also a variety of local and national centers that have a heavy applied or policy orientation, being concerned with intergroup relations, community planning, and various economic matters.

Data archives

Whereas abstracting services report on published literature and research in progress, social science data archives acquire, store, process, and distribute basic social science data produced by various research and administrative groups. These data, primarily materials that are in a form for machine processing, together with their accompanying study designs, code books, research reports, etc., are used by researchers for purposes of secondary analysis and by teachers for purposes of training. (“Secondary analysis” refers to the use of materials for purposes unrelated to those for which they were originally collected.)

Such secondary materials have played a key role in the development of the social sciences, although the contribution differs in accordance with the attention different disciplines give to quantitative social research and with the standards of evidence and inference upheld in the various fields. At one extreme, economics and demography have been heavily quantitative in orientation. While there are certainly major exceptions, in large part the materials used in these fields result from the normal bookkeeping operations of various government and private administrative operations. At the other extreme, anthropology has been largely a descriptive discipline, concerned primarily with “qualitative” materials, or, more properly, information collected by observation of one sort or another. Sociology and, more recently, political science fall between these two poles. In the past, pathbreakers such as Quetelet, Durkheim, and Sorokin based some of their most significant research on existing published statistical data, whereas more recently, beginning in large part with the American Soldier volumes, there has been an increasing research interest in working with the punched cards produced by research projects that have terminated their activities.

With the advent of modern data-processing equipment, social scientists are able to utilize new techniques on new bodies of data. This in turn has contributed to new concepts, theories, and methodologies and to demands for still more data. The change in research orientation can be seen in the history of the concern with existing materials, especially with public-opinion data. Recognizing the significance of these materials very early, Public Opinion Quarterly began in July 1938 to publish the poll results released by the American Institute of Public Opinion; the American and overseas coverage of releases was increased until 1951, when this regular feature was discontinued. However, it was reinstituted some ten years later. In the meantime, the International Journal of Opinion and Attitude Research, published from 1947 to 1951, ran a major feature called “World Opinion.” Between 1943 and 1948 the National Opinion Research Center published eleven issues of Opinion News, which included releases from polling groups in the United States, as well as other countries. In 1951 Cantril and Strunk compiled their book Public Opinion 1935-1946, which included opinion-poll materials from sixteen countries. In 1960 Cole and Nakanishi edited Japanese Polls With Sociopolitical Significance, 1947-1957, and comparable volumes of German, Swedish, and Italian materials have been published. In 1965 the Steinmetz Institute of Amsterdam assumed the editorial responsibilities for Polls, an international journal reporting the research results obtained by about seventy organizations from more than twenty countries.

With the further development of survey methodology, research interests, and data-processing equipment, scholars became increasingly aware of the limitations of these published volumes of research findings. Only a small portion of all the findings and materials was reported; and the materials that did appear in print were at most simple marginal distributions, although two-variable tabulations were sometimes given. In no sense were these materials being adequately exploited. Since large quantities of the basic materials were being destroyed, it appeared that these invaluable research resources would be lost forever.

Although public-minded commercial research agencies, both in America and in Germany, had expressed an early interest in having their materials preserved and made available to scholars, it was not until 1955 that steps were taken to provide an over-all solution to the problem of archiving these data. In 1957 York Lucci and Stein Rokkan reported on their two-year investigation of archive prospects in the United States and Europe. Among other suggestions, the authors proposed the creation in the United States of a central national archive containing survey materials collected from around the world. It was felt that the level of research practices and sophistication among Europeans required that the creation of European archives be accompanied by training programs in the use of survey materials, as well as methodological studies of problems involved in the use of such data. Prior to 1957 individual university research centers maintained archive operations for their own materials, making them available primarily to graduate students for their dissertations. After 1957 a number of specialized archives were created. Although a study in the United States in 1963 discovered that there were, on the average, three or four archives per state (Ferguson & Lazarsfeld 1964), most of these were repositories of materials that were not made readily available to the academic community. But by 1965 approximately fifteen university archives were in existence or in the process of being created in the United States. Other archives were being created in Norway, Finland, the Netherlands, France, England, Germany, and Argentina. This proliferation in turn led to the creation in 1964 of the Council of Social Science Data Archives, an American group, and to the beginnings of an international coordinating network.

Most of these archives were created in response to the research needs of local faculties. Because of this, the archives differ considerably in their scope of concerns, the services they offer, and the data they collect. Some archives have focused primarily on one kind of data—survey materials or aggregative statistics. (The Yale Political Data Program is concerned with national aggregative statistics; an archive at Indiana University is concerned with the more qualitative aspects of nation-states.) Some are concerned with particular regions of the world. (The International Data Library and Reference Service of the Survey Research Center of the University of California at Berkeley specializes in materials from the developing nations; Steinmetz Stichting, a University of Amsterdam archive, focuses on materials from the Netherlands; and the Zentralarchive fur Empirische Sozialforschung at the University of Cologne is devoted primarily to German materials.) Others are concerned with materials pertaining to limited substantive research interests. (The Inter-university Consortium for Political Research, an American group with offices at the University of Michigan, focuses primarily on politically relevant American materials.) Still others are primarily concerned with materials provided by particular kinds of data suppliers. (The Roper Public Opinion Research Center has relied primarily on commercial polling agencies for the materials it distributes.)

In the course of their development, archives have found it necessary to prepare their materials for machine processing, sometimes to evaluate them, and increasingly, to store information that will help to locate the materials which users request.

Archives in the beginning were concerned almost exclusively with sample survey data, since these included the attitudinal materials of central concern to many of the social sciences. Such materials are being produced at an extremely rapid rate. Information presented at the Second Conference on Data Archives in the Social Sciences indicated that more than two thousand surveys, representing approximately two million interviews, were conducted in Britain from 1963 to 1964. Between four and five thousand surveys were conducted in continental Europe in 1963. An estimated fifteen hundred of these studies could be made available in one form or another to scholars for purposes of secondary analysis. The American production of survey materials is even greater; and since survey research methods are being adopted in almost all the developing countries, a truly vast sea of materials is potentially available for research purposes. Add to these materials all the court decisions, police department records, manpower-and-income data, social security records, educational statistics, and the quantities of materials produced by the “knowledge industry” around the world, and it seems that scholars have a wealth of data to work with if these data can be made readily and inexpensively available.

In addition to survey-research materials, other kinds of materials have also been collected in the archives. For example, the aggregative or ecological materials organized by the Yale Political Data Program include averages, events, rates, percentages, measures of dispersion or variance, traditions, developmental patterns, and the like. Information is collected on daily newspaper circulation, cinema attendance, extent of urbanization, military expenditures, election results, immigration, the distribution of agricultural land, and many more variables. The quantitative data, which typically refer to a defined geographical or administrative unit, are reported in raw numbers, as well as in rank orders.

Along somewhat different lines, ethnographic materials from a large sample of world societies have been collected and organized by the Human Relations Area Files into an elaborate classification scheme (Yale University 1938). Although these materials, unlike the others that have been mentioned, are not in a machine-manipulative form, there have been pilot projects designed to facilitate their use by means of computers. In the early 1960s attention was also given to machine manipulation of other kinds of quantitative data, including Ror-schach tests and materials on the relationship between culture and personality.

Two major arguments—one referring to economic efficiency and the other to theoretical significance—have been used to justify the creation of these various data services. From the economic perspective it has been noted that collecting primary data entails considerable costs; large field staffs are needed; competences that many scholars do not have are required; and a full-time commitment on the part of the scholar is necessary. The cost, personnel, competence, and time obstacles prevent scholars from pursuing their research interests. Balanced against these obstacles is the existence of vast quantities of already collected materials on a wide variety of topics from many different countries. These materials, which can be obtained for only a fraction of their original costs, have typically never been analyzed or reported in depth.

For example, the International Data Library and Reference Service of the Survey Research Center of the University of California at Berkeley describes its data as referring to political attitudes and behavior, attitudes toward foreign nations and international relations, patterns of stratification and mobility, family structure and family planning, personal and public ethics, religious beliefs and practices, standards of living, material needs and economic outlook, and many other topics of interest to social scientists. Its materials have been used in studies of political attitudes and behavior in Latin America, North America, Europe, and Asia; of anti-Semitism in Europe and America; of religion and politics in the United States and France; of social class and voting in Great Britain, Australia, Canada, and the United States; of student politics in Asia, Africa, and Latin America, etc. In many instances the same questions are asked in the same country at different periods, permitting trend analysis, or in several different countries, permitting cross-cultural, international comparisons.

The major theoretical significance of the files lies in the increasing use of existing materials for purposes of secondary analysis and for student training. This has not occurred without methodological criticisms and research difficulties. However, the very availability of such data has encouraged scholars to develop and systematize the logic underlying secondary analysis. Some of the university-based archives, especially those concerned with data collected outside the United States and data that are to be used for purposes of international comparative analysis, have taken an active concern with evaluating the materials they acquire and with specifying the various substantive and methodological limits to which their data could be used.

In part the evaluation issue arises because archives use different standards in deciding what materials to acquire. Some archives accept whatever their suppliers offer them; others feel that quality is more important than sheer inclusiveness. In fact, some scholars argue that inclusiveness of coverage, in the sense referred to in the discussion of abstracting services, can be detrimental to the social sciences, for it is necessary to exclude materials that are trivial, insignificant, and of poor quality. Poor data may be worse than no data at all, and good data are preferable to poor data.

Until such time as archives establish standards for determining what is substantively relevant and methodologically adequate, archives will contain large volumes of relatively unimportant, low-use data. Some groups have argued that archives would probably perform a greater service than they do now if they concentrated more of their resources on obtaining, evaluating, and servicing a smaller body of materials.

Evaluation and methodological issues raised with regard to survey materials collected by social science data archives have equal relevance to the collection and use of primary materials. Therefore, the research and methodological efforts encouraged by the development of archives benefit a wide range of social science research concerns. By the very fact of raising these issues, a general distinction can be drawn between the two topics discussed in this review—abstracting services and social science data archives. On the one hand, abstracting services facilitate the research and teaching efforts of scholars by providing them with descriptive information on what scholars have done and are doing. On the other hand, while sharing many of the same organizational problems encountered by abstracting services, social science data archives provide more than information on what has been done: they actually provide the basic raw materials used by other researchers. Because they are concerned with data rather than with information only, they must often concern themselves with an entirely different range of issues, namely, how these materials can be used, what their limitations are, and what contributions they might make to the development of a social science based on firm empirical foundations.

Robert E. Mitchell


The library has been from its beginnings, as they are known to us, a social instrument, the constantly revised invention of men working together in an organized society. The clay tablets of Ashur-banipal’s royal library at Nineveh, the papyrus rolls at Alexandria, the parchment and vellum codices at Pergamum, were all brought together, organized, and preserved because these societies needed recorded information for the maintenance of the state, the preservation and communication of religious belief, the transaction of commerce, the education of youth, the bequeathing of the culture to subsequent generations.

It is undoubtedly true that a society stagnates unless it makes constant provision for the injection and absorption of new knowledge. A society is a duality of action and thought, bound together by a communication system that itself is a duality of mechanism and message—that which is transmitted, as well as the manner of its transmission. In a given society or culture in which language is the medium and the graphic record one of the instrumentalities, libraries of every kind constitute a network within the total communication system, a subsystem whose effectiveness depends upon the librarian’s understanding of the nature of knowledge and its importance to both the individual and society. The library can be socially effective only if its operations derive from and are harmonized with an understanding of the ways in which knowledge is generated and flows through the communication channels of a constantly evolving social and intellectual organization; and it is this changing social structure that in large measure determines how knowledge is translated into action.

The librarian’s professional resources must include an understanding of the processes of intellectual differentiation and the interrelationships of knowledge within a complex social organization. He must recognize not only that intellectual forces shape social structures but also that cultures and their symbol systems shape thought; for example, such concepts as freedom and democracy are

both culturally and linguistically delimited. Since libraries are agencies for the diffusion of cultural products, the theory and practice of librarianship must be founded upon what I call social episte-mology—the study of social knowledge, the means whereby society as a whole achieves a perceptive relation to its total environment, the totality of the stimuli that act upon a society, nation, or culture, with specific reference to the production, flow, integration, and consumption of all forms of communicated thought through the entire social fabric. Social epistemology is of particular importance to the librarian, because he stands at the point where recorded knowledge and social action meet and his concern is with what Kenneth Boulding has called the transcript (whether written or not).of the culture and the impact of that transcript upon—again to borrow Boulding’s term—the image, that which man believes to be true, which largely determines and directs his individual and group behavior.

Every society or culture produces a transcript of its collective thought, a record in more or less permanent form that can be passed from person to person and generation to generation and can thus, at least in a limited way, transcend both space and time. In primitive, nonliterate societies, this transcript usually takes the form of verbally communicated ritual, ceremony, myth, legend, song, even law. The transmission of the store of common knowledge, information, and belief becomes one of the principal concerns of the group, exerts upon it a cohesive force, affects and may even dominate the thinking and actions of individuals, and can become a powerful brake upon innovation and change. The invention of written communication marked the beginning of a substantial degree of dissociation of communicator from receptor. It expanded the potential audience for the communicated message by transcending the bounds of human memory, made possible a virtually unlimited accumulation of knowledge by a society or a culture, and gave rise to the need for a social agency to preserve the written record. As the art of writing increased the intellectual resources of the individual by bringing to him the thought and experiences of people whom he had never seen, so the accumulation and organization of written records in libraries expanded the intellectual resources of societies. Thus, the epistemological pattern became increasingly linear and cumulative as the communication system, of which libraries constitute a subsystem, grew in carrying capacity and absorptive (or storage) power. Not only could individuals build upon the experience of other individuals, but societies, also, could profit from the experience of other societies. These capabilities were immeasurably extended by the invention of printing with movable type, and among the consequences of this invention was a social revolution that eventually made possible the concept of libraries for all the people.

The structure and communication of knowledge form an open system, which changes as the functions and needs of the individual and society shift. But the library is more than a link in the communication chain; as an operational system it is part of the total knowledge process—or of the knowledge situation at any given point in time. The knowledge process itself is a unity of subject, vehicle, and object. The subject is the self, the perceiver in the act of awareness (the library user); the vehicle is the instrumentality or mechanism through which the subject approaches the object (the library’s bibliographical apparatus); and the object is the ultimate goal or referent, knowledge itself (obtained from the library’s store).

There is no direct relationship between any collectivity and its cultural manifestations. Since the objective of the librarian should be to maximize the social utility of graphic records, his bibliographic and information systems must be structured to conform as closely as possible to the patterns of man’s use of those records and the transmission of knowledge within society. His procedures and techniques, which for centuries have been derived from ad hoc assumptions, modified by trial and error, about man’s need for and use of recorded knowledge, must be examined within the context of the cognitive and communication processes in society. The librarian dare not assume that his tools and methods for the control of his collections reflect permanent or even relatively permanent relationships between user and printed word. He must be prepared constantly to revise the configuration of the classification schemes, subject-heading lists, indexes, and other devices at his command for the efficient management of his bibliographic store, to reconcile them with the changing demands of his society.

The communication process in contemporary society has become increasingly complicated by the growth of specialization. The present structure of education, as well as society’s system of economic rewards, tends to direct students of the highest intellectual promise into graduate study and professional schools. Such “educational hybridism” (as Whitehead has characterized it) has resulted in unprecedented progress in certain areas but has also tended to break down generalized communication and understanding among specialists and with the public. Only recently has the academic world begun to recognize the validity of an interdisciplinary approach to education and the social importance of interdisciplinary communication. Even more recently the librarian has increased his appreciation of the fact that he can play a significant social role as a mediator between specialists if he will prepare himself to be a comprehensivist, or highly educated generalist, and develop for librarian-ship a discipline, analogous to Buckminster Fuller’s “comprehensive-design science,” based on a general-systems theory applied to the organization and use of bibliographic materials.

Like any other social institution, the library, through the centuries, has responded to social needs, and alterations and modifications in its morphology have taken place under the impact of social change. Librarians in ancient Alexandria were scholars who worked in seclusion over the manuscripts in their custody. The monastic libraries of the Middle Ages were presided over by recluses who devoted their lives to the production and preservation of religious writings. With the dawn of the age of science and the coming of the Enlightenment, the library became the focal point for man’s inquiry into the physical and social phenomena of his environment. The rise of universal elementary education in the United States during the first half of the nineteenth century prepared the way for the public library, which Horace Mann saw as “the crowning glory of our public schools.” The industrial and technological revolution in society has been reflected, during the present century, in the growth of “special” libraries to serve a wide variety of managerial and research needs.

Between 1930 and 1945, two influences threatened to change completely the intellectual, social, and professional orientation of American librarian-ship. The first was the coming of the great depression. Not only were library budgets sharply curtailed, but the use of libraries—particularly public libraries—assumed new patterns. People turned to the free libraries for nonrecreational and cultural materials to improve their educational qualifications and skills, in the hope of achieving economic security. It was in direct response to such demands that the American Library Association began publishing its “Reading With a Purpose” series. It was no accident that at this time Alvin Johnson described the public library as “the people’s university.”

The second major influence changing the character of American librarianship was World War II, during which—for the first time—information was discovered to be an important strategic weapon. In feeding the insatiable appetite for military intelligence of a nation engaged in a struggle for survival, librarians found themselves very much in the midst of the world of action and were recognized by that world as providing services that were significantly more than a cultural adornment. Never before had librarians been called upon to help fight a war in a capacity other than that of citizen soldier, and the demands that were made of them demonstrated to the profession and the public alike the crucial importance to the national economy of the ready availability of recorded knowledge.

Concurrently with this demand for rapid access to precise and accurate information came the technological revolution of automation. An enormous increase in research activity was forcing upon business, industry, and government a new appreciation for the value of information in almost every area of human activity. But existing channels for the dissemination of recorded knowledge were no longer adequate to the burden that was being placed upon them, and the use of computers, with their ability to manipulate large masses of data at high rates of speed, seemed to promise an escape from the growing morass of print. In due course the very capabilities of the machines had a direct effect upon both professional and lay thought concerning the library’s role in society, and the dramatic flight of the first Sputnik gave a new impetus to the librarian’s participation in the affairs of science. Scientists, as well as humanists, found a place in librarianship, and to the technical jargon of the profession was added a whole new vocabulary derived from electronics, communications, systems engineering, and information theory; e.g., noise, malfunction, programming, on-line, mathematical model, lattice, PERT. Furthermore, the librarians’ ranks are being invaded by engineers, data processors, and systems designers, who have brought into the field a new terminology for established library concepts; for instance, reference work has become information retrieval or information transfer; subject headings, descriptors; collections of library materials, the store; the library, an information center; and the librarian himself, an information specialist.

To this invasion of their once comfortable domain, and to the threat of technological unemployment, librarians are reacting as did the palace scribes, the Luddites, the locomotive firemen, and any number of other challenged groups. Some are ignoring the invaders as being either inconsequential or irrelevant to the librarian’s responsibilities and sphere of activity. Others are combative and argue vigorously that the new technology is not appropriate to library operations. Some deny the existence of the crisis, while others seek a common bond of understanding, in the hope of enriching librarianship with whatever values the innovations possess. Whatever the ultimate resolution of this conflict, it should provide an interesting case study of the dynamics of a profession confronted by the necessity for drastic change.

But distressing as the present disorientation of the profession is to its members, the ultimate values may be great. Because of the emerging “science of information,” librarianship is, for the first time in its long history, being compelled to formulate selfconsciously its role in society, to examine critically its intellectual foundations, and to view itself ho-listically, as an integrated system that serves man, both as an individual and as a member of society, throughout his life. Despite the obvious relationship of librarianship to its coeval culture, the library has been recognized as a sociological entity only within the last half century. The rise of the public library in the United States coincided with important new developments in sociological theory, and the beginnings of a search for status encouraged all lines of inquiry that might help to establish the librarian’s claim to being “professional.”

During the 1920s a group of distinguished social scientists at the University of Chicago created an environment for the study of all aspects of the science of society. Perhaps as a result, when the Graduate Library School was established there its faculty included a number of scholars trained in disciplines other than librarianship, who brought to bear upon library theory the new tools and techniques of sociological research. Douglas Waples began to explore the social effects of reading and the impact of the public library upon mass social behavior. Louis Round Wilson, dean of the school during the 1930s, focused his attention upon what he called “the geography of reading,” by which he meant the effects of sociogeographical factors upon the library as a social instrumentality; his work was strongly influenced by studies in cultural regionalism and the work of Howard Odum, Lloyd V. Ballard, and President Hoover’s Committee on Recent Social Trends. Carleton B. Joeckel (1935) prepared his classic study of the relation of the public library to local and state government. Pierce Butler (1933) wrote the prolegomena to an incipient library science, which sought to harmonize the humanistic and scientific foundations of librarian-ship, and encouraged his students to pursue inquiries into intellectual history (which he called the history of scholarship) and its relation to the library. From this one school, in a period of scarcely a decade, came a small band of unusually articulate disciples, who gradually assumed positions of professional influence and carried the social philosophy of their teachers to the rest of the world.

Western society is so heavily dependent upon the printed word that in a very real sense it can be characterized as a paper culture; projecting past and present library demands into an unpredictable future, librarians are directing their educated judgment toward increasing their professional capacity for efficient service. The small, the large, and the highly specialized libraries are all being affected by the changing information requirements of society, and the institutional pattern of library service is being reshaped. Plans are being laid for the establishment of information networks that will link the resources of many libraries in a variety of fields, and the time is certainly not far distant when the unique resources of any library in the country will be available on immediate call and at minimal cost to any individual who may have a need for them. There is a trend toward the creation of larger units of service, through formal cooperation between political entities too small to support independent libraries. Throughout the library profession there is developing a steadily increasing concern with the improvement of both the economic and the program efficiency of libraries, as analyzed and measured by the methods of social science. In a number of urban centers public libraries are turning to city planning, community analysis, and other techniques of municipal government and public administration for guidance in the allocation and utilization of their resources, especially with reference to the structuring of branch systems and other facilities for the extension of library service to meet changing economic, social, and population patterns of city and suburban life. The public librarian is improving his skill in working effectively with other educational and social agencies in his service area, and for the first time he is being called upon to participate in large-scale community programs for nonreaders, the functionally illiterate, the under-educated, the culturally deprived. In recent decades, and especially within the past few years, the public library has broadened and strengthened its role in the thinking and decision making of the community. In no way do these “auxiliary” functions diminish the library’s independence, initiative, or social prestige.

Programs for the professional education of the librarian have reflected changes in educational philosophy as well as in the theory of librarianship. Originally, apprentice training of the most elementary kind was followed by formal public-library training classes, which slowly gave way at the turn of the century to undergraduate programs in a few colleges and technical schools. C. C. Williamson’s influential report on library education (1923), prepared at the request of the Carnegie Corporation of New York, encouraged the development of graduate programs, and the Graduate Library School, established in 1926 at the University of Chicago, offered the first doctoral degree. Today all of the library schools accredited by the American Library Association award master’s degrees, and some half dozen have programs leading to the doctorate; several require study in a cognate subject area for the doctorate. A sound general education with a good undergraduate major in a subject field is essential to the librarian, and he should pursue his subject specialty as far as his resources permit. No longer must he come to his profession by way of the humanities; today his province is the whole spectrum of human knowledge.

Jesse H. Shera


The development of social scientific thought is often based on information available in libraries. Marx did not do field research; he relied upon the factory reports available to all users of the British Museum. Frederick J. Turner’s thesis presented in The Frontier in American History rested on the published reports of United States censuses of population. The Brandeis Brief, the celebrated sociological argument made in the U.S. Supreme Court in 1908, was written chiefly from European government reports located in the Astor Library. The 1965 report by Daniel Moynihan, The Negro Family, published by the U.S. Department of Labor, exploits population data from the United States census. Like these famous studies, much journeyman social science rests on facts drawn from reference books. “Even the craggiest, most stonily factual reference book, when a little mellowed by time, becomes a quarry from which some perceptive scholar can extract handsome building materials, as John Stuart Mill did from the venerable Annual Register, and James Ford Rhodes from the Tribune Almanac” (Nevins 1958, pp. 7-8).

Social science materials may be presented in many forms, depending on the editorial purpose they subserve and the type of publication. Encyclopedias, dictionaries, and atlases, general and specialized, represent three classic types of reference books of fundamental importance. Reference materials proliferate and necessitate reference books galore: periodicals lead to periodical indexes and to abstracts; court reports to citators and to digests; statutes to legal codes; decades of census publications to a single-volume abstract; books to card catalogues, national library catalogues, and bibliographies; bibliographies to bibliographies on bibliographies! The establishment of archives has led to a need for guides to them and to their contents. The development of libraries has brought inter-library loans and the publication of reference books like library directories and union lists of serials. The publication of manuscripts on microfilm and reproduction through methods like xerography have done away with distinctions like that between primary and secondary sources. The electronic computer, working directly from data on magnetic tape, permits researchers to omit printed statistical reference books such as census reports. Taking account of these magnificent changes in the storage and communication of information, this article will stress the major traits of reference materials and reference books encountered by social scientists.

The importance of reference works to social science has scarcely been matched by scholarly inquiry into this subject. Critical reviews and notes are rarely found in professional journals. Perhaps more notice is given reference works in the book reviews and quarterlies of general circulation. The attention won by reference works has simply not yet grown to be a subject in itself in any of the disciplines. As a result, the appraisals of reference books are fugitive pieces, fragments lacking theme or tradition. This article draws on these fragments to summarize the traits shown by reference materials and books in the social sciences.

When a social scientist assesses the nature and quality of reference works, he inescapably sees them as subjective parts of the discipline that they appear to serve with objectivity. The data of reference volumes are collected by editors with social predilections and published under auspices, whether commercial or governmental, with political preferences. Taken with the foibles men are prone to, these tendencies contribute to the production and dissemination of reference works containing errors of fact and slanting interpretations that are seldom signaled.

The librarians have generally played a neutral role in regard to reference materials. Speaking of the bibliographic and indexing services, a leading librarian has declared that “except in a few exceptional instances, the library and the library world exercise practically no control over the conditions which affect the compilation and publication of this apparatus” (Clapp 1964, p. 83).

Dependence on librarians who are not themselves social scientists has wrought rather arid definitions of reference books, like the following:

From the point of view of use, books may be divided into two groups: those which are meant to be read through for either information or enjoyment, and those which are meant to be consulted or referred to for some definite piece of information. Books of this second class are called reference books, and are usually comprehensive in scope, condensed in treatment and arranged on some special plan to facilitate the ready and accurate finding of information. This special arrangement may be alphabetic, as in the case of most dictionaries or encyclopedias; chronological, as in historical outlines and similar compends; tabular, as in the case of statistical abstracts; regional, as in atlases; classified or systematic, as in the case of some bibliographies, technical handbooks, etc. (Mudge as quoted in Winchell 1951, p. xvi)

Libraries themselves have changed so drastically since the 1930s, in making whole collections readily available to users, that earlier distinctions between types of books have been blurred. Social scientists, like today’s scholars in the humanities and the physical sciences, seldom take the necessary classifications in a library collection as boundaries’ to their work. In the diverting and instructive book The Modern Researcher, Barzun and Graff (1957, p. 74) group reference books into nine types: encyclopedias, biographical dictionaries, indexes to periodicals, dictionaries of quotations and concordances, atlases, chronologies, language dictionaries, handbooks and source books, and bibliographies. Jack Alden Clarke’s useful short guide, Research Materials in the Social Sciences (1959, p. 3), includes only titles of interest to students of two or more of the social sciences. An impressive review article by Kister (1966), in covering a grand variety of sources, praises Sources of Information in the Social Sciences (White et al. 1964) as “indispensable.”

Most commentators see three great divisions among the materials in a modern library: (1) reference materials which are the undigested records of institutions and individuals; (2) reference books which are the collated and organized summaries of knowledge ordinarily presented as objective summaries; and (3) interpretive books, general or monographic, clearly standing for man’s effort to describe, explain, and interpret social phenomena. In the present article, we are especially interested in the first two kinds of publications.

Reference materials

Government publications

Reference materials include government serials like Hansard’s Parliamentary Debates and the newspapers, journals, and reports issued commercially. These materials standing alone are often largely unindexed, and so reference books have been created either to digest this raw data or to provide a key to unlock the information contained there. The need for keys to the vast storehouse of accumulating reference materials has resulted in ever more reference books. The voluminous judicial and administrative reports of rulings in the United States have led to the creation of commercially published digests and topical reporters to pinpoint the important for lawyers and students of law. These reference books are almost entirely products of the twentieth century (Price & Bitner 1953).

The publication and dissemination of reference materials by governments—local, regional, national, and international—is a remarkable development in the whole conception of government, which has, at the same time, amply fed the thirst for knowledge about society. In England there was a long battle over the right of government to keep its proceedings secret, and, at first, reporting of debates of Parliament was controlled. Both initiative and courage were needed in the development of a commercial system of publishing the proceedings of Parliament. This pattern has been followed in most democratic nations: legislative reports, the orders of administrative agencies, and the decisions of courts were kept so private by government that private printers, and later large commercial publishers, grasped the opportunity to sell the available information to the public. In democratic countries a claim of the public’s right to know coincided with the development of bureaucracies that were willing and able to offer the same information through government printing facilities. Thus, in the United States the Government Printing Office was formed in 1862. It took over the publication of reports of Supreme Court cases from private hands in 1872 and also the debates of the national legislature with the initiation of the Congressional Record in the same year (Schmeckebier & Eastin [1936] 1961, p. 124). In England, the publication of Hansard’s Parliamentary Debates was assumed by Her Majesty’s Stationery Office in 1909 (see Wilding & Laundy 1958, p. 258).

Government publications have mushroomed. The Monthly Catalog of United States Government Publications lists some 25,000 items annually, compared with about 17,000 commercially published titles in the United States each year. Practically the whole list of government titles falls into the category of reference materials, a smaller number are reference books and periodicals, and just a few are interpretive monographs.

Government reference materials are sold at cost plus 50 per cent, but with only direct-mail advertising and displays in antiseptic government bookstores or offices. The designation of established libraries as depositories for government publications has made these materials easily available to a wide audience (in the United States there are some 550 depository libraries, only 125 of which are full depositories; see Murphey 1958, pp. 184-188). This idea has been adopted in most countries and by the United Nations as well.

The chief distortion in government publications arises from their being political documents. Remarks made in floor debate may be altered after utterance but before publication in the Congressional Record (Mantel 1959). Years pass before diplomatic papers of most nations are published, and to this time lag may be added the possibility of deletions and distortions in the text. Government publications often serve the regime, the ideology, the men in command. Social scientists in a specialized field ordinarily possess considerable awareness of existing distortions, but the fact that such distortions exist is hardly made obvious by these publications. Nor are governmental publications, in contrast with similar materials published commercially, regularly subjected to critical review. This may be explained in part by the fact that government publications are shunted to one side as reference materials and hence are not review-able, or by the fact that these publications are never advertised in the reviewing media or in scholarly journals.

Historical editing

Whitehead once cautioned against taking “the official documents of an epoch at their full value” by omitting reflection on “the emotional atmosphere which activated its people and the general ideas under whose sway they lived” (Whitehead as quoted in Cappon 1966, p. 56). Since World War II this challenge has largely been met, in an era of “comprehensive editing,” by the historical editor who is “a knowledgeable scholar concerned with the meaning of the sources at his command” (Cappon 1966, p. 75). Historical editing in the United States has existed only since the 1890s, but in this time standards have been broadened and raised. Current editorial projects include the publication of the papers of Jefferson, Calhoun, Franklin, Clay, Adams, Hamilton, and Madison, begun between 1944 and 1956 and expected to total 289 volumes when complete. Professional editing today rests on this rationale: “The historical editor of source materials is a historian whose responsibility consists, first, in transmitting authentic and accurate texts of all extant documents within a rational frame of reference, with due respect for archival principles, and, second, in making these texts more intelligible” (ibid., p. 57). Archivists editing current public papers of the presidents of the United States follow similar canons (Reid 1962, p. 438).

Contemporary historical editing, because of its scale and scope (not because of the internal editorial standards used), has been condemned as the documentary, objective, professional, organized, or official style in the academic study of the American past (Marx 1961, p. 48). The whole field of historical editing is “a remarkable program and, as Emerson would say, a sign of the times—affluent, conservative, and nationalistic times” (ibid.). Some critics of the field argue that so much energy, foundation support, and praise for documentation is making historical editing an end in itself rather than leading scholars to use “these splendid volumes” to create “a richer, and by that I mean a more imaginatively relevant, historical literature” (ibid., p. 51). Although it is true that intellectual resources may thus be misapplied, on the other hand, the condemnation of a concern for accuracy as a tragic expression of pop culture is simply fearful exaggeration (see Macdonald 1952).

Social scientists often enough perform both the feat of documentation and that of interpretation. For example, at the University of Wisconsin after the turn of the century, John R. Commons and his associates first collected and edited, in ten volumes, A Documentary History of American Industrial Society, published in 1910 and 1911, and then wrote a four-volume History of Labour in the United States, published from 1918 to 1935. Other monographs spun off from this body of work, including Perlman’s A Theory of the Labor Movement, which appeared in 1928. It took 25 years to complete this entire body of work.

Newspapers, magazines, and journals

As reference materials, newspapers, magazines, and journals constitute a major resource for social scientists. Beginners approach these sources with such innocence that some critical word is appropriate. The newspaper, as a record of unfolding events, is rife with defects. One critic, summarizing Liebling (1961), has questioned, in terms of general semantics, the reliability of the modern newspaper: “Newspapers like to think they print ‘all’ the ’ news’ in an ‘objective’ way. Actually, of course, they merely abstract a few events out of the current scene and make news stories describing these happenings. The decision as to which events to abstract is the heart of the criticism of newspapers” (Wanderer 1963, p. 491). The abundant criticism of newspapers and magazines of general circulation has been absorbed by the intellectual community, and students are alerted to handle these materials with care.

This is known to be as true of great newspapers of record like The Times of London and the NewYork Times as of the tabloid press (on the latter, see Friedrich 1959, p. 467). The New York Times has become a frequent target of criticism in recent years. The charges include inaccurate reporting of the collectivization of agriculture in the Soviet Union in the 1930s (Muggeridge 1961, p. 87), failure to perceive abiding changes in French politics in the 1950s (Kempton 1961, p. 91), and inadequate coverage of European economic and political events in the 1930s (Lichtheim 1965). A sweeping criticism of American newspapers in general and the New York Times in particular stressed the amateur quality of its news gathering and news reporting methods—"the American press as an institution is comparable to the medical schools of fifty years ago” (Kristol 1967, p. 52). Attacks like these and the many defenses of the New York Times as being an extraordinary journalistic achievement (see Manchester 1959) would fill a book.


The Index to the Times (London) and the New York Times Index are significant reference books which are exclusive keys to the relevant reference materials of the respective newspaper files. They are perhaps the contemporary reference works most heavily used by students of social, economic, and political developments. Typically, though, newspapers and their indexes have rarely been scrutinized by social scientists for accuracy, completeness, and quality of interpretation. Most needed is a call for verification of editorial generalization on many social science subjects. Still, the newspapers of record with indexes remain important combinations of reference materials and reference works.

If indexes to newspapers reflect limitations in the contents they key, the available periodical indexes and abstracts in libraries have strengths and weaknesses of their own. There is no master index to the vast periodical literature in, and of relevance to, social science. Most disciplines have, since World War II, developed separate indexes and abstracts through their respective national and international associations. However, the number of periodicals published in the world far outreaches the number to be found in general indexes to this material. Thus, a leading commercial publisher of periodical indexes has succinctly shown “that there is a ’ vanishing point’ beyond which it is neither practical or feasible to have published indexing at today’s high cost, and that there are literally thousands of specialized periodicals which are so sparsely held by libraries that the few holding libraries could not support published indexing of them” (Haycraft 1962, p. 129). Haycraft reported that in 1960 the New York Public Library maintained subscriptions to 25,568 periodicals. The Wilson periodical indexes covered a total of about 1,250 periodicals; other indexes and abstracts covered at most about 3,000 periodicals. He concluded that “of 25,568 periodicals received, more than 22,000 are not indexed in any published index and probably never will be” (ibid., p. 129). As crucially important reference books, then, the periodical indexes are blunt instruments for identifying a vast range of periodical articles. Within a discipline the articles of the leading national and regional association journals and the more specialized reviews are ordinarily well indexed. But in the larger world, where the disciplines intersect with each other and with periodicals of other sorts, the nature of existing reference materials is highly unsatisfactory. This is the conclusion reached by one expert: “The variations from one bibliographic service to another—in scope, coverage, arrangement, periodicity, format, etc.—are so great that they create a confusing welter rather than a perspicuous guide to published information” (Clapp 1964, p. 84).

Reference books

Reference books are, in a sense, codifications of the larger world of knowledge—especially that contained in reference materials in libraries. It is extraordinarily difficult to condense knowledge and organize and publish it in the seemingly objective forms presented by reference books.

A touchstone is needed to judge the achievements of reference books. In addition to certain specialized requirements, a number of scholarly standards come into play. Editorial whimsicality should be low; explicit assumptions, a clearly stated range of coverage, and a strong awareness of measures of inclusion and exclusion are needed. A fairly selected, apt title should be sought, although it is difficult to achieve. Classification and indexing in a reference book must be intrinsically wise and connected with traditional practices in a given field of knowledge, if these two goals can be reconciled. The information should be consistent in its completeness. If it is not, the work should be representative of a larger range of facts. Consistent usage, especially in serials, and the continuity of data presented in tabular form are also valued.


The encyclopedia is the reference book as code of knowledge, par excellence. Standing above the touchstone of scholarly standards named here is one giant obstacle to the development of an entirely satisfactory encyclopedia. This is the editorial difficulty of resolving the inevitable conflicts of values concerning a range of topics. Questions where nationalism, ideology, race, and religion arise have always posed almost insuperable difficulties for encyclopedia makers.

For example, early editions of the Encyclopaedia Britannica, expressing the intellectual traditions of Edinburgh, drew strong criticism for their independent analysis of various religions, particularly Roman Catholicism and Christian Science (Ein-binder 1964, p. 66). Recent editions have offered the imperfect solution of simply publishing clerical apologetics, statements on religious matters by officials of various churches (ibid., pp. 189-193).

The problem of race is just as difficult as that of religion. Reviewing the five-volume Dictionary of American History, Nevins wrote (1941, p. 4): “Perhaps the most unsatisfactory of the general articles are the two on Race Elements in America and the Race Problem. In the former the term ’ race’ is used in a sense that ethnologists would not approve, and in the latter there is too much pessimism about the subject.” The single-volume American Negro Reference Book, published in 1965, is one corrective, and the establishment of a separate branch of the New York Public Library for works on Negro history (the Schomberg Collection) is another. Single reference books may be limited, but a collection of them can overcome individual deficiencies.

Encyclopedias have often promoted nationalism or other ideological positions. Indeed, the Enciclo-pedia italiana di scienze, lettere ed arti, consisting of 36 volumes published from 1929 to 1939, aimed to provide “an inventory of Italian knowledge,” and it did so in accord with the views of the Fascist regime of the time. The decree of the Soviet Union’s Council of Ministers in 1949, which established the second edition of the Bol’shaia sovetskaia entsiklopediia, declared that it “should elucidate broadly the world-historical victories of Socialism in our country…. With exhaustive completeness it must show the superiority of Socialist culture over the culture of the capitalist world” (as quoted in Ben-ton 1958, p. 553). More recently, many of the new nations of the world have initiated encyclopedias with explicit national biases. But all of this is in the grand tradition, and the wonder is that encyclopedias have ever developed which have disowned such biases. One that was a great success in this regard was the Encyclopaedia of the Social Sciences, published from 1930 to 1935, whose pages, according to Sidney Hook’s review (1935), “showed the fruits of the best type of international intellectual cooperation.” Hook praised the absence of a “synthetic, positive social philosophy” and the presence of contributors “of every school of thought—conservatives, liberals, radicals of every hue and shade.” He concluded that “the emphasis upon the interrelations between the various disciplines, honored in the observance as well as in the program, the treatment of the social implications of material drawn from the arts and sciences make of these fifteen volumes a kind of universal encyclopedia of knowledge.” This is all the more an achievement when it is realized that similar, though less ambitious, projects, such as the Cyclopedia of American Government published in 1914, were accorded strong contemporaneous condemnation for slipshod editing, weak conceptualization, and erroneous information.

A contemporary test of whether severely diverse approaches can be held within the covers of a single reference book is seen in the dispute over editorial policy for an international encyclopedia of comparative law being undertaken by the International Association of Legal Science. The issue is “whether topics in the law of Marxian socialist countries can be integrated with the law of other legal systems under appropriate subject headings” or whether “such topics must be treated separately from the law of non-Marxist countries by placing them in a separate volume devoted solely to their various East European and Asian forms” (Hazard 1965, p. 278). The Marxist view makes universal-ism the aim; but if unification cannot be achieved, then comparison has no purpose. In contrast, Hazard argues (1965, p. 286) that the aim of the editorial committee is to foster “peaceful co-existence between differing social and economic systems” and stresses the educational value of topical comparison for students of law in all societies. He then cites the law of property and contracts as proof that his approach is feasible (ibid., pp. 287-302).

Multivolume encyclopedias running through many editions have developed “continuous revision” as the most suitable method of change. If ten per cent of the articles are revised annually, the whole would change completely in ten years. But Ein-binder (1964) has shown that in the case of the Encyclopaedia Britannica, several hundred articles have been reprinted intact from editions fifty or more years old; also, articles are disturbed by cuts when they are alphabetically proximate to freshly developed subjects. In the Soviet Union, when Beria fell from power the publishers of the Bol’shaia sovetskaia entsiklopediia (“Great Soviet Encyclopedia”) removed the entry for Beria and made available to its 250,000 subscribers in the Soviet Union a special replacement section containing expanded entries on the eighteenth-century courtier F. W. Bergholz, on the Bering Sea, and on Bishop Berkeley (Benton 1958, p. 567).

Biographical reference books

Personal vanity, weak editorial hands, and poorly developed concepts of impartial analysis have been factors in the production of inadequate biographical reference books. In the first edition of Who’s Who in American Education, which was published in 1928, “thousands of obscure public school teachers of all grades, and of all ages from the early twenties up, are listed, while hundreds of distinguished educators are not” (Vance 1961, p. 323). But this has been shown to be only one of hundreds of “who’s who” projects, exclusive of the carefully prepared Marquis volumes and some others, which “simply capitalize upon human vanity and gullibility. Purchase becomes the price for being listed. And the work of the editor-promoter is relatively easy. All he needs is names—several thousands of them. The less important the people invited to be listed, the more readily they will pay” (ibid., pp. 326-327). Contributors to Appleton’s Cyclopedia of American Biography, published from 1887 to 1889, could suggest names for inclusion and submit articles which were published untouched by the editors, and were paid space rates. These considerations seem to account for the presence in the Cyclopedia of some fifty articles on nonexistent botanists or completely trumped-up accounts of real scientists from South America (Schindler 1936, p. 687). Entirely different points are made in a critique of a ten-year supplement to the Dictionary of National Biography (1941-1950). The review acknowledges the merits of the DNB as an Oxford-edited book of the British Establishment but complains that the editors’ values led to serious exclusions of matter: “I examined the lives of three known homosexuals, and found the fact mentioned in none; of three persons who died insane, and found the fact omitted in two and only hinted at in the third; of two persons who died by their own hands, and found the fact omitted in one, but squarely faced in the other (Lord David Cecil’s model account of Virginia Woolf)” (Corke 1959, p. 77).

The multivolume, national biographical reference set has nonetheless been a great achievement, and its vitality is testified to in the announcement, in 1966, that the Dictionary of National Biography, first published from 1890 to 1910, is to be entirely revised and rewritten at Oxford during the next decade. The Dictionary of American Biography, published from 1929 to 1934, has already been shown to be dated by its almost complete neglect of Negroes. And the first volume of the projected 20 volumes of the Dictionary of Canadian Biography appeared in 1966, arranged chronologically rather than alphabetically. By covering a specific period of history, each volume stands on its own as published and affords a balanced view of individuals included. This dictionary also departs from tradition by including introductory essays to set the historical stage for the biographies. The slow-paced publication of the distinguished French national biographical dictionary, Dictionnaire de biographie frangaise (begun in 1933 but with the volumes up to the letter D completed only in 1966), suggests the chronological to be superior to the alphabetical arrangement, at least during the long years of publication.


It remains to note that controversy and criticism have surrounded two other types of reference books: the monolingual dictionary and the geographical atlas. Webster’s Third New International Dictionary, published in 1961, was achieved after some 25 years of compiling a word list, recording definitions in usage, and noting pronunciations under an editorial mandate to apply the science of structural linguistics to lexicography. Philip Gove, the editor, has said that the new dictionary was deeply affected by the work of Leonard Bloomfield, who wrote the charter of contemporary descriptive linguistics. According to Gove, “the fundamental step in setting down postulates for descriptive linguistics is observing precisely what happens when native speakers speak” (Sledd & Ebbitt 1962, p. 66). Webster’s Third New International Dictionary was built empirically, from research into the spoken American language of today. The editorial view was that a dictionary should have no traffic with “artificial notions of correctness and superiority. It must be descriptive and not prescriptive” (Gove in Sledd & Ebbitt 1962, p. 74). In the reception of Webster’s Third this point was central to most critics, who believed that linguistics was a pseudo science that would turn over the language to the mobocracy, that standards of correctness had to be imposed by educated authority, and that the dictionary should be and, in any event, inevitably was, prescriptive (these criticisms are collected in Sledd & Ebbitt 1962).

Considering that a critical storm was begun by a dictionary rooted in modern linguistics and executed by an expert staff at enormous expense, it is clear that the many glossaries prepared by social scientists for student and popular use deserve the kind of scrutiny that they rarely receive. One handbook, A Dictionary of Politics, published in 1957, treated words like Bolshevism, capitalism, and collectivism so arbitrarily and carelessly that one reviewer declared the book “a fraud and a brazen manipulation of facts” (Schlamm 1957). Many specialized dictionaries, such as the Dictionary of the Social Sciences, published in 1964 under UNESCO auspices, are useful, though they are written by authorities rather than built empirically from usage in context.


The atlas, as perhaps the most technically complex reference book, presents the greatest challenge and the best chance for things to go wrong. In trenchant essays, Richard Edes Harrison has named scope, design, and execution as the major criteria in the evaluation of atlases. In regard to scope, or breadth of coverage, he notes that most American “world atlases” shortchange the rest of the world. States and countries may each gain a page regardless of size; Rhode Island and Texas, Switzerland and the Soviet Union, are treated alike. “Such enormous disparities in scale can only lead to erroneous and confused conceptions of geography” (Harrison 1961a, p. 6). Harrison believes that maps are central and require papers, inks, and procedures that should make extraneous statistics, gazetteers, and photographs unwelcome. Design, the second element, “deals with the title page and all other type matter, with the maps and their myriad detail, with page layouts, the treatment of borders, scales, map titles, keys, selection of places, categorization of towns, the generalization of geographical detail, indication of topography, etc.” (ibid.). The criterion of execution “deals with the beauty and accuracy of the drawing, engraving and printing.” Harrison’s best advice is “that a good atlas is always explicit about its methods and content.” On these grounds Harrison condemned the giant Life Pictorial Atlas of the World, published in 1961, as a pretentious, careless, and superficial book.

Technological developments

Typography and format in reference books, often questioned by uneasy but helpless critics, have attracted some attention because of McLuhan’s interpretations of communication media (1962) and through the development of alternative methods of information storage and retrieval. Harrison (1961b, p. 40), in his criticism of the Life Pictorial Atlas, complained of captions set in neat blocks of sans-serif type: “It is a pity that such artificial and immaterial considerations are allowed to take precedence over the business of communicating with the reader.” Carl P. Rollins had earlier (1929) condemned both the Encyclopaedia Britannica and the Dictionary of American Biography as being “dull and insipid typographically.” McLuhan (1964, chapter 2) views the very form of communication as inseparable from content, his watchword being that “the medium is the message.” He argues that “the ’ message’ of any medium or technology is the change of scale or pace or pattern that it introduces into human affairs” (McLuhan 1964, p. 8).

Since Gutenberg, Western intellectual assumptions have been forged by the typographic principles of uniformity, continuity, and linability (McLuhan 1962). This repeatability, and the distribution of the end product, has made the conveyance of information through reference materials and reference books possible. Now, in the mid-twentieth century, a host of intertwined technological developments makes possible further and fuller collection of information and facilitates its duplication, distribution, and use. Clapp (1964) has shown that even with high-ratio photoreproduc-tion, full cataloguing, complete data processing, and related information storage and retrieval, no single library, however well endowed, can hope to come close to embracing all the world’s sources. Thus, to overcome this problem, a carefully developed system of cooperation among libraries is essential. Clapp’s premise is that “the general research library of the future will increasingly be required to make available to its users the informational records of mankind” (1964, p. 53).

Social statistics

Changing technologies will not by themselves overcome the many obstacles to amassing coherent social statistics. The periodic government census of population and of other subjects such as housing, agriculture, and business has become a standard program of most central governments. In contrast, the vital and health records of most countries are decentralized, and the development of national record keeping and publication has been slow. Spiegelman (1963) has explained that problems of definition arise at every stage—for example, definitions of date of birth, nature of illness, cause of divorce, cause of death —so that the development of reliable vital and health statistics is an awesome task. Criminal statistics present similar difficulties; and while the records of Britain and Wales are outstanding, the Uniform Crime Reports of the U.S. Federal Bureau of Investigation have come under frequent criticism since their beginning in 1929 (see Pittman & Handy 1962). Another area where more accurate statistics are needed is that of religious affiliation. In the United States it is difficult to assess the claims of churches to membership growth because of the problems of defining religious affiliation and the impossibility of making the subject a part of the regular U.S. census of population. The result is that each church takes its own count; since there is a lack of uniform rules, reliability is poor (see Lipset 1959). Automobile accidents are another subject where central, uniform statistics are lacking. One critic has blamed this state of affairs on the pressure of car manufacturers to suppress facts about traffic safety (Nader 1965, p. 284).

These weaknesses in the web of information, these disparagements of reference sources, and these criticisms of reference books should not override the very considerable achievements that are everywhere in evidence in the research library. The collection of economic data and their use in constructing economic indicators have definitely contributed to the wise and timely application of government power to national economies. Documentation of the official acts of government is increasingly complete and current. The list of standard reference books which are indispensable itself fills a book (see Winchell 1951). References like the Union List of Serials, Hamer’s Guide to Archives and Manuscripts in the United States, published in 1961, and the Research Centers Directory help to bring the resources of all research collections to the desk of any library. If constructive criticism of reference books prepared by and used by social scientists can be further developed, the quality of information at our disposal can be vastly improved.

Clement E. Vose

[See also the guide to related articles undergovernment statistics.]


The reference materials and books described in this article are, for the most part, not included in the bibliography. The works listed below are primarily guides to and criticisms of them.

During the 1950s scholars from various fields of specialization began the process of defining the new scientific development called the behavioral sciences. Implicit in their writings was a conception of the behavioral sciences as “a multidisci-plinary pursuit of knowledge” about the roots and manifestations of behavior “in man and animals, in individuals, groups, and cultures, and in all conditions, normal, exceptional, and pathological” (Editorial 1960, p. 701). The literature that has resulted from the effort to unify this evolving knowledge creates unusual bibliographic problems which call for new solutions.

This literature is international in scope; in the world’s bibliographic and library systems, however, it is not presented as literature of the behavioral sciences. Research in a special subject area may be as diversified as the behavioral sciences as a whole, as shown in the literature on psychopharma-cology, where topics range from chemistry to creativity, from anthropology to addiction, from medicine to mysticism, memory, and “control of the mind,” and involve legal and social issues. Yet the customary bibliographic categories cannot reflect this significant variety of aspects in the study of human behavior. Moreover, progress in the behavioral sciences has strengthened the conviction of many that ethical considerations must be an integral part of a science of human behavior. Literature in which scientific data relate to ethical concerns raises entirely new bibliographic issues.

Current interpretations of the concept “behavioral sciences” vary (Bry & Afflerbach 1965, p. v). According to some definitions, the behavioral sciences are a part of the social sciences and include various new fields such as game theory and value inquiry (Handy & Kurtz 1964). If the literature of the behavioral sciences were confined to the literature of the social sciences, even to that of the “behavioral social sciences”—cultural anthropology, sociology, and social psychology—important bibliographic problems would arise (see Foskett 1963).

Bibliographic issues typical of the behavioral sciences begin when the literature of disciplines outside the social sciences must be included, particularly that of psychology and psychiatry. Psychology appears by tradition under “philosophy,” but the psychological literature is increasingly scattered through other fields, especially the social sciences, education, and physiology. Traditionally, psychiatric literature has been organized under “medicine.” Partly as a result of social conditions of the past, the bibliographic and library resources for medicine are often separated physically from those of the other fields. In a reversal of earlier trends, the biomedical, behavioral, and environmental health sciences are now being brought together in programs that develop new scientific and social perspectives (Pearsall 1963). MacKenzie and Bloomquist, who interpret the behavioral sciences as “a synthesis of disciplines in the biological and social sciences and in the humanities” (1964, p. 220), have demonstrated the impact of the behavioral sciences upon bibliographic issues in such programs.

A special bibliographic system is needed to reflect the content and to show the progress of the behavioral sciences. It can become generally effective only if it is designed to supplement, not replace, the basic systems of internationally organized bibliography. The methods of identifying and selecting this new literature, however, would have to be derived from developments in the behavioral sciences.

Dilemmas of bibliographic identification. Bibliographic and library systems based on the nineteenth-century disciplinary organization of the sciences started with the assumption that the content of a scientific publication would be confined to its discipline—for example, that journals in the field of geology would deal with geology alone. When psychology first became a discipline at the end of the nineteenth century, this rule no longer applied. The scholars who compiled the annual psychological bibliographies in France, Germany, and the United States had to search through journals in philosophy, physics, medicine, anthropology, and other fields in order to draw together the literature of psychology (Bayne & Bry 1954). The same difficulty developed in another form for psychoanalysis. Freud’s prepsychoanalytic publications had appeared in the medical and neurological literature, but his psychoanalytic writings from 1900 on did not fit the existing bibliographic structure. He was thus led to believe that his early psychoanalytic books and papers had been deliberately ignored, when they had actually gained unusual attention in journals of a wide variety of disciplines such as psychology, criminal anthropology, and sexology (Bry & Rifkin 1962).

In principle, the same problems arise again in the literature of the behavioral sciences, although now they are far more complicated. This literature becomes an organic whole only through the unity of the scientific purpose of the behavioral sciences. There is no single disciplinary structure that could hold it together. In the behavioral sciences there are, however, many elements that belong to a structure. Human and animal behavior can only be studied in a concrete situation or in a setting such as a kindergarten, a medical school, or a space laboratory. And the behavior studied is not that of a man or animal in the abstract; it is behavior of birds, artists, or people living in a particular society. When specialists participate in interdisciplinary projects, their functional relationships do not change the basic pattern in which their disciplines are separately organized; although psychiatrists join sociologists, for example, in studies of mental health in a community, psychiatry and sociology remain separate disciplines, while social psychiatry is developing as a specialized field. The behavioral sciences thus superimpose a unitary function—the study of human and animal behavior—upon a pluralistic structure. The resulting literature appears under a vast variety of auspices, often those provided by the disciplines or settings involved in a given project. The actual publications are distributed according to their immediate function, and they may be found wherever they appear to be most useful—in educational, medical, general, or other libraries. A special bibliographic system is needed to identify the publications that serve the unity of function of the behavioral sciences, and this system should itself be functionally organized.

An “anthropotropic” organizing principle. The encyclopedic systems of knowledge that most strongly influenced the organization of libraries and general bibliographies in the late nineteenth century were anthropocentric, in the sense that they saw man in the center of the world he explores. An organizing principle for the literature of the behavioral sciences, however, should be “anthropotropic” (Bry & Afflerbach 1965) in order to reflect the “turning toward man” in man’s search for knowledge about himself. The idea is not new. During the romantic period around 1800, the “anthropological sciences,” “human sciences,” or “sciences of man” included the study of man’s body and mind and of man as a whole, as an individual and in social relationships. A hundred years later, a similar view was expressed by the French philosopher Edmond Goblot in the form of an interdisciplinary concept, “bio–psycho–sociologie.” During the twentieth century, the term “human behavior” began to be used to distinguish the scientific approach from the philosophical approach to knowledge about man. Around 1950, the phrase “behavioral sciences” was introduced in the United States as a unifying term for the study of human and animal behavior in the psychological, biological, and social sciences.

The search for a clearly defined, internationally acceptable term for this scientific development continues. In England and France, “human sciences” and “sciences humaines” seem to be preferred. In German publications, “Verhalten” corresponds to “behavior” in the several meanings of this word in the American technical literature. At present, Verhaltensforschung in German usage refers chiefly to studies of animal behavior; its literal translation, “behavioral research,” refers in current American usage to studies of human and animal behavior. It is essential to identify international contributions to the behavioral sciences on the basis of their actual scientific content, regardless of terminology.

In terms of the anthropocentric schemes for organizing the literature, “man as a whole” has been dismembered and does not appear as a uniform object of knowledge at all. National bibliographies in whatever country—Canada, Peru, Russia, or India—impose upon the literature of the behavioral sciences schemes which use the very divisions that the behavioral sciences must transcend. Therefore, an organizing principle that can reflect the anthropotropic orientation has been proposed for the literature of the behavioral sciences (Bry & Afflerbach 1965). In applying such a principle, it appears useful to place the literature of the psychological sciences—psychology, psychiatry, and psychoanalysis—in the center of a bibliographic scheme, including fields already merged with the psychological sciences, such as psychopharma-cology or social psychiatry. Publications from other fields—for example, genetics, economics, or religion—would take an “orbital” position, depending on the extent to which they convey knowledge to, or draw knowledge from, the psychological sciences. As new relationships among various fields develop, such a scheme could form a basis for the increasingly necessary cross references. It would reflect the intellectual integration that is being achieved by the scholars from formerly separate fields of specialization. So organized, a functional and dynamic bibliographic system for the behavioral sciences could remain open to new developments, and it could also be superimposed upon the existing bibliographic structure without destroying it.

Indirect methods of selection. If the literature of the behavioral sciences is to be confined to significant publications that are distinctly relevant to the behavioral sciences (Mental Health … 1963), the question of methods of selection arises. Traditionally, scholarly bibliographies apply a direct method of selection, which is based on the bibliographers’ own judgments. The monthly The American Behavioral Scientist published an annotated guide to recent publications in the social and behavioral sciences (The American … 1965), which uses the direct method: the compilers of this bibliography make their own selection from journals and books, including many publications that have appeared outside the United States.

As the behavioral sciences develop in new directions and the literature is published in an increasing number of languages, it becomes necessary to design indirect methods of selection which utilize the competent judgments already made by behavioral scientists as part of their scholarly activities. One attempt to develop such a method was made by the Psychoanalytic Collections Conference of New York City, 1950-1956, a cooperative project of librarians who undertook a bibliographic pilot study of publications “which seemed to bear … upon human behavior and human relations.” After identifying monographic series in the psychological sciences published on an international scale since the late nineteenth century, this group stated the principle of indirect methods of selection in terms of that particular study: when a series is edited by an authority on the subject, the editor’s selection of the monographs assures their relevance and significance to the purpose of the series (Bayne & Bry 1954). The editor’s choice is especially important in the monographic series of newly developing fields, as illustrated by the early psychoanalytic monographic series under the editorship of Freud (Bry et al. 1953). A collection of serially published monographs identifies the pertinent topics, the editors responsible for the series, and the authors who have been invited to contribute the monographs. Their names, in turn, provide a key to other pertinent books by the same writers.

Book reviews have also been used as research material for developing an indirect method of selection (Mental Health …). The selection of books reviewed in scientific journals involves several stages. Publishers, authors, and journal editors participate in varying degrees in the selection of books sent to the journals for review. The editors then select the books that are actually to be reviewed and the specialists who are to review them (Kinney, Franck, & Bry 1955). The citations of book reviews in journals relating to the behavioral sciences have been cumulated in a pilot study by the Mental Health Book Review Index, a project based on that of the Psychoanalytic Collections Conference but broadened through the cooperation of behavioral scientists and of librarians in leading libraries in many parts of the United States. The cumulation reveals different patterns in the alignment of reviews. Certain books may be reviewed in many journals from different disciplines over a period of years. In some instances, the editors of the reviewing journals themselves as well as other authorities in the field review the same book. This process of collective evaluation selects significant books relevant to the behavioral sciences and can thus be used for an indirect method of selection based on competent judgments already made (Editorial 1961, p. vi; 1962). But the strengths and weaknesses of the whole process of book evaluation across national and disciplinary borders remain to be studied by social science and general systems research (Bry & Afflerbach 1964).

Various indirect methods of selection would have to be combined in order to overcome the limitations of any single one. An index of selected bibliographies of subjects important to the behavioral sciences would be particularly useful. It should include not only the bibliographies that are separately published but also those that appear as journal articles or parts of books. Contributions made by behavioral scientists to a special bibliographic system need not themselves be of a bibliographic nature. The proceedings of pertinent conferences and symposia, for example, may serve as guide-posts indicating the direction the literature of the behavioral sciences may take in the future. Festschriften (Mental Health …1963, p. 242) may also forecast later developments, when they project a scholar’s influence and interests into the history of ideas. The position the behavioral sciences occupy in the total advance of knowledge might be traced in broad lines through endowed lectures in which leading scientists and scholars interpret progress in their respective fields. The basic themes of many long-established lectureships have religious or humanistic implications, and behavioral scientists invited to deliver such lectures appear to use this medium to clarify also the social and ethical implications of their scientific work. If these and other developments that can contribute to a bibliographic system for the behavioral sciences could be brought into focus, a system that can integrate the relevant data would offer a basis for a valid selection of the significant literature.

“Sociobibliography.” The behavioral sciences and the field of bibliography have other problems in common, which could be studied by a new sub-discipline—perhaps to be called “sociobibliography” —through combining the approaches of social research with research in bibliography. A few examples may suggest the type of problems to which sociobibliographic research could be applied.

An analysis of data about twenty thousand book reviews published during the years 1955 to 1961 in 150 journals—in the English language but international in origin—relating to the behavioral sciences showed that about one-third of the reviews were concentrated in a relatively small number of older journals, many of which represented the disciplinary tradition, while about two-thirds of the reviews were scattered through some 130 journals, many of which were recent and reflected newer scientific perspectives. The established journals, which contained only a portion of the reviews, are available in many more libraries and are more widely recorded and indexed than the new ones (Editorial 1961, pp. iii-v). Sociobibliographic research could develop methods that would either extend the advantages of international bibliographic and library cooperation to the more recent literature of the behavioral sciences or compensate for the lack of these advantages through other means.

There is a need for studying the literature and bibliography of the behavioral sciences as part of the intellectual, social, and cultural history of the past hundred years (Bayne & Bry 1954; Journal of the History …). Historical bibliographic data may throw light on cultural differences that influenced progress in fields relating to the behavioral sciences. For example, by the 1880s changes in psychiatric theory had led, in France and various other countries, to a substantial literature on hysteria in men. In spite of Germany’s important position in psychiatry, scarcely any articles on this subject had then been published in that country. It appears from pertinent nineteenth-century sources that the national self-image had inhibited the diagnosis and study of male hysteria in Germany at that time (Bry & Rifkin 1962). Sociobibliographic research could lead to an understanding of cultural differences that influence contemporary developments in the behavioral sciences. [SeeKnowledge, Sociology of.]

Another contemporary problem belongs in the context of culture change. International bibliographic codes, such as subject headings and classification numbers, may continue to reflect the attitudes and social conditions on which they were based, but which have been or are being changed. There is an urgent task for sociobibliography to analyze obsolete social and behavioral implications of bibliographic categories and symbols before the programming of computers for library use proceeds beyond trial stages (Bry & Afflerbach 1964). The presentation of the literature of the social and behavioral sciences should be consistent with the content that this literature is intended to communicate. The cultural implications of bibliographic categories are particularly significant in subject areas where attitudes and social change are an integral part of the subject under discussion—for example, in the literature on population problems, race relations, new nations, or international understanding.

Values and visibility. In a pioneer study, Albert and Kluckhohn (1959) offered a retrospective inventory of values that were the subject of discussion in the literature they had surveyed. Although there is as yet no science of values (Handy & Kurtz 1964, pp. 131-136), behavioral scientists often identify, debate, and clarify the issues involving values—philosophical, cultural, social, psychological—that arise in the course of their work. Positions taken on such issues are being further clarified in the course of evaluating the literature of the behavioral sciences; so are the values embodied in theories and in the concepts that organize the facts.

Since values implicit in the work of behavioral scientists and in various schools of thought enter into the evaluation, the entire process of evaluation is becoming increasingly visible in the literature itself. In the pertinent journals, the trend has been to give as much information about book reviewers as about other contributors, to establish prizes for papers and monographs, and to identify those who bestow honors, tributes, or awards as fully as those who receive them. The evaluating process that leads to scholarly recognition should also be visible in the bibliographic record. The current standard bibliographic presentation, however, includes little information that would indicate the scholarly significance of important and influential works.

A bibliographic system for the behavioral sciences could begin to make values visible by providing a continuous record of the topics in the pertinent literature that are explicitly concerned with values. Furthermore, it could aim at identifying the values that are in the process of being clarified in the course of assessing the literature. A selected bibliography for the behavioral sciences, derived from the evaluation made by behavioral scientists, would concentrate on the literature marked as significant. It would then provide all salient data, especially those relating to values. Not only the leading value positions but also the processes of evaluation that govern their impact could then be continuously reassessed (Bry 1962).

As long as research in any single discipline, designed to observe, explain, and predict human behavior, could be used for this basic scientific purpose only, the literature fitted into the existing bibliographic systems; if it did not fit, new disciplines created their own visible bibliographic records, as did psychology and psychoanalysis at the beginning of the twentieth century. There has been no such direct correspondence between literature and bibliographic presentation in the multi-disciplinary behavioral sciences. Since knowledge potentially evolving from the behavioral sciences might be used to modify human behavior in fundamental ways or help in the solution of the urgent problems of mankind, the need for a widely visible record of the representative works and values in the literature of the behavioral sciences is no longer only a bibliographic but also a social issue.

Ilse Bry

[See alsoBehavioral sciences; Knowledge, Sociology of.]


Information Retrieval

Information Retrieval

Information retrieval, commonly referred to as IR, is the process by which a collection of information is represented, stored, and searched in order to extract items that match the specific parameters of a user's requestor queryfor information. Though information retrieval can be a manual process, as in using an index to find certain information within a book, the term is usually applied when the collection of information is in electronic form, and the process of matching query and document is carried out by computer. The collection usually consists of text documents (either bibliographic information such as title, citation and abstract, or the complete text of documents such as journal articles, magazines, newspapers, or encyclopedias). Collections of multimedia documents such as images, videoclips, music, and sound are also becoming common, and information retrieval methods are being developed to search these types of collections as well.

The information retrieval process begins with an information need someone (referred to as the user) requires certain information to answer a question or carry out a task. To retrieve the information, the user develops a query, which is the expression of the information need in concrete terms ("I need information on whitewater rafting in the Grand Canyon").

The query is then translated into the specific search strategy best suited to the document collection and search engine to be searched (for example, "whitewater ADJ rafting AND grand ADJ canyon" where ADJ means "adjacent" and AND means "and"). The search engine matches the terms of the search query against terms in documents in the collection, and it retrieves the items that match the user's request, based on the matching criteria used by that search engine. The retrieved documents can be viewed by the user, who decides whether they are relevant; that is, whether they meet the original information need.

Information retrieval is a complex process because there is no infallible way to provide a direct connection between a user's query for information and documents that contain the desired information. Information retrieval is based on a match between the words used to formulate the query and the words used to express concepts or ideas in a document. A search may fail because the user does not correctly guess the words that a useful document would contain, so important material is missed. Or, the user's search terms may appear in retrieved documents that pertain to a subject other than the one intended by the user, so material is retrieved which is not useful. Research in information retrieval has aimed at developing systems which minimize these two types of failures.

History of Information Retrieval

Almost as soon as computers were developed, information scientists suggested that the new machines had the potential to perform text processing as well as arithmetical operations. By representing text as ASCII characters, queries formulated as character strings could be matched against the character strings in documents. The first computer-based IR systems, which appeared in the 1950s, were based on punched cards . These were followed in the 1960s by systems based on storage of the database on magnetic tape .

These first systems were hampered by the limited processing power of early computers, and the limited capacity for and high cost of storage. They operated offline , in a batch processing mode. It was not until the 1970s that IR systems made it possible for users to submit their queries and obtain an immediate response, allowing them to view the results and modify their queries as needed. The development of magnetic disk storage and improvements in telecommunications networks at this time made it possible to provide access to IR systems nationwide.

At first very little textual information was available in electronic form, though printed indexing and abstracting services for manual searching had been available for many years. Over time, however, a significant back file of a number of databases was created, making it realistic to do a retrospective search for literature on a given topic.

One of the best known commercial information systems is DIALOG, which currently has hundreds of databases containing many types of informationnewspapers, encyclopedias, statistical profiles, directories, and full-text and bibliographic databases in the sciences, humanities, and business. Another well-known commercial system is LEXIS-NEXIS, which is widely used for its full-text collection in business and particularly law, since it provides computer searching of statutes and case law.

Much early work in information retrieval was conducted at U.S. government institutions such as the National Aeronautics and Space Administration (NASA) and the National Library of Medicine (NLM), and included the forerunners of today's systems. Versions of the DIALOG system were first operated by NASA and the Atomic Energy Commission; it later became a commercial system. The MEDLINE system operated by NLM today originated in an experimental system for searching their medical database, MEDLARS.

Boolean Information Retrieval

For many years, the standard method of retrieval from commercially available databases was Boolean retrieval. In Boolean retrieval, queries are constructed by combining search terms with the Boolean operators AND, OR, and NOT. The system returns those documents which exactly match the search terms and the logical constraints.

In addition to the basic AND, OR, NOT operators, most operational Boolean systems offer proximity operators so that searchers can specify that terms must be adjacent or within a fixed distance of one another. This allows the specification of a phrase as a search term, for example "grand ADJ canyon," meaning "grand" must be adjacent to "canyon" in retrieved documents. Many other functions are commonly available, such as the ability to search specific parts of a document, to search many databases simultaneously, or to remove duplicates. However the basic functionality in commercial systems remains the standard Boolean search.

Problems with Boolean Retrieval

Boolean searching has been criticized because it requires searchers to understand and apply basic Boolean logic in constructing their search strategies, rather than posing their queries in natural language. Another criticism is that Boolean searching requires that terms in the retrieved document exactly match the query terms, so potentially useful information may be missed because a document does not contain the specific term the searcher thought to use. A Boolean search essentially divides a database into two parts: documents that match and those that do not match the query. The number of documents retrieved may be zero, if the query was very specific, or it could be tens of thousands if very common terms were used. All documents retrieved are treated equally so the system cannot make recommendations about the order in which they should be viewed. Because of its complexity, Boolean searching has often been carried out by information professionals such as librarians who act as research intermediaries for their patrons.

Boolean retrieval has also been criticized on the basis of performance. The standard measures of performance for IR systems are precision and recall. Precision is a measure of the ability of a system to retrieve only relevant documents (those which match the subject of the user's query). Recall is a measure of the ability of the system to retrieve all the relevant documents in the system. Using these measures, the performance of Boolean systems has been criticized as inadequate, leading to the continuing search for other ways to retrieve information electronically.

Alternatives to Boolean Retrieval

Since the 1960s and 1970s, IR researchers explored ways to improve the performance of information retrieval systems. Gerard Salton (19271995), a professor at Cornell University, was a key figure in this research. For more than thirty years, he and his students worked on the Smart system, a research environment that allowed them to explore the impact of varying parameters in the retrieval system. Using measures such as precision and recall, he and other researchers found that performance improvements can be made by implementing systems with features such as term weighting, ranked output based on the calculation of query-document similarity, and relevance feedback.

In these systems, documents are represented by the terms they contain. The list of terms is often referred to as a document vector and is used to position the document in N-dimensional space (where N is the number of unique terms in the entire collection of documents). This approach to IR is referred to as the "vector space model."

For each term, a weight is calculated using the statistics of term frequency, which represents the importance of the term in the document. A common method is to calculate the tfxidf value (term frequency x inverse document frequency). In this model the weight of a term in a document is proportional to the frequency of occurrence of the term in the document, and inversely proportional to the frequency with which the term occurs in the entire document collection. In other words, a good index term is one that occurs frequently in a particular document but infrequently in the database as a whole.

The query is also considered as a vector in N-dimensional space, and the distance between a document and a query is an indication of the similarity, or degree of match, between them. This distance is quantified by using a distance measure, commonly a similarity function such as the cosine measure. The results are sorted by similarity value and displayed in order, best match first.

The relevance feedback feature allows the user to examine documents and make some judgments about their relevance. This information is used to recalculate the weights and rerank the documents, improving the usefulness of the document display.

These systems allow the user to state an information need in natural language, rather than constructing a formal query as required by Boolean systems. The ranked output also imposes an order on the documents retrieved, so that the first documents to be viewed are most likely to be relevant. The search is modified automatically based on the user's feedback to the system.

More recently, information retrieval systems have been developed to search the World Wide Web. These search engines use software programs called crawlers that locate pages on the web which are indexed on a centralized server. The index is used to answer queries submitted to the web search engine. The matching algorithms used to match queries with web pages are based on the Boolean or vector space model.

Individual search engines vary in terms of the information on the web page that they index, the factors used in assigning term weights, and the ranking algorithm used. Some search engines index information extracted from hyperlinks as well as from the text itself. Because information on the search engine is usually proprietary , details of the algorithms are not readily available. Comparisons of retrieval performance are also difficult because the systems index different parts of the web and because they undergo constant change. Recall is impossible to measure because the potential number of pages relevant to a query is so large.

The Future of Information Retrieval

Researchers continue to improve the performance of information retrieval systems. An ongoing series of experiments called TREC (Text Retrieval Evaluation Conference) is conducted annually by the National Institute of Standards and Technology to encourage research in information retrieval and its use in real-world systems.

One long-term goal is to develop systems that do more than simply identify useful documents. By considering a database as a knowledge base rather than simply a collection of documents, it may be possible to design retrieval systems that can interpret documents and use the knowledge they contain to answer questions. This will require developments in artificial intelligence (AI) , natural language processing, expert systems , and related fields. Research so far has concentrated primarily on relatively narrow subject areas, but the goal is to create systems that can understand and respond to questions in broad subject areas.

see also Boolean Algebra; E-commerce; Search Engines; World Wide Web.

Edie M. Rasmussen


information storage and retrieval

information storage and retrieval, the systematic process of collecting and cataloging data so that they can be located and displayed on request. Computers and data processing techniques have made possible the high-speed, selective retrieval of large amounts of information for government, commercial, and academic purposes. There are several basic types of information-storage-and-retrieval systems. Document-retrieval systems store entire documents, which are usually retrieved by title or by key words associated with the document. In some systems, the text of documents is stored as data. This permits full text searching, enabling retrieval on the basis of any words in the document. In others, a digitized image of the document is stored, usually on a write-once optical disc. Database systems store the information as a series of discrete records that are, in turn, divided into discrete fields (e.g., name, address, and phone number); records can be searched and retrieved on the basis of the content of the fields (e.g., all people who have a particular telephone area code). The data are stored within the computer, either in main storage or auxiliary storage, for ready access. Reference-retrieval systems store references to documents rather than the documents themselves. Such systems, in response to a search request, provide the titles of relevant documents and frequently their physical locations. Such systems are efficient when large amounts of different types of printed data must be stored. They have proven extremely effective in libraries, where material is constantly changing.

information storage and retrieval

information storage and retrieval (ISR) The linked activities of storing and retrieving information, and the strategies and techniques for doing so. The activities are linked because the means of retrieving information are dependent on the means by which it was stored. The storage strategy must be designed for the most efficient retrieval, consistent with the characteristics of the information and the time and cost that can be tolerated.

information retrieval

information retrieval Strictly, the activity of retrieving previously stored information. The term is sometimes used to mean information storage and retrieval (as in information retrieval application).

