Computer Science: Information Science and the Rise of the Internet

views updated

Computer Science: Information Science and the Rise of the Internet

Introduction

First emerging as an academic discipline in the 1960s, information science is the field of knowledge concerned with the storage, organization, search, and retrieval of information stored in all kinds of documents, including computer records. Because of the information explosion of the twentieth century, including millions of printed books, newspapers, and articles as well as trillions of terabytes on the Internet, it would be impossible to access most human knowledge without the tools devised by information science.

Historical Background and Scientific Foundations

The Ancient Period

Written language first developed in the Middle East in 3000 BC. By 2500 BC, Sumerian scribes were collecting large numbers of documents in central locations. The first documents listing other documents appear at about this time, showing the first effort to use an information technology system to manage the information itself. The first library in the world was probably assembled by the Assyrian king Ashurbanipal (685–627 BC), who authorized his chief scribe to collect every tablet in the kingdom—by force, if necessary—and bring it to the capital city to be added to his collection. (Documents were literally “tablets” at this time, small rectangles of solid clay with marks pressed into their surfaces by sharpened reeds.)

The first Chinese library was assembled about 1400 BC; the Egyptian Pharaoh Ramses II built one in Thebes about 1225 BC; and in India manuscript collections appeared about 1000 BC. The destruction of libraries was also routine. Ashurbanipal's library was destroyed when the Assyrian empire collapsed. In 213 BC the new emperor of China, Shi Huangdi, ordered every book in China destroyed so that he could replace them with books composed according to his own ideas.

The greatest library of ancient times, founded about 300 BC, was the Great Library at Alexandria, a city in Egypt named after the Greek conqueror Alexander the Great (356–323 BC). The Greek ruler of Egypt at that time, Ptolemy I (367–283 BC) built a royal library and offered special incentives, including money and free housing, to attract scholars from surrounding countries.

By 47 BC the collection contained about 700,000 written works, vastly more than any other collection in the world. This library also featured the world's first system of classification. Rather than works being mixed together in an indiscriminate jumble, they were segregated into different rooms based on their subject matter. The scholar Callimachus (305–240 BC) created a 120-volume catalog listing the contents of the entire library as it existed in his day. Tragically, the Great Library at Alexandria was destroyed. The exact date of the disaster and its cause are not certain; the library was probably burned, partly or entirely, on more than one occasion.

Throughout the Middle Ages, European written culture was kept alive mostly by the efforts of Christian monks in scattered monasteries, memorizing and copying the scriptures onto parchment. Even the largest collections of documents at this time did not number above the hundreds, and there was little need for special systems to manage information.

The Era of Print

The great turn toward the modern information era came about 1450, when German craftsman Johannes Gutenberg (1400–1468) invented the printing press and movable type. In this system, individual letters are mass-produced, then arranged together in a block or plate to form a mirror image of a page of text. The plate is then wetted with ink and pressed against a blank sheet of paper. Gutenberg's method allowed for the rapid creation of new plates, making it possible to manufacture books far more cheaply and rapidly than ever before. Printing presses quickly swept across Europe, transforming its culture; printing presses remained the world's primary information technology for the next 600 years.

Over the next few centuries, printed books became steadily cheaper and more numerous. As knowledge expanded and book collections swelled, the dream of mechanized organization of the world's knowledge first surfaced. In 1532, Giulio Camillo (1480–1544), a Venetian, designed a curious device called the Theater of Memory. This consisted of a small room in which the user could sit, surrounded by an array of small windows covered by shutters. The shutters could be opened by a geared mechanism to reveal words and images. Although Camillo could not mechanize the processing of information, only display it, he seems to have anticipated a form of the “windows” principle by which most modern users interact with computers.

Camillo's Theater of Memory was never completed and no similar devices were built. However, it is the first recorded attempt to mechanize access to all knowledge, one that would eventually be fulfilled in some measure by the Internet.

Long before the Internet and computer technology emerged, however, the need to master large amounts of printed information was a continuous problem. In 1668 the Scientific Revolution generated so many journals and papers that English bishop John Wilkins (1614–1672) proposed a radical solution: a “Universal Language” that would place most of the world's knowledge into 40 categories, each with a unique four-letter name.

Wilkins argued that natural language was inadequate to handle the quantity and diversity of all known facts, an anticipation of modern computer languages that handle the quadrillions of bits of information available today. At first Wilkins's plan was received enthusiastically by scholars, just as 300 years later ARPANET, the prototype of the Internet, would be built for researchers. However, Wilkins's scheme was too arbitrary and rigid to fulfill its promise, and it was soon forgotten.

The information problem continued to grow. From the invention of the printing press to 1500, Europe manufactured about 20 million books; in the following century, it manufactured ten times as many. Demand grew for a compact, affordable way to access knowledge in many fields at once.

This demand was met by the invention of the encyclopedia. The first modern encyclopedia was edited by Denis Diderot (1713–1784), whose Encyclopédie, ou dictionnaire raisonné des sciences, des arts et des métiers (Encyclopedia, or a systematic dictionary of the sciences, the arts, and the professions) appeared in installments from 1751 to 1772. This was a revolutionary technology and recognized as such: The Pope and the kings of England and France all condemned it for giving the masses easy access to technical knowledge. The French government ordered publication of the work to stop and literally imprisoned 6,000 volumes in the Bastille. Even in the modern world repressive governments, such as

those of China, Iran, and Cuba, try to control access to the World Wide Web.

Throughout the eighteenth and nineteenth centuries, the organization of most libraries remained crude. As at the Great Library of Alexandria, libraries' contents were listed in specialized books called catalogs. At the beginning of the twentieth century, however, a new library technology appeared—the card catalog, a cabinet filled with small, alphabetically ordered drawers holding paper cards, one card for each book in the library. Other new schemes of library organization were also proposed. The most successful, at least in the United States (where it is still used in thousands of libraries) was that proposed by Melvil Dewey (1851–1931). Today, like physical card catalogs, Dewey's and other cataloging systems have been transferred to computers.

Mechanical Dreams, Library Science, and Rise of the Internet

The idea of combining text, photographs, microfilm, television, and other mechanical aids to produce a sort of twentieth-century Theater of Memory was the dream of Paul Otlet (1868–1944), who established the professional field of document tracking and organizing that gave rise to information science in the late 1940s. Years before electronic computers were invented, Otlet promoted the idea of a universal knowledge network that could be viewed on screen via what he called an “electric telescope.” Sitting at a special circular desk packed with apparatus, the user would have access to essentially all the recorded knowledge of the world, a “universal book.”

Although Otlet's vision of mechanized universal access was impractical, his insights into the relationships between texts was not. In the 1930s he invented the term “links” to name references in one text to other texts and described the resulting set of relationships between books, articles, and other information sources as a “web.” He also saw the need not only for an elaborate information-handling machinery, but for some type of search and retrieval system—functions supplied today by search engines.

There was little interest in Otlet's ideas, however, especially as World War II (1939–1945) loomed. After the war, American engineer Vannevar Bush (1890–1974) proposed a system he called the Memex, widely hailed as the immediate conceptual forebear of the Internet. In a popular 1945 Atlantic Monthly article, “As We May Think,” Bush proposed a workstation-type desk packed with apparatuses. Information would be stored on reels of microfilm; images and text would be projected upward onto desktop viewing screens. A built-in camera would provide the photographic equivalent of a scanner, allowing the user to add texts and images to the Memex's microfilm memory. Crucially, and recalling Otlet's notion of “links” (though there is no evidence that Bush knew of Otlet's work), Bush proposed that the Memex would allow its user to make notes linking one document to another to form “associative trails,” anticipating the hyperlinks that tie the World Wide Web together.

Scholars have noted that, unlike the Internet, Memex's goal was to contain all information relevant to a given researcher, not to link to a network of outside sources. It was to be, in effect, a miniaturized private library plus “associative trails,” not a terminal or node in a larger network. Its closest parallel is perhaps an MP3 player or iPod, not a personal computer linked to the Internet. Although Memex was never built, Bush's idea generated excitement, talk, and fresh awareness of the possibility of mechanizing access to information—just as the digital electronic computer was in the process of being invented.

A decade later, in the mid-1950s, in an apparently unrelated development, American library science student Eugene Garfield (1925–) devised a method of citation ranking—a way to measure the influence of individual articles in scientific journals. The more a paper is cited, the more important it is deemed to be, and the higher a ranking it is given. Soon, Garfield's method became a standard fixture of scientific research, helping scientists cope with the overwhelming and ever-growing number of publications in their field by telling them which were the most important. The Science Citation index, a privately-produced database of citation rankings, is purchased by all university libraries for use by researchers. The significance of citation ranking is that its basic method was adopted half a century later as the core method of the search engine Google, which ranks Web pages rather than scientific papers and does so automatically (see sidebar). Once again, a technique developed for scholars broke ground that would eventually be used by many millions of users.

The electronic digital computer matured and became a commercial product (though still a large, rare, and expensive one) in the 1950s and 1960s. Most universities and large businesses acquired computers. Individual computers could now be linked to each other

electronically through the telephone system or other channels. In 1962, American computer scientist J.C.R. Licklider (1915–1990) and his colleagues described a vision of a network of computers that would allow users to exchange information with each other. Hired by the U.S. government's Advanced Research Projects Agency in 1963, Licklider set about orchestrating the creation of such a network, the Advanced Research Projects Agency Network (ARPANET). ARPANET would link universities and other institutions where researchers sponsored by ARPA worked. In 1969, the first elements of the system became functional—four computers at four American universities. By 1973 there were 40 nodes, and by 1981 there were 213, including international links, mostly to Europe. In the 1970s and early 1980s other, independent networks were built, but during the 1980s these were merged or interconnected with each other. In the late 1980s the first Internet service provider companies came into being, selling access to the growing Internet.

But there is more to today's Internet than a system of computers exchanging data: What has made the Internet useful to hundreds of millions of people is the World Wide Web. Early ARPANET and other computers featured a screen and a keyboard. Users typing at the keyboard saw letters appear on the screen, which might be entered as commands or messages. Messages from other network users or computers would also appear as lines of text. There were no windows—just the single screen, with a growing stack of lines of text—lines that would eventually scroll up out of view beyond the top of the screen. Users had to know specific computer commands to use the system. In the 1970s through the 1990s, several ways of organizing information on the screen and interacting with the computer were invented that would revolutionize this clunky standard interface. Collectively, these made possible the World Wide Web. The Web should not be confused with the Internet. The Internet is a communications network: the Web is a collection of software applications running on the computers connected to that network. In practice, the Web is a mass of several tens of billions of Web pages structured by hypertext markup language (HTML) and linked to each other by hyperlinks. Web pages are visually rich, and users interact with them either by typing in text or by clicking on words or images using a mouse to control a pointer.

IN CONTEXT: DID AL GORE INVENT THE INTERNET?

During the presidential campaign season of 1999–2000, strange headlines appeared asserting that candidate Al Gore (1948–), a Democrat and Vice President of the United States from 1993 to 2001, had claimed in a TV interview to be the “inventor of the Internet.” The story appeared in dozens of opinion columns and editorials. Many pundits described Gore as “delusional.” The Associated Press ran a story headlined, “Republicans pounce on Gore's claim that he created the Internet” (March 11, 1999).

Gore must have been crazy, the commentators said: The Internet is not any one person's invention. If anyone could lay claim to that title of the Internet's inventor it might have been J.C.R. Licklider (1915–1990), who in the 1960s created the military-funded network of university computers called ARPANET, which eventually evolved into the Internet. Or perhaps the credit might be shared with Tim Berners-Lee, who in 1991 invented hypertext markup language (HTML), the system of coding that makes all Web pages possible.

The accusation against Gore was, however, fundamentally flawed. What he said in the TV interview was not that he had “invented” the Internet or was “the father” of the Internet, as many outlets wrongly reported, but this:“

During my service in the United States Congress, I took the initiative in creating the Internet.”

Was this claim correct? As a matter of Internet history, it is: In the 1980s, Gore was the leader of bipartisan Congressional efforts to construct the physical backbone of what would become the Internet in the 1990s. He coined the term “information superhighway” and proposed legislation to fund the construction of high-speed, transcontinental data links. Senator Newt Gingrich, a Republican and otherwise a political enemy of Gore's, worked with him in the 1980s to lay the groundwork for the Internet. In 2000 Gingrich said, “in all fairness, Gore is the person who, in the Congress, most systematically worked to make sure that we got to the Internet.” (C-SPAN, September 1, 2000).

Reports of Gore's delusions of grandeur were incorrect. Al Gore did not invent the Internet—and Al Gore never claimed to have invented the Internet.

Gore lost the 2000 presidential election to George W. Bush, but later won the 2007 Nobel Peace Prize, shared with the United Nations Intergovernmental Panel on Climate Change, for his groundbreaking work bringing attention to the problem of global climate change, most notably with the film An Inconvenient Truth.

These ways of presenting information and interacting with computers may seem obvious today, but were not always so. The mouse was the 1963 invention of American inventor Douglas Engelbart (1925–), a fan of Vannevar Bush's 1945 “As We May Think” essay. Hypertext was invented by Ted Nelson (1937–) in 1968 and first demonstrated publicly by Engelbart in the same year. The windows-type screen environment was developed by Xerox researchers in the mid 1970s. In 1990, English physicist Tim Berners-Lee fused graphic user interfaces, HTML—a program he wrote himself to manage HTML documents on-screen (i.e., a Web browser)—and the Internet to produce the beginnings of the World Wide Web. Indeed, there was a brief time when the World Wide Web included only a single computer, Berners-Lee's own.

IN CONTEXT: SEARCH ENGINES

Search engines such as Ask.com, Google, and Yahoo!Search are the spark of life in the Internet. Without such tools, looking for a specific piece of information on the Internet would be like trying to take a sip of water from an open fire hydrant.

A user of a search engine inputs a search term or phrase, and the search engine displays a number of Web pages chosen from the several tens of billions of Web pages that exist. The first step in enabling the engine to do this is to “crawl” the Web, that is, to automatically visit as many Web pages as possible—all of them, ideally—and index their contents. Indexing means making a list of words, each accompanied by a list of pointers to places where the word occurs. In a book, the pointer points to a page number; on the Internet, it points to a Web page address.

But simple indexing is not enough. Pages must also be ranked, that is, ordered from the ones that the user will probably find most interesting to those they will probably find least interesting.

Search engine companies use many secret methods for ranking Web pages, as well as other tricks. The basic method used by Google is known, since the mathematical formula was made public in 1999. For every page on the Web, Google calculates a number that says how important that page is: This number is called a page rank. When you perform a search, pages with high ranks are more likely to appear on your screen. The Google formula calculates a rank for a Web page—call it page Z—by looking at the pages that link to Z. For each page linking to Z, a number is calculated by dividing its own page rank by the number of its out-going links. Z's rank is then given as the sum of all these fractions (modified by an arbitrary number called the damping factor).

In this system, the page rank of every page depends on the page rank of many other pages, which in turn depend on the page ranks of other pages, and so on. In practice, powerful computers owned by search-engine companies are continually re-calculating page rankings for the entire Web as it grows.

Modern Cultural Connections

The invention of the World Wide Web in 1990 was possible because the Internet already existed. The creation of the Internet depended on artificial languages developed for computers and elaborate methods of organizing information inside computer memories. These techniques, in turn, can in part be traced to methods developed by the fields of documentation and information science in the early twentieth century. Without those techniques, which allowed scientists and engineers to trace what was important to them in the millions of pages of printed technical matter already being produced, it would have been difficult to invent modern computers originally.

Since 1990, Berners-Lee's desktop Web has indeed become worldwide, transforming every aspect of life in the industrialized countries from courtship to business. Techniques borrowed directly from library science are at the heart of the search engines (Google, Yahoo!Search, etc.) that make the Web functional in practice, and are therefore a lynchpin of the Internet-driven aspect of the modern global economy. All technologies have a potentially dark side, however, and the Internet and Web are no exception: the same information-management techniques of listing, searching, and ranking that can be used to connect users with information that is relevant to them can be run by corporations and governments in reverse, tracing Internet usage backward to individual people, who may then be discriminated against, spied upon, arrested, or even killed. Legitimate law enforcement, such tracking child-pornography traffickers and terrorists, makes use of essentially the same set of tools.

Despite their extreme popularity, it is difficult to point any overall improvement in society that can be attributed directly to the Internet or Web. In the industrialized countries, science literacy has increased only slightly, if at all, since the Internet exploded into daily life. Worldwide, military dictatorships remain in control despite millions of Internet users in their countries. In China, which has more Internet users than any other country except the United States (over 50 million), all Internet traffic enters and leaves the country through a cluster of supercomputers owned and operated by the government. This allows government agents to scan the content of all Web pages, blogs, e-mails, and other international Internet exchanges, looking for—and punishing the users of—content the government disapproves of for political reasons. Since the late 1990s, at least several dozen people have been jailed in China for using the Internet for forbidden political purposes; several have been tortured, according to the human rights organization Amnesty International. In 2006, the Internet search-engine company Google, submitting to Chinese government demands, agreed to voluntarily block access by users inside China to international sites banned by the Chinese government.

In the United States, a minority of eligible voters still participates in national elections, and the amount of time spent daily reading books (the only form in which most people can still have in-depth, prolonged encounters

with fiction or nonfiction works) is declining. Simply making trillions of searchable terabytes available to people has not, by and large, made them freer, happier, or better educated. Furthermore, the Internet itself, according to some experts, is beset by deep organizational flaws, such as the one-way nature of hyperlinks. Flawed or shallow information, hoaxes, spam, propaganda, pornography, and advertising make up a large fraction—perhaps a majority—of what is available on the Internet. Old problems have migrated to the new medium, and the new medium has given rise to new problems.

Nevertheless, it can be argued that the Internet has had a net positive effect in at least some areas. For scholars and scientists, its original users, it is an essential tool for sharing large quantities of complex data rapidly. Some individuals have been empowered by the Internet to become more politically active, generally educated, or economically independent.

bibliography

Books

Albarran, Alan B., and David H. Goff, eds. Understanding the Web: Social, Political, and Economic Dimensions of the Internet. Ames, IA: Iowa State University Press, 2000.

Arms, William Y. Digital Libraries. Cambridge, MA: M.I.T. Press, 2000.

Ceruzzi, Paul E. A History of Modern Computing. 2nd ed. Cambridge, MA: The M.I.T. Press, 2003.

Katz, James E., and Ronald E. Rice. Social Consequences of Internet Use. Cambridge, MA: The M.I.T. Press, 2002.

Lilley, Dorothy B., and Ronald W. Trice. A History of Information Science, 1945–1985. San Diego, CA: Academic Press, Inc., 1989.

Wright, Alex. Glut: Mastering Information Through the Ages. Washington, DC: Joseph Henry Press, 2007.

Periodicals

Dalbello, Marija. “Is There a Text in This Library? History of the Book and Digital Continuity.” Journal of Education for Library and Information Sciences 43 (2002): 11–19.

Veith, Richard H. “Memex at 60: Internet or iPod?” Journal of the American Society for Information Science and Technology 57 (2006): 1233–1242.

Larry Gilman

Scientific Thought: In Context