Skip to main content

Information Theory

Information Theory


The concepts and measures of the statistical theory of selective information (information theory) have become so thoroughly enmeshed with the whole of behavioral science that delineation of the exact contribution of the theory is nearly impossible. The very verbal descriptive fabric of the behavioral sciences has become thoroughly interlaced with informational concepts: individuals or groups are described as “information sources” or “receivers”; skilled performance is described as “information processing”; memory is described as “information storage”; nerves are described as “communication channels”; the patterning of neural impulses is described as “information coding”; the brain is described as “an informational computer,” etc. Indeed, the molecule, the cell, the organ, the individual, the group, the organization, and the society have all been examined from the point of view of a general systems theory which focuses upon the information-processing, rather than upon the energetic, characteristics of each system (J. G. Miller 1955). Perhaps the closest analogue to the impact of information theory upon psychology is the impact that behaviorism had upon psychology, with the subsequent redefinition of psychology as the science of behavior. In both cases questions of definition have replaced questions of possible relevance. [SeeSystems Analysis, article onGeneral Systems Theory.]

Information theory is a formal mathematical theory, a branch of the theory of probability. As such, the theory is self-contained; it does not require verification by experiment (Frick 1959). Yet, formal theories often have profound influence as conceptual models and as models for experiment. The theory is indelibly flavored by the context of electrical communications and control in which it was developed. Cherry ([1957] 1961, pp. 30-65) has charted the development of the theory within the field of communications. The genesis of the modern theory of statistical communications is due primarily to Hartley (1928). Building upon earlier work, by Nyquist and Küpfmüller, Hartley showed that in order to transmit a given amount of information a communication channel must undergo an exchange between its duration and its bandwidth, or frequency range. With a narrower frequency range, the communication channel must be available for a longer duration to transmit a given amount of information. Information was identified with a arbitrary selection of symbols from a set of defined symbols. The measure of information was defined in terms of the logarithm of the number of equally likely symbols available for communication. The essence of the idea is that information is measured in terms of what could have been communicated under a defined set of circumstances rather than in terms of what actually is communicated at a particular moment. The definition is sufficiently broad to provide a general framework for the specification of a wide class of communication systems. Following Hartley, numerous distinguished contributions were made throughout the world. These included the contribution by R. A. Fisher in characterizing the efficiency and sufficiency of statistics and that of D. Gabor, who introduced the concept of the logan as the elementary unit of information. It was the contributions of Shannon (1948) and of Wiener (1948), however, which provided the intellectual synthesis that marked the birth of modern information theory. [SeeCyberneticsand the biography ofWiener; see alsoFrick 1959.]

Shannon provides a scheme for a general communication system.

It consists of essentially five parts: 1. An information source which produces a message or a sequence of messages to be communicated to the receiving terminal. … 2. A transmitter which operates on the message in some way to produce a signal suitable for transmission over the channel. … 3. The channel is merely the medium used to transmit the signal from transmitter to receiver…. During transmission, or at one of the terminals, the signal may be perturbed by noise. … 4. The receiver ordinarily performs the inverse operation of that done by the transmitter, reconstructing the message from the signal. … 5. The destination is the person (or thing) for whom the message is intended. ([1948] 1962, pp. 4-6)

This description, while initially directed toward electrical communication systems, is sufficiently general to use in the consideration of a wide class of information systems.

The measure of information. The essential idea of the Shannon-Wiener mathematical theory of communication is that communication is a statistical process which can be described only in probabilistic terms. When it is possible to predict completely each message out of a source of possible messages, by definition no information will be conveyed by the message. When any one message is as probable as any other possible message, maximum information will be conveyed by the message. From this point of view, the information of any message is associated with the reduction in the range of possible selections by the receiver of any message, i.e., with the reduction of the receiver’s uncertainty. Uncertainty, choice, information, surprise value, and range of selections, therefore, all become intimately related concepts. The meaning, reasonableness, and personal importance of the message, however, are not considered within this approach to communications. The concern of the theory is to provide a measure of the amount of information for any selection or choice from defined sources of information.

The measure of the amount of information associated with a given selection can be arbitrarily defined as the logarithm of the number of equally likely alternatives. The measure can also be rigorously derived by initially defining a set of conditions which it must satisfy. Shannon employed the latter procedure, and the interested reader is referred to the original article (1948) for the statement of the conditions. Luce (1960) has also listed a set of necessary conditions which lead to the same result. The conditions are (a) independence of irrelevant alternatives—the amount of information transmitted by the selection of any item from a defined set of items shall be a real number which depends only upon the probability of that item, p(i), and not upon the probability distribution of the other items; (b) continuity—the measure of information shall be a continuous (in the mathematical sense) function of p(i); (c) additivity—if two independent selections, i and j, with probabilities p(i) and p(j), are made, the amount of information transmitted in the joint selection of (i, j) shall be a simple sum of the information transmitted by each of the selections; and (d) scale—one unit of information is associated with a binary selection between two equally likely alternatives; the unit is called the bit. The only measure which satisfies all of these conditions for any symbol i is the negative logarithm (to the base 2) of the probability of i, p(i). And, over the ensemble of possible items, the average information of the ensemble is the average weighted logarithmic measure, H, or

The H measure has a number of important properties. First, H ≥ 0; it is 0 if, and only if, the probability of a single i equals 1, while the probability of the remaining (n —1)i is equal to 0; otherwise H is greater than 0. Information is associated with any ensemble of items whenever there is uncertainty about which item will be presented. Second, H is maximum when all of the items are equally probable. If there are n possible items, the uncertainty associated with the set of items is maximum when p(i) = 1/n. Third, H is maximum if all items occur independently of each other. If the occurrence of one item is related to the occurrence of another, the average information is reduced by the extent of that relatedness. This property is extremely important for the behavioral sciences, since the information measure provides a measure of the degree of relatedness between items in a set of possible items. The ratio of the uncertainty of a source to the maximum possible uncertainty with the same set of symbols is a measure of the relative transmitting efficiency of the source; Shannon has termed this the relative entropy of the source. And 1 minus the relative entropy has been defined as the redundancy of the source. Fourth, it is possible to encode a long sequence of items so that, on the average, H binary units per item are required to specify the sequence, even though more than H binary units per item are required to specify a short sequence. This property, called the encoding property, has recently become extremely important for electrical communications but has not been much exploited by the behavioral sciences.

History. Although Hartley’s development of the theory provided both the essential definition of information and a measure for its description, it went little noticed in the behavioral sciences. The historian of science will probably note that the behavioral sciences were not ready for the idea, nor, for that matter, was communications engineering fully ready. Shannon’s development, on the other hand, was enthusiastically grasped very early by a handful of psychologists, primarily those associated with the Psycho-Acoustics Laboratory at Harvard University.

George A. Miller, in a personal communication to the author of this article in January 1964, has described the intellectual ferment associated with the early developments. Noteworthy is Miller’s comment that “had the group not been actively interested in other related ideas from the communication engineers, the occurrence of Shannon’s paper probably would have gone unnoticed by the group.” The initial enthusiasm was stirred by the realization that Shannon had provided a tool for describing discrete events that at the same time was compatible with the continuous Fourier analysis of communication systems, with which the group was already acquainted.

Dahling has provided a valuable bibliographic survey of the early spread of concepts of the theory of selective information through a number of fields, ranging from telecommunication to journalism. Information theory provides an interesting case study for the diffusion of ideas in modern science, because of its great impact and its relative recency. From his analysis Dahling concluded that “the idea was drawn from a flurry of current related activity and, as the idea developed, it gained impetus and speed of adoption from the same surrounding activity that gave rise to it” and that “the adoption of the theory was speeded by a clearly apparent need for such a theory” (1962, p. 121). Moreover, “because the idea dealt with matters of common interest, it was able to spread more rapidly between disciplines” (p. 126). The idea “spread to other disciplines in proportion to its congeniality with their methods” and “to its analogic and suggestive value” (p. 132). Experimental psychologists working in communication problems and trained in the mathematics of communication engineering became logical carriers of the theory to the behavioral sciences. The introduction of information concepts to psychology was made by several routes. A number of excellent summaries are available that trace this development within experimental psychology: Attneave (1959), Garner (1962), Luce (1960), G. A. Miller (1956), and Quastler (1955). Here we shall briefly summarize a few of the salient avenues of entry into the field of experimental psychology, although parallel developments can doubtlessly be cited in any of the behavioral sciences. Thus, the balance of this review is a highly selective examination of the role of information theory in the social sciences. It is not a general review.

Organization of behavior sequences. The information measure was introduced to psychology in a now classic paper by Miller and Frick (1949). Their primary concern was the description of sequences of discrete responses. Their aim was twofold : the development of a stochastic model of behaviorial sequences and the development of a quantitative measure of the organization of the sequences. The Markov model, employed by Shannon, served as their descriptive model for the generation of response sequences; the information measure served as their measure of the degree of organization of the sequences [seeMarkov Chains].

For illustrative purposes response sequences of rats and of young girls, provided earlier in a multiple-choice experiment by Hamilton, were analyzed. An index of response stereotypy was identified as 1.0 minus the ratio of the obtained uncertainty, relative to maximum possible uncertainty. Thus, the measure of the stereotypy of response sequences is formally identical with the measure of relative redundancy of communication sources. For example, two responses, left and right, are defined as the class of responses available for observation of a rat in a given experimental situation. If the sequence of the rat’s responses were perfectly predictable (e.g., were all left responses or a left-right alternation sequence), there would be 0 uncertainty in specifying successive responses. Thus, an index of response stereotypy of 1 would be obtained. Conversely, if the rat responded left and right equally often and if the sequence of responses was unpredictable, there would be maximum uncertainty in specifying successive responses. An index of response stereotypy of 0 would be obtained. In Hamilton’s data identical indexes of response stereotypy were obtained for both girls and rats when the distributions of single-response choices were examined and when the distributions of pairs of successive choices were examined. The responses of girls became differentiated from those of the rats only when sequences of three successive choices were analyzed. In pointing out the importance of the higher-order statistical structure of response sequences and in providing an objective measure of its patterning, Miller and Frick laid the groundwork for the mathematical modeling of complex response sequences. [SeeRresponse Sets.]

Language. The statistical analysis of language represents a special application of the analysis of response sequences. Indeed, interest in cryptographic secrecy systems profoundly shaped the direction of Shannon’s development of information theory. The English alphabet is potentially capable of producing many more different messages than it actually does. In practice certain letters are employed more frequently than others, e.g., e relative to z; and certain sequences occur more frequently than others, e.g., th relative to ht. Shannon (1951) measured the relative redundancy of English and obtained a lower bound of about 50 per cent and an upper bound of about 75 per cent redundancy relative to a random distribution of letters. A related observation is that English text may be nearly completely reconstructed when as much as 50 per cent of the text has been deleted (Shannon 1951; Chapanis 1954). Furthermore, in most communications environments the range of possible communications is strongly restricted by situational factors. In tower-pilot communications at air force bases (e.g., Fritz & Grier 1955; Frick & Sumby 1952), it was demonstrated that the over-all redundancy may approach 95 per cent, again relative to a random distribution of letters. As a result of Shannon’s work and, especially, its popularization by Miller, nonlinguists became willing to tackle the intricacies of language as a complex series of response sequences, amenable to measurement and quantitative specification. [SeeLinguistics.]

A related development of information concepts in psychology was the demonstration of the important role of informational factors in the perception of speech (G. A. Miller 1951). For example, the intelligibility of words heard against a noise background is a critical function of the size of the test vocabulary (Miller et al. 1951), i.e., a critical function of stimulus information. A given word might be perceived nearly perfectly when it is embedded within a small vocabulary of possible words but might be perceived correctly only rarely when it is embedded within a large vocabulary of possible words. This result is reasonable if information is associated with what could have been presented, rather than in terms of what actually is presented.

A number of different investigators have found the Miller, Heise, and Lichten data to be a rich source for testing theories of choice behavior. For example, Garner (1962) has demonstrated that these data are consistent with the assumption that under a given signal-to-noise ratio different vocabularies may transmit the same information. Stated differently, a large vocabulary coupled with a high error rate may yield nearly the same amount of information as that transmitted by a small vocabulary and a low error rate. [SeePerception, articlespeech perception.]

Identification experiments. Another way that information concepts have been introduced to psychology is by the quantitative description, in informational terms, of the identification experiment (Garner & Hake 1951). In the identification experiment, one of n possible stimuli is presented to the subject, whose task is to identify which one of the n stimuli was presented. For example, the instruments of a symphony orchestra are defined as the class of objects for study, and one of the instruments is sounded at random. The listener is instructed to indicate which instrument of the defined set was sounded. When the stimuli are well ordered and associated with a common unit of measurement—weight, length, duration, frequency, etc.—identification performance may readily be described in terms of conventional statistical measures, e.g., the average error. When the stimuli are not well ordered, as in the case of the symphonic instruments or a series of photographs depicting various emotional moods, identification performance cannot readily be described in terms of such conventional statistical measures. The transmitted-information measure is ideally suited to be an appropriate nonmetric measure of relatedness between stimuli and responses. In addition, a vexing methodological problem is associated with the identification experiment for ordered stimuli. The identification experiment attempts to answer a straightforward question: how many stimuli can be correctly identified? The answer to the question, furnished by a given body of data, depends upon what criterion for errors is adopted. If a small average error is permitted, the same body of data will admit a larger number of distinguishable stimuli than if a large average error is permitted.

A resolution to this problem is suggested by Hake and Garner (1951), who demonstrated that while the proportion of errors is greater in locating a point at one of 50 possible positions than at one of ten positions, the amount of transmitted information for ten possible positions is about equal to that for 50 possible positions. In turn, the amount of transmitted information specifies an equivalent number of perfectly identified categories.

A concentrated flurry of experimental activity demonstrated limited transmitted-informational capabilities with a wide variety of stimulus variables. Although the categorical measure of information was better matched to nonmetric variables, most of the initial studies took place with welldefined stimulus variables upon continuous scales, e.g., length of line, direction, sound frequency, etc. The only apparent advantage of the information measure to these studies was that a single measure of performance could be employed across a wide set of variables. The historian will probably judge that many experimental psychologists had previously steered clear of variables with weak metric properties and, as a result, were unable to appreciate immediately the full potential of the informational technique for nonmetric variables. In any event, the number of identifiable stimulus components associated with any single stimulus variable was found to be disappointingly small—from less than four to about ten stimuli. However, experimenters quickly discovered that a large number of identifiable stimulus components could be achieved by employing a large number of different stimulus variables, as long as the number of components per variable was kept reasonably small. (This story is told in G. A. Miller 1956, by means of the engaging title “The Magical Number Seven, Plus or Minus Two”; and in Garner 1962.)

Response speed and skilled tasks. An area of active experimental interest is the relation between the speed of response and the informational characteristics of skilled tasks. Hick (1952) sparked interest in this area with his demonstration that reaction time was linearly related to the logarithm of the number of possible alternatives available to the subject. Further, he suggested that a measure of the rate of information transmitted, in terms of the reciprocal of the slope of the empirical function relating reaction time to stimulus information, might be achieved from a discrete-trials reaction-time experiment. This transformation provides an estimate of the rate of information transmission in humans as about five to seven bits per second (Bricker 1955). More recent findings, however, have shown that with highly overlearned tasks, such as the reading of numerals, there is little change in reaction time as a function of the information of the task (Mowbray & Rhoades 1959; Leonard 1959). In this circumstance, identification of the rate of information transmission with the reciprocal of the slope of the reaction-time functions would lead to the unreasonable conclusion that there is an infinitely high rate of information transmission. The rate of information transmitted by the human receiver has been measured directly, in a variety of tasks, by a number of investigators. (This work is summarized in Quastler 1955, pp. 305-359.) In highly overlearned tasks there is an upper limit of about 35 bits per second, which is jointly a function of the highest effective speed, the highest effective range of alternatives, and the highest effective transmission rate (ibid., p. 345). For most tasks, man’s information-transmission rate is far lower than 35 bits per second. Electronic channels of communication, by contrast, have capabilities of millions or billions of bits per second. Clearly, man’s forte is not the rate at which he can process information. When one examines certain structural features of information processing, however, the disparity between man and machine is narrowed. The largest and most elaborate of computers cannot yet perform many pattern-recognition tasks routinely performed by children. However, the rapid development of sophisticated computer programs may radically alter this situation. As Garner suggests, we shall need to devote more emphasis to the structural, as distinguished from the metric, characteristics of information if we are to understand human information processing. [SeeLearning, article onAcquisition of skill; Reaction time.]

Structure of information. The structural examination of information is based upon a multivariate extension of Shannon’s analysis by McGill (1954; 1955) and by Garner and McGill (1956). This work is summarized by Garner (1962). Garner has demonstrated the power of a multivariate information analysis for dissecting the information-relevant features of complex information sources. In this development, formulas associated with multiple correlation and with the analysis of variance find their direct counterparts within multivariate information analysis. Multivariate information analysis thus achieves the status of a general statistical tool for categorical materials, regardless of the appropriateness of the specific conceptualization of behavior in the terms of source, channel, noise, receiver, designation, etc. Furthermore, the efficiency of experimental design may be evaluated from the point of view of multivariate informational analysis. [SeeMultivariate analysis; see alsoMcGill 1955.]

Gainer’s approach to a structural analysis of an information source rests on the distinction between internal structure, the relations between variables composing a set of stimuli, and external structure, the relations between stimuli and responses. This distinction is perhaps clarified by referring to Figure 1. A total ensemble of 16 possible stimulus patterns results from the binary coding of four variables: figure shape (circle or triangle), dot location (above or below), gap location (right or left), and number of internal lines (one or two). Thus, the 16 possible patterns have a potential

information transmission of four bits per pattern. Let us now arbitrarily select a subset of eight of the possible 16 patterns. Such a subset has a potential information transmission of only three bits per pattern. According to Garner, the one bit lost in terms of external structure can appear in the form of internal structure. In the eight patterns of subset A of Figure 1, internal structure is represented by a simple contingency between figure shape and gap location (right gap with circles; left gap with triangles). In the eight patterns of subset J, internal structure is represented by a four-variable interaction among the variables. In these terms, subsets A and J represent the same amount of external structure and the same amount of internal structure but differ in the form of their internal structure. As a result of the differences in form of internal structure, the identification, from the 16 possible patterns, of the eight patterns of subset A is substantially superior to the identification of the eight patterns of subset J. The free recall of subsets with simple internal structure is also superior to that of subsets with complex internal structure (Whitman & Garner 1962). In the opinion of the author of this article, an extension of this method of structural analysis might reasonably be expected to provide a tool for the experimental assault upon qualitative differences in information. [For further discussion of structural analysis, seeSystems analysis, article onPsychological systems.]

The close relationship between information theory and psychology can be best summarized by the concluding remarks of the 1954 Conference on Information Theory in Psychology, organized by Henry Quastler. Although more than a decade has intervened, the remarks are nonetheless appropriate today.

There is something frustrating and elusive about information theory. At first glance, it seems to be the answer to one’s problems, whatever these problems may be. At second glance it turns out that it doesn’ t work out as smoothly or as easily as anticipated. Such disappointments, together with some instances of undoubtedly ill-advised use, have caused a certain amount of irritation. So nowadays one is not safe in using information theory without loudly proclaiming that he knows what he is doing and that he is quite aware that this method is not going to alleviate all worries. Even then, he is bound to get his quota of stern warnings against unfounded assumptions he has allegedly made.

It seems that these warnings have reached a point of diminishing returns. Most of us who still use information theory are quite aware of the fact that this method is difficult, full of pitfalls, and definitely limited. We are hopeful, of course—nobody would work in a difficult field without expecting results—but always ready for a sober evaluation of the domain of our method.

It has become very clear that information theory is one thing, information measures another. The two are historically linked, but can very well be disassociated. Information theory is defined by concepts and problems. It deals in a very particular way with amounts of variation, and with operations which have effect on such amounts. Information theory needs some measure of variation—but it doesn’ t have to be H; neither is the applicability of H and related measures restricted to information theory. (Quastler 1955, pp. 2-3)

Although a biophysicist by training, Quastler was acutely sensitive to psychological problems, as witnessed by the perspective of the quotation cited above. His death was a serious setback to the further definition of the role of information theory within psychology.

The historian of psychology will undoubtedly note the evangelistic endeavors in the early 1950s to remake psychology in the image of information theory. He will also note the flickering of that evangelical spirit as the concepts became more and more absorbed into the fabric of psychology. It is this author’s guess that future historians will note that the development of information theory within psychology followed Garner’s lead in highlighting the structural, rather than the metric, features of information measurement.

Irwin Pollack

[Other relevant material may be found inCybernetics; Mathematics; Models, Mathematical; Probability; Simulation; and in the biography ofWiener.]


Attneave, Fred 1959 Applications of Information Theory to Psychology: A Summary of Basic Concepts, Methods, and Results. New York: Holt.

Bricker, Peter D. 1955 The Identification of Redundant Stimulus Patterns. Journal of Experimental Psychology 49:73-81.

Broadbent, Donald E. 1958 Perception and Communication. Oxford: Pergamon.

Bross, Irwin D. J. 1966 Algebra and Illusion. Science 152:1330 only. → An interesting comment on the “fruitfulness” of applying formal models to science.

California, University of, Los Angeles, Western Data Processing Center 1961 Contributions to Scientific Research in Management: The Proceedings of a Scientific Program. Los Angeles: The University. → See especially the article by Jacob Marshak, “Remarks on the Economics of Information.”

Chapanis, Alphonse 1954 The Reconstruction of Abbreviated Printed Messages. Journal of Experimental Psychology 48:496-510.

Cherry, Colin (1957) 1961 On Human Communication: A Review, a Survey, and a Criticism. New York: Wiley. → A paperback edition was published in 1963.

Dahling, Randall L. 1962 Shannon’s Information Theory: The Spread of an Idea. Pages 119-139 in Stanford University, Institute for Communication Research, Studies of Innovation and of Communication to the Public, by Elihu Katz et al. Stanford, Calif.: The Institute.

Frick, F. C. 1959 Information Theory. Volume 2, pages 611-636 in Sigmund Koch (editor), Psychology: A Study of a Science. New York: McGraw-Hill.

Frick, F. C; and Sumby, W. H. 1952 Control Tower Language. Journal of the Acoustical Society of America 24:595-596.

Fritz, L.; and Grier, George W. jr. 1955 Pragmatic Communication: A Study of Information Flow in Air Traffic Control. Pages 232-243 in Henry Quastler (editor), Information Theory in Psychology. Glencoe, III.: Free Press.

Garner, Wendell R. 1962 Uncertainty and Structure as Psychological Concepts. New York: Wiley.

Garner, Wendell R.; and Hake, Harold W. 1951 The Amount of Information in Absolute Judgments. Psychological Review 58:446-459.

Garner, Wendell R.; and Mcgill, William J. 1956 The Relation Between Information and Variance Analyses. Psychometrika 21:219-228.

Gilbert, E. N. 1966 Information Theory After 18 Years. Science 152:320-326. → An overview from the point of view of the mathematical statistician.

Hake, Harold W.; and Garner, Wendell R. 1951 The Effect of Presenting Various Numbers of Discrete Steps on Scale Reading Accuracy. Journal of Experimental Psychology 42:358-366.

Hartley, R. V. L. 1928 Transmission of Information. Bell System Technical Journal 7:535-563.

Hick, W. E. 1952 On the Rate of Gain of Information. Quarterly Journal of Experimental Psychology 4:11-26.

Kullback, Solomon 1959 Information Theory and Statistics. New York: Wiley. → Considers a development of statistical theory along information lines.

Leonard, J. Alfred 1959 Tactical Choice Reactions: I. Quarterly Journal of Experimental Psychology 11: 76-83.

Luce, R. Duncan 1960 The Theory of Selective Information and Some of Its Behavioral Applications. Pages 1-119 in R. Duncan Luce (editor), Developments in Mathematical Psychology. Glencoe, III.: Free Press.

Mcgill, William J. 1954 Multivariate Information Transmission. Psychometrika 19:97-116.

Mcgill, William J. 1955 Isomorphism in Statistical Analysis. Pages 56-62 in Henry Quastler (editor), Information Theory in Psychology. Glencoe, III.: Free Press.

Miller, George A. 1951 Language and Communication. New York: McGraw-Hill. → A paperback edition was published in 1963.

Miller, George A. 1953 What Is Information Measurement? American Psychologist 8:3-li.

Miller, George A. 1956 The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information. Psychological Review 63: 81-97.

Miller, George A.; and Frick, Frederick C. 1949 Statistical Behavioristics and Sequences of Responses. Psychological Review 56:311-324.

Miller, George A.; Heise, George A.; and Lichten, Wwilliam 1951 The Intelligibility of Speech as a Function of the Context of the Test Materials. Journal of Experimental Psychology 41:329-335.

Miller, James G. 1955 Toward a General Theory for the Behavioral Sciences. American Psychologist 10: 513-531.

Mowbray, G. H.; and Rhoades, M. V. 1959 On the Reduction of Choice Reaction Times With Practice. Quarterly Journal of Experimental Psychology 11:16-23.

Quastler, Henry (editor) 1955 Information Theory in Psychology: Problems and Methods. Glencoe, III.: Free Press.

Shannon, Claude E. (1948) 1962 The Mathematical Theory of Communication. Pages 3-91 in Claude E. Shannon and Warren Weaver, The Mathematical Theory of Communication. Urbana: Univ. of Illinois Press. → First published in Volume 27 of the Bell System Technical Journal.

Shannon, Claude E. 1951 Prediction and Entropy of Printed English. Bell System Technical Journal 30: 50-64.

Whitman, James R.; and Garner, Wendell R. 1962 Free Recall Learning of Visual Figures as a Function of Form of Internal Structure. Journal of Experimental Psychology 64:558-564.

Whitman, James R.; and Garner, Wendell R. 1963 Concept Learning as a Function of Form of Internal Structure. Journal of Verbal Learning and Verbal Behavior 2:195-202.

Wiener, Norbert (1948) 1962 Cybernetics: Or, Control and Communication in the Animal and the Machine. 2d ed. Cambridge, Mass.: M.I.T. Press.

Cite this article
Pick a style below, and copy the text for your bibliography.

  • MLA
  • Chicago
  • APA

"Information Theory." International Encyclopedia of the Social Sciences. . 25 Apr. 2017 <>.

"Information Theory." International Encyclopedia of the Social Sciences. . (April 25, 2017).

"Information Theory." International Encyclopedia of the Social Sciences. . Retrieved April 25, 2017 from

Information Theory

Information Theory

"Information" is a term used universally in fields associated with computing technology. It is often loosely applied when no other term seems to be readily at hand; examples of this are terms such as "information technology," "information systems," and "information retrieval." It surprises most people when they discover that the term "information" actually has a very real meaning in an engineering context. It does not mean the same thing as "knowledge" or "data," but is instead intertwined with elements of communication systems theory.

When computing systems are connected together, it is necessary to consider how they might exchange data and work cooperatively. This introduces the notion that messages can be formulated by computing machines and be dispatched to other machines that receive them and then deal with their contents. All of the issues that are involved with these transmission and reception operations constitute what is known as "information theory."

A communication channel is a connective structure of some sort that supports the exchange of messages. Examples are wired interconnections such as ethernets or perhaps fiber optic cables, or even wireless communications such as microwave links. These are all paths over which digital information can be transmitted.

Noise and Errors

Information theory has to do with how messages are sent via communication channels. When this field was first being studied, the common consensus was that it would be impossible to get digital machines to make exchanges in a way that was guaranteed to be error-free. This is because all the components used to construct computing machines are imperfect; they tend to distort the electrical signals they process as a side effect of their operation.

The components add extra electrical signals called "noise." In this instance, the term "noise" does not necessarily refer to something that can be heard. Instead, "noise" is used to describe the corruption of electrical signals, which makes them harder for devices in the computer system to understand correctly. This signal corruption might appear as extra voltage levels in the signal, or some signals may be completely missing.

Because communication channels inherently contain noise, exchanged messages are always being damaged in one way or another. When a particular message is dispatched from one machine to another, there is a chance that it might be distorted by imperfections in the channel and therefore not correctly interpreted by the recipient. Channel noise cannot be entirely eliminated. For this reason, early information theorists believed that it was a reality that messages transmitted digitally would not arrive at their destinations in exactly the way that the senders had sent them.

Information Defined

This pessimistic outlook all changed in 1947 with the publication of Claude Shannon's seminal study of information theory. He proposed that even in the presence of noise (which it had been agreed was unavoidable), it was possible to ensure error-free transmission. This effectively heralded the era of a new field of computing science and engineering: that of information theory. "Information" was granted a precise definition. It was related to the inverse of the probability of the content of a message. For example, if a person was told in a message that "tomorrow, the sky will be blue," that person would conclude that there was not much in that message that he or she had not already expected. In other words, there was not much information in that message, because it essentially reaffirmed an expectation. There is not much information in that message, because the probability of the outcome is high. Conversely, if one were told in a message that "tomorrow, the sky will be green," then he or she would be greatly surprised. There is more information in this second message purely by virtue of the fact that the probability of this event is so much lower. The information pertaining to a particular event is inversely proportional to the logarithm of the probability of the event actually taking place.

Information log (1/p) where p is the probability of an event within the message.

Shannon's work led to a new field of engineering. Quantities such as the capacity of a channel to transmit information could be evaluated. This provided telecommunications specialists with a way of knowing just how many messages could be simultaneously transmitted over a channel without loss.


In addition to this, ways of representing, or encoding, information during transmission from one place to another were explored; some approaches were better than others. Encoding simply means that some pieces of information that are normally represented by particular symbols are converted to another collection of symbols that might better suit their reliable transfer. For example, text messages are often represented by collections of alphabetic characters when created and read, but they are then converted into another form, such as ASCII codes, for transmission over a communication channel. At the receiving end, the codes are converted back into text again.

The advantage these conversions offer is that some ways of representing information are more robust to the effects of noise in information channels than others, and perhaps more efficient, as well. So, the extra expense involved in carrying out these encoding and decoding operations is offset by the reliability they offer.

Information theory has become a mature field of engineering and computer science. It has enhanced the reliability of computer-based networks at all levels, from small local area networks (LANs) to the Internet, and it has done so in a way that is unobtrusive, so that users are unaware of its presence. In addition to this, information theory has also assisted in the development of techniques for encoding digital information and sending this over analog communication channels that were not designed for handling computer-based transmissions, such as the public telephone networks. It is important to remember that these contributions of information theory to modern computing began with the ability to define information mathematically, and the work Claude Shannon did to understand communication channels and encoding schemes.

see also Cybernetics; Networks; Shannon, Claude E.

Stephen Murray


Lathi, Bhagwandas P. Modern Digital and Analog Communication Systems, 2nd ed. Orlando, FL: Holt, Rinehart and Winston, 1989.

Proakis, John G. Digital Communications, 3rd ed. New York: McGraw-Hill, 1995.

Shanmugam, K. Sam. Digital and Analog Communication Systems. New York: John Wiley & Sons, 1985.

Sklar, Bernard. Digital Communications, Fundamentals and Applications. Englewood Cliffs, NJ: Prentice Hall, 1988.

Cite this article
Pick a style below, and copy the text for your bibliography.

  • MLA
  • Chicago
  • APA

"Information Theory." Computer Sciences. . 25 Apr. 2017 <>.

"Information Theory." Computer Sciences. . (April 25, 2017).

"Information Theory." Computer Sciences. . Retrieved April 25, 2017 from

Information Theory


Information theory posits that information is simply data that reduces the level of uncertainty on a given subject or problem, and can be quantified by the extent to which uncertainty is diminished. More importantly for the practical uses of information theory, however, is that it fits the concept of information and communication into mathematical theory. All content, no matter what its form-music, text, video-can be reduced to a simple string of ones and zeros, thereby allowing tremendous flexibility in the mode of interpretation of that information. The application of information theory has had a tremendous impact on telecommunications and information technology and, by implication, the Internet, since it deals expressly with information-carrying capacities.


Information theory is the product of the renowned scientist Claude Shannon, widely acknowledged as one of the most innovative thinkers of his day. Born in 1916 in Petoskey, Michigan, Shannon grew up in an era when telecommunications were primarily limited to the telegraph and the telephone. From an early age, Shannon displayed an affinity for electronic equipment and radios and a penchant for devising his own inventions, much in the spirit of his hero and distant relative Thomas Edison.

Shannon attended the University of Michigan and later the Massachusetts Institute of Technology (MIT), studying electrical engineering and mathematics, in which he excelled. After college, he went to work for Bell Telephone Laboratories, where he worked on cryptographic systems using early computers. In 1948, on the strength of his work and research, Shannon published his "A Mathematical Theory of Communication," a breakthrough paper that for the first time demonstrated that all information exchanges could be expressed digitally in terms of ones and zeros, based on mathematical reductions.

Shannon redefined the traditional concept of entropy to mean, in the realm of information theory, the amount of uncertainty in a given system. Information was simply anything that reduced the level of uncertainty, and hence the degree of entropy. To measure the amount of information, Shannon devised a mathematical theory in which capacities could be expressed in terms of bits per second. In fact, many historians of science insist Shannon was the first to employ the term "bit," which is shorthand for "binary digit." All information, Shannon claimed, could ultimately be understood as a string of bits, and could therefore be stored and transmitted as such. Shannon also developed theories on the practical transmission of digital information. He surmised that when information is sent over "noisy" or compromised channels, simply adding redundant bits to the message can smooth out and correct the corruption in the information.


As the foundation upon which modern telecommunications systems, technologies, and theories are built, information theory was of central importance to the Internet era; it was ultimately responsible for most of the revolutionary breakthroughs in digital communication and information storage. Compact discs and digital television, not to mention the Internet, are everyday items that owe their existence to information theory. Information theory holds that all channels of information transmission and storage can also be expressed and analyzed in terms of bits, thereby providing the link that allowed for perfecting physical methods of information transmission, including how to send highly encoded Internet signals over simple telephone wires.

In the Internet world, information theory proved tremendously important not only for the basics of Internet telecommunications but also for cryptography, another field in which Shannon worked. Cryptography, in the contemporary sense, refers to protecting electronic information from compromise by applying mathematical algorithms consisting of a series of bits that scrambles the information and later decodes it when necessary. Cryptography was a key component of the development of e-commerce, since it lay at the heart of privacy and transaction protection.

Shannon's theory proved one of the great intellectual breakthroughs of the 20th century, as it gave scientists a new way to consider information and provided the basic framework within which all digital communications technology would take shape. In addition to its role as the bedrock of modern telecommunications, information theory also washed over fields as disparate as biology, ecology, medicine, mathematics, psychology, linguistics, and even investment theory.


"Claude Shannon." The Times (London), March 12, 2001.

Golomb, Solomon W. "Retrospective: Claude E. Shannon (1916-2001)." Science, April 20, 2001.

Robinson Pierce, John. An Introduction to Information Theory: Symbols, Signals and Noise. 2nd ed. Mineola, NY: Dover Publications, 1980.

SEE ALSO: Cryptography; Encryption

Cite this article
Pick a style below, and copy the text for your bibliography.

  • MLA
  • Chicago
  • APA

"Information Theory." Gale Encyclopedia of E-Commerce. . 25 Apr. 2017 <>.

"Information Theory." Gale Encyclopedia of E-Commerce. . (April 25, 2017).

"Information Theory." Gale Encyclopedia of E-Commerce. . Retrieved April 25, 2017 from

information theory

information theory or communication theory, mathematical theory formulated principally by the American scientist Claude E. Shannon to explain aspects and problems of information and communication. While the theory is not specific in all respects, it proves the existence of optimum coding schemes without showing how to find them. For example, it succeeds remarkably in outlining the engineering requirements of communication systems and the limitations of such systems.

In information theory, the term information is used in a special sense; it is a measure of the freedom of choice with which a message is selected from the set of all possible messages. Information is thus distinct from meaning, since it is entirely possible for a string of nonsense words and a meaningful sentence to be equivalent with respect to information content.

Measurement of Information Content

Numerically, information is measured in bits (short for binary digit; see binary system). One bit is equivalent to the choice between two equally likely choices. For example, if we know that a coin is to be tossed but are unable to see it as it falls, a message telling whether the coin came up heads or tails gives us one bit of information. When there are several equally likely choices, the number of bits is equal to the logarithm of the number of choices taken to the base two. For example, if a message specifies one of sixteen equally likely choices, it is said to contain four bits of information. When the various choices are not equally probable, the situation is more complex.

Interestingly, the mathematical expression for information content closely resembles the expression for entropy in thermodynamics. The greater the information in a message, the lower its randomness, or "noisiness," and hence the smaller its entropy. Since the information content is, in general, associated with a source that generates messages, it is often called the entropy of the source. Often, because of constraints such as grammar, a source does not use its full range of choice. A source that uses just 70% of its freedom of choice would be said to have a relative entropy of 0.7. The redundancy of such a source is defined as 100% minus the relative entropy, or, in this case, 30%. The redundancy of English is estimated to be about 50%; i.e., about half of the elements used in writing or speaking are freely chosen, and the rest are required by the structure of the language.

Analysis of the Transfer of Messages through Channels

A message proceeds along a channel from the source to the receiver; information theory defines for any given channel a limiting capacity or rate at which it can carry information, expressed in bits per second. In general, it is necessary to process, or encode, information from a source before transmitting it through a given channel. For example, a human voice must be encoded before it can be transmitted by telephone. An important theorem of information theory states that if a source with a given entropy feeds information to a channel with a given capacity, and if the source entropy is less than the channel capacity, a code exists for which the frequency of errors may be reduced as low as desired. If the channel capacity is less than the source entropy, no such code exists.

The theory further shows that noise, or random disturbance of the channel, creates uncertainty as to the correspondence between the received signal and the transmitted signal. The average uncertainty in the message when the signal is known is called the equivocation. It is shown that the net effect of noise is to reduce the information capacity of the channel. However, redundancy in a message, as distinguished from redundancy in a source, makes it more likely that the message can be reconstructed at the receiver without error. For example, if something is already known as a certainty, then all messages about it give no information and are 100% redundant, and the information is thus immune to any disturbances of the channel. Using various mathematical means, Shannon was able to define channel capacity for continuous signals, such as music and speech.


See C. E. Shannon and W. Weaver, The Mathematical Theory of Communication (1949); M. Mansuripur, Introduction to Information Theory (1987); J. Gleick, The Information: A History, a Theory, a Flood (2011).

Cite this article
Pick a style below, and copy the text for your bibliography.

  • MLA
  • Chicago
  • APA

"information theory." The Columbia Encyclopedia, 6th ed.. . 25 Apr. 2017 <>.

"information theory." The Columbia Encyclopedia, 6th ed.. . (April 25, 2017).

"information theory." The Columbia Encyclopedia, 6th ed.. . Retrieved April 25, 2017 from

Information Theory

Information Theory

The version of information theory formulated by mathematician and engineer Claude Shannon (19162001) addresses the processes involved in the transmission of digitized data down a communication channel. Once a set of data has been encoded into binary strings, these strings are converted into electronic pulses, each of equal length, typically with 0 represented by zero volts and 1 by + 5 volts. Thus, a string such as 0100110 would be transmitted as seven pulses:

It is clear from the example that the lengths of pulses must be fixed in order to distinguish between 1 and 11. In practice, the diagram represents an idealized state. Electronic pulses are not perfectly discrete, and neither are the lengths of pulses absolutely precise. The electronic circuits that generate these signals are based upon analogue processes that do not operate perfectly, and each pulse will consist of millions of electrons emitted and controlled by transistors and other components that only operate within certain tolerances. As a result, in addition to the information sent intentionally down a channel, it is necessary to cater for the presence of error in the signal; such error is called noise.

This example illustrates the dangers inherent in the differences between the way one represents a process in a conceptual system and the underlying physical processes that deliver it. To conceive of computers as if they operate with perfectly clear 0 and 1 circuits is to overlook the elaborate and extensive error-checking necessary to ensure that data are not transmitted incorrectly, which is expensive both in time and cost.

In 1948, Shannon published what came to be the defining paper of communication theory. In this paper he investigated how noise imposes a fundamental limit on the rate at which data can be transmitted down a channel. Early in his paper he wrote:

The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point. Frequently the messages have meaning ; that is they refer to or are correlated according to some system with certain physical or conceptual entities. These semantic aspects of communication are irrelevant to the engineering problem. (p.379)

The irrelevance of meaning to communication is precisely the point that encoding and the transmission of information are not intrinsically connected. Shannon realized that if one wishes to transmit the binary sequence 0100110 down a channel, it is irrelevant what it means, not least because different encodings can make it mean almost anything. What matters is that what one intends to transmitas a binary stringshould arrive "exactly or approximately" at the other end as that same binary string. The assumption is that the encoding process that produces the binary string and the decoding process that regenerates the original message are known both to the transmitter and the receiver. Communication theory addresses the problems of ensuring that what is received is what was transmitted, to a good approximation.

See also Information; Information Technology


shannon, claude e. "a mathematical theory of communication." the bell system technical journal 27 (1948): 379423, 623656.

john c. puddefoot

Cite this article
Pick a style below, and copy the text for your bibliography.

  • MLA
  • Chicago
  • APA

"Information Theory." Encyclopedia of Science and Religion. . 25 Apr. 2017 <>.

"Information Theory." Encyclopedia of Science and Religion. . (April 25, 2017).

"Information Theory." Encyclopedia of Science and Religion. . Retrieved April 25, 2017 from

information theory

information theory The study of information by mathematical methods. Informally, information can be considered as the extent to which a message conveys what was previously unknown, and so is new or surprising. Mathematically, the rate at which information is conveyed from a source is identified with the entropy of the source (per second or per symbol). Although information theory is sometimes restricted to the entropy formulation of sources and channels, it may include coding theory, in which case the term is used synonymously with communication theory.

Cite this article
Pick a style below, and copy the text for your bibliography.

  • MLA
  • Chicago
  • APA

"information theory." A Dictionary of Computing. . 25 Apr. 2017 <>.

"information theory." A Dictionary of Computing. . (April 25, 2017).

"information theory." A Dictionary of Computing. . Retrieved April 25, 2017 from

information theory

information theory Mathematical study of the laws governing communication channels. It is primarily concerned with the measurement of information and the methods of coding, transmitting, storing and processing this information.

Cite this article
Pick a style below, and copy the text for your bibliography.

  • MLA
  • Chicago
  • APA

"information theory." World Encyclopedia. . 25 Apr. 2017 <>.

"information theory." World Encyclopedia. . (April 25, 2017).

"information theory." World Encyclopedia. . Retrieved April 25, 2017 from