|
Search over 100 encyclopedias and dictionaries: |
Research categories | Follow us on Twitter |
Research categories
View all topics in the newsView all reference sources at Encyclopedia.com |
|||
Information Theory
Information TheoryThe concepts and measures of the statistical theory of selective information (information theory) have become so thoroughly enmeshed with the whole of behavioral science that delineation of the exact contribution of the theory is nearly impossible. The very verbal descriptive fabric of the behavioral sciences has become thoroughly interlaced with informational concepts: individuals or groups are described as “information sources” or “receivers”; skilled performance is described as “information processing”; memory is described as “information storage”; nerves are described as “communication channels”; the patterning of neural impulses is described as “information coding”; the brain is described as “an informational computer,” etc. Indeed, the molecule, the cell, the organ, the individual, the group, the organization, and the society have all been examined from the point of view of a general systems theory which focuses upon the information-processing, rather than upon the energetic, characteristics of each system (J. G. Miller 1955). Perhaps the closest analogue to the impact of information theory upon psychology is the impact that behaviorism had upon psychology, with the subsequent redefinition of psychology as the science of behavior. In both cases questions of definition have replaced questions of possible relevance. [SeeSystems Analysis, article onGeneral Systems Theory.] Information theory is a formal mathematical theory, a branch of the theory of probability. As such, the theory is self-contained; it does not require verification by experiment (Frick 1959). Yet, formal theories often have profound influence as conceptual models and as models for experiment. The theory is indelibly flavored by the context of electrical communications and control in which it was developed. Cherry ([1957] 1961, pp. 30-65) has charted the development of the theory within the field of communications. The genesis of the modern theory of statistical communications is due primarily to Hartley (1928). Building upon earlier work, by Nyquist and Küpfmüller, Hartley showed that in order to transmit a given amount of information a communication channel must undergo an exchange between its duration and its bandwidth, or frequency range. With a narrower frequency range, the communication channel must be available for a longer duration to transmit a given amount of information. Information was identified with a arbitrary selection of symbols from a set of defined symbols. The measure of information was defined in terms of the logarithm of the number of equally likely symbols available for communication. The essence of the idea is that information is measured in terms of what could have been communicated under a defined set of circumstances rather than in terms of what actually is communicated at a particular moment. The definition is sufficiently broad to provide a general framework for the specification of a wide class of communication systems. Following Hartley, numerous distinguished contributions were made throughout the world. These included the contribution by R. A. Fisher in characterizing the efficiency and sufficiency of statistics and that of D. Gabor, who introduced the concept of the logan as the elementary unit of information. It was the contributions of Shannon (1948) and of Wiener (1948), however, which provided the intellectual synthesis that marked the birth of modern information theory. [SeeCyberneticsand the biography ofWiener; see alsoFrick 1959.] Shannon provides a scheme for a general communication system.
This description, while initially directed toward electrical communication systems, is sufficiently general to use in the consideration of a wide class of information systems. The measure of information. The essential idea of the Shannon-Wiener mathematical theory of communication is that communication is a statistical process which can be described only in probabilistic terms. When it is possible to predict completely each message out of a source of possible messages, by definition no information will be conveyed by the message. When any one message is as probable as any other possible message, maximum information will be conveyed by the message. From this point of view, the information of any message is associated with the reduction in the range of possible selections by the receiver of any message, i.e., with the reduction of the receiver’s uncertainty. Uncertainty, choice, information, surprise value, and range of selections, therefore, all become intimately related concepts. The meaning, reasonableness, and personal importance of the message, however, are not considered within this approach to communications. The concern of the theory is to provide a measure of the amount of information for any selection or choice from defined sources of information. The measure of the amount of information associated with a given selection can be arbitrarily defined as the logarithm of the number of equally likely alternatives. The measure can also be rigorously derived by initially defining a set of conditions which it must satisfy. Shannon employed the latter procedure, and the interested reader is referred to the original article (1948) for the statement of the conditions. Luce (1960) has also listed a set of necessary conditions which lead to the same result. The conditions are (a) independence of irrelevant alternatives—the amount of information transmitted by the selection of any item from a defined set of items shall be a real number which depends only upon the probability of that item, p(i), and not upon the probability distribution of the other items; (b) continuity—the measure of information shall be a continuous (in the mathematical sense) function of p(i); (c) additivity—if two independent selections, i and j, with probabilities p(i) and p(j), are made, the amount of information transmitted in the joint selection of (i, j) shall be a simple sum of the information transmitted by each of the selections; and (d) scale—one unit of information is associated with a binary selection between two equally likely alternatives; the unit is called the bit. The only measure which satisfies all of these conditions for any symbol i is the negative logarithm (to the base 2) of the probability of i, p(i). And, over the ensemble of possible items, the average information of the ensemble is the average weighted logarithmic measure, H, or The H measure has a number of important properties. First, H ≥ 0; it is 0 if, and only if, the probability of a single i equals 1, while the probability of the remaining (n —1)i is equal to 0; otherwise H is greater than 0. Information is associated with any ensemble of items whenever there is uncertainty about which item will be presented. Second, H is maximum when all of the items are equally probable. If there are n possible items, the uncertainty associated with the set of items is maximum when p(i) = 1/n. Third, H is maximum if all items occur independently of each other. If the occurrence of one item is related to the occurrence of another, the average information is reduced by the extent of that relatedness. This property is extremely important for the behavioral sciences, since the information measure provides a measure of the degree of relatedness between items in a set of possible items. The ratio of the uncertainty of a source to the maximum possible uncertainty with the same set of symbols is a measure of the relative transmitting efficiency of the source; Shannon has termed this the relative entropy of the source. And 1 minus the relative entropy has been defined as the redundancy of the source. Fourth, it is possible to encode a long sequence of items so that, on the average, H binary units per item are required to specify the sequence, even though more than H binary units per item are required to specify a short sequence. This property, called the encoding property, has recently become extremely important for electrical communications but has not been much exploited by the behavioral sciences. History. Although Hartley’s development of the theory provided both the essential definition of information and a measure for its description, it went little noticed in the behavioral sciences. The historian of science will probably note that the behavioral sciences were not ready for the idea, nor, for that matter, was communications engineering fully ready. Shannon’s development, on the other hand, was enthusiastically grasped very early by a handful of psychologists, primarily those associated with the Psycho-Acoustics Laboratory at Harvard University. George A. Miller, in a personal communication to the author of this article in January 1964, has described the intellectual ferment associated with the early developments. Noteworthy is Miller’s comment that “had the group not been actively interested in other related ideas from the communication engineers, the occurrence of Shannon’s paper probably would have gone unnoticed by the group.” The initial enthusiasm was stirred by the realization that Shannon had provided a tool for describing discrete events that at the same time was compatible with the continuous Fourier analysis of communication systems, with which the group was already acquainted. Dahling has provided a valuable bibliographic survey of the early spread of concepts of the theory of selective information through a number of fields, ranging from telecommunication to journalism. Information theory provides an interesting case study for the diffusion of ideas in modern science, because of its great impact and its relative recency. From his analysis Dahling concluded that “the idea was drawn from a flurry of current related activity and, as the idea developed, it gained impetus and speed of adoption from the same surrounding activity that gave rise to it” and that “the adoption of the theory was speeded by a clearly apparent need for such a theory” (1962, p. 121). Moreover, “because the idea dealt with matters of common interest, it was able to spread more rapidly between disciplines” (p. 126). The idea “spread to other disciplines in proportion to its congeniality with their methods” and “to its analogic and suggestive value” (p. 132). Experimental psychologists working in communication problems and trained in the mathematics of communication engineering became logical carriers of the theory to the behavioral sciences. The introduction of information concepts to psychology was made by several routes. A number of excellent summaries are available that trace this development within experimental psychology: Attneave (1959), Garner (1962), Luce (1960), G. A. Miller (1956), and Quastler (1955). Here we shall briefly summarize a few of the salient avenues of entry into the field of experimental psychology, although parallel developments can doubtlessly be cited in any of the behavioral sciences. Thus, the balance of this review is a highly selective examination of the role of information theory in the social sciences. It is not a general review. Organization of behavior sequences. The information measure was introduced to psychology in a now classic paper by Miller and Frick (1949). Their primary concern was the description of sequences of discrete responses. Their aim was twofold : the development of a stochastic model of behaviorial sequences and the development of a quantitative measure of the organization of the sequences. The Markov model, employed by Shannon, served as their descriptive model for the generation of response sequences; the information measure served as their measure of the degree of organization of the sequences [seeMarkov Chains]. For illustrative purposes response sequences of rats and of young girls, provided earlier in a multiple-choice experiment by Hamilton, were analyzed. An index of response stereotypy was identified as 1.0 minus the ratio of the obtained uncertainty, relative to maximum possible uncertainty. Thus, the measure of the stereotypy of response sequences is formally identical with the measure of relative redundancy of communication sources. For example, two responses, left and right, are defined as the class of responses available for observation of a rat in a given experimental situation. If the sequence of the rat’s responses were perfectly predictable (e.g., were all left responses or a left-right alternation sequence), there would be 0 uncertainty in specifying successive responses. Thus, an index of response stereotypy of 1 would be obtained. Conversely, if the rat responded left and right equally often and if the sequence of responses was unpredictable, there would be maximum uncertainty in specifying successive responses. An index of response stereotypy of 0 would be obtained. In Hamilton’s data identical indexes of response stereotypy were obtained for both girls and rats when the distributions of single-response choices were examined and when the distributions of pairs of successive choices were examined. The responses of girls became differentiated from those of the rats only when sequences of three successive choices were analyzed. In pointing out the importance of the higher-order statistical structure of response sequences and in providing an objective measure of its patterning, Miller and Frick laid the groundwork for the mathematical modeling of complex response sequences. [SeeRresponse Sets.] Language. The statistical analysis of language represents a special application of the analysis of response sequences. Indeed, interest in cryptographic secrecy systems profoundly shaped the direction of Shannon’s development of information theory. The English alphabet is potentially capable of producing many more different messages than it actually does. In practice certain letters are employed more frequently than others, e.g., e relative to z; and certain sequences occur more frequently than others, e.g., th relative to ht. Shannon (1951) measured the relative redundancy of English and obtained a lower bound of about 50 per cent and an upper bound of about 75 per cent redundancy relative to a random distribution of letters. A related observation is that English text may be nearly completely reconstructed when as much as 50 per cent of the text has been deleted (Shannon 1951; Chapanis 1954). Furthermore, in most communications environments the range of possible communications is strongly restricted by situational factors. In tower-pilot communications at air force bases (e.g., Fritz & Grier 1955; Frick & Sumby 1952), it was demonstrated that the over-all redundancy may approach 95 per cent, again relative to a random distribution of letters. As a result of Shannon’s work and, especially, its popularization by Miller, nonlinguists became willing to tackle the intricacies of language as a complex series of response sequences, amenable to measurement and quantitative specification. [SeeLinguistics.] A related development of information concepts in psychology was the demonstration of the important role of informational factors in the perception of speech (G. A. Miller 1951). For example, the intelligibility of words heard against a noise background is a critical function of the size of the test vocabulary (Miller et al. 1951), i.e., a critical function of stimulus information. A given word might be perceived nearly perfectly when it is embedded within a small vocabulary of possible words but might be perceived correctly only rarely when it is embedded within a large vocabulary of possible words. This result is reasonable if information is associated with what could have been presented, rather than in terms of what actually is presented. A number of different investigators have found the Miller, Heise, and Lichten data to be a rich source for testing theories of choice behavior. For example, Garner (1962) has demonstrated that these data are consistent with the assumption that under a given signal-to-noise ratio different vocabularies may transmit the same information. Stated differently, a large vocabulary coupled with a high error rate may yield nearly the same amount of information as that transmitted by a small vocabulary and a low error rate. [SeePerception, articlespeech perception.] Identification experiments. Another way that information concepts have been introduced to psychology is by the quantitative description, in informational terms, of the identification experiment (Garner & Hake 1951). In the identification experiment, one of n possible stimuli is presented to the subject, whose task is to identify which one of the n stimuli was presented. For example, the instruments of a symphony orchestra are defined as the class of objects for study, and one of the instruments is sounded at random. The listener is instructed to indicate which instrument of the defined set was sounded. When the stimuli are well ordered and associated with a common unit of measurement—weight, length, duration, frequency, etc.—identification performance may readily be described in terms of conventional statistical measures, e.g., the average error. When the stimuli are not well ordered, as in the case of the symphonic instruments or a series of photographs depicting various emotional moods, identification performance cannot readily be described in terms of such conventional statistical measures. The transmitted-information measure is ideally suited to be an appropriate nonmetric measure of relatedness between stimuli and responses. In addition, a vexing methodological problem is associated with the identification experiment for ordered stimuli. The identification experiment attempts to answer a straightforward question: how many stimuli can be correctly identified? The answer to the question, furnished by a given body of data, depends upon what criterion for errors is adopted. If a small average error is permitted, the same body of data will admit a larger number of distinguishable stimuli than if a large average error is permitted. A resolution to this problem is suggested by Hake and Garner (1951), who demonstrated that while the proportion of errors is greater in locating a point at one of 50 possible positions than at one of ten positions, the amount of transmitted information for ten possible positions is about equal to that for 50 possible positions. In turn, the amount of transmitted information specifies an equivalent number of perfectly identified categories. A concentrated flurry of experimental activity demonstrated limited transmitted-informational capabilities with a wide variety of stimulus variables. Although the categorical measure of information was better matched to nonmetric variables, most of the initial studies took place with welldefined stimulus variables upon continuous scales, e.g., length of line, direction, sound frequency, etc. The only apparent advantage of the information measure to these studies was that a single measure of performance could be employed across a wide set of variables. The historian will probably judge that many experimental psychologists had previously steered clear of variables with weak metric properties and, as a result, were unable to appreciate immediately the full potential of the informational technique for nonmetric variables. In any event, the number of identifiable stimulus components associated with any single stimulus variable was found to be disappointingly small—from less than four to about ten stimuli. However, experimenters quickly discovered that a large number of identifiable stimulus components could be achieved by employing a large number of different stimulus variables, as long as the number of components per variable was kept reasonably small. (This story is told in G. A. Miller 1956, by means of the engaging title “The Magical Number Seven, Plus or Minus Two”; and in Garner 1962.) Response speed and skilled tasks. An area of active experimental interest is the relation between the speed of response and the informational characteristics of skilled tasks. Hick (1952) sparked interest in this area with his demonstration that reaction time was linearly related to the logarithm of the number of possible alternatives available to the subject. Further, he suggested that a measure of the rate of information transmitted, in terms of the reciprocal of the slope of the empirical function relating reaction time to stimulus information, might be achieved from a discrete-trials reaction-time experiment. This transformation provides an estimate of the rate of information transmission in humans as about five to seven bits per second (Bricker 1955). More recent findings, however, have shown that with highly overlearned tasks, such as the reading of numerals, there is little change in reaction time as a function of the information of the task (Mowbray & Rhoades 1959; Leonard 1959). In this circumstance, identification of the rate of information transmission with the reciprocal of the slope of the reaction-time functions would lead to the unreasonable conclusion that there is an infinitely high rate of information transmission. The rate of information transmitted by the human receiver has been measured directly, in a variety of tasks, by a number of investigators. (This work is summarized in Quastler 1955, pp. 305-359.) In highly overlearned tasks there is an upper limit of about 35 bits per second, which is jointly a function of the highest effective speed, the highest effective range of alternatives, and the highest effective transmission rate (ibid., p. 345). For most tasks, man’s information-transmission rate is far lower than 35 bits per second. Electronic channels of communication, by contrast, have capabilities of millions or billions of bits per second. Clearly, man’s forte is not the rate at which he can process information. When one examines certain structural features of information processing, however, the disparity between man and machine is narrowed. The largest and most elaborate of computers cannot yet perform many pattern-recognition tasks routinely performed by children. However, the rapid development of sophisticated computer programs may radically alter this situation. As Garner suggests, we shall need to devote more emphasis to the structural, as distinguished from the metric, characteristics of information if we are to understand human information processing. [SeeLearning, article onAcquisition of skill; Reaction time.] Structure of information. The structural examination of information is based upon a multivariate extension of Shannon’s analysis by McGill (1954; 1955) and by Garner and McGill (1956). This work is summarized by Garner (1962). Garner has demonstrated the power of a multivariate information analysis for dissecting the information-relevant features of complex information sources. In this development, formulas associated with multiple correlation and with the analysis of variance find their direct counterparts within multivariate information analysis. Multivariate information analysis thus achieves the status of a general statistical tool for categorical materials, regardless of the appropriateness of the specific conceptualization of behavior in the terms of source, channel, noise, receiver, designation, etc. Furthermore, the efficiency of experimental design may be evaluated from the point of view of multivariate informational analysis. [SeeMultivariate analysis; see alsoMcGill 1955.] Gainer’s approach to a structural analysis of an information source rests on the distinction between internal structure, the relations between variables composing a set of stimuli, and external structure, the relations between stimuli and responses. This distinction is perhaps clarified by referring to Figure 1. A total ensemble of 16 possible stimulus patterns results from the binary coding of four variables: figure shape (circle or triangle), dot location (above or below), gap location (right or left), and number of internal lines (one or two). Thus, the 16 possible patterns have a potential information transmission of four bits per pattern. Let us now arbitrarily select a subset of eight of the possible 16 patterns. Such a subset has a potential information transmission of only three bits per pattern. According to Garner, the one bit lost in terms of external structure can appear in the form of internal structure. In the eight patterns of subset A of Figure 1, internal structure is represented by a simple contingency between figure shape and gap location (right gap with circles; left gap with triangles). In the eight patterns of subset J, internal structure is represented by a four-variable interaction among the variables. In these terms, subsets A and J represent the same amount of external structure and the same amount of internal structure but differ in the form of their internal structure. As a result of the differences in form of internal structure, the identification, from the 16 possible patterns, of the eight patterns of subset A is substantially superior to the identification of the eight patterns of subset J. The free recall of subsets with simple internal structure is also superior to that of subsets with complex internal structure (Whitman & Garner 1962). In the opinion of the author of this article, an extension of this method of structural analysis might reasonably be expected to provide a tool for the experimental assault upon qualitative differences in information. [For further discussion of structural analysis, seeSystems analysis, article onPsychological systems.] The close relationship between information theory and psychology can be best summarized by the concluding remarks of the 1954 Conference on Information Theory in Psychology, organized by Henry Quastler. Although more than a decade has intervened, the remarks are nonetheless appropriate today. There is something frustrating and elusive about information theory. At first glance, it seems to be the answer to one’s problems, whatever these problems may be. At second glance it turns out that it doesn’ t work out as smoothly or as easily as anticipated. Such disappointments, together with some instances of undoubtedly ill-advised use, have caused a certain amount of irritation. So nowadays one is not safe in using information theory without loudly proclaiming that he knows what he is doing and that he is quite aware that this method is not going to alleviate all worries. Even then, he is bound to get his quota of stern warnings against unfounded assumptions he has allegedly made. It seems that these warnings have reached a point of diminishing returns. Most of us who still use information theory are quite aware of the fact that this method is difficult, full of pitfalls, and definitely limited. We are hopeful, of course—nobody would work in a difficult field without expecting results—but always ready for a sober evaluation of the domain of our method. It has become very clear that information theory is one thing, information measures another. The two are historically linked, but can very well be disassociated. Information theory is defined by concepts and problems. It deals in a very particular way with amounts of variation, and with operations which have effect on such amounts. Information theory needs some measure of variation—but it doesn’ t have to be H; neither is the applicability of H and related measures restricted to information theory. (Quastler 1955, pp. 2-3) Although a biophysicist by training, Quastler was acutely sensitive to psychological problems, as witnessed by the perspective of the quotation cited above. His death was a serious setback to the further definition of the role of information theory within psychology. The historian of psychology will undoubtedly note the evangelistic endeavors in the early 1950s to remake psychology in the image of information theory. He will also note the flickering of that evangelical spirit as the concepts became more and more absorbed into the fabric of psychology. It is this author’s guess that future historians will note that the development of information theory within psychology followed Garner’s lead in highlighting the structural, rather than the metric, features of information measurement. Irwin Pollack [Other relevant material may be found inCybernetics; Mathematics; Models, Mathematical; Probability; Simulation; and in the biography ofWiener.] BIBLIOGRAPHYAttneave, Fred 1959 Applications of Information Theory to Psychology: A Summary of Basic Concepts, Methods, and Results. New York: Holt. Bricker, Peter D. 1955 The Identification of Redundant Stimulus Patterns. Journal of Experimental Psychology 49:73-81. Broadbent, Donald E. 1958 Perception and Communication. Oxford: Pergamon. Bross, Irwin D. J. 1966 Algebra and Illusion. Science 152:1330 only. → An interesting comment on the “fruitfulness” of applying formal models to science. California, University of, Los Angeles, Western Data Processing Center 1961 Contributions to Scientific Research in Management: The Proceedings of a Scientific Program. Los Angeles: The University. → See especially the article by Jacob Marshak, “Remarks on the Economics of Information.” Chapanis, Alphonse 1954 The Reconstruction of Abbreviated Printed Messages. Journal of Experimental Psychology 48:496-510. Cherry, Colin (1957) 1961 On Human Communication: A Review, a Survey, and a Criticism. New York: Wiley. → A paperback edition was published in 1963. Dahling, Randall L. 1962 Shannon’s Information Theory: The Spread of an Idea. Pages 119-139 in Stanford University, Institute for Communication Research, Studies of Innovation and of Communication to the Public, by Elihu Katz et al. Stanford, Calif.: The Institute. Frick, F. C. 1959 Information Theory. Volume 2, pages 611-636 in Sigmund Koch (editor), Psychology: A Study of a Science. New York: McGraw-Hill. Frick, F. C; and Sumby, W. H. 1952 Control Tower Language. Journal of the Acoustical Society of America 24:595-596. Fritz, L.; and Grier, George W. jr. 1955 Pragmatic Communication: A Study of Information Flow in Air Traffic Control. Pages 232-243 in Henry Quastler (editor), Information Theory in Psychology. Glencoe, III.: Free Press. Garner, Wendell R. 1962 Uncertainty and Structure as Psychological Concepts. New York: Wiley. Garner, Wendell R.; and Hake, Harold W. 1951 The Amount of Information in Absolute Judgments. Psychological Review 58:446-459. Garner, Wendell R.; and Mcgill, William J. 1956 The Relation Between Information and Variance Analyses. Psychometrika 21:219-228. Gilbert, E. N. 1966 Information Theory After 18 Years. Science 152:320-326. → An overview from the point of view of the mathematical statistician. Hake, Harold W.; and Garner, Wendell R. 1951 The Effect of Presenting Various Numbers of Discrete Steps on Scale Reading Accuracy. Journal of Experimental Psychology 42:358-366. Hartley, R. V. L. 1928 Transmission of Information. Bell System Technical Journal 7:535-563. Hick, W. E. 1952 On the Rate of Gain of Information. Quarterly Journal of Experimental Psychology 4:11-26. Kullback, Solomon 1959 Information Theory and Statistics. New York: Wiley. → Considers a development of statistical theory along information lines. Leonard, J. Alfred 1959 Tactical Choice Reactions: I. Quarterly Journal of Experimental Psychology 11: 76-83. Luce, R. Duncan 1960 The Theory of Selective Information and Some of Its Behavioral Applications. Pages 1-119 in R. Duncan Luce (editor), Developments in Mathematical Psychology. Glencoe, III.: Free Press. Mcgill, William J. 1954 Multivariate Information Transmission. Psychometrika 19:97-116. Mcgill, William J. 1955 Isomorphism in Statistical Analysis. Pages 56-62 in Henry Quastler (editor), Information Theory in Psychology. Glencoe, III.: Free Press. Miller, George A. 1951 Language and Communication. New York: McGraw-Hill. → A paperback edition was published in 1963. Miller, George A. 1953 What Is Information Measurement? American Psychologist 8:3-li. Miller, George A. 1956 The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information. Psychological Review 63: 81-97. Miller, George A.; and Frick, Frederick C. 1949 Statistical Behavioristics and Sequences of Responses. Psychological Review 56:311-324. Miller, George A.; Heise, George A.; and Lichten, Wwilliam 1951 The Intelligibility of Speech as a Function of the Context of the Test Materials. Journal of Experimental Psychology 41:329-335. Miller, James G. 1955 Toward a General Theory for the Behavioral Sciences. American Psychologist 10: 513-531. Mowbray, G. H.; and Rhoades, M. V. 1959 On the Reduction of Choice Reaction Times With Practice. Quarterly Journal of Experimental Psychology 11:16-23. Quastler, Henry (editor) 1955 Information Theory in Psychology: Problems and Methods. Glencoe, III.: Free Press. Shannon, Claude E. (1948) 1962 The Mathematical Theory of Communication. Pages 3-91 in Claude E. Shannon and Warren Weaver, The Mathematical Theory of Communication. Urbana: Univ. of Illinois Press. → First published in Volume 27 of the Bell System Technical Journal. Shannon, Claude E. 1951 Prediction and Entropy of Printed English. Bell System Technical Journal 30: 50-64. Whitman, James R.; and Garner, Wendell R. 1962 Free Recall Learning of Visual Figures as a Function of Form of Internal Structure. Journal of Experimental Psychology 64:558-564. Whitman, James R.; and Garner, Wendell R. 1963 Concept Learning as a Function of Form of Internal Structure. Journal of Verbal Learning and Verbal Behavior 2:195-202. Wiener, Norbert (1948) 1962 Cybernetics: Or, Control and Communication in the Animal and the Machine. 2d ed. Cambridge, Mass.: M.I.T. Press. |
|
|
Cite this article
"Information Theory." International Encyclopedia of the Social Sciences. 1968. Encyclopedia.com. 1 Jun. 2012 <http://www.encyclopedia.com>. "Information Theory." International Encyclopedia of the Social Sciences. 1968. Encyclopedia.com. (June 1, 2012). http://www.encyclopedia.com/doc/1G2-3045000574.html "Information Theory." International Encyclopedia of the Social Sciences. 1968. Retrieved June 01, 2012 from Encyclopedia.com: http://www.encyclopedia.com/doc/1G2-3045000574.html |
|
Information Theory
Information Theory"Information" is a term used universally in fields associated with computing technology. It is often loosely applied when no other term seems to be readily at hand; examples of this are terms such as "information technology," "information systems," and "information retrieval." It surprises most people when they discover that the term "information" actually has a very real meaning in an engineering context. It does not mean the same thing as "knowledge" or "data," but is instead intertwined with elements of communication systems theory. When computing systems are connected together, it is necessary to consider how they might exchange data and work cooperatively. This introduces the notion that messages can be formulated by computing machines and be dispatched to other machines that receive them and then deal with their contents. All of the issues that are involved with these transmission and reception operations constitute what is known as "information theory." A communication channel is a connective structure of some sort that supports the exchange of messages. Examples are wired interconnections such as ethernets or perhaps fiber optic cables, or even wireless communications such as microwave links. These are all paths over which digital information can be transmitted. Noise and ErrorsInformation theory has to do with how messages are sent via communication channels. When this field was first being studied, the common consensus was that it would be impossible to get digital machines to make exchanges in a way that was guaranteed to be error-free. This is because all the components used to construct computing machines are imperfect; they tend to distort the electrical signals they process as a side effect of their operation. The components add extra electrical signals called "noise." In this instance, the term "noise" does not necessarily refer to something that can be heard. Instead, "noise" is used to describe the corruption of electrical signals, which makes them harder for devices in the computer system to understand correctly. This signal corruption might appear as extra voltage levels in the signal, or some signals may be completely missing. Because communication channels inherently contain noise, exchanged messages are always being damaged in one way or another. When a particular message is dispatched from one machine to another, there is a chance that it might be distorted by imperfections in the channel and therefore not correctly interpreted by the recipient. Channel noise cannot be entirely eliminated. For this reason, early information theorists believed that it was a reality that messages transmitted digitally would not arrive at their destinations in exactly the way that the senders had sent them. Information DefinedThis pessimistic outlook all changed in 1947 with the publication of Claude Shannon's seminal study of information theory. He proposed that even in the presence of noise (which it had been agreed was unavoidable), it was possible to ensure error-free transmission. This effectively heralded the era of a new field of computing science and engineering: that of information theory. "Information" was granted a precise definition. It was related to the inverse of the probability of the content of a message. For example, if a person was told in a message that "tomorrow, the sky will be blue," that person would conclude that there was not much in that message that he or she had not already expected. In other words, there was not much information in that message, because it essentially reaffirmed an expectation. There is not much information in that message, because the probability of the outcome is high. Conversely, if one were told in a message that "tomorrow, the sky will be green," then he or she would be greatly surprised. There is more information in this second message purely by virtue of the fact that the probability of this event is so much lower. The information pertaining to a particular event is inversely proportional to the logarithm of the probability of the event actually taking place. Information log (1/p) where p is the probability of an event within the message. Shannon's work led to a new field of engineering. Quantities such as the capacity of a channel to transmit information could be evaluated. This provided telecommunications specialists with a way of knowing just how many messages could be simultaneously transmitted over a channel without loss. EncodingIn addition to this, ways of representing, or encoding, information during transmission from one place to another were explored; some approaches were better than others. Encoding simply means that some pieces of information that are normally represented by particular symbols are converted to another collection of symbols that might better suit their reliable transfer. For example, text messages are often represented by collections of alphabetic characters when created and read, but they are then converted into another form, such as ASCII codes, for transmission over a communication channel. At the receiving end, the codes are converted back into text again. The advantage these conversions offer is that some ways of representing information are more robust to the effects of noise in information channels than others, and perhaps more efficient, as well. So, the extra expense involved in carrying out these encoding and decoding operations is offset by the reliability they offer. Information theory has become a mature field of engineering and computer science. It has enhanced the reliability of computer-based networks at all levels, from small local area networks (LANs) to the Internet, and it has done so in a way that is unobtrusive, so that users are unaware of its presence. In addition to this, information theory has also assisted in the development of techniques for encoding digital information and sending this over analog communication channels that were not designed for handling computer-based transmissions, such as the public telephone networks. It is important to remember that these contributions of information theory to modern computing began with the ability to define information mathematically, and the work Claude Shannon did to understand communication channels and encoding schemes. see also Cybernetics; Networks; Shannon, Claude E. Stephen Murray BibliographyLathi, Bhagwandas P. Modern Digital and Analog Communication Systems, 2nd ed. Orlando, FL: Holt, Rinehart and Winston, 1989. Proakis, John G. Digital Communications, 3rd ed. New York: McGraw-Hill, 1995. Shanmugam, K. Sam. Digital and Analog Communication Systems. New York: John Wiley & Sons, 1985. Sklar, Bernard. Digital Communications, Fundamentals and Applications. Englewood Cliffs, NJ: Prentice Hall, 1988. |
|
|
Cite this article
Murray, Stephen. "Information Theory." Computer Sciences. 2002. Encyclopedia.com. 1 Jun. 2012 <http://www.encyclopedia.com>. Murray, Stephen. "Information Theory." Computer Sciences. 2002. Encyclopedia.com. (June 1, 2012). http://www.encyclopedia.com/doc/1G2-3401200560.html Murray, Stephen. "Information Theory." Computer Sciences. 2002. Retrieved June 01, 2012 from Encyclopedia.com: http://www.encyclopedia.com/doc/1G2-3401200560.html |
|
information theory
information theory or communication theory, mathematical theory formulated principally by the American scientist Claude E. Shannon to explain aspects and problems of information and communication. While the theory is not specific in all respects, it proves the existence of optimum coding schemes without showing how to find them. For example, it succeeds remarkably in outlining the engineering requirements of communication systems and the limitations of such systems.
|
|
|
Cite this article
"information theory." The Columbia Encyclopedia, 6th ed.. 2011. Encyclopedia.com. 1 Jun. 2012 <http://www.encyclopedia.com>. "information theory." The Columbia Encyclopedia, 6th ed.. 2011. Encyclopedia.com. (June 1, 2012). http://www.encyclopedia.com/doc/1E1-inform-th.html "information theory." The Columbia Encyclopedia, 6th ed.. 2011. Retrieved June 01, 2012 from Encyclopedia.com: http://www.encyclopedia.com/doc/1E1-inform-th.html |
|
Information Theory
Information TheoryThe version of information theory formulated by mathematician and engineer Claude Shannon (1916–2001) addresses the processes involved in the transmission of digitized data down a communication channel. Once a set of data has been encoded into binary strings, these strings are converted into electronic pulses, each of equal length, typically with 0 represented by zero volts and 1 by + 5 volts. Thus, a string such as 0100110 would be transmitted as seven pulses: It is clear from the example that the lengths of pulses must be fixed in order to distinguish between 1 and 11. In practice, the diagram represents an idealized state. Electronic pulses are not perfectly discrete, and neither are the lengths of pulses absolutely precise. The electronic circuits that generate these signals are based upon analogue processes that do not operate perfectly, and each pulse will consist of millions of electrons emitted and controlled by transistors and other components that only operate within certain tolerances. As a result, in addition to the information sent intentionally down a channel, it is necessary to cater for the presence of error in the signal; such error is called noise. This example illustrates the dangers inherent in the differences between the way one represents a process in a conceptual system and the underlying physical processes that deliver it. To conceive of computers as if they operate with perfectly clear 0 and 1 circuits is to overlook the elaborate and extensive error-checking necessary to ensure that data are not transmitted incorrectly, which is expensive both in time and cost. In 1948, Shannon published what came to be the defining paper of communication theory. In this paper he investigated how noise imposes a fundamental limit on the rate at which data can be transmitted down a channel. Early in his paper he wrote:
The irrelevance of meaning to communication is precisely the point that encoding and the transmission of information are not intrinsically connected. Shannon realized that if one wishes to transmit the binary sequence 0100110 down a channel, it is irrelevant what it means, not least because different encodings can make it mean almost anything. What matters is that what one intends to transmit—as a binary string—should arrive "exactly or approximately" at the other end as that same binary string. The assumption is that the encoding process that produces the binary string and the decoding process that regenerates the original message are known both to the transmitter and the receiver. Communication theory addresses the problems of ensuring that what is received is what was transmitted, to a good approximation. See also Information; Information Technology Bibliographyshannon, claude e. "a mathematical theory of communication." the bell system technical journal 27 (1948): 379–423, 623–656. john c. puddefoot |
|
|
Cite this article
PUDDEFOOT, JOHN C.. "Information Theory." Encyclopedia of Science and Religion. 2003. Encyclopedia.com. 1 Jun. 2012 <http://www.encyclopedia.com>. PUDDEFOOT, JOHN C.. "Information Theory." Encyclopedia of Science and Religion. 2003. Encyclopedia.com. (June 1, 2012). http://www.encyclopedia.com/doc/1G2-3404200284.html PUDDEFOOT, JOHN C.. "Information Theory." Encyclopedia of Science and Religion. 2003. Retrieved June 01, 2012 from Encyclopedia.com: http://www.encyclopedia.com/doc/1G2-3404200284.html |
|
information theory
information theory The study of information by mathematical methods. Informally, information can be considered as the extent to which a message conveys what was previously unknown, and so is new or surprising. Mathematically, the rate at which information is conveyed from a source is identified with the entropy of the source (per second or per symbol). Although information theory is sometimes restricted to the entropy formulation of sources and channels, it may include coding theory, in which case the term is used synonymously with communication theory.
|
|
|
Cite this article
JOHN DAINTITH. "information theory." A Dictionary of Computing. 2004. Encyclopedia.com. 1 Jun. 2012 <http://www.encyclopedia.com>. JOHN DAINTITH. "information theory." A Dictionary of Computing. 2004. Encyclopedia.com. (June 1, 2012). http://www.encyclopedia.com/doc/1O11-informationtheory.html JOHN DAINTITH. "information theory." A Dictionary of Computing. 2004. Retrieved June 01, 2012 from Encyclopedia.com: http://www.encyclopedia.com/doc/1O11-informationtheory.html |
|
information theory
information theory Mathematical study of the laws governing communication channels. It is primarily concerned with the measurement of information and the methods of coding, transmitting, storing and processing this information.
|
|
|
Cite this article
"information theory." World Encyclopedia. 2005. Encyclopedia.com. 1 Jun. 2012 <http://www.encyclopedia.com>. "information theory." World Encyclopedia. 2005. Encyclopedia.com. (June 1, 2012). http://www.encyclopedia.com/doc/1O142-informationtheory.html "information theory." World Encyclopedia. 2005. Retrieved June 01, 2012 from Encyclopedia.com: http://www.encyclopedia.com/doc/1O142-informationtheory.html |
|