Intelligence and Intelligence Testing
Intelligence and Intelligence Testing
The term “intelligence,” like much of the vocabulary of psychology, is drawn from the vocabulary of everyday speech. In a general way, everyone knows what intelligence or intelligent behavior is. We think of behavior as intelligent to the extent that it is efficient and adaptive in handling a situation that the individual faces and to the extent that it meets the demands of the situation, in its novelty, complexity, and abstractness. But psychologists have had little success in reaching a definition in verbal terms that is much more precise and satisfactory than the common-sense understanding of the term held by the layman. Different writers have emphasized different aspects of intelligent behavior—one has emphasized its dependence on ability to learn, another its close relationship to abstract thinking, another its dependence on judgment and reasoning, and yet another its concern with perception and formulation of relationships (“Intelligence …” 1921). These are in large part supplementary rather than contradictory emphases, each sensibly pointing to a different aspect of intelligent behavior. But the attempt to formulate the definition of intelligence has not carried us very far beyond our general lay understanding of the concept.
In light of the difficulties inherent in attempting a precise verbal formulation, it is not surprising that much of the energy of psychologists has been expended in the development of operations for measuring intellectual ability or abilities and in the attempt to clarify the concept inductively from a study of the data resulting from the application of these measurement operations. Test tasks have been developed based upon common-sense notions of the types of performances that call for intelligent behavior. These have included apprehension of relationships among words, numbers, and spatial patterns, reasoning tasks, span of immediate memory, general information about one’s environment, and judgment as to the appropriate action in problematic situations. Various assortments of these tests have been administered to groups of subjects, and from the pattern of relationships among them, the investigators have attempted to infer the underlying structure and nature of intellect. The statistical techniques used have mainly been those of correlational analysis and factor analysis. Correlational analysis traditionally means the examination of partial and multiple correlation coefficients. Factor analysis examines the matrix of correlations among a set of tests with the objective of determining a simpler set of primary variables that could account parsimoniously for the given correlation matrix. [See Factor analysis; Multivariate analysis, articles oncorrelation.]
Early research was interpreted by Charles Spearman (1927) as indicating that the communality, or common variance, among tests involving a wide variety of cognitive performances could be accounted for by one single and underlying general factor (g) running through all the tests, supplemented by a different specific (s) factor for each test. Intelligence was equated with this general factor that accounted for the correlations among the several tests. The general factor was spoken of by Spearman at times as a kind of general “mental energy,” the specific factors representing the different “engines” through which this energy expressed itself. Spearman also felt that the common theme represented by g could be described as the ability to educe relationships [see Spearman],
As additional evidence accumulated, it became apparent that Spearman’s original formulation was an oversimplification and that it was necessary to postulate additional factors. Typically, these are “group factors,” involving some but not all of the tests. Some workers in the field of intelligence testing (e.g., Thurstone 1938; Guilford 1959) have attached primary importance to an array of group factors dealing with more limited aspects of cognitive functioning—factors of verbal ability, numerical ability, spatial visualizing, reasoning, etc.—and they have minimized and even undertaken to dispense with the notion of a g or general intellective factor. However, the fact remains that the correlations among different types of cognitive tests are predominantly positive. To represent this fact, most formulations based entirely upon group factors have had to recognize that these factors were themselves not independent but related; thus a general factor was reintroduced in the form of a “second-order factor” expressing the relationship between the group factors themselves. It is this nucleus of relationship between a wide variety of tasks that provides the psychometric basis for a concept of “general intelligence” and the justification for using a single score to express individual differences along such a dimension [see Thurstone].
The Binet tests. Stimulated by a concern for pupils who seemed unable to progress in school and responding to the request of the French Minister of Public Instruction, Binet, with the assistance of Simon, prepared a series of tasks in 1905 to be used for the appraisal of the intellectual abilities of pupils. The original series of tasks was revised, expanded, and organized by age levels in 1908, and a further revision appeared in 1911. Binet’s work provided the basis for a number of modified versions in other languages and countries, perhaps the best known and most widely used of which has been the Stanford-Binet Intelligence Scale developed at Stanford University by Lewis M. Terman. An original edition was published by Terman in 1916 (Terman 1916), and revisions appeared in 1937 (Terman & Merrill 1937) and 1960 (Terman & Merrill 1960) [see binet; terman].
The Stanford-Binet is an individual intelligence test administered to one examinee at a time by a trained examiner in a face-to-face interview situation. The test is organized by age levels, the most recent editions having items extending from those for age two to those suitable for superior adults. The six subtests for a given age level present varied types of tasks, but in general the tasks are characterized as being quite verbal and quite abstract.
The IQ concept. Test performance on the Binet is expressed in terms of an age scale, a basal age being established at the age level at which the examinee passes all tests; additional months of credit are given for tests passed above the basal level. The resulting “mental age” serves as a unit in terms of which the level of the individual’s performance is expressed. However, it soon became apparent that, in addition to level of performance, it would be desirable to have an index that would relate that level to the performance typical of the individual’s age group. For this purpose the intelligence quotient, or IQ, first suggested by Wilhelm Stern, was universally adopted. The IQ may be expressed as
The IQ had certain properties which made it very appealing as an index of relative performance on a mental test and which thus fixed it firmly in the vocabulary of both the psychological profession and the general public:
(1)By definition and by the process of test development, the average IQ is substantially the same at all age levels, having a value of approximately 100.
(2)The variability of IQ’s around this average also is about the same at all age levels (standard deviation of about 16 for recent revisions of the Stanford-Binet).
(3)As a result of (1) and (2) and of the fairly high consistency in an individual’s performance from year to year, IQ’s of most individuals of school age or above show rather small (and unsystematic) shifts from one testing to another a year or even several years later.
The relative stability of IQ values and the apparently random nature of many of the shifts have given rise in some quarters to the practice of speaking and thinking about the IQ as if it were an individual constant, determined unequivocally by the individual’s genetic constitution. This is untrue. Any test performance, as will be elaborated more fully later, is a current achievement resulting from the interaction of genetic constitution with the whole social history of the individual.
In popular speech, the term “IQ test” has become a substitute for “intelligence test,” and what was originally merely the unit of measure in which relative performance was expressed has come to have a type of substantive existence. This development appears unfortunate, because the men-surational soundness of the original IQ ratio, 100(MA/CA), can be questioned. In particular, the variability of the MA/CA ratio has been found to differ significantly from one age to another, so that the same IQ does not have truly comparable meaning at different ages (Pinneau 1961). Furthermore, the basic age unit becomes essentially meaningless after adolescence, and it has been necessary to develop completely artificial age levels for items and to use an arbitrary chronological age base for computing the ratio after the age of about 13 or 14. Test makers have become increasingly conscious of these problems, and consequently, for most recently developed tests results are expressed as a percentile rank in a group or as a standard score; even when the scores have numerical values similar to IQ’s (i.e., mean of 100 and standard deviation of 15 or 16), they are not based upon age as a unit of measure.
The Wechsler tests
A second series of tests very widely used for individual administration in the United States is that prepared by David Wechsler. At first designed (Wechsler 1939) to provide tests with content appropriate for adults rather than children—the original clientele for the Binet tests —the Wechsler series has now been developed to include tests for children of school age. The Wechsler Adult Intelligence Scale (Wechsler 1955) and the Wechsler Intelligence Scale for Children (Wechsler 1949) differ from tests in the Binet tradition in several respects. The total test is made up of distinct subtests, each administered as a unit. In addition to a total test IQ, two separate subtotal IQ’s are reported—one for the verbal sub-tests and one for the performance subtests. The results are expressed as standard scores without using age as a frame of reference.
The Binet and Wechsler tests were designed to be administered to one examinee at a time in a face-to-face situation by an examiner trained in the techniques of presenting the tasks and evaluating the examinee’s responses. Individually administered tests are widely used in clinical work, for example, with pupils who are experiencing special difficulties in school, delinquents appearing in court, emotionally disturbed clients coming for counseling services, or institutionalized psychotics. For routine use in education, in military personnel selection, and in industry, group tests of intellectual ability have found a larger role. Group tests had their first large-scale use in the United States at the time of World War I (Yerkes 1921), when they were used to help in the screening and classification of military personnel. Since then, many series of group tests have been produced for use in schools, in industry, and in the military establishment. It is estimated that millions of these tests are administered annually (Hawes 1964, pp. 53-55), with the greatest concentration of testing being in the United States. [See Yerkes.]
Group tests of mental ability present, in varying combinations, tasks involving word meanings, verbal relationships, arithmetical reasoning, form classification, spatial relationships, and other abstract symbolic material. They differ from measures of school achievement in being somewhat less directly related to school instruction. However, in many cases the resemblance in content between tests designed to function as intelligence or scholastic aptitude tests and those designed to serve as achievement measures is marked. The statistical overlap between the two categories of tests, especially when the aptitude test is based primarily upon verbal and numerical symbolism, is also substantial. Correlations between the two categories of tests run in the .70s and .80s, and Kelley’s early estimate (1927, chapter 8) that fully 90 per cent of the non-chance variance of each category of test is shared by the other seems as sound now as when it was originally made [see Aptitude testing].
Nonverbal and culture-free tests
Individual tests of the Binet type, as well as the bulk of group tests, present tasks that depend heavily upon the use of words and upon grasping relationships among words. Because these tests are obviously inappropriate for those who do not speak the language in which the test is written, because they appear quite closely tied to school achievement, and because they would appear to penalize those whose intellectual talents are most developed in some medium other than the verbal, a number of tests have been developed that do not use verbal symbolism. These tests tend to make use of concrete materials (i.e., blocks, form-boards, paper cutouts) or of pictures and geometric diagrams and tend to call for analysis and discrimination of relationships between these forms or objects. The mental functions measured by the nonverbal tests are somewhat different from those measured by the verbal tests, as shown by the patterning of subtest correlations. The nonverbal tests are somewhat less accurate predictors of academic performance and consequently tend to be used less in educational situations.
One problem that has been of continuing concern to research workers in the field of intelligence measurement is the development of tests that may appropriately be used with individuals of different social classes within a given society and that may appropriately be used cross-culturally in a number of different societies. Since practically all tests have been prepared by individuals who are members of the middle-class European-American culture, there is a feeling that the test content, and even perhaps the intellectual processes called for, may be biased in favor of the cultural content and values of such a group.
A number of efforts have been made to prepare tests, typically nonverbal, that are based upon content that is “culture-free” or at least “culture-fair.” Perhaps the most widely used of these tests is the Progressive Matrices test prepared by Raven (1958). The content of the test’s items contains material that is nonlinguistic and nonrepresentational; in this sense it does not depend upon the culture of any particular group. However, one can hardly contend that the materials are entirely culture-free. The very use of graphic representation, the orientation toward problem-solving in this type of puzzle situation, and the habits of abstraction and classification that are called for with these materials—any or all of these factors may be foreign to certain cultures. (It may be remarked in passing that many nonverbal tasks are most readily solved by verbalizing the relationships—at least for word-minded individuals.) The Progressive Matrices test has been widely used in different countries and cultures. However, those who have tried to use it with primitive groups have had serious questions as to its appropriateness for them.
There have also been attempts to develop tests that are “fair” to different classes in American society. Reacting critically to the content of available intelligence tests, Davis and Eells (1952-1953) prepared the Davis-Eells Games, a test that uses only oral language and presents problem situations designed to be familiar to the lower-class child. The test has been quite extensively studied in the United States since it appeared in 1951. Unfortunately, although the test is somewhat less academically oriented than conventional group intelligence tests and as a result shows lower correlation with school success, it continues to show about the same relationship to socioeconomic indices as do the more conventional tests. There is little indication that underprivileged groups perform better on this test, or on any of the other nonverbal tests that have been developed so far, than they do on the conventional verbal and school-related measures. To the extent that poor test performance is a function of cultural deprivation, this effect appears to be far-reaching and to include almost all test-taking performances rather than merely verbal or school-oriented ones.
Tests for infants and preschoolers
Initially, tests of intellectual ability were developed for school-age children. In the army testing program and, subsequently, in the Wechsler Adult Intelligence Scale, these were extended upward for adults. There have also been efforts to extend objective testing procedures downward to permit the appraisal of intelligence in preschool children and even infants. With infants a “test” in the conventional sense is obviously impossible, but observation of the infant’s responses to a standard set of stimuli can be made. For example, one can observe whether the infant follows with his eyes a point of light that is moved back and forth transversely or whether he grasps a pellet that is placed upon the table before him and by what type of opposition of thumb and fingers or palm and fingers he does this. Data gathered by Gesell (see Yale University 1940) and others indicate that certain patterns of behavior typically appear at certain ages, to be replaced by more mature patterns at later ages. Status with respect to this developmental sequence has been thought to provide in the infant an index of something analogous to intelligence. However, as data have accumulated (e.g., Wittenborn et al. 1956), it has become clear that there is very little relationship between any appraisals of the infant during the first year of life and his status on the intelligence tests at school age. Whether this reflects the changing nature of the tasks through which intelligence manifests itself or the basic instability of growth patterns during infancy and childhood is difficult to determine, but to date it does not appear that, except in cases of extreme deficiency, observations during infancy give any substantial basis for predicting intellectual performances during school age and in adulthood.
Tests developed for the preschool years have fared somewhat better, and as a matter of fact the Stanford-Binet itself extends down to the two-year level. Other tests have been developed for children ranging from one year to five years of age. These tend to involve perceptual and motor tasks somewhat more heavily than do the typical school-age tests, but they also depend somewhat more heavily than do infant tests upon verbal comprehension by the child. Tests at these ages permit somewhat better forecasts of school-age development (Honzik et al. 1948), although the stability is much less than is true for a comparable time period with school children and with adolescents. Again, this is probably due in part to changes in the tests, in part to the negativism and distractibility that make young children difficult to test reliably, and in part to the cumulative impact of changing school, family, and community environments upon intellectual growth as the individual moves out of his family into the wider environment of the school and the community.
Kinds of intelligence
In recent years conventional tests of intelligence have come under some criticism because, it is alleged, they do not appraise the “creativity” of the individual. It is asserted that conventional tests require the individual to select or produce a predetermined “right” answer, so that there is little leeway for individual originality or inventiveness. The examinee is required to reproduce the thought process of the test maker, or at least to come out with the same answer. This is, of course, true. However, if the thought processes of the test maker are sufficiently ingenious, subtle, and various from item to item, it may still call for a good deal of flexibility and ingenuity on the part of the examinee to reproduce them.
Some recent test construction has emphasized the measurement of “divergent thinking” as opposed to the “convergent thinking” that is considered to characterize conventional tests. In measures of divergent thinking the individual receives credit for the number, the variety, and the originality of his productions in response to an intellectual task. One representative task is “List as many different uses as you can for a brick.” Fluency is evidenced by the number of responses given, flexibility by the number of different categories of response (i.e., building material, weight, tool, etc.) represented in the list, and originality by the rarity or unusualness of the responses.
There is clearly some overlap between performance on convergent and divergent tests. However, the correlation is fairly low, especially within the abler and better educated groups. Although the overlap among different divergent measures is rather modest, the correlations seem to justify speaking of a divergent thinking factor that has some degree of generality (Thorndike 1963). Studies of the correlates of divergent thinking are still incomplete, but it appears that those students high on divergent thinking make a different and generally less favorable impression upon conventional teachers in conventional schools than do those who excel in convergent thinking, and they present generally somewhat different and more tempestuous personality patterns.
The relationship of divergent thinking, as measured by tests, to creativity in the sense of producing socially valued products remains largely to be explored. [See Creativity, article onpsychological aspects.]
Early work on cognitive test development emphasized the single factor of general intellectual ability. However, test theory has increasingly emphasized the multiplicity of factors involved in cognitive performances. For example, basing his work partly on a priori analyses and partly on data from factor analytic studies, Guilford (1957) developed an elaborate three-dimensional “structure of intellect,” in which 72 cells represent different facets of intellectual functioning. There has been a corresponding trend in practical test development toward the development of test batteries composed of a number of tests, each designed to measure a distinct cognitive ability, e.g., verbal, numerical, spatial visualizing, mechanical, inductive reasoning, etc. Although the tests of specific abilities typically show positive intercorrelations, thus providing support for the concept of an underlying general intellectual factor, there is enough that is specific to each test so that the battery can be considered to give a usefully differentiated map of the individual’s cognitive development.
If one is to make any practical interpretation or use of intelligence test results, it is important to know something about their stability and their correlates. There are so many specific facts involved, depending upon the specific test in question, age or educational group referred to, and definition of the correlative variable, that it is hard to present a summary that will be both brief and accurate. However, an attempt will be made to capture the main trends of the evidence.
Stability over time
As indicated earlier, observations of behavior during infancy permit no better than a chance forecast of intelligence as measured in later years. A core of stability develops during the preschool years, and the relationship between successive tests with a constant time interval (e.g., one year, five years) increases as a person progresses through the elementary school years.
By the time of adolescence, the relative standing of the typical individual on measures of intelligence has stabilized, and subsequent changes in his standing in his group arise in large part from random errors of measurement. Relative position is maintained with a good deal of consistency throughout adult life, although patterns of increment or decrement do to some extent reflect amount of schooling and other types of opportunities and advantages.
Age and sex difference
The determination of age trends in intelligence is complicated by the difficulty, both practical and theoretical, of identifying comparable groups at different ages. Within this limitation, data suggest that performance on measures of intelligence increases at a rapid and apparently fairly uniform rate during childhood, slows down during adolescence, reaches a maximum, and subsequently declines. However, the age at which a maximum is reached and the rate of subsequent decline is a function both of the nature of the test task and of the life history of the individual. Tests that depend in large measure upon the accumulation of experience (i.e., vocabulary or general information) continue to show increments in performance through the twenties and perhaps longer, and show a decline only with the approach of senescence. On the other hand, performance on tests that depend upon speed, flexibility, and adaptation to novel and unfamiliar tasks appears to reach a maximum during the teens and declines shortly thereafter. However, the time and rate of decline is a function of educational level and pattern of life experience, being slower for those who continue their schooling and who live and work in situations where traffic in ideas and abstractions is a part of their daily living.
Some sex differences appear with respect to specific types of test tasks. Girls generally have been found to do better on tasks with a substantial verbal component, and boys have been found to do better on quantitative and concrete types of tasks. In general, however, sex differences are of modest size and appear to reflect in considerable part cultural demands and expectations rather than inherent differences.
In Western cultures there has been consistently a relationship between socioeconomic status and average level of intelligence test performance. Thus, in the standardization population for the 1937 edition of the Stan-ford-Binet (McNemar 1942) the following mean IQ’s were found for children whose fathers fell in the indicated occupational levels: professional, 116; semiprofessional and managerial, 112; clerical, skilled trades, and retail business, 107; semiskilled, minor clerical, and minor business, 105; slightly skilled, 98; day laborers, 96. This trend is found for all types of tests, although there is some evidence, far from unanimous or unequivocal, that differences are less marked for cross-cultural tests such as the Progressive Matrices. The presence of these differences is unquestioned; their source is a matter of some disagreement and debate. The issue of genetic as opposed to cultural causation is considered later in this article.
Many studies attest to an occupational hierarchy of intellectual performance, with the professional groups at the top and rough unskilled laboring groups at the bottom (Stewart 1947). However, the range within any given occupation is typically large, with a substantial per cent of overlapping even between occupations well separated in the occupational scale.
With respect to relationships to success within a given occupation, the results are less clear. The definition of “success” itself is often a problem, and results vary with the type of job. The most promising results have been found for clerical, skilled, and supervisory occupations (Ghiselli & Brown 1948). However, it must be admitted that even for these groups relationships tend to be quite modest. Test results appear to be more predictive of ability to enter and survive in an occupation than they are of degrees of success above the survival level.
Whatever is said about socioeconomic differences may be repeated with respect to Negro-white differences in the United States. The caste differences are of a size comparable to those between middle-class and lower-class groups, and racial differences have been inevitably and inextricably bound up with class differences.
It is practically guaranteed by the nature of their content and the procedures by which test items are selected that intelligence tests will show substantial correlations with academic achievement. When that achievement itself is measured by objective tests the correlation is quite high, reflecting the similarity of processes called for and in some cases actual communality of test content. When achievement is represented by teachers’ appraisals of their pupils, the correlations tend to be lower, reflecting in some measure the diverse considerations that enter into teacher appraisals. The relationships are most marked in the public schools, in which the whole range of intellectual talent is represented, and become progressively less marked as one proceeds up the educational ladder to the more and more screened and intellectually homogeneous groups of college and graduate or professional schools.
Other success criteria
There have been vast numbers of studies relating measured intelligence to all sorts of other indicators of adjustment in or failure to adjust to life. Intelligence has been studied in delinquents, criminals, psychopaths, addicts, and others within the range of social problems and ills. However, for many of these socially deviating groups there is an association between deviancy and low socioeconomic status. Thus, the results for many of the studies have been confounded by the coexistent low socioeconomic status and low educational level. What interpretation to give to the relationships that are found is far from clear.
The existence of individual and especially group differences in measured intellect raises insistently the question of the causation of the differences that are found. To what extent should differences be attributed to hereditary differences transmitted through the genes and to what extent should they be attributed to differences in the physical and social environment subsequent to conception and birth?
The effort to clarify this issue, or to provide evidence in support of either a genetic or an environmentalist point of view, generated a great volume and variety of research, especially in the period just prior to World War II; much of it was suggestive but none of it really definitive. There have been many correlational studies of siblings, fraternal twins, and identical twins (Erlenmeyer-Kimling & Jarvik 1963), and these have shown correlations ranging from about .50 for siblings to about .90 for identical twins. (The correlations between identical twins are substantially as high as those between two testings of the same individual.) Data have been laboriously assembled on identical twins reared in separate homes, and those persons have been found to be less alike than identical twins reared together but more alike than fraternal twins growing up together in the same family.
Studies of foster children have been complicated by the possibility of selective placement, so that correlations between foster siblings and between adopted children and their foster parents have been difficult to interpret clearly. A further complication has been the equivocality of test results for very young children. The average level of performance of foster children, typically placed in average or above-average homes, has been found at least equal to that of the general population, even though these children’s own mothers performed well below average when tested as adults. However, the intelligence of individual foster children appears to bear little relationship to measures of the foster home in which they are placed, while appreciable correlations do appear between the IQ of the foster child and that of his own mother, from whom he has been separated almost from birth (Skodak & Skeels 1949). [See Genetics, article ongenetics and behavior.]
A number of patterns of investigation have been developed involving changed environment. Children from orphanages have been retested after a period of residence in a foster home. Negroes from the South have been compared with those migrating to New York City. Both Negro and Puerto Rican children in New York City have been studied in an attempt to relate length of residence in New York to level of test performance. Many of these studies show higher test performance associated with exposure to the presumably more stimulating environment and suggest that a greater increment is associated with early and extended exposure to the improved environment (Jones 1946).
Thus, at the present time there would be little dissent from the proposition that measured intelligence is a function of the environment to which the individual or group has been exposed and that some part of the difference between individuals and between groups is attributable to such environmental differences. It is when one attempts to ascertain how much, that conflict is generated and uncertainty arises. This attempt appears to be extraordinarily difficult, for the following reasons, among others:
(1) It is difficult to identify the relevant and crucial aspects of environmental influence. The important aspects are only crudely represented by gross indicators of socioeconomic status. Consequently, it is almost impossible to state when two environments are “equal,” or to express in quantitative terms the amount of difference between two environments.
(2) The effects of genetic and environmental factors are almost certainly interactive rather than summational. What is a stimulating environment for one genetic constitution may be an overpowering one for another, and the gains one may expect to accrue from a particular type of environmental stimulation are almost certainly a function of the genetic materials to which that environment is applied.
(3) For intelligence test performance, as for any other attribute of the individual, one faces the somewhat paradoxical situation that the more nearly optimum the surroundings are for each individual’s development, the less the differences between individuals can be attributed to environmental factors. Thus any estimate of the per cent of variance attributable to environmental factors is always specific to a given time and place and the range of environmental opportunities (and of genetic constitutions) that characterize that specific population. If the democratic ideal of equal educational and other opportunities for everyone were achieved, environmental differences would recede into the background as a cause of differences among individuals and groups.
In recent years the interest of psychologists and educators seems to have shifted away from attempts to estimate the relative role of nature and nurture in intellectual development; it seems to be focused more on the attempt to analyze more incisively what the crucial elements in an environment are that foster optimum intellectual development. At least for the short haul, this seems to be a more productive enterprise.
Robert L. Thorndike
[Directly related are the entries Achievement testing; Aptitude testing; Intellectual development. Other relevant material may be found in Counseling psychology; Developmental psychology, article ona theory of development; educational psychology; Psychometrics; and in the biographies of Binet; Terman.]
Binet, Alfred 1911 Nouvelles recherches sur la mesure du niveau Intellectuel chez les enfants d’ école. Année psychologique 17:145-201.
Binet, Alfred; and simon, Th. 1908 Le développement de l’ intelligence chez les enfants. Année psychologique 14:1-94.
Cattell, Psyche 1940 The Measurement of Intelligence of Infants and Young Children. New York: Psychological Corp.
Davis, Allison; and eells, Kenneth 1952-1953 Davis-Eells Games: Davis-Eells Test of General Intelligence or Problem-solving Ability. Tarrytown-on-Hudson, N.Y.: World Book.
Erlenmeyer-kimling, L.; and jarvik, Lissy F. 1963 Genetics and Intelligence: A Review. Science 142: 1477-1479.
Ghiselli, Edwin E.; and brown, Clarence W. 1948 The Effectiveness of Intelligence Tests in the Selection of Workers. Journal of Applied Psychology 32:575-580.
Guilford, Joy P. 1957 A Revised Structure of Intellect. University of Southern California, Reports From the Psychological Laboratory No. 19.
Guilford, Joy P. 1959 Three Faces of Intellect. American Psychologist 14:469-479.
Hawes, Gene R. 1964 Educational Testing for the Millions. New York: McGraw-Hill.
Honzik, Marjorie P.; macfarlane, Jean W.; and allen, Lucille 1948 The Stability of Mental Test Performance Between Two and Eighteen Years. Journal of Experimental Education 17:309-324.
Intelligence and Its Measurement: A Symposium. 1921 Journal of Educational Psychology 12:123-147, 195-216.
Jones, Harold E. (1946) 1954 The Environment and Mental Development. Pages 631-696 in Leonard Carmichael (editor), Manual of Child Psychology. 2d ed. New York: Wiley.
Kelley, Truman L. 1927 Interpretation of Educational Measurements. Yonkers-on-Hudson, N.y.: World Book.
Mcnemar, Quinn 1942 The Revision of the Stanford-Binet Scale. Boston: Houghton Mifflin.
Pinneau, Samuel R. 1961 Changes in Intelligence Quotient: Infancy to Maturity. Boston: Houghton Mifflin.
Raven, J. C. 1958 Guide to Using the Coloured Progressive Matrices (1947): Sets A, Ab, B. London: Lewis.
Skodak, Marie; and skeels, Harold M. 1949 A Final Follow-up Study of One Hundred Adopted Children. Journal of Genetic Psychology 75:85-125.
Spearman, Charles E. 1927 The Abilities of Man: Their Nature and Measurement. London: Macmillan.
Stewart, Naomi 1947 A.G.C.T. Scores of Army Personnel Grouped by Occupations. Occupations 26:5-41.
Terman, Lewis M. 1916 The Measurement of Intelligence. Boston: Houghton Mifflin.
Terman, Lewis M.; and merrill, Maud A. (1937) 1960 Measuring Intelligence: A Guide to the Administration of the New Revised Stanford–Binet Tests of Intelligence. Boston: Houghton Mifflin.
Terman, Lewis M.; and merrill, Maude A. 1960 Stanford–Binet Intelligence Scales: Manual for the Third Revision, Form L-M. Boston: Houghton Mifflin.
Thorndike, Robert L. 1963 Some Methodological Issues in the Study of Creativity. Pages 40-54 in Invitational Conference on Testing Problems, 1962, Proceedings. Princeton, N.j.: Educational Testing Service.
Thurstone, Louis L. 1938 Primary Mental Abilities. Psychometric Monographs No. 1.
Wechsler, David 1939 The Measurement of Adult Intelligence. Baltimore: Williams & Wilklns.
Wechsler, David 1949 Wechsler Intelligence Scale for Children. New York: Psychological Corp.
Wechsler, David 1955 Wechsler Adult Intelligence Scale (WAIS). New York: Psychological Corp.
Wittenborn, John R. et al. 1956 A Study of Adoptive Children. II: The Predictive Validity of the Yale Developmental Examination of Infant Behavior. Psychological Monographs 70, no. 2:59-92.
Woodworth, Robert S. 1941 Heredity and Environment: A Critical Survey of Recently Published Material on Twins and Foster Children. New York: Social Science Research Council.
Yale university, Clinic of child development 1940 The First Five Years of Life: A Guide to the Study of the Preschool Child. New York: Harper. → Contains “Early Mental Growth” by Arnold Gesell and “The Study of the Individual Child” by Arnold Gesell and Catherine S. Amatruda.
Yerkes, Robert M. (editor) 1921 Psychological Examining in the United States Army. Volume 15 of National Academy of Sciences, Memoirs. Washington: Government Printing Office.
"Intelligence and Intelligence Testing." International Encyclopedia of the Social Sciences. . Encyclopedia.com. (February 21, 2018). http://www.encyclopedia.com/social-sciences/applied-and-social-sciences-magazines/intelligence-and-intelligence-testing
"Intelligence and Intelligence Testing." International Encyclopedia of the Social Sciences. . Retrieved February 21, 2018 from Encyclopedia.com: http://www.encyclopedia.com/social-sciences/applied-and-social-sciences-magazines/intelligence-and-intelligence-testing