Statistical tables are the most common form of documentation used by the quantitative social scientist, and he should cultivate skill in table construction just as the historian learns to evaluate and cite documents or the geographer learns cartography. Table making is an art (as is table reading), and one should never forget that a table is a form of communication—a way to convey information to a reader. The principles of table making involve matters of taste, convention, typography, aesthetics, and honesty, in addition to the principles of quantification.
It is useful to distinguish between raw data tables and analytic tables, although the line between them is somewhat arbitrary. Raw data tables, for example, in census reports, serve a library function: they arrange and explain the figures in such a way as to make it easier for the user to find what he wants. Here principles of accuracy, completeness, and editorial style are important. The reader can find full treatment of these topics in a number of standard sources (see, for example, U.S. Bureau of the Census 1949). It will be sufficient here to stress the importance of showing raw data whenever this is possible. In a research report, basic data can sometimes be presented in appendix tables, and sometimes they can be presented graphically. Sometimes basic data can be deposited with the American Documentation Institute or a similar organization to facilitate public access. As a last, and obviously least satisfactory, resort, a statement that basic data may be obtained from the author should be appended to a research report.
In analytic tables, on the other hand, the data are organized to support some assertion of the author of the research report. Since numbers do not speak for themselves, analytic tables require careful planning and oblige the table maker to steer a course between art and artifice. Without art, he may fail to convey his evidence to the reader, but technique can also be used to deceive. Thus, the most important rule of table making is this: Arrange the table so the reader may both see and test the inferences drawn in the text.
Among analytic tables, the most common in many of the social sciences are percentage tables. The percentage is an extremely useful statistic in that it is familiar and meaningful to even relatively naive readers; it is highly analogous to the slightly more technical statistical concept of probability; and it is close enough to the raw data so that they can (if the denominator upon which the percentage is based is given) be reconstructed by a critical reader. Percentage tables thus meet the fundamental criterion: the reader may easily see and test the inferences of the author. Percentage tables, however, suffer from the drawback that they become cluttered and confusing unless they are well constructed. Occasionally a very large percentage table is required to present data that might better be summarized by two or three descriptive coefficients. Accordingly, it is important to make a considered decision between the use of percentage tables and the use of descriptive coefficients. If percentage tables (particularly large and complicated ones) are used, consideration must be given to the main principles and strategies for constructing them.
This article deals primarily with percentage tables. Most of the principles described, however, are also applicable to other analytic tables, in which the entries may be means, medians, sums of money, descriptive indexes, and so on. In particular, when dealing with these other sorts of tables, as well as when dealing with percentage tables, one should be sensitive to the importance of using clear, meaningful, and consistent units, of including an informative title and indication of the source of the data, and of making some arrangement to indicate the accuracy of the figures.
Most of these principles apply also to the presentation of tables of empirical distributions by absolute or relative frequencies; in these cases, decisions must be made on such matters as the width of class intervals and the location of class marks. [See Statistical analysis, special problems of, article on grouped observations; Statistics, descriptive, article on location and dispersion; see also Wallis & Roberts 1956, chapter 6; Yule & Kendall  1950, chapter 4.]
The simplest percentage table is one that presents the distribution of answers to a single question or observations on a single variable. Table 1 is a typical, although hypothetical, example that serves to illustrate a number of basic principles.
|Table 1 — Attitudes toward fob*|
|* Hypothetical responses to the question “In general, how do you feel about your job?”|
|I like it very much||58%|
|I like it somewhat||40|
|I dislike it||2|
|Number of respondents (N)||= 2,834|
|No answer||= 8|
|Not applicable (housewife or unemployed)||= 314|
Very few people read a research report word by word from beginning to end. Many are interested in a particular chapter or sections; others are looking for one or two specific tables to answer some limited questions. Since it is a nuisance for these readers to have to read many pages of text in order to understand a table, each table presented should provide sufficient information to be meaningful by itself. This information should, of course, include the source or sources of the data presented in the table, if they originate elsewhere than in the present research.
Headings and footnotes . A table should usually have both a title (caption) and a number; the latter should consistently be used for reference, and in a lengthy monograph, it is convenient to articulate it with the chapter number (for example, Table 3.2 refers to the second table in chapter 3) or even with the page (for example, Table 2,77a refers to the first table on page 277).
The main title should be short and concrete; subtitles or footnotes should indicate clearly what the table describes. Table 1, for example, gives the wording of the question that was asked. If the data in a table are based upon an index, it should be described, usually in a footnote. Well-known indexes or scales need only be described by their names—for example, “Scores on the Stanford-Binet Intelligence Test, Form B.”
If the same items appear in a series of tables, the full information need not be given each time, although if an item reappears after a long gap, there should be a reference to the table containing the full explanation.
Some items, like the dichotomy male-female, are virtually unambiguous. Most questions, however, do have variant forms. Age may be reported to the last birthday or the next birthday, and educational attainment may be in terms of degrees acquired, years of schooling completed, and so on.
Percentages . The figures that appear in percentage tables are of two kinds: the percentages themselves and certain absolute numbers indicating frequencies (N’s). It is generally crucial to include both kinds of figures; it is also crucial to avoid confusion between the two. Many of the problems in interpreting percentage tables stem from a failure of the author to make this distinction clear. The following practices should be followed in presenting percentages in tables.
Per cent signs. A per cent sign ( % ) should be placed after the first percentage in a column of percentages that adds to 100 per cent, as shown in Table 1. This is literally redundant, since the column is labeled “per cent,” but in this situation redundancy serves the purpose of reminding the reader that he is examining percentages.
Totals. If the total of the percentages in an additive column is not exactly 100 per cent because of rounding, it is good practice to point this out in a footnote. (If the rounding produces a deviation of more than 1 per cent from 100, the arithmetic should be rechecked. By the same token, the last percentage in a column should be obtained by calculation rather than by subtraction; otherwise, a valuable check on the calculations will be destroyed. )
Multiple responses. If a percentage table presents responses to a question to which multiple responses are permitted (for example, “What brands of coffee did you purchase in the past year?”), the percentages should not be presented as if they were additive. The frequently found notation that “percentages add to more than 100 per cent because of multiple answers” is misleading. Rather, each percentage is better treated as half a dichotomous response (for example, 40 per cent of the sample purchased a particular brand of coffee in the past year, and 60 per cent did not). If there are only a few possible responses, the percentages of respondents falling in the various patterns of response may be of interest.
Decimals. The number of decimals to retain in presenting percentages (should 38 per cent, 37.8 per cent, or 37.8342 per cent be reported?) must be determined after careful thought. The case for the use of several decimals is based upon the following arguments: (1) a reader who wants to recalculate the data himself can be more accurate if more decimals are given, and (2) one’s own check will be more precise. The case against the use of several decimals is that they often give a spurious air of precision (69.231 per cent of 13 cases means simply nine cases), and that they usually add little information, since only in extraordinarily large samples do differences of 1 per cent or less have any meaning. The policy for retaining decimals should, in most cases, remain the same for all of the entries within an individual table, and usually the same within an entire report, except that relatively raw data tables in an appendix are usually more useful if the data are given to more decimals than they are in analytic tables in the body of the report.
Rounding. Rounding should be to the nearest number. If one decimal place is to be retained, and the original calculation is 76.42 per cent, 76.4 per cent should be used in the table; if the original calculation is 76.48 per cent, 76.5 per cent should be reported. If the original calculation is 76.45000 . . . per cent, an arbitrary convention must be established to guide the rounding. The usual convention is to always round to the even possibility. That is, 76.45000 . . . per cent becomes 76.4 per cent, and 76.750000 ... per cent becomes 76.8 per cent. (See Croxton, Cowden & Klein  1967, for a discussion of rounding procedures.)
In some investigations, it is desirable to distinguish between a true zero (no observations or responses in a category) and a per cent rounded down to zero (for example, if only one decimal place is being retained, 0.04 per cent will be presented as 0.0 per cent). One convention is to use O.Oe if a true (e = exact) zero is intended. In other cases, the difference may not be important, especially if sampling fluctuation could easily change a true zero to some small nonzero per cent.
Sampling variability . If the data reported in a table are derived from a probability sample, it is usually very desirable to indicate the extent of sampling variability. Sometimes this is done by adding an indication of standard deviation (for example, 76.5 ±3.2), but this tends to clutter tables and has the danger of creating misunderstandings. (Is 3.2 the estimated standard deviation, the half length of a 95 per cent confidence interval, or what?) A device favored by the U.S. Bureau of the Census is to give approximate confidence interval widths for per cents in various brackets in a footnote or in a small auxiliary table. (In a particular case, for example, such a footnote might indicate that if a per cent lies between 40 and 60, its 95 per cent confidence interval half width is about 5.) If the tables come from a sample that is not based upon probability sampling, the basic problem of sampling fluctuation becomes much more difficult [see Sample surveys, article on nonprobability sampling]. Errors other than sampling ones are often very important but are typically discussed in the text rather than in a table [see Errors, article on nonsampling errors]. Of course, if there exists a bias whose magnitude might seriously affect a table’s interpretation, it is good practice to refer to the relevant text discussion in a footnote to the table.
Frequencies . The absolute frequencies that appear in a percentage table are of three kinds: (1) the total number of individuals or cases in the study, (2) the total number of individuals or cases upon which percentages are based (often called N, or base N), and (3) the cases that are excluded from the percentaging for one reason or another.
Some important reasons for excluding cases are inapplicability (for example, in Table 1, an unemployed person cannot have an opinion about his present employment), the refusal of the respondent to answer the question (or the failure of the interviewer to ask or to record it), the inability of the interviewer to locate the respondent (the persistent not-at-homes), and the inconsistency of a response. Cases of this type are often thrown together into a single No Answer or Not Applicable category, but for some purposes (for example, in considering the magnitudes of possible biases) it is important to separate the reasons. The following practices are recommended in reporting Totals, N’s, No Answers, and Not Applicables.
Totals. Unless all cases are accounted for in every table in a series, the total number of cases in the study should be reported in every table. This enables the reader to detect (and perhaps the analyst to avoid) the all-too-common situation in which a result that appears to apply to the entire population studied is really based upon a small fraction of the total. It also forces the analyst to maintain a careful accounting of his cases.
N’s or base N’s. The number of cases upon which percentages are based should also appear in every table. This is necessary so that the reader may evaluate the reliability of the percentages and, if he desires, reconstruct the original frequencies from the percentages that are presented.
Discrepancies beween Totals and N’s. Any discrepancies between Totals and N’s should be accounted for by specifying the number of No Answer and Not Applicable cases, as at the bottom of Table 1. It is poor practice to obtain No Answer figures by subtracting the N from the Total, since this destroys a practical check upon the calculations.
The percentaging of No Answers. Some survey analysts include No Answers as a category to be percentaged. This seems inadvisable, since it introduces a logically separate dimension into the classification. The proportion of No Answers depends more strongly on the research procedure than it does on the population being studied. It should be noted that in opinion research the category No Opinion is not the same as No Answer. Respondents who have neither a positive nor a negative opinion on an issue should usually be included in a category to be percentaged, since, for example, differences over time or between subgroups in the proportion of people with no opinion on an issue constitute substantive findings. [For a discussion of the dangers of a large number of No Opinions and of the effects of No Answers, see Errors, article on non-sampling errors.]
Although few analysts follow the convention, in the analysis of sample surveys, a case can be made for including in the No Answer category respondents who were selected for the sample but never interviewed because of refusals, absences from home, and the like. This practice would provide a fairer picture of the coverage of the data than the standard procedure, which is to describe the completed interviews as the Total and to count as No Answers only instances of skipped questions, illegible answers, refusals to answer a question, and so forth.
The reporting of frequencies. As a general rule it is inadvisable to present the absolute frequencies
|Table 2 — Usually wrong!|
|Table 3 — Right!|
|N = 75|
for the various categories being percentaged unless there is some compelling reason to do so (for example, if the frequencies are the basic data for a statistical test that will be discussed). In general, frequencies should be reported only if they represent 100 per cent (tables 2 and 3).
There are two reasons for preferring Table 3 over Table 2. First, an absolute frequency that is not presented cannot be confused with a percentage (for example, it is not difficult to read 49 per cent instead of 65 per cent for the top line of Table 2, and the confusion is even more likely to arise if the table is more complicated). Second, no additional information is provided by the individual N’s, since the interested reader can calculate the individual N’s if he is given the base N and the percentages.
Two-variable tables present a number of choices for arrangement and layout that do not appear in one-variable tables; Table 4 is a typical example. The following practices are suggested for constructing tables of this type.
|Table 4 — Age and cumulative grade-point averages*|
|* Hypothetical data.|
|grade-point (grouped)average||20 or under||Over 20|
|N = 2,705 + 3,110 - 5,815|
|No answer on:|
Arrangement and layout . There is no consistent opinion on the question of whether percentages should run down the columns (as in Table 4) or across the rows. The former is generally preferable, both because it is consistent with the normal pattern of columns of additive figures and because it places the independent variable (the so-called “causal” variable) at the head of the table and the dependent variable (the variable the analyst is seeking to explain) at the side of the table. (See Zeisel  1957, pp. 24-41, for a discussion of this preference.)
Which way to percentage. The construction of a two-variable table requires a decision as to which way the percentages are to be computed. For example, in constructing a table from the data used in constructing Table 4, should the table present the distribution of grade scores by age (as in Table 4), or the distribution by age of students in the various grade-point groups, or the distribution of the entire sample by the six age by grade-point groups?
A number of considerations enter into the decision. If, for example, the two samples of students (of differing ages) were drawn according to separate sampling plans, that is, if they were considered as separate populations for sampling purposes, then ordinarily one would compute percentages separately for each age group.
Under the assumption that this sampling issue does not arise, the choice of method of presentation depends on which is most useful for the purposes of the analysis. If the analyst is primarily concerned with the question of how grade-point averages differ between younger and older students, then the method used in Table 4 is most appropriate. If, on the other hand, interest is centered on the age distributions of separate grade-point groups (as might be the case, for example, if the age composition of projected remedial classes is of interest), then percentages should be computed the other way.
Thus no general rule can be given, and indeed it is often desirable to compute the percentages both ways. (A detailed discussion of the issue is given in Zeisel  1957, pp. 24-41.) Consistency within a report is often a consideration. For example, in analysis of political party preferences, it might be better always to use “per cent Democratic,” that is, to let party preference always be the dependent variable, rather than to shift back and forth.
Ordering and placing categories. If the independent variable has no intrinsic order among its categories, the results will typically be clearer if
|Table 5 — Early consideration of graduate study, by field|
|Field of graduate study||Percentage of graduate students reporting having considered graduate study in field before Junior year in college||N*|
|* NA not presented here, because data are part of a larger tabulation in the original.|
|Source: National Opinion Research Center 1962, p. 222.|
the entries are arranged according to increasing or decreasing values of the dependent variable. Table 5 demonstrates this point; an alternate plan would be to group the fields of study by category (physical sciences, social sciences, humanities) and then order the fields within categories by the frequency of the percentages reporting the dependent variable. Although alphabetical ordering is sometimes desirable, it is the least efficient method in a table containing as few groups as Table 5.
Dichotomous dependent variables . When a dependent variable is dichotomous, as in Table 5, a decision has to be made as to whether to present both percentages. The style of Table 5 is generally preferable; the percentage of students who reported that they had not considered graduate study in the field is not presented, on the grounds that it is easily computable by the reader through subtraction, and its inclusion would have cluttered the table with unnecessary information. As a general rule, a table should contain as few numbers as possible without excluding vital information.
Per cent signs . It is important that the use of per cent signs in tables be consistent throughout a report. Conventions differ widely, but it is recommended that if a two-variable table reports only one half a dichotomy, a per cent sign should be placed after each percentage if there are only a few (as in Table 6) or after the first percentage in the column if there are many (as in Table 5). If there are many columns or rows, it may be best to eliminate per cent signs altogether, provided that the caption of the table or the headings on the columns or rows indicate clearly that the numbers in the table are percentages. An important consideration is whether or not actual numbers of cases (or amounts such as dollars) should also appear in the table; if they do, per cent signs serve an
|Table 6 — Percentage of students expecting professional employment after graduation, by field of study*|
|Field of study||Percentage expecting employment||N|
|* Hypothetical data.|
important role in preventing confusion . In any event, every total percentage should have a per cent sign.
Reporting N’s. A two-variable table distributed by one variable contains more than a single base N (as in Table 6), and it may contain a large number of N’s (as in Table 5). It is essential that an N be presented for each row or column of additive percentages; in extremely complex tables, these may be provided in footnotes.
A special problem arises if some of the percentages in a table are based upon a large number of cases while one or more are based upon very few. Table 6 illustrates the problem.
The “100 per cent” figure for astrophysics in Table 6 is unreliable because it is based on only one case, and the unwary reader who does not read the N’s upon which percentages are based may mistakenly conclude that a strong difference has been found. One is tempted to protect the reader against this sort of error by omitting categories for which the data are unreliable. If information on one or more categories is excluded from a table, however, the reader is unable to regroup the results and make his own calculations. Furthermore, some readers with a strong interest in a particular category would prefer that any available data be presented, on the grounds that some information is preferable to none at ail.
How should this problem be handled? A frequent procedure is to select a value of N below which results are considered unreliable. The value selected is, of course, arbitrary: 10 seems to be the lowest that is ever used, and 20 is much more common. There is one good argument for using at least 20; this is the lowest value at which a single case would make no more than a 5 per cent difference. Beyond selecting a value of 20 or more, three styles of presentation are possible: (1) percentages based upon lower values of N can be reported but placed in brackets (with a footnote explaining their meaning); (2) the N can be reported but the percentage replaced by a dash (again with an explanatory footnote); and (3) procedure (2) can be followed, but in addition to a dash signifying an unreliable percentage, the actual number of cases that would have been the numerator of the percentage can be given adjacent to the N, properly labeled. Each style has advantages and disadvantages; the use of brackets is probably the best compromise. (Note that if the data are from a probability sample, and if confidence intervals are reported for each percentage, the great width of the confidence interval when N is small serves as a warning of this problem.)
Reporting No Answers . In a cross tabulation of two variables there are three types of No Answers to consider: (1) No Answers on the dependent variable, (2) No Answers on the independent variable, and (3) No Answers on both variables. As a check on calculations, it is necessary to count all three types to determine that the totals for each variable are correct. Whether to report the No Answers separately or collectively is a matter of individual taste. Two rules of thumb are (1) a total of 10 per cent or more of No Answers of all types is large enough to raise questions in the mind of a critical reader, and (2) No Answers should often be reported separately if one variable contributes a disproportionate share of the total. Table 4 illustrates how No Answers may be reported separately; it also shows that while the total proportion of No Answers is less than 10 per cent, the dependent variable (grade-point average) is the major contributor.
Many analyses of survey data require the introduction of a second independent variable, usually an “intervening” variable that specifies or elaborates the relationship between the independent and the dependent variable. Table 7 is a typical example.
If Table 7 is read across the rows, it shows the relationship between combat experience and anxiety at any particular educational level; if it is read
|Table 7 — Percentage of soldiers with critical scores on the anxiety index, by educational level and combat experience*|
|Educational Attainment||COMBAT EXPERIENCE|
|None||Under fire||Actual combat|
|* Numbers in parentheses are base N’s for the ’percentages.|
|Source: Adapted from Stouffer et al. 1949, p. 447.|
|Grade school or less||40%||47%||57%|
|Attended high school||34%||42%||47%|
|High school graduate||20%||29%||36%|
|N = 2,187|
down the columns, it shows the relationship between educational level and anxiety at any particular level of combat experience. (One could also look at the row and column totals—these are discussed below.) Since a frequent problem in survey research is to ascertain the relative influence of traits (educational achievement) and exposure to experience (combat), tables having the format of Table 7 are frequently necessary. (See the tables in Kendall & Lazarsfeld 1950; Berelson et al. 1954; Sills 1957; Hyman 1955; Stouffer [1935-1960] 1962; Lazarsfeld & Thielens 1958; Davis 1964.)
Dichotomizing the dependent variable . The most important principle underlying the construction of tables with two or more independent variables is that in spite of the fact that these tables are mathematically three-, four-, or five-dimensional, they must somehow be presented on two-dimensional paper. (The classic presentation of this idea is in Zeisel  1957, pp. 67-90.) This can always be done clearly, provided that one of the variables is dependent and that it can be expressed as a dichotomy.
Many variables used in survey research are natural dichotomies, such as voted-did not vote and employed-not employed. Ordered classes, such as low-medium-high, level of education, and military rank can always be dichotomized by combining categories. Items that consist of true qualitative (unordered) categories (such as religious affiliation, field of study, ethnic origin) present difficult problems that are discussed below.
When categories are combined, considerable information is necessarily lost. The analyst who constructs a table is thus placed in a dilemma. If he does not dichotomize the dependent variable, the table may become so complicated that the reader cannot follow it; if he does dichotomize it, the reader may be unable to determine the full relationship between the independent and the dependent variables. Indeed, thoughtless dichotomization may conceal important nonmonotonic relationships— more precisely known as nonisotropic relationships (see Yule & Kendall  1958, pp. 57-59).
There is no clear-cut resolution of the problem of dichotomization, although a few general guidelines can be set down. First, the analyst should inspect the raw data tables; if he is satisfied that the relationship is essentially monotonic, the percentage table should present the dependent variable as a dichotomy. Second, if a more complex form of relationship is found, the data are probably better presented by a graph or by a coefficient of some kind. Third, in cases of doubt, a reference should be made to the raw data tables in the appendix, so that the reader may draw his own conclusions about the wisdom of the dichotomization. The analyst should remember, however, that rather large samples are required to detect complex relationships reliably and that all the little “jiggles and bounces” in the data are not grounds for excitement.
Ordered classes should be dichotomized as closely as possible to the median (the cutting point that splits the sample into two groups of equal size). This rule is useful, but it has exceptions. For example, if a study concerns the political participation of young people, the age variable should probably be dichotomized at age 21 (or whatever is the local legal age for voting), even if it is not the median, because it provides a particularly meaningful cutting point. Cutting points that leave only a handful of cases in one group should be avoided (for example, it would usually be foolish to dichotomize economic status into “millionaires” and “all others”).
Arrangement and symbols . In presenting three-variable tables, the data should be arranged so that the most important comparisons are those between adjacent numbers, because such comparisons are easier for the reader to make than are comparisons between nonadjacent numbers (see the first part of Table 8). Thus, in planning a complex percentage table, it is often useful to place the base N’s in parentheses, below and to the right of the percentages. Table 8 presents the preferred and two less preferable ways of presenting base N’s. In the second example in Table 8, the base N (745) intervenes between the two comparisons in the first column; in the third example it intervenes between the two comparisons in the first row; in the first example, neither comparison is obstructed. Furthermore, the reader interested in examining the raw numbers can, if the first example is followed, compute the correlation between the two independent variables by comparing bases that are adjacent to each other in rows and columns. For example, the base N’s given in Table 7 make it possible to com
|Table 8. — Examples of preferred and nonpreferred methods of displaying base N’s|
pute the raw-data (marginal) association between educational level and combat experience.
In general, one independent variable should be displayed in the columns and the other in the rows (see Table 7). This makes for fewer intervening numbers than an all-column or all-row arrangement.
In three-variable tables, it is recommended that a per cent sign be placed adjacent to every percentage in the table. If no per cent signs are used, confusion with the base N’s can result; if only the top row of percentages has per cent signs, the non-additive nature of the percentages may not be apparent.
The use of separate tables . Table 7 presents two partial relationships (between education and anxiety, controlling for combat experience; and between combat experience and anxiety, controlling for education). It does not, however, present the zero-order (two-variable) relationships between the independent and the dependent variables. It is tempting to include these by adding a Totals column giving anxiety percentages by educational level, regardless of combat experience, and a Totals row giving anxiety percentages by amount of combat experience, regardless of educational attainment.
The resulting table would be compact, but it would have drawbacks. First, it would present two mathematically distinct statistical relationships (zero-order and partial relationships) without making a clear visual distinction between them. Second, the natural way of reporting a research finding is to begin with zero-order relationships and then discuss the partials. If Total columns and rows are included, the reader is, in effect, asked to read them while ignoring the partiais in the interior of the table.
For these reasons it is generally preferable to present a series of discrete tables. In the present instance, the following sequence is suggested:
Table a—Education and anxiety; Table b—Combat experience and anxiety; Table c—Education and combat experience; Table d—Table 7. This procedure would be space-consuming, but each table would tell a simple and distinct story. In fact, if this sequence is used, the text almost writes itself:
There is an association between education and anxiety (Table a). But combat experience is also associated with anxiety (Table b), and the less-educated soldiers are a little more likely to have been in actual combat (Table c). However, the educational difference is not a spurious effect of differentials in combat, for when combat experience is controlled, the educational difference remains (Table 7).
In practice, one would provide a fuller description of the tables, explaining the variables and the relationships in more detail, but these four tables provide the basic skeleton for a verbal description of the findings.
It should be remembered that even highly educated people are-of ten poor readers of tables. Accordingly, important tables should be inserted in the text, not at the end of a report or in an appendix; furthermore, only essential tables should appear at all.
Dependent variables that cannot be dichotomized . There remains the problem of how to present in tabular form dependent variables that are true trichotomies, as well as other qualitative classifications that cannot reasonably be dichotomized. If the style of the dichotomy table (for example, Table 8) is followed, the result would be in the form of Table 9, in which major field of study is the dependent variable.
A table format such as that of Table 9 means that column comparisons have a large number of “intervening” figures, and if one is concerned with the effect of variable A as well as of variable B, the table is hard to read. Although more space is consumed, it is often better to break the table down
|Table 9 — Three-variable table with a fr/chofomi’zed dependent variable|
|INDEPENDENT VARIABLE B|
|Natural sciences||%||Natural sciences||%|
|Yes||Social sciences||%||Social sciences||%|
|INDEPENDENT VARIABLE A||(N)||(N)|
|Natural sciences||%||Natural sciences||%|
|No||Social sciences||%||Social sciences||%|
into subtables, one for each category of the dependent variables, particularly if the nature of the difference varies between the categories of the dependent variable (for example, social science majors vary with A, natural science majors vary with B, and humanities majors vary with neither). Table 10 shows the format of a table of this kind.
Finally, a trichotomized dependent variable can be presented in two dimensions, without collapsing, by the use of triangular coordinate graph paper (see Coleman 1961, p. 29; Davis 1964, especially pp. 95-97). This is a very useful procedure, but it must be carefully explained to readers. Other graphical devices have been suggested; for example, see Anderson 1957.
Four-or-more variable tables
If there are four, five, or six independent variables in one table, no new problems arise, but the old ones become intensified. Such tables have more headings, and they usually have smaller case bases and greater problems of dichotomization.
Reading a four-variable table is a difficult task. To assist the reader, a number of subtables should be used (as in Table 10). Also, the analyst must be sure to explain the table and the findings it presents very carefully in the text. The technique of “talking through” a table should be employed, as in this example (referring to Table 11);
Beginning in the upper left-hand cell of the table, observe the entry 58 per cent, which is the percentage employed among fathers under 27 years of age attending private schools. Following across that row, the percentages increase from 58 to 69. A similar trend is found in each row, which means that when family role and type of school attended are held constant, the likelihood of employment increases with age. Turning now to the effect of family role. . . .
Note that the above hypothetical text matter is not an interpretation of the results but, rather, a translation of them from percentages into words. Such a description may justifiably run to several hundred words. (An extreme example is Davis 1964, which consists entirely of the discussion of a single table that cross tabulates nine variables.)
Focusing on one independent variable. One variable among the three or more independent variables is often singled out for attention. Given the
fact that variables A, B, and C are all correlated with variable D, the real interest may be in whether C is still correlated with D when A and B are held constant rather than in the independent effects of A and B. Accordingly, if the analyst wants to focus attention on one of the independent variables, he should present it alone in the columns of the table and use the rows for combinations of the other variables, as in Table 11.
Each row in Table 11 shows an age contrast, controlling for family role and type of school. In order to examine other effects, the reader must shift his eyes between the columns and rows.
The percentage-difference table . An extension of the above presentational strategy is the percentage difference table, which presents the effects of a given variable for various combinations of control variables. This table consists of rows and columns laid out according to the control variables, with the entries consisting of the percentage difference in the dependent variable produced by the test variable. Table 12 is an example.
|Table 12 — Percentage differences in full-Time employment for the age dichotomy in Table 11*|
|TYPE OF SCHOOL|
|* Percentage employed among those students who are 27 or older minus the percentage employed among those under 27.|
Table 12 demonstrates that age has a positive effect on each of the four control categories. In practice, one would probably not present a percentage difference in a situation where there are so few control categories, but when there are many categories, a difference table can make clear what otherwise appears to be an obscure pattern. In particular, a difference table can reveal complex patterns of interactions. Table 13 is a hypothetical example.
Table 13 summarizes the following complex relationship: “Cramming the night before an examination is associated with better final grades only among the students with high IQs who had done fairly well at midterm; the relationship is the same for men and women.” A complicated pattern such as this might well have been lost if a standard five-variable table format had been used.
The passive role played by sex in Table 13 raises the question of whether or not data should be presented when there are neither effects nor interactions. In general the answer is No, since these no-effect variables add to the complexity of multi-variable tables without adding to their content. In particular, one should avoid the not uncommon practice of maintaining a variable throughout a report simply because it was introduced at an early stage in the analysis and there is reluctance to take the time to recalculate the data. This will avoid the fallacy of pseudo rigor, a tendency to make research appear more meticulous than it is by controlling for irrelevant variables. Nevertheless, there are situations when it is interesting and important to present negative results in tabular form. By presenting such results, the analyst may answer questions that may be in the reader’s mind and may prevent others from making unnecessary calculations. In the case of Table 13, for example, it might be argued that there is substantial evidence from other research about sex differences in academic achievement and, accordingly, that it is worthwhile to show that this finding holds for both men and women.
Two important limitations of percentage-difference tables stem from the inherent properties of percentages. First, these tables are appropriate only if both the dependent variable and the independent variable are dichotomized (the range in percentages of the dependent variable produced by variation of the independent variable over multiple levels is not an acceptable measure). Second, the absolute values of the difference must be interpreted with extreme care. When the two percentages are either very large or very small, a slight difference between them may represent as strong an effect as a larger difference when they both are of medium size. Thus, for example, if one calculates the coefficient of association Q for the data in Table 12, one finds that the Q value associated
|Table 13 — Percentage differences between crammers and moviegoers in final course grades*|
|* Hypothetical data showing percentage receiving high grades among those who crammed the night before the examination minus the percentage among those who went to the movies.|
* Number s in parentheses are base N’s for the percentages.
with the 5 per cent in the lower right-hand corner (Q = .40) is a little higher than for the 15 per cent in the lower left-hand corner (Q = .36). The reason for this difference is that the former is based on two small percentages (9 per cent and 4 per cent), while the latter is based on moderate percentages (35 per cent and 20 per cent). If the size of the effects is important, it is better to use other coefficients of association than the percentage difference. [See Statistics, Descriptive, article on association.]
Comparisons between independent variables . A different situation obtains when the purpose of a table is to show that each of several independent variables makes a difference in the dependent variable, a situation in survey analysis that is somewhat akin to multiple correlation.
The Rossi stratagem. Given the limitations of geometry, as more comparisons are desired, it becomes increasingly difficult to present all the relevant comparisons adjacent to each other. Peter H. Rossi has developed a way to do this for three independent variables (Table 14). The table has an unorthodox appearance, but it has several advantages over any other form of presentation. First, each vertical comparison involves a sex difference, controlling for the other two variables. Second, each horizontal comparison involves an age difference, controlling for the other two variables. Third, each diagonal comparison (represented by broken lines) involves an educational difference, controlling for the other two variables. Fourth, there are no intervening percentages between any two percentages that are to be compared.
Weaker effects “inside” stronger ones. When there are four or more independent variables, even the stratagem developed by Rossi and illustrated in Table 14 breaks down. In such instances, it is impossible to place percentages that are to be compared adjacent to each other. When the variables differ considerably and consistently in their percentage effects, the weaker effects should be placed “inside” the stronger effects.
An example using hypothetical data will make the idea clear. Consider four dichotomous independent variables, A, B, C, and D, that produce the following consistent percentage differences in dependent variable E
A = 5% ; B = 5% ; C = 25% ; D = 25%.
It is desired to construct a table showing the simultaneous effects of the four independent variables upon the dependent variable, E. Following the rule of placing the weaker effects “inside” the stronger effects, A and C are paired, as are B and D. The table is then designed so that variable A is nearer the cell entries than variable C, and variable B is “inside” variable D (Table 15, where A, for example, means “not A.”)
|Table 15 — Percentage of respondents scored positively on variable E, by variables A, B, C, and D*|
|* Hypothetical data.|
|B||25 %||30%||50 %||55%|
|B||5 %||10%||30 %||35 %|
|B||0 %||5 %||25 %||30 %|
The meaning of Table 15 is clear at a glance. The percentages increase steadily up each column and across each row. Therefore, it is apparent that each independent variable adds to the percentage scoring positively on dependent variable, E. Suppose, however, that the rule for the placing of variables was violated, and the weaker effects were
|Table 16 — Percentage of respondents stored positively on variable E, by variables A, B, C, and D*|
|* Hypothetical data.|
placed “outside” and the stronger effects “inside” (Table 16).
Tables 15 and 16 present identical information, but the meaning of the data as they are displayed in Table 16 is not at all clear at a glance. The reader must take the time to make his own specific percentage comparisons if he wishes to verify the statements made in the text of the report.
With actual data the percentage effects are seldom as consistent as those in Table 15, and such a perfect progression cannot always be displayed. However, if there are four or more variables whose independent effects are to be shown, it is always worth the time to seek the most lucid arrangement of the variables. Note that even in Table 11 the control variables are displayed so that there is a smooth progression up and down the columns, a style that adds considerably to ease of comprehension.
There is no method of tabular presentation that will make small differences any larger or trivial findings of substantive importance. Nevertheless, the presentation of percentage data in a form that will enable the reader to read them and test them is an important aspect of communicating the findings of survey research.
James A. Davis and Ann M. Jacobs
[See also Graphic Presentation]
American Psychological Association (1952) 1957 Publication Manual of the American Psychological Association. Rev. ed. Washington: The Association. → A style manual giving instructions chiefly for the journals of the American Psychological Association, but containing much generally useful material. Tabular presentation is described on pages 30-40.
Anderson, Edgar 1957 A Semigraphical Method for the Analysis of Complex Problems. National Academy of Sciences Proceedings 43:923-927.
Berelson, Bernard; Lazarsfeld, Paul F.; and McPhee, William N. 1954 Voting: A Study of Opinion Formation in a Presidential Campaign. Univ. of Chicago Press.
Chaundy, Theodor W.; BARRETT, P. R.; and Batey, Charles 1954 The Printing of Mathematics: Aids for Authors and Editors and Rules for Compositors and Readers at the University Press. Oxford Univ. Press. → Described by the authors as the successor to G. H. Hardy’s pamphlet, Notes on the Preparation of Mathematical Papers, published in 1932. Deals chiefly with the setting of equations; pages 68-69 treat tables. Good exposition of mechanics of typesetting for statistical and technical authors.
Chicago, University Of, PRESS (1906) 1956 A Manual of Style, llth ed. Univ. of Chicago Press. → A standard manual of typographic and editorial practice. The chapter on tables, pages 158-172, gives a variety of examples from scholarly disciplines.
Coleman, James S. 1961 The Adolescent Society: The Social Life of the Teenager and Its Impact on Education. New York: Free Press. → See especially pages 29 ff. for examples of trichotomous data laid out on triangular coordinate graph paper.
Croxton, Frederick E.; Cowden, Dudley J.; and Klein, Sidney (1939) 1967 Applied General Statistics. 3d ed. Englewood Cliffs, N.J.: Prentice-Hall. → A standard text. Pages 45-59 of the 1967 edition cover table construction for the presentation of classified statistical data. Earlier editions were by Croxton and Cowden.
Davis, James A. 1964 Great Aspirations: The Graduate School Plans of America’s College Seniors. Chicago: Aldine. → See especially pages 53 ff. for examples of trichotomous data laid out on triangular coordinate graph paper and two-, three-, and four-variable tables of the kind discussed in this article.
Davis, James A. et al. 1961 Great Books and Small Groups. New York: Free Press.
Diexel, Karl 1936 Normung statistischer Tabellen. Institut International de Statistique, Revue 4:232-237.
Hall, Ray O. (1943) 1946 Handbook of Tabular Presentation; How to Design and Edit Statistical Tables: A Style Manual and Case Book. New York: Ronald Press.
Hyman, Herbert H. 1955 Survey Design and Analysis: Principles, Cases, and Procedures. Glencoe, 111.: Free Press.
Kendall, Patricia L.; and Lazarsfeld, Paul F. 1950 Problems of Survey Analysis. Pages 133-196 in Robert K. Merton and Paul F. Lazarsfeld (editors), Continuities in Social Research: Studies in the Scope and Method of The American Soldier. Glencoe, 111.: Free Press.
Lazarsfeld, Paul F.; and Thielens, Wagner Jr. 1958 The Academic Mind: Social Scientists in a Time of Crisis. A report of the Bureau of Applied Social Research, Columbia University. Glencoe, 111.: Free Press.
Myers, John H. 1950 Statistical Presentation. Ames, Iowa: Littlefield.
National Opinion Research Center 1958 Survey of Graduate Students. Unpublished manuscript.
National Opinion Research Center 1962 Stipends and Spouses: The Finances of American Arts and Science Graduate Students, by James A. Davis et al. Univ. of Chicago Press.
Sills, David L. 1957 The Volunteers: Means and Ends in a National Organization. Glencoe, 111.: Free Press.
Stouffer, Samuel A. (1935-1960) 1962 Social Research to Test Ideas: Selected Writings. New York: Free Press.
U.S. Bureau OF Agricultural Economics (1937) 1942 The Preparation of Statistical Tables: A Handbook. Washington: Government Printing Office.
U.S. Bureau OF THE Budget, Office OF Statistical Standards 1963 Statistical Services of the United States Government. Rev. ed. Washington: Government Printing Office. → See especially “Presentation of the Data.”
U.S. Bureau OF THE Census 1949 Manual of Tabular Presentation: An Outline of Theory and Practice, by Bruce L. Jenkinson. Washington: Government Printing Office. → See the review by Hall in the June 1950 issue of the Journal of the American Statistical Association.
Walker, Helen M.; and Burost, Walter N. 1936 Statisticai Tobies: Their Structure and Use. New York: Columbia Univ. Press.
Wallis, W. Allen; and Roberts, Harry V. 1956 Statistics: A New Approach. Glencoe, 111.: Free Press.
Watkins, George P. 1915 Theory of Statistical Tabulation. Journal of the American Statistical Association 14:742-757.
Yule, G. Udny; and Kendall, Maurice G. (1911) 1958 An Introduction to the Theory of Statistics. 14th ed., rev. & enl. London: Griffin. → Kendall has been a joint author since the eleventh edition (1937). The 1958 edition was revised by him.
Zeisel, Hans (1947)1957 Say It With Figures. 4th ed., rev. New York: Harper. → Designed to initiate the nonstatistical reader into survey analysis. Covers (inter alia) multidimensional tables, indexes, analysis of data by cross-tabulation. Copiously illustrated with tables and charts.
I like it very much
I like it somewhat
I dislike it
The most basic type of analytic table is the percentage table. The simplest percentage table is the univariate type which presents the distribution of answers to a single question. A two-way (or two variable) table shows the relationship between a dependent and independent variable. For example, in the table above, the hypothetical responses to a question ‘In general, how do you like sociology?’ are broken down by sex. These hypothetical data would illustrate that male students are more positive about sociology than are female students. In order to be sure that the difference is not simply due to sampling error it would be necessary also to include the associated significance test. However, no table can show whether or not the difference is of substantive importance, and the scientist must establish in the text why the results matter. See also CONTINGENCY TABLE.