Frequency Distributions

views updated

Frequency Distributions

Every day, people are confronted with large amounts of information. Occasionally, the amount of data may be so large that they cannot interpret it in a meaningful way. When this occurs, people often need to organize and summarize the data in such a way that it allows them to uncover patterns or relationships or to use it as tools for analysis and interpretation.

One way to organize large amounts of data is through frequency distributions. A frequency distribution is a summary of data that shows the frequency or number of times a particular observation occurs. To construct a frequency distribution, it is necessary to arrange the data into categories that “represent value ranges of the variables in question” (Fleming 2000, p. 15).

For example, imagine that you are responsible for creating a range of activities for a group of fifty people, and you want to organize the activities according to age. The following is a list of the ages of the individuals in the group:

4	15	20	18	4
22	36	7	24	11
27	33	13	7	17
18	30	25	9	23
11	16	14	5	31
31	36	22	15	24
13	11	22	9	11
28	21	8	14	18
34	13	17	16	7
23	29	5	16	20

The first step in constructing a frequency distribution is to arrange the data from the smallest to the largest value to determine, as in this case, the age range. Next, the different values need to be listed with the number of times a particular age appears—its frequency.

Arranging the data in this way provides people with a more organized way to understand any trends or patterns in the data. However, even after organizing the data based on the frequency distribution, it can still be very overwhelming. By using what researchers refer to as “interval scores” or “class intervals,” frequency distributions become important tools to manage raw data (Hinkley 1982, p. 26). Interval scores or class intervals are intervals that include a range of values. There are several steps to establish class intervals. First, it is necessary to determine the number of non-overlapping classes. Classes need to be mutually exclusive, meaning that a value cannot belong to two different classes at the same time. According to convention, there should be between 5 and 15 classes.

The next step is to decide the width of each class. The size i will be the same for all classes, and it is calculated by subtracting the lowest score (4) from the highest score (36) and dividing it by the number of class intervals (e.g., 8). For the previous example, the approximate class width is 5. The larger the width of each class, the smaller the number of classes there will be.

Table 1.
List of values and their frequencies
Value	Frequency
4	1
5	2
7	3
8	1
9	2
11	4
13	3
14	2
15	2
16	3
17	2
18	3
20	2
21	2
22	3
23	2
24	2
25	1
27	1
28	1
29	1
30	1
31	2
33	1
34	1
36	2

Table 2. Frequency Distributions and Cumulative Frequencies
Frequency distributions and cumulative frequencies
Class Interval	f	*Cumulative f*
1-5	3	3
6-10	6	9
11-15	11	20
16-20	10	30
21-25	9	39
26-30	4	43
31-35	4	47
36-40	2	49

In addition, one needs to determine the limits of the frequency. The concept of exact limits of a score can be extended to frequency distributions by distinguishing between the exact limits of a given class interval and its score limits. The limits are determined so that each observation belongs to only one class. According to researcher Wiersma Hinkley, “the lower class limit identifies the smallest possible data value assigned to the class. The upper class limit identifies the largest possible data value assigned to the class. The exact limits are 0.5 units below and 0.5 units above the score limits of the class interval” (1982, p. 27). Consider the age interval 11–15 in Table 2. The interval 11–15 represents the score limits, whereas the interval 10.5-15.5 represents the exact limits. The midpoint will always be the same regardless of what type of limits are used. The midpoint of the interval is defined as “the point on the scale of measurement that is halfway through the interval” (p. 28). There will be scenarios in which some class intervals are empty. When this occurs, researchers suggest that these intervals be eliminated by combining them.

In Table 2, a frequency distribution f table is presented as well as the cumulative frequencies cumulative f. The cumulative frequency is “the total number of scores in the class interval and all intervals below it” (Abrami 2001, p. 63).

In most cases, frequency distributions allow an individual to understand raw data in a more meaningful way. However, it is sometimes necessary to have a visual display of the data beyond numbers to recognize the patterns that otherwise would be difficult to identify. Graphs, when clearly presented, are two-dimensional representations of data that can help in this endeavor. Graphs typically consist of vertical and horizontal dimensions known as axes. By tradition, the horizontal axis is called the abscissa, or the x-axis, and represents the independent variable. The vertical axis is called ordinate, or y-axis, and represents the dependent variable. Histograms, bar graphs, and frequency polygons are graphs that are used to visually display frequency distributions.

HISTOGRAMS

Histograms are one of the most useful graphical representations of statistical data. Histograms display the frequency of an individual score or scores in class intervals by the length of its bars. The majority of histograms represent a single variable. The variable of interest is placed on the x-axis and the frequency distribution, relative frequency, or percent frequency distribution on the y-axis In histograms, the bars are shown touching one another representing their continuous nature. This means that the variables are interval- (e.g., height or weight) or ratio-scaled. Also, the height of each bar represents its frequency because the class intervals are equal (see Figure 1).

BAR CHARTS

Besides histograms, bar charts are used to visually display frequency distributions. However, contrary to histograms, the variable under study has a nominal and ordinal value, which is represented by the spaces between

each bar. In Figure 2, the specific variables (categories and frequencies) that are used for the classes on each axis are presented.

FREQUENCY POLYGON

Another way to display information is through frequency polygons, also known as frequency curves or line graphs. In this type of graph, the scores of the class interval are displayed by using the midpoint of each class interval. Once these points have been marked, they are connected with straight lines. Frequency polygons are useful when trying to compare two different frequencies in one graph or to highlight trends over time.

SEE ALSO Methods, Quantitative

BIBLIOGRAPHY

Abrami, Philip C., Paul Cholmsky, and Robert Gordon. 2001. Statistical Analysis for the Social Sciences an Interactive Approach. Boston: Allyn and Bacon.

Anderson, Sweeney William. 2000. Descriptive Statistics I: Tabular and Graphical Methods. In Essentials of Statistics for Business and Economics. Cincinnati, OH: South-Western College.

Christensen, Larry, and Charles M. Stoup. 1986. Frequency Distribution and Percentiles. Introduction to Statistics for the

Social and Behavioral Sciences. Monterey, CA: Brooks/Cole Pub. Co.

Fleming, Michael C., and Joseph G. Nellis. 2000. Describing Data Tables, Charts, and Graphs. In Principles of Applied Statistics, 2nd ed., 13–50. New York: Thomson Learning.

Hays, William L. 1981. Frequency and Probability Distributions. In Statistics, 3rd ed. New York: Holt, Rinehart and Winston.

Hinkley, Wiersma Jurs. 1982. Frequency Statistics. In Basic Behavioral Statistics. Boston: Houghton Mifflin.

Wright, Daniel. 2002. Graphing Variables. In First Steps in Statistics. Thousand Oaks, CA: SAGE.

María Isabel Ayala

International Encyclopedia of the Social Sciences