You may have heard the saying "You can prove anything with statistics," which implies that statistical analysis cannot to be trusted, that the conclusions that can be drawn from it are so vague and ambiguous that they are meaningless. Yet the opposite is also true. Statistical analysis can be reliable and the results of statistical analysis can be trusted if the proper conditions are established.
What Is Statistical Analysis?
Statistical analysis uses inductive reasoning and the mathematical principles of probability to assess the reliability of a particular experimental test. Mathematical techniques have been devised to allow measurement of the reliability (or fallibility) of the estimate to be determined from the data (the sample, or "N") without reference to the original population. This is important because researchers typically do not have access to information about the whole population, and a sample—a subset of the population— is used.
Statistical analysis uses a sample drawn from a larger population to make inferences about the larger population. A population is a well-defined group of individuals or observations of any size having a unique quality or characteristic. Examples of populations include first-grade teachers in Texas, jewelers in New York, nurses at a hospital, high school principals, Democrats, and people who go to dentists. Corn plants in a particular field and automobiles produced by a plant on Monday are also populations. A sample is the group of individuals or items selected from a particular population. A random sample is taken in such a way that every individual in the population has an equal opportunity to be chosen. A random sample is also known as an unbiased sample.
Most mail surveys, mall surveys, political telephone polls, and other similar data gathering techniques generally do not meet the proper conditions for a random, unbiased sample, so their results cannot to be trusted. These are "self-selected" samples because the subjects choose whether to participate in the survey and the subjects may be picked based on the ease of their availability (for example, whoever answers the phone and agrees to the interview).
Selecting a Random Sampling
The most important criterion for trustworthy statistical analysis is correctly choosing a random sample. For example, suppose you have a bucket full of 10,000 marbles and you want to know how many of the marbles are red. You could count all of the marbles, but that would take a long time. So, you stir the marbles thoroughly and, without looking, pull out 100 marbles. Now you count the red marbles in your random sample. There are 10. Thus you could conclude that approximately 10 percent of the original marbles are red. This is a trustworthy conclusion, but it is not likely to be exactly right. You could improve your accuracy by counting a larger sample; say 1,000 marbles. Of course if you counted all the marbles, you would know the exact percentage, but the point is to pick a sample that is large enough (for example, 100 or 1,000) that gives you an answer accurate enough for your purposes.
Suppose the 100 marbles you pulled out of the bucket were all red. Would this be proof that all 10,000 marbles in the bucket were red? In science, statistical analysis is used to test a hypothesis . In the example we are testing, the hypothesis would be "all the marbles in the bucket are red."
Statistical inference makes it possible for us to state, given a sample size (100) and a population size (10,000), how often false hypotheses will be accepted and how often true hypotheses are rejected. Statistical analysis cannot conclusively tell us whether a hypothesis is true; only the examination of the entire population can do that. So "statistical proof" is a statement of just how often we will get "the right answer."
Using Basic Statistical Concepts
Statistics is applicable to all fields of human endeavor, from economics to education, politics to psychology. Procedures worked out for one field are generally applicable to the other fields. Some statistical procedures are used more often in some fields than in others.
Example 1. Suppose the Wearemout Pants Company wants to know the average height of adult American men, an important piece of information for a clothing manufacturer producing pants. The population is all men over the age of 25 who live in the United States. It is logistically impossible to measure the height of every man who lives in the United States, so a random sample of around 1,000 men is chosen. If the sample is correctly chosen, all ethnic groups, geographic regions, and socioeconomic classes will be adequately represented. The individual heights of these 1,000 men are then measured. An average height is calculated by dividing the sum of these individual heights by the total number of subjects (N = 1,000). By doing so, imagine that we calculate an average height is 1.95 meters (m) for this sample of adult males in the United States. If a representative sample was selected, then this figure can be generalized to the larger population.
The random sample of 1,000 men probably included some very short men and some very tall men. The difference between the shortest and the tallest is known as the "range" of the data. Range is one measure of the "dispersion" of a group of observations. A better measure of dispersion is the "standard deviation." The standard deviation is the square root of the sum of the squares of the differences divided by one less than the number of observations.
In this equation, xi is an observed value and is the arithmetic mean.
In our example, if a smaller height interval is used (1.10 m, 1.11 m, 1.12 m, 1.13 m, and so on) and the number of men in each height interval plotted as a function of height a smooth curve can be drawn which would have a characteristic shape, known as a "bell" curve or "normal frequency distribution." A normal frequency distribution can be stated mathematically as
The value of sigma (σ) is a measure of how "wide" the distribution is. Not all samples will have a normal distribution, but many do, and these distributions are of special interest.
The following figure shows three normal probability distributions. Because there is no skew, the mean, median, and mode are the same. The mean of curve (a) is less than the mean of curve (b), which in turn is less than the mean of (c). Yet the standard deviation, or spread, of (c) is least, whereas that of (a) is greatest. This is just one illustration of how the parameters of distributions can vary.
Example 2. One of the most common uses of statistical analysis is in determining whether a certain treatment is efficacious. For example, medical researchers may want to know if a particular medicine is effective at treating the type of pain resulting from extraction of third molars (known as "wisdom" teeth). Two random samples of approximately equal size would be selected. One group would receive the pain medication while the other group received a "placebo," a pill that looked identical but contained only inactive ingredients. The study would need to be a "double-blind" experiment, which is designed so that neither the recipients nor the persons dispensing the pills knew which was which. Researchers would know who had received the active medicines only after all the results were collected.
Example 3. Suppose a student, as part of a science fair project, wishes to determine if a particular chemical compound (Chemical X) can accelerate the growth of tomato plants. In this sort of experiment design, the hypothesis is usually stated as a null hypothesis : "Chemical X has no effect on the growth rate of tomato plants." In this case, the student would reject the null hypothesis if she found a significant difference. It may seem odd, but that is the way most of the statistical tests are set up. In this case, the independent variable is the presence of the chemical and the dependent variable is the height of the plant.
The next step is experiment design. The student decides to use height as the single measure of plant growth. She purchases 100 individual tomato plants of the same variety and randomly assigns them to 2 groups of 50 each. Thus the population is all tomato plants of this particular type and the sample is the 100 plants she has purchased. They are planted in identical containers, using the same kind of potting soil and placed so they will receive the same amount of light and air at the same temperature. In other words, the experimenter tries to "control" all of the variables, except the single variable of interest. One group will be watered with water containing a small amount of the chemical while the other will receive plain water. To make the experiment double-blind, she has another student prepare the watering cans each day, so that she will not know until after the experiment is complete which group was receiving the treatment. After 6 weeks, she plans to measure the height of the plants.
The next step is data collection. The student measures the height of each plant and records the results in data tables. She determines that the control group (which received plain water) had an average (arithmetic mean) height of 1.3 m (meters), while the treatment group had an average height of 1.4 m.
Now the student must somehow determine if this small difference was significant or if the amount of variation measured would be expected under no treatment conditions. In other words, what is the probability that 2 groups of 50 tomato plants each, grown under identical conditions would show a height difference of 0.1 m after 6 weeks of growth? If this probability is less than or equal to a certain predetermined value, then the null hypothesis is rejected. Two commonly used values of probability are 0.05 or 0.01. However, these are completely arbitrary choices determined mostly by the widespread use of previously calculated tables for each value. Modern computer analysis techniques allow the selection of any value of probability.
The simplest test of significance is to determine how "wide" the distribution of heights is for each group. If there is a wide variance (σ) in heights (say, σ = 25), then small differences in mean are not likely to be significant. On the other hand, if the dispersion is narrow (for example, if all the plants in each group were close to the same height, so that σ = 0.1) then the difference would probably be significant.
There are several different tests the student could use. Selecting the right test is often a tricky problem. In this case, the student can reject several tests outright. For example, the chi-square test is suitable for nominal scales (yes or no answers are one example of nominal scales), so it does not work here. The F -test measures variability or dispersion within a single sample. It too is not suitable for comparing two samples. Other statistical tests can also be rejected as inappropriate for various reasons.
In this case, since the student is interested in comparing means, the best choice is a t test. The t -test compares two means using this formula:
In this case, the null hypothesis assumes that μ1 − μ2 = 0 (no difference in the sample groups), so that we can say:
The quantity is known as the standard error of the mean difference. When the sample sizes are the same, . The standard error of the mean difference is the square root of the sums of the squares of the standard errors of the means for each group. The standard error of the mean for each group is easily calculated from . N is the sample size, 50, and the student can calculate the standard deviation by the formula for standard deviation given above.
The final experimental step is to determine sensitivity. Generally speaking, the larger the sample, the more sensitive the experiment is. The choice of 50 tomato plants for each group implies a high degree of sensitivity.
Students, teachers, psychologists, economists, politicians, educational researchers, medical researchers, biologists, coaches, doctors and many others use statistics and statistical analysis every day to help them make decisions. To make trustworthy and valid decisions based on statistical information, it is necessary to: be sure the sample is representative of the population; understand the assumptions of the procedure and use the correct procedure; use the best measurements available; keep clear what is being looked for; and to avoid statements of causal relations if they are not justified.
see also Central Tendency, Measures of; Data Collection and Interpretation; Graphs; Mass Media, Mathematics and the.
Huff, Darrell. How to Lie With Statistics. New York: W. W. Norton & Company, 1954.
Kirk, Roger E. Experimental Design, Procedures for the Behavioral Sciences. Monterrey, CA: Brooks/Cole Publishing Company, 1982.
Paulos, John Allen. Innumeracy: Mathematical Illiteracy and Its Consequences. New York: Hill & Wang, 1988.
Tufte, Edward R. The Visual Display of Quantitative Information. Cheshire, CT: Graphics Press, 1983.
MEAN, MEDIAN, AND MODE
The average in the clothing example is known as the "arithmetic mean," which is one of the measures of central tendency. The other two measures of central tendency are the"median" and the "mode." The median is the number that falls in the mid-point of an ordered data set, while the mode is the most frequently occurring value.
"Statistical Analysis." Mathematics. . Encyclopedia.com. (May 29, 2017). http://www.encyclopedia.com/education/news-wires-white-papers-and-books/statistical-analysis
"Statistical Analysis." Mathematics. . Retrieved May 29, 2017 from Encyclopedia.com: http://www.encyclopedia.com/education/news-wires-white-papers-and-books/statistical-analysis
Classical Statistical Analysis
Classical Statistical Analysis
Classical statistical analysis seeks to describe the distribution of a measurable property (descriptive statistics) and to determine the reliability of a sample drawn from a population (inferential statistics). Classical statistical analysis is based on repeatedly measuring properties of objects and aims at predicting the frequency with which certain results will occur when the measuring operation is repeated at random or stochastically.
Properties can be measured repeatedly of the same object or only once per object. However, in the latter case, one must measure a number of sufficiently similar objects. Typical examples are measuring the outcome of tossing a coin or rolling a die repeatedly and count the occurrences of the possible outcomes as well as measuring the chemical composition of the next hundred or thousand pills produced in the production line of a pharmaceutical plant. In the former case the same object (one and the same die cast) is “measured” several times (with respect to the question which number it shows); in the latter case many distinguishable, but similar objects are measured with respect to their composition which in the case of pills is expected to be more or less identical, such that the repetition is not with the same object, but with the next available similar object.
One of the central concepts of classical statistical analysis is to determine the empirical frequency distribution that yields the absolute or relative frequency of the occurrence of each of the possible results of the repeated measurement of a property of an object or a class of objects when only a finite number of different outcomes is possible (discrete case). If one thinks of an infinitely repeated and arbitrarily precise measurement where every outcome is (or can be) different (as would be the case if the range of the property is the set of real numbers), then the relative frequency of a single outcome would not be very instructive; instead one uses the distribution function in this (continuous) case which, for every numerical value x of the measured property, yields the absolute or relative frequency of the occurrence of all values smaller than x. This function is usually noted as F (x ), and its derivative F’ (x ) = f (x ) is called frequency density function.
If one wants to describe an empirical distribution, the complete function table is seldom instructive. This is why the empirical frequency or distribution functions are often represented by a few parameters that describe the essential features of the distribution. The so-called moments of the distribution represent the distribution completely, and the lower-order moments represent the distribution at least in a satisfactory manner. Moments are defined as follows:
where k is the order of the moment, n is the number of repetitions or objects measured, and c is a constant that is usually either 0 (moment about the origin) or the arithmetic mean (moment about the mean), the first-order mean about the origin being the arithmetic mean.
In the frequentist interpretation of probability, frequency can be seen as the realization of the concept of probability: It is quite intuitive to believe that if the probability of a certain outcome is some number between 0 and 1, then the expected relative frequency of this outcome would be the same number, at least in the long run. From this, one of the concepts of probability is derived, yielding probability distribution and density functions as models for their empirical correlates. These functions are usually also noted as f (x ) and F (x ), respectively, and their moments are also defined much like in the above formula, but with a difference that takes into account that there is no finite number n of measurement repetitions:
where the first equation can be applied to discrete numerical variables (e.g., the results of counting), while the second equation can be applied to continuous variables. Again, the first-order moment about 0 is the mean, and the other moments are usually calculated about this mean. In many important cases one would be satisfied to know the mean (as an indicator for the central tendency of the distribution) and the second-order moment about the mean, namely the variance (as the most prominent indicator for the variation). For the important case of the normal or Gaussian distribution, these two parameters are sufficient to describe the distribution completely.
If one models an empirical distribution with a theoretical distribution (any non-negative function for which the zero-order moment evaluates to 1, as this is the probability for the variable to have any arbitrary value within its domain), one can estimate its parameters from the moments of the empirical distributions calculated from the finite number of repeated measurements taken in a sample, especially in the case where the normal distribution is a satisfactory model of the empirical distribution, as in this case mean and variance allow the calculation of all interesting values of the probability density function f (x ) and of the distribution function F (x ).
Empirical and theoretical distributions need not be restricted to the case of a single property or variable, they are also defined for the multivariate case. Given that empirical moments can always be calculated from the measurements taken in a sample, these moments are also results of a random process, just like the original measurements. In this respect, the mean, variance, correlation coefficient or any other statistical parameter calculated from the finite number of objects in a sample is also the outcome of a random experiment (measurement taken from a randomly selected set of objects instead of exactly one object). And for these derived measurements theoretical distributions are also available, and these models of the empirical moments allow the estimation with which probability one could expect the respective parameter to fall into a specified interval in the next sample to be taken.
If, for instance, one has a sample of 1,000 interviewees of whom 520 answered they were going to vote for party A in the upcoming election, and 480 announced they were going to vote for party B, then the parameter πA—the proportion of A-voters in the overall population—could be estimated to be 0.52, but this estimate would be a stochastic variable, which approximately obeys a normal distribution with mean 0.52 and variance 0.0002496 (or standard deviation 0.0158), and from this result one can conclude that another sample of another 1,000 interviewees from the same overall population would lead to another estimate whose value would lie within the interval [0.489, 0.551] (between 0.52 ± 1.96 0.0158) with a probability of 95 percent (the so-called 95 percent confidence interval, which in the case of the normal distribution is centered about the mean with a width of 3.92 standard deviations). Or, to put it in other words, the probability of finding more than 551 A-voters in another sample of 1,000 interviewees from the same population is 0.025. Bayesian statistics, as opposed to classical statistics, would argue from the same numbers that the probability is 0.95 that the population parameter falls within the interval [0.489, 0.551].
SEE ALSO Bayesian Statistics; Descriptive Statistics; Inference, Bayesian; Inference, Statistical; Sampling; Variables, Random
Hoel, Paul G. 1984. Introduction to Mathematical Statistics. 5th ed. Hoboken, NJ: Wiley.
Iversen, Gudmund. 1984. Bayesian Statistical Inference. Beverly Hills, CA: Sage.
Klaus G. Troitzsch
"Classical Statistical Analysis." International Encyclopedia of the Social Sciences. . Encyclopedia.com. (May 29, 2017). http://www.encyclopedia.com/social-sciences/applied-and-social-sciences-magazines/classical-statistical-analysis
"Classical Statistical Analysis." International Encyclopedia of the Social Sciences. . Retrieved May 29, 2017 from Encyclopedia.com: http://www.encyclopedia.com/social-sciences/applied-and-social-sciences-magazines/classical-statistical-analysis
"statistical analysis." A Dictionary of Computing. . Encyclopedia.com. (May 29, 2017). http://www.encyclopedia.com/computing/dictionaries-thesauruses-pictures-and-press-releases/statistical-analysis
"statistical analysis." A Dictionary of Computing. . Retrieved May 29, 2017 from Encyclopedia.com: http://www.encyclopedia.com/computing/dictionaries-thesauruses-pictures-and-press-releases/statistical-analysis