views updated

Eighteenth-Century Advances in Statistics and Probability Theory

Overview

Probability theory tells us how likely it is that an event will occur. Statistics tell us, among other things, how likely it is that a particular set of data accurately reflects reality. The two fields are closely linked because statistical results indicate the probability that our data is accurate. Both have had a profound impact on society, influencing such diverse ventures as quantum mechanics, the gambling industry, insurance, and the space shuttle.

Background

Games of chance date to early in human history, and include dice, cards, and other diversions that produce random results. If played fairly, all players have an equal chance of winning. These games inspired the first mathematical investigations into probability because mathematicians wondered if a deeper mathematical truth lay behind random events and, if so, if patterns in such events could be discerned and predicted.

Jerome Cardan (1501-1576), an Italian mathematician whose most significant contributions were in algebra, performed the first probability study in the sixteenth century. Unfortunately, his work was neglected and, a century later, reinvented by Blaise Pascal (1622-1662), the French mathematician and physicist, whose inspiration came from a dice game. Pascal's work was built upon over the years and, by the end of the eighteenth century, statistics had emerged as an independent field, albeit closely allied with probability. With the flowering of science during the Enlightenment, statistics became very much a field devoted to the analysis of data. All possible information was gleaned from the actual data, and much emphasis was placed on patterns that could be discerned from it.

Probability theory, on the other hand, became more closely allied with gaming until the advent of the Industrial Revolution. Then, as the industrialized nations became more mechanized, probability theory began to be used as a predictor, asking what the chance was that a particular piece of equipment, for example, would break down in a particular year. Still later, with the introduction of quantum mechanics, probability theory came into its own because, as it turned out, virtually all events at the level of individual atoms or particles are uncertain. Because of this, scientists could say, for example, that the probability of a particular atom undergoing radioactive decay in a particular period of time was 50%. In a sense, quantum mechanics helped rescue probability theory from the factories and gambling halls.

Also at about this time, the insurance industry was beginning to emerge. Insurance companies, in effect, bet that they will collect more money from their clients than they will spend on an unfortunate accident. It became very important for insurance companies to understand the odds (or the probability) that an event would (or would not) take place. This, in turn, produced actuaries, who are analysts that tell insurance companies how much to charge for their premiums. For example, an insurance company will charge an elderly person more for health insurance because statistics show that the elderly are more likely to become ill and that their illnesses generally cost more to treat. Without probability theory and good, solid statistics, such determinations would be very difficult, if not impossible to undertake.

Impact

Statistics underlies virtually all scientific research, and also influences most social science research, opinion polling, insurance rates, epidemiology, public health, and a number of other fields. Without statistics, our world would be very different, as would the way we see our world. Consider, for example, the following headlines:

Violent crime rates drop for third straight year

Study shows drinking orange juice cuts cancer risk

Scientists discover new planet outside the solar system

New drug helps fight AIDS

Physicists discover top quark

Polls show public favors tax bill

All of these depend on statistical sampling or the statistical analysis of data. While statistics's relation to crime rates or public polls may be obvious, the others topics are equally dependent on statistics, because scientists must convince other scientists that their findings are correct and not merely some fluke.

For example, most planets found outside the solar system are detected because their gravitational pull causes their star to wobble very slightly. This wobble is barely noticeable, and can be detected only through a sensitive statistical analysis of data over the course of up to several years. Astronomers must consider random fluctuations due to turbulence in the Earth's atmosphere, the effects of our motion around our sun, effects due to changes in the telescope, and a host of other factors. When all of these have been considered, the astronomers must then show that, statistically speaking, there is more than a 95% chance that what they claim to have seen is an actual effect. This requires still more statistical treatment of the data, ending up with a claim that is likely to be accepted by the scientific community at large.

Similarly, every discovery of new subatomic particles must be shown to be statistically significant, as must discoveries of behavioral patterns, disease patterns, and so forth. Scientists in a wide variety of fields use very similar statistical methods to show that their data are valid and that their theories are likely to be correct interpretations of that data. In fact, a great deal of quantum mechanics, subatomic physics, and the fields of thermo and gas dynamics depend completely on probability theory and the statistical treatment of physical phenomena. Although we cannot predict what an individual atom will do, we can predict quite nicely the behavior of tens of billions of atoms, using well know laws of statistics and probability.

Engineering uses a branch of statistics and probability theory called probabilistic risk assessment (PRA). In this field, a complex system, such as the space shuttle, is examined to determine the probability that a particular component will fail. Then, all the things that failure will affect are assessed and the probability that each of those will happen is analyzed. For each of those things, the assessors again try to figure out everything that can go wrong and, again, probabilities are assigned to each failure mode. This is propagated until either there are no more components that can fail, until all of the high-probability failure modes have been exhausted, or until some other stopping point is reached. Since many end results can have multiple causes (for example, a host of factors can cause an engine to shut down prematurely), the odds of a particular outcome are summed across all of the failure pathways to reach a final answer. This answer will tell the engineers the chance that, say, a main shuttle engine will stop for any reason.

In the case of PRA, many of the failure probabilities are determined by a statistical analysis of various components and systems that is usually compiled during design and testing. A thousand turbopump bearings, for example, may be subjected to the stresses of takeoff to see how many takeoffs they can survive before cracking or splitting. When all of the bearings have failed, a bell curve can be drawn, showing the mean lifespan of such a bearing under that level of stress. This curve, then, becomes input to a PRA analysis on the turbopump—one of the factor that can lead to pump failure during takeoff. The raw data come from statistics, the final answer from probability theory.

The other field in which probability and statistics are so closely linked is the insurance industry. As mentioned previously, the insurance companies are betting that customers will spend more on premiums than they will pay in claims. In order to attract customers, insurance rates must be low while, to stay in business, they cannot be so low that paying off policies costs more than the companies bring in.

This has led to the specialized field of actuarial science. Actuaries perform sophisticated statistical analyses of all sorts of factors, trying to arrive at a reasonable probability that certain events will take place. Once they know the likelihood that, for example, a certain type of car will be stolen, they can set an insurance rate. Say, for example, that 1% of all Corvettes are stolen and not recovered in a given year, and that the average Corvette costs \$20,000. That means the insurance company has to charge 1% of the cost of the average Corvette, or about \$200 each year simply to cover what it will cost to replace these Corvettes. If the company charges \$150, it will lose money; if it charges \$250 it will make a profit. Insurance companies perform similar calculations for everything—the incidence of various diseases at different stages of life, the risk of an airplane crash, or the chance that a satellite will not reach orbit. In these, as in so many other fields, a solid grasp of probability and statistics is essential.

P. ANDREW KARAM