Sports Data

views updated

Sports Data

Who was the greatest home run hitter: Babe Ruth or Roger Maris? How does Mark McGwire compare? What professional football team won the most games in one season? Sports fans argue such questions and use data collected about players and teams to try to answer them.

Every sport from baseball to golf, from tennis to rodeo competitions keep statistics about individual and team performances. These statistics are used by coaches to help plan strategies, by team owners to match salaries with performance, and by fans who simply enjoy knowing about their favorite sports and players. Most statistics are either about players or teams. There are many different statistical measures used, but the most common ones are maxima and minima, rankings, averages, ratios, and percentages.

Maximum or Minimum

Perhaps the simplest of all statistics answers the question, "Who won the most competitions?" or "Who made the fewest errors?" In all professional sports, data are kept on the number of games won and lost by teams and the number of errors made by players. A listing of the number of wins quickly shows which team won the most games. Mathematically, one is seeking the largest number (or the smallest number) in a list. These numbers are called the maximum (or the minimum) values of the list.


Another common way to compare players or teams is to rank them. The number of games won each season often ranks teams. Some sports rank individual players as well. Tennis, for example, ranks players (called their seed) based on the number of tournaments they have completed, the strength of their opponents, and their performance in each. Wins against higher-ranked opponents affect a player's seed more than wins against lower-ranked opponents.


Averaging is one of the most well known ways to create a statistic from data, and one of the most well known of all sports statistics is a baseball player's batting average. To average a series of numbers, add them up and divide by the number of numbers; a batting average is no different. Each time a player comes to the plate, he either gets 1 hit or he does not hit. To compute the batting average, add up all the numbers (that is, the total number of hits in a season) and divide by the total number of times he was at bat.

Averages are common in other sports as well. In football, for example, one can compute a player's average punt returns (total number of yards /number of attempts), his average yards rushing (total number of yards /

number of attempts), and his average yards receiving (total number of yards / number of times received).

In basketball, players are compared by their scoring average (total number of points scored divided by number of games played), their rebound average (total number of rebounds / number of games played), and their stealing average (total number of balls stolen / number of games played). In other sports, the scores are based on averages. In snowboarding, for example, each competitor is given a score from five different judges who look at different characteristics of a run. The average of these numbers determines a snowboarder's score on a scale of 1.010.0.

Averages sometimes need to be adjusted. One may want to compare the performance of two pitchers, for example, who played a different number of innings. To compute the pitcher's earned run average, divide the total number of earned runs by the total number of innings pitched and then multiply by 9. This gives a number that represents the number of runs that would have been earned if the player had completed nine innings of play.

Ratios and Percentages

Often interesting information cannot be obtained from a simple average. A baseball coach, for example, might want his next batter to get a walk and he must decide which of his pinch hitters is most likely to get a walk. In this case the most useful statistic is a percentage called the base on balls percentage. It is computed by dividing the number of times a player is at bat by the number of times he was walked. This decimal is then written as a percent. Similarly, the stolen base percentage is computed by dividing the number of stolen bases by the number of attempts at a stolen base.

Percentages are a commonly used statistic in other sports as well. In basketball, for example, a player's field-goal percentage is the ratio of the number of field goals attempted to the number of field goals made; a player's free-throw percentage and three-point field-goal percentage is calculated similarly. In football, a quarterback's efficiency is measured by the percentage of passes that are completed (number of passes completed / number of passes attempted).

Weighted Average

The batting average compares the number of hits to the number of times a player was at bat. This allows a comparison of players who have been at bat a different number of times. In this average, however, both singles and home runs are counted the sameas hits. A player who hits mainly singles could have a higher batting average than a player who mostly belts home runs. In order to average numbers where some of the numbers are more important than others, a statistician will often prefer to use a weighted average.

A baseball player's slugging average is a weighted average of all of the player's hits. To compute a weighted average, each number is assigned a weight. For the slugging average, a home run is assigned a weight of 4, a triple 3, a double 2, a single 1, and an out 0 points. The slugging average is computed as follows.

Multiply 4 (number of home runs).

Multiply 3 (number of triples).

Multiply 2 (number of doubles).

Multiply 1 (number of singles).

Multiply 0 (number of outs).

Add these numbers and divide by the number of at-bats.

For example, a player who has 80 at-bats and got 20 hits would have a batting average of 20 / 80 = .250 whether those hits were home runs or singles. Suppose those 20 hits included 4 home runs, 3 triples, 8 doubles, and 5 singles. To compute the player's slugging average, first compute 4(4) + 3(3) + 8(2) + 1(5) = 46 and then divide by the number of at-bats to get SA = 46 / 80 = .575. Another player who had 15 singles and 5 doubles in 80 times at bat would have the same batting average but a slugging average of only [2(5) + 1(15)] / 80 = .3125.

Threshold Statistics

In many situations there is no need for any computation. Often there is an interest in knowing whether a player has reached some mark of excellence. These are called threshold statistics. For example, pitchers who have pitched a perfect game (no one gets on base) or bowlers who have scored 300 (a perfect score) have reached a threshold of excellence. Other examples of threshold statistics are pitchers who have pitched a no-hitter, pitchers who have attained 3,000 strikeouts in their careers, and batters who have hit "for the cycle" (a single, double, triple, and home run in the same game).

Algebraic Formula

Many sports enthusiasts and coaches have come up with other, sometimes quite complex measures of individual and team performance. These are often expressed as algebraic formulas that combine other more elementary statistics. A simple example from hockey, for example, is a player's +/

TeamGames PlayedFumblesInterceptionsTotalFumblesInterceptionsTotalMargin
Ohio State121018286121810
Penn State1271724119204
Michigan State1161218916257

score. This is computed from the formula PM = (T+) (T) where T+ stands for the number of minutes the player was on the ice when his team scored a goal and T stands for the number of minutes he was on the ice when a goal was scored against his team. The difference in these numbers is the +/ score or PM. This gives a measure of the efficiency of different players in both offensive and defensive play.

A similar computation is used in football to determine a team's total turnovers (called the margin in the table.) It is computed by subtracting the number of times a team lost the ball through fumbles or interceptions from the number of the times the team gained the ball through fumbles or interceptions.

A more complex algebraic computation is a baseball player's adjusted production statistic. It is computed from the formula APRO = OBP / LOBP + SA / LSA 1. That is, a player's On-Base Percentage is compared to the League's On-Base Percentage and his Slugging Average is compared to the League's Slugging Average. These two ratios are then added and 1 is subtracted.


The distance of a home run may be estimated using math. At Wrigley Field in Chicago, there is a person responsible for estimating the distance of each home run. When one occurs, about 10 seconds are needed to determine the distance and relay the information to the scoreboard operator for display. The dimensions of Wrigley are 355 feet down the left-field line, 353 feet down the right-field line, and 400 feet to dead center. The power alleys are both marked at 368 feet.

When a home run is hit and lands inside the park, these dimensions are used to estimate the distance. If the ball lands in the stands, 3 feet are added for each row of bleacher seats. For example, if a ball is hit into the left side of the field between the foul pole and left center, four rows into the bleachers, then the total distance is estimated as 362 feet (the midpoint of 355 and 368) plus 12 feet (4 rows multiplied by 3 feet per row), which equals 374 feet. Of course, it gets even harder to estimate the distance when a ball leaves the park!

see also Athletics, Technology in.

Alan Lipp and

Denise Prendergast


Gutman, Dan, and Tim McCarver. The Way Baseball Works. New York: Simon & Schuster, 1996.

1998 Sports Almanac. New York: Little, Brown and Company, 1998.

Internet Resources

About Snowboarding. Salt Lake 2002: Official Site of the 2002 Olympic Winter Games. <>.

"Statistical Formulas Page." Baseball Almanac. 2001 <>.