Data Visualization

views updated

Data Visualization

"A picture is worth 1,000 words." Data visualization is based on this old adage, pointing out the value of displaying quantitative information as images or graphs. The idea is to convey the meaning of the data in a simple and intuitive manner. This is achieved by mapping quantity into geometric attributes, length, area, color, symbols, positions, curves, or other visual cues. Common graphs are histograms or bar charts, displaying quantity as length; pie charts for presenting parts of a whole; time series, which show the development of a variable over time; and scatterplots showing the relationships between two variables by marking pairing quantities. These graphs are abstract, but use real-world properties to support intuitive understanding. A pie diagram makes associations to bakery goods, time series to mountains and valleys, and scatterplots may be discussed using terms such as position, closeness, distance, groups, and outliers or individuals.

The physical connection is more direct in maps, where the information space itself is a real-world representation. Maps are an ideal information space for displaying any type of geographically based data, using concrete or abstract symbols. Depths and heights may be converted into color, for example, on a scale from dark blue, to shades of greens, to dark brown to symbolize the highest mountains. Other geographical data may be presented in different layers on the same map, as when lines represent borders, arrows visualize winds and currents, and symbols show the presence of hospitals or churches. More abstract data can be displayed by positioning pie and bar charts on the map or by using color-coding or shading of areas.

Although the first geographical maps were drawn on clay tablets more than 5,000 years ago, data maps and other visualizations were first created in the seventeenth century. With a few exceptions, such as seismograms from earthquake detectors and electrocardiograms from heart monitors, the graphs had to be made by hand, and handled separately from text through the printing process. This changed with the advent of the computer, which made it easy to collect and organize large amounts of data, to create graphs automatically through spreadsheet programs, and to integrate text and graphics in modern word processing software. With laser printers, copying machines, and offset technology, graphics can be printed as easily as text. Data visualization has therefore become an important part of many documents, whether a student paper, scientific journal, or newspaper.

For online presentations, it is possible to create dynamic visualizations. Here the user may set up filter queries that select the data to be displayed, choose display methodology, select values to be presented on different axes, and so on. The most advanced systems may also present simulations, showing how data sets change over time. In online systems the visualization may act as an interface to an underlying database, for example by letting the user retrieve an object by clicking on its representation on the display.

Visualizing More than Two Dimensions

Most common visualization techniques are based on one or two variables, but different techniques can be used to visualize higher dimensions. For instance, shadowing techniques may give depth to the display, adding a third axis, or the user may be able to rotate a multidimensional display, looking at it from different viewpoints. VIBE (Visual Information Browsing Environment), an experimental system, uses a different approach based on relative position of data objects with regard to a user-defined visualization space. The metaphor behind this system is a bookshelf, for example where books on geography may be stacked to the left, books on history to the right. A book on historical geography may then naturally be inserted in the middle.

A VIBE display space is shown in the accompanying figure. It is defined by three POIs (Points Of Interest), shown by circles in the diagram. Data objects are represented as rectangles and will be positioned according to the score they have on each POI. For example, an object that gets a score on A only, will be positioned on top of A; an object that gets a score on A and B, but a zero score on C is positioned between A and B, closest to the POI with the highest score; an object that get a similar score on all three POIs will be positioned in the middle of the space. The coordinates for the documents are calculated as an average of the POI coordinates, weighted by the POI scores. See Figure 1 above.

VIBE works with any type of data. A manufacturer, for example, may use VIBE to display its products in a space defined by POIs such as a price, weight, or volume, while a student may wish to present readings in a space with POIs defined by terms such as "history" or "geography." In the latter case, a function will be needed to map the concept into a numerical score, which may be as simple as counting the frequency of the POI terms within each document. VIBE allows for any number of POIs in a multidimensional display. However, with more than three POIs, the display may become ambiguous, and the user may need to assign colors to POIs or move POIs around to see the correlation between the objects. An advantage with visualization over statistical methods is that all data objects will be shown. VIBE has been used with as many as several thousand objects. The display may then show clusters, positions where most of the objects fall, but still be able to present icons for individual data objects.

Pitfalls with Visual Displays

A picture does not always say more than 1,000 words! Graphs may use so many visual effects that they become confusing for the reader, hiding the meaning of the data. Ideally, the size of the effect shown in the graphic should be similar to the size of the effect in the data. Typical distortions are found in time series of monetary data, where values are not adjusted for inflation or where a special scale is not made apparent to the reader. In a three-dimensional pie chart, the sector in front may look bigger than the one in the back, even if the quantities are identical. It is interesting to see that publications that take accuracy very seriously often fail in their graphics. In addition to controlling the effects of the graphic, it is important to consider if a graphic is needed at all! A few numbers may often be presented better directly in the text or as a table than in a pie chart.

see also Database Management Software; Image Analysis: Medicine; Scientific Visualization.

Kai A. Olsen

Bibliography

Kraak, Menno-Jan, et al. Cartography: Visualization of Spatial Data. Boston: Addison-Wesley, 1996.

Korfhage, Robert R. Information Storage and Retrieval. New York: Wiley Computer Publishing, 1997.

Morse, E. L., M. Lewis, and K. A. Olsen. "Evaluating Visualizations: Using a Taxonomic Guide." International Journal of Human-Computer Studies 53 (2000): 637–662.

Tufte, Edward R. The Visual Display of Quantitative Information. Cheshire, CT: Graphics Press, 1992.

Computer Sciences