Standards and Information

views updated

STANDARDS AND INFORMATION

It is a tribute to the power and effectiveness of information processing standards that people benefit from so many of them without noticing. Standards allow computers and computer programs to share information, even when the hardware or software has been designed by different individuals or companies. When a new expansion card or peripheral works in a computer without any problems, it is because the device has been designed to conform to standards. When software is able to read a data file sent to a user by a friend, it is because the data is written and read according to a standard format. Information processing is only one of many areas of daily life in which standards are important. For example, automobile parts and the voltage of household electrical current are standardized, and money is a standard medium of exchange.

When a person refers to an information or data processing "standard," he or she may mean any of the following:

A method, protocol, or system for accomplishing a particular task, such as encoding information in a file or sharing it over a network.
Hardware or software that employs or executes that method or protocol (e.g., a word-processing program).
A document that specifies the method or protocol in very detailed, precise technical language.
An agreement that such a document represents among organizations or individuals.

For the purposes of this entry, an adequate definition of "information processing standards" is that they are precisely documented agreements about methods or protocols for information processing that are realized in the operation of computer hardware and software.

Standards as Solutions

The impetus for the creation of any standard is to address a particular problem. As indicated by Martin Libicki (1995), the goal of standardization is almost always to make a process more efficient or reliable, or to define a single consistent interface that allows unlike systems or applications to interoperate.

Consider the problem of representing the content of documents in computer files. The most basic problem is how to represent individual text characters as sequences of binary digits (ones and zeros). Solutions to this problem in the 1960s eventually culminated in one of the most successful and widely used data interchange standards: ASCII (the American National Standard Code for Information Interchange). For many computer users, the term "ASCII" is synonymous with "plain text," but all computer files consist of binary data, and an ASCII file is simply a binary file with contents that are meaningful when interpreted according to the ASCII standard. An application that represents text using the ASCII code can exchange information with other programs that read and write ASCII files.

ASCII is very limited as a solution to the problem of representing document content. The code consists of only 128 characters, each represented by a sequence of seven bits. That is not even enough for every European alphabet, let alone the many alphabets and other forms of writing in the world. A number of extended, ASCII-based character sets have been proposed for the representation of non-Roman alphabets such as the Arabic and Hebrew alphabets. Some have become standards, like the eight-bit ISO 8859 family of character sets (approved by the International Organization for Standardization). The Unicode standard (created by the Unicode Consortium) uses a sixteen-bit encoding system and aims to include every major script in the world and every technical symbol in common use.

Another limitation of ASCII and other plaintext formats is that documents contain forms of information other than alphanumeric characters, punctuation, and blank space. The software industry has produced many technologies for the representation of images, multimedia data, specifications for presentation and formatting, and hypertext links. Some of these technologies have also become standards.

Standards as Documents

Standards documents are notoriously difficult to read. This reputation is deserved, but largely due to factors that are unavoidable. Almost all standards emerge from the work and consensus of many people, and they therefore represent solutions that are general enough to address a variety of problems. For example, the Standard Generalized Markup Language, or SGML (ISO 8879), is a successful and influential standard that enables the structure of a document to be encoded separately from its presentation. Invented by Charles Gold-farb in the 1970s, SGML would be much easier for novices to understand if it defined a single character set (such as ASCII or Unicode) for all conforming documents. But SGML is flexible enough to accept any number of different character sets, and for that reason, SGML syntax must be described at more than one level of abstraction. This is only one example of how the generality and flexibility of standards makes them difficult documents to read. On the other hand, adopting a general solution to a problem often forces one to think more broadly and deeply about an application. That can help avoid additional work and expense in the long run.

There are other reasons why standards documents can be problematic for newcomers. As the adoption of a standard becomes more widespread and formalized, the same solution (or nearly the same) may be published by different organizations under different names. For example, the original ASCII standard was published in 1968 by the American National Standards Institute as ANSI X3.4. When ASCII was adopted as an international standard, the same encoding was published by the International Organization for Standardization as ISO 646. Each of the eight-bit character sets in the ISO 8859 family subsumes ASCII and is in turn subsumed within Unicode. Finally, Unicode is nearly identical with the ISO 10646 standard.

Standards as Agreements

There are various criteria for what counts as a standard, and the same person may use different criteria depending on the context in which they are writing or speaking. When a person refers to a particular technology as "standard," the term is usually used in one of three senses:

A de facto standard is a solution that has become widely adopted and is considered standard by virtue of its popularity.
A de jure standard has been reviewed and formally approved by a standards developing organization such as the International Organization for Standardization or one of its member organizations (e.g., the American National Standards Institute in the United States).
There are public specifications that are similar to de jure standards but are authorized by industry consortia. These consortia operate according to somewhat different rules than standards developing organizations.

Each of the three kinds of standards can be understood in how they vary along four key dimensions: acceptance, openness, stability, and consensus.

Acceptance is the key to the success of any standard, and some technologies are deemed standards solely by virtue of their wide acceptance and popularity. Native file formats for popular word-processing software are examples of these de facto standards. If a person receives such a file on diskette or as an attachment to an e-mail message, the ability to read the file requires that the recipient have access to the word processor that reads and writes that format. Problems of compatibility and interoperability are avoided if most people adopt exactly the same solution (i.e., use the same software).

A de facto information processing standard need not be a commercial product, nor must its popularity extend to the general community of computer users. For example, the file format for the (nonproprietary) TeX typesetting system, designed by Donald Knuth in the late 1970s, has been widely adopted for mathematics and engineering documents, but it is not popular for office documents.

The hallmark of any de jure standard or public specification is its openness. Formal information processing standards are designed and documented with the aim of making every detail public. They are written with the expectation that engineers will develop hardware and software to implement the solutions that the standard represents. For that reason they are documented in exhaustive detail.

Specifications for de facto information standards vary in their degrees of openness. An organization controlling a de facto standard may publish a reference manual in paper and/or electronic form. For example, Adobe Systems Incorporated publishes both digital and paper references for their PostScript and Portable Document Format technologies. Knuth's published documentation for the TeX typesetting system includes both a reference and the complete annotated source code for TeX itself. In contrast, the publishers of popular office and productivity software (such as word processors) keep their source code secret in the interests of protecting their copyright. The native file formats and communications protocols used by such programs may be proprietary, and their full specifications unpublished.

Potential adopters of a standard may perceive a tug-of-war between the stability of strict adherence and the freedom to innovate. If a developer limits his or her application to those functions covered by a standard, then users of the technology enjoy stability in its interoperability with other systems. However, standards are slow to change and may not accommodate new functions and capabilities that could improve a system. For that reason, developers may introduce nonstandard extensions to a method, protocol, or format. Such extensions come at the cost of stability, since deploying them involves making changes to the format or protocol.

Finally, standards differ in the type of consensus they represent. De facto information standards represent a consensus among users that an existing application or protocol is worth adopting because it meets particular needs, is judged better than competing technologies, or simply because so many other people have already adopted it. However, de jure standards are designed from the beginning to address as wide a range of needs as possible. Standards developing organizations such as the International Organization for Standardization and its member organizations represent many different stakeholders and interested parties in coordinating the development of a standard. Industry consortia need not be as inclusive, but they often are inclusive in practice. For example, the Unicode Consortium includes churches, libraries, and individual specialists among its members, in addition to corporations. Closer cooperation between consortia and standards developing organizations (as exists between the International Organization for Standardization and the Unicode Consortium) bodes well for the future role of standards in improving information management and information processing.

Bibliography

Adobe Systems Incorporated. (1993). Portable Document Format Reference Manual. Reading, MA: Addison-Wesley.

Adobe Systems Incorporated. (1999). PostScript Language Reference. Reading, MA: Addison-Wesley.

American National Standards Institute. (1986). Information Systems—Coded Character Sets—7-Bit American National Standard Code for Information Interchange (ANSI X3.4-1986). New York: American National Standards Institute.

Cargill, Carl F. (1997). Open Systems Standardization: A Business Approach. Upper Saddle River, NJ: Prentice-Hall.

Goldfarb, Charles F. (1990). The SGML Handbook. Oxford, Eng.: Oxford University Press.

International Organization for Standardization. (1993). Information Technology—Universal Multiple-Octet Coded Character Set (UCS)—Part 1: Architecture and Basic Multilingual Plane (ISO/IEC 10646-1:1993). Geneva, Switzerland: International Organization for Standardization.

International Organization for Standardization. (1998). Information Technology—8-Bit Single-Byte Coded Graphic Character Sets—Part 1: Latin Alphabet No. 1 (ISO/IEC 8859-1:1998). Geneva, Switzerland: International Organization for Standardization.

Knuth, Donald E. (1986). TeX: The Program. Reading, MA: Addison-Wesley.

Knuth, Donald E. (1990). The TeXbook. Reading, MA: Addison-Wesley.

Libicki, Martin. (1995). Standards: The Rough Road to the Common Byte. Washington, DC: Center for Advanced Concepts and Technology, Institute for National Strategic Studies, National Defense University.

Unicode Consortium. (2000). The Unicode Standard, Version 3.0. Reading, MA: Addison-Wesley.

David S. Dubin

Encyclopedia of Communication and Information