Genetic distance refers to the mathematical reduction of multidimensional genetic differences to one-dimensional lengths, which can then be easily compared. While mathematical precedents existed, the use of genetic distance flowered in the 1960s with the conjunction of two biological programs: (1) racial serology, which had been amassing genetic data on differences across populations but was quantitatively unsophisticated; and (2) numerical taxonomy, which was developing a radical post-Linnaean approach to biological systematics and was mathematically sophisticated but philosophically unpersuasive.
Racial serology, the study of human diversity using immunological reactions of the blood, began during World War I. As the collection and analysis expanded, it became clear that different blood markers showed different patterns of diversity across the human gene pool. For example, diverse populations, such as Navajos and Estonians, might have very different allele frequencies for the ABO blood group, but very similar allele frequencies for the MN blood group. With many populations and many blood group markers, these data quickly become unwieldy.
Numerical taxonomy sought to replace the verbal impressionistic taxonomy of earlier generations with a rigorous, mathematical approach to scientific classification. Its focus was on establishing patterns of relationships, based on quantifiable similarity, among groups of objects, which came to be called operational taxonomic units, or OTUs. The goal of numerical taxonomy was to create a tree-like structure, or dendrogram, a statistical digestion summarizing the similarities of OTUs.
The techniques of numerical taxonomy lent themselves well to the analysis of data from many genetic loci across many human populations. Unfortunately, their formalism tended to obscure layers of subjectivity. At the most fundamental level, different statistical algorithms can produce different trees from the same data. Moreover, the trees are generated regardless of whether or not the OTUs are comparable. Thus, if the African gene pool subsumes the European gene pool, then they cannot be intelligibly contrasted against one another (although the computer programs will mindlessly do so). Likewise, the computer will produce relationships among groups defined geographically, linguistically, politically, ethnically, and racially in the same study, in spite of the fact that such comparisons may be largely meaningless.
Further, the meaning of a relatively small genetic difference may be problematic. Above the species level, it likely indicates a close phylogenetic relationship (a recent divergence time between the species being compared). Below the species level, however, it may indicate both phylogenetic proximity and complex patterns of genetic contact (gene flow).
Consequently, the greatest success of genetic distance studies has come above the species level. In 1967, Vincent Sarich and Allan Wilson were able to show that (1) measurable rates of genetic change appear to be roughly constant; (2) the genetic distance between human and chimpanzee seem to correspond to a divergence time of 3 to 5 million years; and therefore (3) the fossils called Ramapithecus, dated to 14 million years ago, could not be on the uniquely human evolutionary line, because that line was not established for nearly another 10 million years.
Meanwhile, direct DNA sequence comparisons were facilitated technologically in the 1980s and 1990s. The most fundamental problem faced by these comparisons is the relationship between the amount of difference observed and the amount of evolution inferred. Where DNA sequence changes are rare, the number of differences observed between two species will approximate the number of evolutionary changes that actually occurred to the DNA. The sample size of those changes is small, however. In contrast, where DNA sequence differences between two species are copious, the sample of evolutionary changes is high. Regardless, the number of observed differences will underestimate the actual number of mutations that have occurred, because a single observed difference may represent multiple changes (“hits”) at the same nucleotide site.
Thus, rapidly evolving mitochondrial DNA (mtDNA) may be valuable for estimating reliable and precise genetic distances among human populations over a span of thousands of years. It is less valuable for the distances among ape species over millions of years, where unacceptably high levels of homoplasy (parallel mutations in different lineages) may create a disjuncture between the genetic distances measured and the evolutionary patterns inferred from them.
Mitochondrial DNA comparisons do suggest that human beings are about forty to fifty times more similar to each another than any human is to a chimpanzee. The detectable mtDNA distance between human and Neanderthal appears to be comparable to that between chimpanzee subspecies.
Hull, David L. 1988. Science as a Process: An Evolutionary Account of the Social and Conceptual Development of Science. Chicago, University of Chicago Press.
Marks, Jonathan. 1995. Human Biodiversity: Genes, Race, and History. New York, Aldine de Gruyter.
Sarich, Vincent M., and Allan C. Wilson. 1967. “Immunological Time Scale for Hominid Evolution.” Science 158: 1200–1203.