Evolution of Genes

views updated

Evolution of Genes

Individual genes and whole genomes change over time. Indeed, evolution of genes ultimately accounts for the evolution of organisms that is seen in the fossil record: Humans evolved from earlier apes, and those creatures from their ancestors, by gene changes in the earlier creatures. Just as the fossil record can be examined to understand the patterns of organismic evolution, so too can genes be compared to understand genomic evolution.

Natural Selection

Most changes that occur in genes are subject to natural selection, the process first outlined by Charles Darwin in 1859. In natural selection, a heritable change arises by chance. If the organism with that change is better able to survive and reproduce, it will leave more descendants in future generations. These descendants will also carry the new genetic change, and as they reproduce, the change will become more widespread in the population. On the other hand, if the change decreases an organism's survival rate, it will be lost from the population. It is also possible to have a neutral change, with no immediate effect on survival. Such "hidden" genetic variation within a population provides grist for evolution when it offers a selective advantage under new environmental conditions.

It is important to remember that a genome (all the DNA of an organism) is more than just its genes. The genome includes vast amounts of DNA outside of genes, and this too is subject to change over time. In fact, non-gene portions usually change at a faster rate than genes, because many of these changes have little or no effect on the organism's survival.

Evolution of genes and genomes includes sequence changes to existing genes, gene duplication, recombination of gene segments, and the varied actions of transposable elements as they move through the genome.

Point Mutations in Existing Genes

Genes are long strings of four nucleotides (abbreviated A, T, C, and G) whose order dictates the order of amino acids in proteins (or nucleotides in RNA). A point mutation is a change in a single nucleotide position at a particular point in a gene. Point mutations can be changes that convert one type of nucleotide to another (a C to a T, for instance), or cause the deletion or addition of a single nucleotide. While the rate of mutation is slow, over long periods (millions of years) most possible sequence changes will have occurred several times in a population, and so natural selection is likely to have acted on most genes in almost every modern population.

Some point mutations can change the amino acid sequence of the resulting protein, altering its properties. For instance, a digestive enzyme's attraction for its substrate (the food molecule it breaks down) may be altered to allow the organism to digest new foods, or prevent it from digesting old ones. The sickle form of hemoglobin arose because of a point mutation that changed a single amino acid in hemoglobin. The result was a molecule that bound oxygen less tightly under certain conditions, conferring resistance to malaria, but also causing sickle cell disease, a type of hemoglobinopathy.

Other point mutations may leave the protein unchanged, but alter the conditions under which it is expressed. Genes are expressed (that is, are "read" to cause protein production) when transcription factors bind to a region of the gene known as the promoter. Promoters interact with other DNA regions, called enhancers, which are often a long distance from the gene along the chromosome, but are nonetheless close to it because of folding of the DNA. Mutations in either the promoter or enhancer region can have profound effects on the sensitivity of expression to hormones, temperature changes, and other regulatory influences. For instance, a human gene coding for one form of an enzyme called pancreatic elastase appears to have been silenced by an evolutionarily recent enhancer mutation.

Not all DNA sequence changes will lead to changes in the encoded protein or in the way that it is expressed. For many amino acids, there are several DNA "synonyms" that all code for the same amino acid. GGG, GGC, GGA, and GGT all code for the amino acid proline, for example. This is known as the degeneracy of the genetic code. Also, mutations that code for chemically similar amino acids may have no significant effect on protein function. For instance, leucine (AAT) and isoleucine (TAT) are both small, nonpolar amino acids often found on the interior of proteins. Mutations that exchange one for the other may have little effect on protein structure or function.

Genes in eukaryotes also contain noncoding regions, called introns. Mutations in these regions often have no effect, since their sequences do not code for part of the finished protein. Changes to the ends of introns and certain internal sequences may have an effect, however, since these control the removal of the intron sequence from the RNA copy after transcription. A mutation here can prevent intron removal, altering the finished protein. One form of the hemoglobin disease beta-thalassemia is due to an intron mutation.


Humans and most other multicellular creatures are diploid, meaning they carry two sets of virtually identical chromosomes (one inherited from the mother, one from the father). The two members of a chromosome pair (called homologous chromosomes) carry identical sets of genes, so that each organism has two copies of each gene. A point mutation in one of these creates a new form of the gene. Different forms of the same gene are called alleles . Creation of alleles is one of the most common forms of gene evolution. The existence of diploidy allows a greater tolerance of new alleles, since a mutation to one allele still leaves the other one functioning, and in many cases this may be sufficient for survival. The existence of alleles allows greater genetic diversity in a population, which may increase that population's ability to adapt to changing environmental conditions.

Gene Duplication

Occasionally a gene on a single chromosome will be duplicated to create a pair of identical genes. Duplication may occur for any one of several reasons. One type of duplication occurs when the RNA transcript of a gene is "reverse transcribed" back into a DNA sequence and reinserted elsewhere in the genome (a process called retroposition), leading to a new, possibly functional copy of the gene in a new location and subject to different regulatory systems. This appears to have occurred with the human gene for phosphoglycerate kinase 2, which is involved in energy use in the cell.

More likely, a retroposed gene will be functionless, since it will not have the promoters and enhancers it needs for expression. Typically, one copy of a duplicated gene is either nonfunctional or accumulates mutations that render it so. After a long period of evolutionary time, the duplicate gene may acquire so many mutations that it may be difficult to see its relationship to its parent gene. Nonfunctional copies of previously functional genes are called pseudogenes.

Analysis of genomes shows that many gene copies are found lying next to each other, linked head to tail in an arrangement called a "tandem repeat." This may occur because of errors of the normal recombination machinery that is responsible for DNA repair and crossing over during meiosis. Tandem repeats are susceptible to amplification, which is the further increase in the number of copies. This can occur during crossing over. Normal crossing over pairs up identical segments on homologous chromosomes, and then exchanges them. If the chromosomes each have a tandem repeat, the crossover machinery may line up incorrectly, leaving one homologue with three gene copies and one with only one. Repeating this process over ensuing generations can lead to dozens of extra gene copies.

Duplication of much larger portions of a genome is also possible, including whole chromosomes (called chromosomal aberrations) and even the entire genome (called polyploidy). In each case, the number of copies of a gene increases. Such copies are usually removed by natural selection, but it is sometimes advantageous to have several gene copies, particularly for those genes that code for ribosomal RNAs. These are present in dozens or even hundreds of copies, allowing rapid production of new ribosomes during cell growth.

While gene duplication is a rare event in the short term, it is frequent enough in the long term to have been a central feature in the evolution of the genome of eukaryotic organisms. As with alleles, gene duplication frees up a gene copy to accumulate mutations with less selective penalty. Over time, such a gene may change its function slightly, or even acquire a new function, that increases the capabilities of the organism. The modification of duplicated genes therefore provides the diversity that is acted on by natural selection. An increase in the rate of gene duplication is thought to be one factor contributing to the diversification of multicellular organisms during the so-called Cambrian explosion 600 million years ago, which gave rise to most of the basic animal forms that exist today.

Gene Families

The set of genes that evolve from a single ancestral gene comprise a gene family. In humans, gene families are found in many important groups of proteins, and more are being discovered as the human genome is explored. These include the globins, which carry oxygen in the blood (hemoglobin) and store it in muscle (myoglobin); the immunoglobulins (specifically, the heavy chain of the immunoglobulin), which form antibodies of the immune system; the actins, which move the cell cytoskeleton and muscle; the collagens, which form cartilage and other structural materials; and the homeotic genes, master controllers of embryonic development. There are many other examples as well.

A new member of a particular gene family is discovered by comparing its sequence to known members. This is usually done by computerized database search, and is one of the challenges of bioinformatics , a new specialty devoted to collecting and analyzing large amounts of biological information.


Over time, some gene copies mutate to lose their function entirely. Such so-called pseudogenes may arise through accumulation of mutations that prevent translation of the gene, such as an insertion or deletion that stops translation at the beginning of the gene sequence. Pseudogenes also arise from mutation in a gene's promoter region. The promoter is the site at the beginning of the gene that attracts the enzyme called RNA polymerase. Without a functional promoter, the gene cannot be transcribed effectively, and so cannot lead to protein production.

Retroposition is a very common source of pseudogenes. Pseudogenes have been discovered because their sequences are similar to functional genes. In humans, pseudogenes are known to exist for topoisomerase (a gene that cuts DNA to prevent twisting), ferritin (an iron storage protein), two different forms of actin, and many other genes.

The Role of Transposable Genetic Elements

Transposable genetic elements are DNA segments that move around in the genome. Many biologists consider them to be a form of "selfish DNA," a kind of genetic parasite that serves no useful function for the host, but remains in the genome because it is efficient at getting itself copied. They can be present in large numbers of copies. In humans, more than a million copies of a single element, called Alu, account for about ten percent of the entire genome.

Insertion of a transposable element can disrupt a gene, creating a pseudogene. When a transposable element moves, it occasionally also takes a gene with it, placing it in a new position under the control of different regulatory elements. Alternatively, it may move an enhancer, thus affecting both the gene whose enhancer was removed and the gene (if any) it is now placed closer to. Some transposable elements themselves contain enhancers, further increasing the chances of altering gene expression when they are inserted in a new location.

Exon Shuffling

The coding portions of eukaryotic genes, termed "exons," are interrupted by noncoding regions, termed "introns." The evolutionary role of introns has been controversial since their discovery in 1977. Some scientists propose they are just another form of "junk DNA," and may be the relics of transposable elements or other forms of selfish DNA. Others suggest they may have played a central role in protein evolution.

The argument about the evolutionary importance of introns turns on exactly how they divide up the genes in which they are found. Proteins, which are encoded by genes, are not random strings of amino acids, but rather highly organized three-dimensional shapes, with different functions served by discrete parts, known as domains. A protein may contain half a dozen domains; one may bind a signaling molecule from outside the cell, another embeds the protein in a membrane, another binds an internal protein, and so on. It is often the case that each domain in a protein is folded up from a discrete segment of the amino acid chain.

Just as the domain's amino acids occur in sequence in the protein, the nucleotides that code for them occur in sequence in the gene. Those who propose that introns play a vital role in protein evolution suggest that exons correspond to the protein's domains, and that introns serve to divide the gene into these useful little bits of code. In this view, exons serve as "modules," or useful gene segments, that can be shuffled (via gene duplication and transposable elements, for instance) to create genes for new proteins with novel functions. For instance, a module for a membrane-embedding domain could be linked to a module for an oxygen-binding domain, allowing oxygen to be stored on a membrane, or a hormone-binding domain might be joined to a promoter-binding domain, allowing a hormone to control gene transcription.

The validity of this model of protein evolution depends on whether a gene's exons do indeed correspond to its protein's domains, and whether introns do actually separate domain-coding regions. So far the evidence is mixed, with some genes clearly divided this way, but many others showing complex or conflicting structures.

Because of this, scientists do not yet agree on the importance of exon shuffling in protein evolution. While it likely has occurred, it is unknown how widespread it may be. Also at issue is whether introns themselves arose early or late in life's evolution. If early, it may have been central to the development of all forms of life. The absence of introns in bacteria would then presumably be due to a streamlining of their genome by natural selection. If introns arose late, they were probably confined to eukaryotes and were therefore only important in their evolution.

While there is much that remains controversial, there is little disagreement about the importance of a related use of exons that occurs continually in many tissues. This is called alternative splicing. In this case, particular exons may be omitted, or they may be reassembled differently from tissue to tissue, creating tissue-specific variants, called isoforms, of the same protein.

see also Alternative Splicing; Bioinformatics; Chromosomal Aberrations; Development, Genetic Control of; Gene; Gene Families; Genetic Code; Hemoglobinopathies; Immune System Genetics; Mutation; Polyploidy; Pseudogenes; RNA Processing; Transposable Genetic Elements.

Richard Robinson


Alberts, Bruce, et al. Molecular Biology of the Cell, 4th ed. New York: Garland Science,2002.

Cooper, David N. Human Gene Evolution. Oxford: BIOS Scientific Publishers, 1999.

Eickbush, T. "Exon Shuffling in Retrospect." Science 283 (1999): 1465-1467.

One form of the disorder hemophilia is due to the insertion of a transposable element into a blood clotting gene.