Repetitive DNA Elements
Repetitive DNA Elements
The human genome contains approximately three billion base pairs of DNA. Within this there are between 30,000 and 70,000 genes, which together add up to less than 5 percent of the entire genome . Most of the rest is made up of several types of noncoding repeated elements.
Most gene sequences are unique, found only once in the genome. In contrast, repetitive DNA elements are found in multiple copies, in some cases thousands of copies, as shown in Table 1. Unlike genes, most repetitive elements do not code for protein or RNA. Repetitive elements have been found in most other eukaryotic genomes that have been analyzed. What functions they serve, if any, are mainly unknown. Their presence and spread causes several inherited diseases, and they have been linked to major events in evolution.
Types of Repetitive Elements
Repetitive elements differ in their position in the genome, sequence, size, number of copies, and presence or absence of coding regions within them. The two major classes of repetitive elements are interspersed elements and tandem arrays.
Interspersed repeated elements are usually present as single copies and distributed widely throughout the genome. The interspersed repeats alone constitute about 45 percent of the genome. The best-characterized interspersed repeats are the transposable genetic elements, also called mobile elements or "jumping genes" (Figure 1).
Sequences that are "tandemly arrayed" are present as duplicates, either head to tail or head to head. So-called satellites, minisatellites, and microsatellites largely exist in the form of tandem arrays (these elements originally got their name as "satellites" because they separated from the bulk of nuclear DNA during centrifugation). Sequences repeated in tandem are common at the centromere (where the two halves of a replicated chromosome are held together), and at or near the telomeres (the chromosome tips). Because they are difficult to sequence, sequences repeated in tandem at centromeres and telomeres are underrepresented in the draft sequence of the human genome. This makes it difficult to estimate the copy number, but they certainly represent at least 10 percent of the genome.
Satellites (also called classical satellites), which occur in four classes (I-IV), form arrays of 1,000 to 10 million repeated units, particularly in the heterochromatin of chromosomes. They are concentrated in centromeres and account for much of the DNA there. Satellites of one type, called alpha-satellites, occur as repeated units of approximately 171 base pairs (bp) in length, with high levels of sequence variation between the repeated units, as shown in Figure 2.
|NUMBER OF COPIES OF INTERSPERSED REPEATS OBSERVED IN THE DRAFT OF THE HUMAN GENOME|
|Repeat type||Copy number||Fraction of the genome (%)|
Minisatellites form arrays of several hundred units of 7 to 100 bp in length. They are present everywhere with an increasing concentration toward the telomeres. They differ from satellites in that they are found only in moderate numbers of tandem repeats and because of their high degree of dispersion throughout chromosomes.
Microsatellites, or simple sequence repeats (SSRs), are composed of units of one to six nucleotides , repeated up to a length of 100 bp or more. One-third are simple "polyadenylated" repeats, composed of nothing but adenine nucleotides. Other examples of abundant microsatellites are (AC)n, (AAAN)n, (AAAAN)n, and (AAN)n, where N represents any nucleotide and n is the number of repeats. Less abundant, but important because of their direct involvement in the generation of disease, are the (CAG/CTG)n and (CGG/CCG)n trinucleotide (or triplet) repeats.
Telomeric and subtelomeric repeats are present at the end of the telomeres and are composed of short tandem repeats (STRs) of (TTAGGG)n, up to 30,000 bp long. This sequence is "highly conserved," meaning it has changed very little over evolutionary time, indicating it likely plays a very important role. These STRs function as caps or ends of the long linear chromosomal DNA molecule and are crucial to the maintenance of intact eukaryotic chromosomes. Subtelomeric repeats act as transitions between the boundary of the telomere and the rest of the chromosome. They contain units similar to the TTAGGG, but they are not conserved.
Transposable elements are classified as either transposons or retrotransposons, depending on their mechanism of amplification. Transposons directly synthesize a DNA copy of themselves, whereas retrotransposons generate an RNA intermediate that is then reverse-transcribed (by the enzyme reverse transcriptase) back into DNA. Transposable elements fall into three major groups: DNA transposons, long terminal repeat (LTR) retrotransposons, and non-LTR retrotransposons. They also are subdivided into autonomous and nonautonomous elements, based on whether they can move independently within the genome or require other elements to perform this process, as shown in Figure 3.
DNA transposons are flanked by inverted repeats and contain two or more open reading frames (ORFs). An ORF is a DNA sequence that can be transcribed to make protein. The ORFs in DNA transposons code for the proteins required for making transposon copies and spreading them through the genome. The nonautonomous elements miniature inverted-repeat transposable elements (MITEs) are derived from a parent DNA transposon that lost ORF sequences, making them unable to amplify on their own. Instead, they must borrow the factors for amplification from external sources.
LTR retrotransposons are very similar to the genomes of retroviruses. They are flanked by 250 to 600 bp direct repeats called long terminal repeats. In general, not only are these elements defective, but they also appear to have deletions typical of nonautonomous families.
Several different groups of non-LTR retrotransposons can be found throughout most, if not all, eukaryotic genomes. One of these groups, the long interspersed repeated elements (LINEs), constitute about 21 percent of the human genome, with L1 and L2 being the dominant elements. Most of the element copies are incomplete and inactive. Two types of non-autonomous elements are thought to use factors made by LINEs: short interspersed repeated elements (SINEs) and retropseudogenes.
SINEs are derived from two types of genes coding for RNA: 7SL (which aids the movement of new proteins into the endoplasmic reticulum ) and transfer RNAs. The most abundant human SINE is Alu, constituting about 13 percent of the human genome.
Retropseudogenes are derived from retrotransposition of mRNA derived from different genes. They can be distinguished from the parental gene by their lack of a functional promoter and by their lack of introns . The human genome is estimated to contain 35,000 copies of different retropseudogenes.
Role of Repetitive DNA in Evolution and Impact on the Human Genome
Most eukaryotic genomes contain repetitive DNA. Although most repeated sequences have no known function, their impact and importance on genomes is evident. Mobile repeated elements have been a critical factor in gene evolution. It has been suggested that some types of repeats may be linked to speciation, since during the evolutionary period when there was a high activity of mobile elements, radiation of different species occurred.
There are several diseases linked to—or caused by—repetitive elements. Expansion of triplet repeats has been tied to fragile X syndrome (a common cause of mental retardation), Huntington's disease, myotonic muscular dystrophy, and several other diseases. In addition, the discovery of STR instability in certain cancers suggests that sequence instability may play a role in cancer progression.
Mobile elements have caused diseases when a new mobile element disrupts an important gene. Neurofibromatosis type 1, for example, is caused by the insertion of an Alu element in the gene NF1. Alternatively, recombination between two repeated elements within a gene will alter its function, also causing disease. Many examples of cancers (e.g., acute myelogenous leukemia) and inherited diseases (e.g., alpha thalassemia) are caused by mobile-element-based recombinations.
Application of Repeats to Human Genomic Studies
Repeated sequences can be useful genetic tools. Because many of the repeated sequences are stably inherited, highly conserved, and found throughout the genome, they are ideal for genetic studies: They can act as "signposts" for finding and mapping functional genes. In addition, a repeat at a particular locus may be absent in one individual, or it may differ between two individuals (polymorphism ). This makes repeats useful for identifying specific individuals (called DNA profiling) and their ancestors (molecular anthropology).
Microsatellites, in particular, have been used to identify individuals, study populations, and construct evolutionary trees. They have also been used as markers for disease-gene mapping and to evaluate specific genes in tumors. LINEs, and particularly the human SINE Alu, have been used for studies of human population genetics, primate comparative genomics, and DNA profiling.
see also Centromere; Chromosome, Eukaryotic; In Situ Hybridization; Polymorphisms; Pseudogenes; Retrovirus; Telomere; Transposable Genetic Elements; Triplet Repeat Disease.
Astrid M. Roy-Engel
and Mark A. Batzer
Deininger, Prescott L., and Mark A. Batzer. "Alu Repeats and Human Disease." Molecular Genetics and Metabolism 67, no. 3 (1999): 183-193.
Deininger, Prescott L., and Astrid M. Roy-Engel. "Mobile Elements in Animal and Plant Genomes." In Mobile DNA II, Nancy L. Craig, et al., eds. Washington, DC: ASM Press, 2001.
Jurka, Jerzy, and Mark A. Batzer. "Human Repetitive Elements." In Encyclopedia of Molecular Biology and Molecular Medicine, vol. 3, Robert A. Meyers, ed. Weinheim, Germany: VCH Publishers, 1996.
Lander, Eric S., et al. "Initial Sequencing and Analysis of the Human Genome." Nature 409 (2001): 860-921.
Lodish, Harvey, et al. "Molecular Anatomy of Genes and Chromosomes." In Molecular Cell Biology, 3rd ed. New York: W. H. Freeman, 1995.
Prak, Elaine T., and Haig H. Kazazian. "Mobile Elements and the Human Genome." Nature Reviews: Genetics 1, no. 2 (2000): 134-144.
Wolfe, Stephen L. "Organization of the Genome and Genetic Rearrangements." In Molecular and Cellular Biology. Belmont, CA: Wadsworth Publishing, 1993.