When molecular biologists began analyzing the complete sequence of the human genome in mid-2001, one surprising observation was that humans have relatively few genes. We may have as few as 30,000 genes, only about two times as many as the much simpler fruit fly, Drosophila melanogaster. How can the much greater size and complexity of humans be encoded in only twice the number of genes required by a fly? The answer to this paradox is not fully understood, but it appears that humans and other mammals may be more adept than other organisms at encoding many different proteins from each gene. One way they do this is through alternative splicing, the processing of a single RNA transcript to generate more than one type of protein.
In most eukaryotic genes, the protein-coding sequences, termed exons, are interrupted by stretches of sequence, termed introns, that have no protein-coding information. After the gene is copied, or transcribed, to RNA, the introns are removed from this "pre-mRNA," and the exons are spliced together to form a mature mRNA , consisting of one contiguous protein-coding sequence. In addition, the complete mRNA contains upstream and downstream sequences flanking the coding sequences. These sequences do not encode protein, but help to regulate translation of the mRNA into protein. Variations in the splice pattern lead to alternative transcripts and alternative proteins.
Splicing is accomplished in the cell's nucleus by spliceosomes , which are molecular machines composed of proteins and small RNA molecules. The boundaries between exons and introns in a pre-mRNA are marked very subtly. Certain segments of the pre-mRNA, termed splice sites, direct the spliceosomes to the precise positions in the transcript where they can excise introns and splice together exons. Splice sites are short sequences, typically less than ten bases long. 5′ splice sites mark the 5′ end of introns; 3′ splice sites define the 3′ end of introns. ("Five prime" and "three prime" refer to the upstream and downstream ends of the RNA.)
Although splice sites often can be recognized as such by common patterns in their base sequence, there are many variations on the basic splice site consensus sequence. These differences affect how readily a particular splice site is recognized and processed by the splicing machinery. Many other molecules within the cell, called splicing factors, also participate in the splicing reaction. The combination of all of these determines the pattern of splicing for a particular pre-mRNA molecule.
For many genes the pattern of splicing is always the same. These genes encode many copies of their corresponding pre-mRNA molecules. The introns are removed in a consistent pattern, producing mature mRNA molecules of identical sequence, all of which encode identical proteins.
For other genes the splice pattern varies depending on the tissue in which the gene is expressed, or the stage of development the organism is in. Because the choice of splice sites depends on so many different factors, the same pre-mRNAs from these genes may become spliced into several, or even many, different mature mRNA variants. 5′ splice sites may be ignored, converting intron sequences into exons; 3′ splice sequences can be ignored, converting exon sequences into introns; or different sequences, ordinarily not recognized as splice sites, can function as new splice sites. (To understand why ignoring a 5′ splice site would convert an intron to an exon, recall that transcription of RNA proceeds from 5′ to 3′.) The production of such mRNA variations through the use of different sets of splice sites is known as alternative splicing. It has been estimated that at least one-third of all human genes are alternatively spliced.
Alternative splicing can have profound effects on the structure and function of the protein encoded by a gene. Many proteins are comprised of several domains, or modules, that serve a particular function. For example, one domain may help the protein bind to another protein, while another domain gives the protein enzymatic activity. By alternative splicing, exons, and, therefore, protein domains, can be mixed and matched, altering the nature of the protein. By regulating which splice patterns occur in which tissue types, an organism can fine-tune the action of a single gene so it can perform many different roles.
The various forms of a protein are known as isoforms. Isoforms are often tissue-specific. The dystrophin gene, for example, has one form in muscle and another in brain tissue. Defects in alternative splicing are associated with several important human diseases, including amyotrophic lateral sclerosis, dementia, and certain cancers.
Alternative splicing can also act to turn genes off or on. In mRNA, codons, consisting of three adjacent nucleotides , either encode an amino acid or signal the ribosome to stop synthesizing a polypeptide . Normally, exon sequences must not encode stop codons (AUG, UAG, or UAA) until after the final amino-acid-coding codon. Alternative splicing can introduce a stop codon in the beginning or middle of a protein-coding sequence, resulting in an mRNA that encodes a prematurely truncated polypeptide.
Human hearing offers a dramatic illustration of how important alternative splicing is in everyday life. Microscopic hair cells lining the inner ear vibrate when stimulated by sound. One of the proteins in the hair cells that plays a role in the hearing sensation is a calcium-activated potassium channel. The gene for this protein can generate more than five hundred different mRNA variants through alternative splicing. The resulting potassiumchannel proteins have slightly differing physiological properties. This is in part what tunes hair cells to different frequencies.
see also Gene; Proteins; RNA Processing; Transcription.
Paul J. Muhlrad
Alberts, Bruce, et al. Molecular Biology of the Cell, 4th ed. New York: Garland Science,2002.
Griffiths, Anthony J. F., et al. An Introduction to Genetic Analysis, 7th ed. New York: W. H. Freeman, 2000.
Lodish, Harvey, et al. Molecular Cell Biology, 4th ed. New York: W. H. Freeman, 2000.
The sensitivity of the human ear to a wide range of sound frequencies is due to alternative splicing of a potassium channel gene, giving rise to a set of related proteins whose exact form varies with the position in the cochlea.
The protozoan Trypanosome brucei, which causes African sleeping sickness, edits some of its messenger RNA molecules after they are transcribed. Uracil nucleotides are added in some locations in the mature RNA and deleted from others. Similar cases of RNA editing occur in other organisms, and even in humans. The human apolipoprotein B gene is edited in the intestine but not in the liver, leading to two distinct forms of the protein, serving different functions in the two organs.