The genetic code (which includes the codon) serves as a basis for establishing how genes encoded in DNA are decoded into proteins. A critical interaction in protein synthesis is the interaction between the codon in messenger RNA (mRNA) and the anticodon in an aminoacyl-transfer RNA (aminoacyl-tRNA).
A codon is a triplet of adjacent nucleotides in mRNA that specifies an amino acid to be incorporated in a protein. Because the codon can be made from three of the four possible ribonucleotides, there are 43 or 64 combinations, leading to 64 different codons. The first letter of the codon is at the 5′-end, while the last letter is at the 3′-end. For example, 5′-AUG-3′.
The amino acid sequence of a protein will be specified by the sequence of contiguous codons in the mRNA template. The initial codon in the mRNA establishes the reading frame and defines the protein's initial amino acid.
There are three types of codons. There is an initiation codon, AUG, which signifies the initial amino acid (and also codes for methionine residues in internal positions) in the protein. There are 61 codons, including AUG, that designate individual amino acids. The remaining three codons (UAA, UAG, and UGA) are termination codons (also called stop codons or nonsense codons), which do not code for amino acids, but signal the end of the mRNA message and provide the "stop" signal for protein synthesis.
Two amino acid residues, tryptophan and methionine, have unique codons—UGG and AUG, respectively. All other amino acids may be coded for by more than one codon, such that the code is said to be degenerate. This degeneracy is not uniform, but varies according to the particular amino acids. For example, three amino acids (arginine, leucine, and serine) have six codons, five amino acids have four, isoleucine has three, and nine amino acids have two. The first two letters of each codon provide the primary determinant in the specificity. For example, the codons for amino acid valine are GUU, GUC, GUA, and GUG. The open reading frame of the mRNA, which extends from the AUG codon to the termination codon, establishes the protein that is to be synthesized.
The correspondence between codons and the amino acids that they specify appears to be nearly, but not quite, universal among species. This genetic code is identical within nuclear genes in all species examined, including Escherichia coli, viruses, various plants, and humans, with the exceptions being those genes that are encoded in mitochondria and genes found in a small number of other organisms. This is cited as evidence that all life-forms have a common evolutionary ancestor, with the genetic code being preserved throughout evolution.
The genetic information within a gene in DNA is encoded by a sequence of four nucleotides (A, T, G, and C). This must ultimately be translated into the twenty-letter (corresponding to amino acids) language of proteins. It is now known that this information is translated first into an intermediate message form called mRNA, and then converted into a specific protein. This latter process of converting from the "nucleotide alphabet" to the "protein alphabet" requires that specific segments on mRNA correspond to specific amino acids in the protein being manufactured. This connection is provided by the genetic code.
The translation process that occurs at the site of the ribosomes in the cytoplasm requires that the mRNA designate the codons that then specify the amino acid sequence for the protein. The codons on the mRNA must interact with the anticodons on the charged tRNA molecules, which bring to the site the specific amino acid residues. Watson-Crick complementary base pairing provides the specificity for this interaction.
see also Protein Synthesis; Protein Translation; Ribonucleic Acid.
William M. Scovell
Nelson, David L., and Cox, Michael M. (2000). Lehninger Principles of Biochemistry, 3rd edition. New York: Worth Publishers.
The term codon refers to a sequence of three nucleotide bases (the building blocks of deoxyribonucleic acid, or DNA) that codes for a specific amino acid (the building blocks of proteins). A sequence of codons thus specifies the assembly of a sequence of amino acids; when assembly is complete, the result is the particular protein.
Information for the genetic code is stored in a sequence of three nucleotide bases of DNA called base triplets, which act as a template for which messenger RNA (mRNA) is transcribed. A sequence of three successive nucleotide bases in the transcript mRNA represents a codon.
Codons are complementary to base triplets in the DNA. For example, if the base triplet in the DNA sequence is GCT, the corresponding codon on the mRNA strand will be CGA.
When interpreted during protein syntheis, codons direct the insertion of a specific amino acid into the protein chain. Codons may also direct the termination of protein synthesis.
During the process of translation, the codon is able to code for an amino acid that is incorporated into a polypeptide chain. For example, the codon GCA designates the amino acid arginine. Each codon is nonoverlapping so that each nucleotide base specifies only one amino acid or termination sequence. A codon codes for an amino acid by binding to a complementary sequence of RNA nucleotides called an anticodon located on a molecule of tRNA. The tRNA binds to and transports the amino acid that is specific to the complementary mRNA codon. For example, the codon GCA on the mRNA strand will bind to CGU on a tRNA molecule that carries the amino acid arginine.
Because there are four possible nucleotide bases to be incorporated into a three base sequence codon, there are 64 possible codons (43 = 64). Sixty-one of the 64 codons signify the 20 known amino acids in proteins. These codons are ambiguous codons, meaning that more than one codon can specify the same amino acid. For example, in addition to GCA, five additional codons specify the amino acid arginine. Because the RNA/DNA sequence cannot be predicted from the protein, and more than one possible sequence may be derived from the same sequence of amino acids in a protein, the genetic code is said to be degenerate.
The remaining three codons are known as stop codons and signal one of three termination sequences that do not specify an amino acid, but rather stop the synthesis of the polypeptide chain.
Research began on deciphering the genetic code in several laboratories during the 1950s. By the early 1960s, an in vitro system was able to produce proteins through the use of synthetic mRNAs in order to determine the base composition of codons. The enzyme, polynucleotide phosphorylase, was used to catalyze the formation of synthetic mRNA without using a template to establish the nucleotide sequence. In 1961, English molecular biologist Francis Crick’s research on the molecular structure of DNA provided evidence that three nucleotide bases on an mRNA molecule (a codon) designate a particular amino acid in a poly-peptide chain. Crick’s work then helped to establish which codons specify each of the 20 amino acids found in a protein. During that same year, Marshall W. Nirenberg and Heinrich Matthei, using synthetic mRNA, were the first to identify the codon for phenylalanine. Despite the presence of all 20 amino acids in the reaction mixture, synthetic RNA polyuridylic acid (poly U) only promoted the synthesis of polyphenylalanine. Soon after Nirenberg and Matthei correctly determined that UUU codes for phenylalanine, they discovered that AAA codes for lysine and CCC codes for proline.
Eventually, synthetic mRNAs consisting of different nucleotide bases were developed and used to determine the codons for specific amino acids.
In 1968, American biochemists Marshall W. Nirenberg, Robert W. Holley, and Har G. Khorana won the Nobel Prize in Physiology or Medicine for discovering that a three-nucleotide base sequence of mRNA defines a codon able to direct the insertion of amino acids during protein synthesis (translation).