The molecules that give cells and entire organisms their shape as well as their ability to move, grow, and reproduce are the proteins. Although they come in an almost infinite variety of shapes and sizes, they have all been designed by the process of evolution to serve a defined and useful function in the processes of life. Some proteins, like actin and collagen, help to give a cell its physical shape. Other proteins, like lactase and pepsin, help in the digestion of food. Others transport signals between cells, help us fight off disease, or repair damaged DNA. For almost every job in a cell, there is a protein designed to do it.
The Building Blocks of Proteins
The building blocks of proteins are amino acids. There are twenty different amino acids used by living cells to build proteins. They are linked together in a long, linear chain during the process of translation, which is carried out by the ribosomes inside cells. Proteins begin to take on their characteristic three-dimensional shape even while they are being made, folding and twisting as each new amino acid added to the chain tugs or pushes at the others added before it. Each amino acid has an amino group (-NH3+) and a carboxyl group (-COOH). Peptide bonds link the carboxyl group of one amino acid to the amino group of the next amino acid. On one end of a protein, therefore, there is a free amino group called the N-terminus, and on the other end is a free carboxyl group, called the C-terminus.
The process of determining a protein's order of amino acids is called protein sequencing. A protein's sequence can easily be deduced from its gene sequence, since the order of bases on a DNA strand specifies the order in which the amino acids are linked together during translation. The chemistry involved in DNA sequencing is less complex than that which is involved in determining the order of each amino acid in an amino acid chain. There are two primary reasons why effort would be put into sequencing a protein. The first is to provide the information needed to design a synthetic DNA probe that can be used to locate the gene that codes for the protein. The second is to prove that a protein that has been isolated or manufactured in the laboratory is what it is believed to be.
The most widely used technique for sequencing proteins is the Edman degradation, a procedure developed by Pehr Edman in the 1950s. The reaction steps used for this method have since been completely automated by machine. The procedure uses special reagents under alternating basic and acidic conditions to remove one amino acid at a time from the protein's N-terminus. As each amino acid is released during each cycle of degradation, it is identified by chromatography, a separation technique that relies on an amino acid's unique size and electrical charge to distinguish it from the other nineteen amino acids.
In many automated approaches, high-performance liquid chromatography (HPLC) is used to tell which amino acid has been released; the amount of time it takes to travel through an HPLC column is unique to each amino acid. Up to fifty amino acids from the N-terminus can be identified using Edman degradation. If a scientist is trying to identify a previously sequenced protein, usually only the first fifteen to twenty amino acids of the purified protein need to be sequenced. That information can then be entered into a database and matched with known proteins having identical or related sequences.
Sequencing a protein from its C-terminus is particularly challenging, and there are no techniques that are as robust as Edman degradation. However, some limited amino acid sequence information can be obtained using enzymes called carboxypeptidases, which remove individual C-terminal amino acids. These enzymes, however, tend to cleave only specific amino acids from the C-terminus.
Carboxypeptidase B, isolated from cow pancreas, for example, can release the amino acids arginine and lysine from the C-terminus of a protein. Carboxypeptidase A, also isolated from cow pancreas, fails to release arginine, lysine, or proline, but can cleave off the other seventeen amino acids. Carboxypeptidases isolated from citrus leaves and yeast can cleave off any amino acid from the C-terminus of a protein, although the rate at which they do this depends on the particular amino acid. If one amino acid is released slowly and the next within the chain is released very quickly, they might appear to be cleaved at the same time, making it difficult to establish their order. C-terminal amino acid identification using enzymes, therefore, is not practical beyond the first several positions.
Another method of protein sequencing, called mass spectrometry, uses electric current to break individual amino acids from a protein. In a mass spectrometer, the released amino acids are collected in a detector and are each identified by their unique mass.
Sequencing of the human genome has allowed a giant leap in the understanding of how the human species evolved and how genetic diseases arise. Advances made in DNA sequencing technology lead to this grand accomplishment. The next frontier is to decipher how all the proteins encoded by the genome interact to carry out the processes of life. This is the study of proteomics. Advances in mass spectrometry and protein sequencing instrumentation are bringing this challenging problem closer to its resolution.
see also HPLC: High-Performance Liquid Chromatography; Mass Spectrometry; Proteins; Sequencing DNA.
Frank H. Stephenson
and Maria Cristina Abilock
Creighton, Thomas E. Proteins: Structures and Molecular Properties. New York: W. H.Freeman, 1993.
The results of this chemical sequencing can often be compared with the amino-acid sequence deduced by DNA sequencing. The gene coding for the protein under investigation may be found by screening a DNA library, for example by Western blotting. However, the base sequence of the gene gives only the amino-acid sequence of the nascent protein, i.e. before post-translational modification. The sequence of the functional protein can only be found by chemical analysis.