Proteins are linear chains of amino acids connected by chemical bonds between the carboxyl group of each amino acid and the amine group of the one following. These bonds are called peptide bonds, and chains of only a few amino acids are referred to as polypeptides rather than proteins. Different authorities set the protein/polypeptide dividing line at anywhere from 10 to 100 amino acids.
Many proteins have components other than amino acids. For example, some may have sugar molecules chemically attached. Exactly which types of sugars are involved and where on the protein chain attachment occurs will vary with the specific protein. In a few cases, it may also vary between different people. The A, B, and O blood types, for example, differ in precisely which types of sugar are or are not added to a specific protein on the surface of red blood cells.
Other proteins may have fat-like (lipid ) molecules chemically bonded to them. These sugar and lipid molecules are always added after synthesis of the protein's amino acid chain is complete. As a result, discussions of protein structure and synthesis—including this one—may virtually ignore them. Nevertheless, such molecules can significantly affect the protein's properties.
Many other types of molecules may also be associated with proteins. Some proteins, for example, have specific metal ions associated with them. Others carry small molecules that are essential to their activity. Still others associate with nucleic acids in chromosomal or ribosomal structures.
What proteins do
Much of our bodies' dry weight is protein—even our bones are about one-quarter protein. The animals we eat and the microbes that attack us are likewise largely protein. The leather, wool, and silk clothing that we wear are nearly pure protein. The insulin that keeps diabetics alive and the "clot-busting" enzymes that may save heart attack patients are also proteins. Proteins can even be found working at industrial sites-protein enzymes produce not only the high-fructose corn syrup that sweetens most soft drinks, but also fuel-grade ethanol (alcohol ) and other gasoline additives.
Within our bodies and those of other living things, proteins serve many functions. They digest foods and turn them into energy ; they move our bodies and move molecules about within our cells; they let some substances pass through cell membranes while keeping others out; they turn light into chemical energy, making both vision and photosynthesis possible; they allow cells to detect and react to hormones and toxins in their surroundings; and, as antibodies, they protect our bodies against foreign invaders.
Many of these protein functions are addressed or referred to in other articles in this encyclopedia. Yet there are simply too many proteins—possibly more than 100,000—to even consider mentioning them all. Even trying to discuss every possible type of protein is an exercise in futility. Not only is the number of types enormous, but the types overlap. In producing muscle contraction, for example, the proteins actin and myosin obtain energy by breaking down adenosine triphosphate in an enzyme-like fashion.
Scientists have traditionally addressed protein structure at four levels: primary, secondary, tertiary, and quaternary. Primary structure is simply the linear sequence of amino acids in the peptide chain. Secondary and tertiary structure both refer to the three-dimensional shape into which a protein chain folds. The distinction is partly historical: secondary structure refers to certain highly regular arrangements of amino acids that scientists could detect as long ago as the 1950s, while tertiary structure refers to the complete three-dimensional shape. Determining a protein's tertiary structure can be difficult even today, although researchers have made major strides within the past decade.
The tertiary structure of many proteins shows a "string of beads" organization. The protein includes several compact regions known as domains, separated by short stretches where the protein chain assumes an extended, essentially random configuration. Some scientists believe that domains were originally separate proteins that, over the course of evolution , have come together to perform their functions more efficiently.
Quaternary structure refers to the way in which protein chains—either identical or different—associate with each other. For example, a complete molecule of the oxygen-carrying protein hemoglobin includes four protein chains of two slightly different types. Simple laboratory tests usually allow scientists to determine how many chains make up a complete protein molecule.
Primary structure: peptide-chain synthesis
Proteins are made (synthesized) in living things according to "directions" given by DNA and carried out by RNA and proteins. The synthesized protein's linear sequence of amino acids is ultimately determined by the linear sequence of DNA bases—or of base triplets known as codons —in the gene that codes for it. Each cell possesses elaborate machinery for producing proteins from these blueprints.
The first step is copying the DNA blueprint, essentially fixed within the cell nucleus, into a more mobile form. This form is messenger ribonucleic acid (mRNA), a single-stranded nucleic acid carrying essentially the same sequence of bases as the DNA gene. The mRNA is free to move into the main part of the cell, the cytoplasm, where protein synthesis takes place.
Besides mRNA, protein synthesis requires ribosomes and transfer ribonucleic acid (tRNA). Ribosomes are the actual "factories" where synthesis takes place, while tRNA molecules are the "trucks" that bring amino acids to the ribosome and ensure that they are incorporated at the right spot in the growing chain.
Ribosomes are extremely complex assemblages. They comprise almost 70 different proteins and at least three different types of RNA, all organized into two different-sized subunits. As protein synthesis begins, the previously separate subunits come together at the beginning of the mRNA chain; all three components are essential for the synthetic process.
Transfer RNA molecules are rather small, only about 80 nucleotides long. (Nucleotides are the fundamental building blocks of nucleic acids, as amino acids are of proteins.) Each type of amino acid has at least one corresponding type of tRNA (sometimes more). This correspondence is enforced by the enzymes that attach amino acids to tRNA molecules, which "recognize" both the amino acid and the tRNA type and do not act unless both are correct.
Transfer RNA molecules are not only trucks but translators. As the synthetic process adds one amino acid after another, they" read" the mRNA to determine which amino acid belongs next. They then bring the proper amino acid to the spot where synthesis is taking place, and the ribosome couples it to the growing chain. The tRNA is then released and the ribosome then moves along the mRNA to the next codon—the next base triplet specifying an amino acid. The process repeats until the "stop" signal on the mRNA is reached, upon which the ribosome releases both the mRNA and the completed protein chain and its subunits separate to seek out other mRNAs.
The two major types of secondary structure are the alpha helix and the beta sheet, both discovered by Linus Pauling and R. B. Corey in 1951. (Pauling received the first of his two Nobel Prizes for this discovery.) Many scientists consider a structure known as the beta turn part of secondary structure, even though the older techniques used to identify alpha helices and beta sheets cannot detect it. For completeness, some authorities also list random coil—the absence of any regular, periodic structure-as a type of secondary structure.
alpha helix. In an alpha helix, the backbone atoms of the peptide chain—the carboxyl carbon atom, the a-carbon atom (to which the side chain is attached), and the amino nitrogen atom—take the form of a three-dimensional spiral . The helix is held together by hydrogen bonds between each nitrogen atom and the oxygen atom of the carboxyl group belonging to the fourth amino acid up the chain. This arrangement requires each turn of the helix to encompass 3.6 amino acids and forces the side chains to stick out from the central helical core like bristles on a brush.
Since amino acids at the end of an alpha helix cannot form these regular hydrogen bonds, the helix tends to become more stable as it becomes longer-that is, as the proportion of unbonded "end" amino acids becomes smaller. However, recent research suggests that most alpha helices end with specific "capping" sequences of amino acids. These sequences provide alternative hydrogen-bonding opportunities to replace those unavailable within the helix itself.
beta sheet. Beta sheets feature several peptide chains lying next to each other in the same plane . The stabilizing hydrogen bonds are between nitrogen atoms on one chain and carboxyl-group oxygen atoms on the adjacent chain. Since each amino acid has its amino group hydrogen-bonded to the chain on one side and its carboxyl group to the chain on the other side, sheets can grow indefinitely. Indeed, as with alpha helices, the sheet becomes more stable as it grows larger.
The backbone chains in a beta sheet can all run in the same direction (parallel beta sheet) or alternate chains can run in opposite directions (antiparallel beta sheet). There is no significant difference in stability between the types, and some real-world beta sheets mix the two. In each case, side chains of alternate amino acids stick out from alternate sides of the sheet. The side chains of adjacent backbone chains are aligned, however, creating something of an accordion-fold effect.
beta turn. Many antiparallel beta sheets are formed by a single peptide chain continually looping back on itself. The loop between the two hydrogen-bonded segments, known as a beta turn, consistently contains one to three (usually two) amino acids. The amino acids in a beta turn do not form hydrogen bonds, but other interactions may stabilize their positions. A further consistency is that, from a perspective where the side chain of the final hydrogen-bonded amino acid projects outward toward the viewer, the turn is always to the right.
Tertiary structure and protein folding
Within seconds to minutes of their synthesis on ribosomes, proteins fold up into an essentially compact three-dimensional shape-their tertiary structure. Ordinary chemical forces fully determine both the steps in the folding pathway and the stability of the final shape. Some of these forces are hydrogen bonds between side chains of specific amino acids. Others involve electrical attraction between positively and negatively charged side chains. Perhaps most important, however, are what are called hydrophobic interactions—a scientific restatement of the observation that oil and water do not mix.
Some amino acid side chains are essentially oil-like (hydrophobic-literally, "water-fearing"). They accordingly stabilize tertiary structures that place them in the interior, largely surrounded by other oil-like side chains. Conversely, some side chains are charged or can form hydrogen bonds. These are hydrophilic, or "water-loving," side chains. Unless they form hydrogen or electro-static bonds with other specific side chains, they will stabilize structures where they are on the exterior, interacting with water.
The forces that govern a protein's tertiary structure are simple. With thousands or even tens of thousands of atoms involved, however, the interactions can be extremely complex. Today's scientists are only beginning to discover ways to predict the shape a protein will assume and the folding process it will go through to reach that shape.
Recent studies show that folding proceeds through a series of intermediate steps. Some of these steps may involve substructures not preserved in the final shape. Furthermore, the folding pathway is not necessarily the same for all molecules of a given protein. Individual molecules may pass through any of several alternative intermediates, all of which ultimately collapse to the same final structure.
The stability of a three-dimensional structure is not closely related to the speed with which it forms. Indeed, speed rather than stability is the main reason that egg white can never be "uncooked." At room temperature or below, the most stable form of the major egg white protein is compact and soluble. At boiling-water temperatures, the most stable form is an extended chain. When the cooked egg is cooled, however, the proteins do not have time to return to their normal compact structures. Instead, they collapse into an aggregated, tangled mass. And although this tangled mass is inherently less stable than the protein structures in the uncooked egg white, it would take millions of years—effectively forever—for the chains to untangle themselves and return to their soluble states. In scientific terminology, the cooked egg white is said to be metastable.
Something very similar could happen in the living cell. That it rarely does so reflects eons of evolution: selection has eliminated protein sequences likely to get trapped in a metastable state. Mutations can upset this balance, however. In the laboratory, scientists have produced many mutations that disrupt a protein's tertiary structure; either rendering it unstable or allowing it to become trapped in a metastable state. In the body, some scientists suspect that cystic fibrosis and an inherited bone disease called osteogenesis imperfecta may be due to mutations interfering with protein folding. And some believe that Alzheimer disease may also be due to improper protein folding, although not because of a mutation .
Scientists were recently surprised to discover that some proteins require an additional mechanism to ensure that they fold properly: association with other proteins. Since a protein's primary sequence completely determines its tertiary structure—as Christian Anfinsen and his National Institutes of Health colleagues had shown in a classic 1960 study—external mechanisms were not anticipated.
Sometimes the associated proteins become part of the final protein complex; in effect, quaternary structure forms before the final tertiary structure. In other instances, folding is assisted by a class of proteins known as chaperonins that dissociate when the process is complete. No one knows the precise role chaperonins play; it may not be the same in all cases. Scientists suspect, however, that one major chaperonin role may be to steer target proteins away from aggregation or other metastable states in which they might become trapped.
Quaternary structure, cooperativity, and hemoglobin
Some proteins have no quaternary structure. They exist in the cell as single, isolated molecules. Others exist in complexes encompassing anywhere from two to dozens of protein molecules belonging to any number of types.
Proteins may exhibit quaternary structure for a variety of reasons. Sometimes several proteins must come together to carry out a single function, or to perform it efficiently, without the substances on which they all act having to diffuse halfway across the cell. At other times the reasons are at least partially structural; for example, several proteins may come together to form an ion channel long enough to reach across the cell membrane . The most interesting reason, however, is that association allows changes to one molecule to affect the shape and activity of the others. Hemoglobin provides an intriguing example of this.
Hemoglobin, which makes up about a third of red blood cells' weight, is the protein that transports oxygen from the lungs to the tissues where it is used. It would be a major oversimplification, but not entirely false, to say that the protein (globin) part of hemoglobin is simply a carrier for the associated heme group.
Heme is a large "ring of rings" comprising 33 carbon, 4 nitrogen, 4 oxygen, and 30 hydrogen atoms. In the center, bonded to the four nitrogen atoms, is an iron atom; attraction between this iron atom and a histidine side chain on the globin is one of several forces holding the heme in place. Another histidine side chain is located slightly further from the iron atom, allowing an oxygen molecule to insert itself reversibly into the gap. In similar proteins lacking this histidine, oxygen alters the iron's oxidation state rather than attaching to it.
Hemoglobin consists of two copies of each of two slightly different protein molecules. All four molecules are in intimate contact with each other; thus, it is easy to see how a change in the shape of one could encourage the others to change shape as well. In fact, that is exactly what happens. When oxygen binds to one hemoglobin molecule, it forces a slight change in that molecule's shape. This change, in turn, alters the other molecules' shape so that oxygen binding is more likely. The end result is that any given hemoglobin tetramer (four-molecule complex) almost always carries either four oxygen molecules or none.
This "cooperativity," discovered by Coryell and Pauling in 1939, is extremely important for hemoglobin's function in the body. In the lungs, where there is a great deal of oxygen, binding of an oxygen molecule is quite likely. This leads almost immediate binding of three more oxygen molecules, so hemoglobin is nearly saturated with oxygen as it leaves the lungs. In the tissues, where there is less oxygen, the chance that an oxygen molecule will leave the hemoglobin tetramer becomes quite high. As a result, the other three oxygen molecules will be bound less tightly and will probably leave also. The final consequence is that most of the oxygen carried to the tissues will be released there.
Without cooperativity, hemoglobin would pick up less oxygen in the lungs and release less in the tissues. Overall oxygen transport would therefore be less efficient.
Although we think of proteins as natural products, scientists are now learning to design proteins. Many of today's designs involve making small changes in already existing proteins. For example, by changing two amino acids in an enzyme that normally breaks down proteins into short peptides, scientists have produced one that instead links peptides together. Similarly, changing three amino acids in an enzyme often used to improve detergents' cleaning power doubled the enzyme's wash-water stability.
Researchers have also designed proteins by combining different naturally occurring domains, and are actively investigating possible applications. Medical applications seem especially promising. For example, we might cure cancer by combining cancer-recognizing antibody domains with the cell-killing domains of diphtheria toxin. While native diphtheria toxin kills many types of cells in the body, scientists hope these engineered proteins will attach to, and kill, only the cancer cells against which their antibody domains are directed.
The long-term goal, however, is to design proteins from scratch. This is extremely difficult today, and will remain so until researchers better understand the rules that govern tertiary structure. Nevertheless, scientists have already designed a few small proteins whose stability or instability helps illuminate these rules. Building on these successes, scientists hope they may someday be able to design proteins for a spectrum of industrial and economic needs.
Gerbi, Susan A. From Genes to Proteins. Burlington, NC: Carolina Biological, 1987.
Yew, Nelson S. Protein Processing Defects in Human Disease. Austin: R. G. Landes, 1994.
Zubay, Geoffrey, and Richard Palmiter. Principles of Biochemistry. Vol. 3. Nucleic Acid and Protein Metabolism. Dubuque, IA: William C. Brown, 1994.
King, Jonathan. "The Unfolding Puzzle of Protein Folding." Technology Review (May/June 1993): 54-61.
Lipkin, Richard. "Designer Proteins: Building Machines of Life from Scratch." Science News 146 (1994): 396-397.
Sato, M., K. Machida, E. Arikado, et al. "Expression of Outer Membrane Proteins of Escherichia coli Growing at Acid pH."" Applied and Environmental Microbiology no. 66 (March 2000): 943-947.
Zhaohui, Xu., J.D. Knafels, and K. Yoshino. "Crystal Structure of the Bacterial Protein Export Chaperone SecB." Nature Structural Biology no. 7 (December 2000): 1172-1177.
W. A. Thomasson
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
- Alpha helix
—A type of secondary structure in which a single peptide chain arranges itself in a three-dimesional spiral.
- Beta sheet
—A type of secondary structure in which several peptide chains arrange themselves alongside each other.
—A relatively compact region of a protein, seperated from other domains by short stretches in which the protein chain is more or less extended; different domains often carry out distinct parts of the protein's overall function.
- Messenger ribonucleic acid (mRNA)
—A molecule of RNA that carries the genetic information for producing one or more proteins; mRNA is produced by copying one strand of DNA, but is able to move from the nucleus to the cytoplasm (where protein synthesis takes place).
- Peptide bond
—A chemical bond between the carboxyl group of one amino acid and the amino nitrogen atom of another.
—A group of amino acids joined by peptide bonds; proteins are large polypeptides, but no agreement exists regarding how large they must be to justify the name.
- Primary structure
—The linear sequence of amino acids making up a protein.
- Quaternary structure
—The number and type of protein chains normally associated with each other in the body.
—A protein composed of two subunits that functions in protein synthesis.
- Secondary structure
—Certain highly regular three-dimensional arrangements of amino acids within a protein.
- Tertiary structure
—A protein molecule's overall threedimensional shape.
- Transfer ribonucleic acid (tRNA)
—A small RNA molecule, specific for a single amino acid, that transports that amino acid to the proper spot on the ribosome for assembly into the growing protein chain.