Now that the Human Genome Project is essentially complete, should governmental and private agencies commit themselves to the Human Proteome Project, which would study the output of all human genes

views updated


Now that the Human Genome Project is essentially complete, should governmental and private agencies commit themselves to the Human Proteome Project, which would study the output of all human genes?

Viewpoint: Yes, governmental and private agencies should now commit themselves to a Human Proteome Project because of the many practical benefits such work could bring, from improvements in rational drug design to the discovery of new disease markers and therapeutic targets.

Viewpoint: No, governmental and private agencies should not commit themselves to a Human Proteome Project; the intended endpoint of the project is unclear, and the battles over access to the data might be even more intense than those that marked the Human Genome Project.

The term genetics was coined at the beginning of the twentieth century to signify a new approach to studies of patterns of inheritance. Within about 50 years, the integration of several classical lines of investigation led to an understanding of the way in which the gene was transmitted and the chemical nature of the gene. Subsequent work revealed how genes work, the nature of the genetic code, and the way in which genetic information determines the synthesis of proteins. During the course of these investigations, classical genetics was largely transformed into the science of molecular biology.

For most of the first half of the twentieth century, scientists assumed that the genetic material must be protein, because no other species of biological molecules seemed to possess sufficient chemical complexity. Nucleic acids, which had been discovered by Friedrich Miescher in the 1860s, were thought to be simple, monotonous polymers. The work of Erwin Chargaff in the 1940s challenged prevailing ideas about DNA and suggested that nucleic acids might be as complicated, highly polymerized, and chemically diverse as proteins. However, because the structure of DNA was still obscure, experiments that indicated that DNA might be the genetic material could not explain the biological activity of the gene. This dilemma was resolved in 1953 when James D. Watson and Francis Crick described an elegant double helical model of DNA that immediately suggested how the gene could function as the material basis of inheritance. Indeed, molecular biologists called the elucidation of the three-dimensional structure of DNA by Watson and Crick one of the greatest achievements of twentieth-century biology.

Based on the Watson-Crick DNA model, researchers were able to determine how genes work, that is, how information stored in DNA is replicated and passed on to daughter molecules and how information in DNA determines the metabolic activities of the cell. In attempting to predict the steps involved in genetic activity, Watson stated that: DNA [.arrowright] RNA [.arrowright] protein. The arrows represent the transfer of genetic information from the base sequences of the nucleic acids to the amino acid sequences in proteins. The flow of information from DNA to RNA to protein has been called the Central Dogma of molecular biology.

Further research demonstrated that specific mutations in DNA are associated with changes in the sequence of amino acids in specific proteins that result in a wide variety of inherited diseases. By the 1980s scientists had established the existence of some 3,000 human genetic diseases, which suggested that developments in genetics would make it possible to manage many diseases at the gene level. In theory, the ability to produce genes and gene products in the laboratory should allow "replacement therapy" for the hundreds of genetic diseases for which the defective gene and gene product are known.

In the 1970s, scientists learned how to cut and splice DNA to form recombinant DNA molecules and place these molecules into host organisms, and then clone them, that is, induce them to reproduce within their hosts. The commercial potential of recombinant DNA and genetic engineering was almost immediately understood by scientists, university development officials, venture capitalists, and entrepreneurs. Genentech (Genetic Engineering Technology) was established in 1976 for the commercial exploitation of recombinant DNA techniques. In 1980, the U.S. Patent Office granted a patent for the recombinant DNA process. This was the first major patent of the new era of biotechnology. Scientists confidently predicted that the techniques of molecular biology would make it possible to treat genetic diseases, manipulate the genetic materials of plants, animals, and microorganisms, synthesize new drugs, and produce human gene products such as insulin, growth hormone, clotting factors, interferons, and interleukins. However, such predictions also raised questions about the potential uses and abuses of recombinant DNA, cloning, genetic engineering, gene therapy, and genetically modified foods.

Refinements in the techniques of molecular biology made it possible to plan and execute the Human Genome Project (HGP), an international effort dedicated to mapping and sequencing all of the bases on the 23 pairs of human chromosomes. In 1988 an international council known as the Human Genome Organization (HUGO) was established to coordinate research on the human genome, and the Human Genome Project was officially launched in 1990. Some scientists called the project the "Holy Grail" of human genetics, but others warned against diverting the limited resources available to biological science to what was essentially routine and unimaginative work. As the project proceeded, however, new techniques for rapid and automated sequencing and data analysis demonstrated the feasibility of the project. Another aspect of the HGP was the growing influence of research methodologies known as "discovery science" and "data mining," which involve generating and utilizing enormous databases.

In 1998 the consortium of scientists involved in the HGP found itself in competition with Celera Genomics, a corporation founded and directed by J. Craig Venter, a former NIH researcher. On June 26, 2000, leaders of the public genome project and Celera held a joint press conference to announce the completion of working drafts of the whole human genome. Both maps were published in February 2001. By January 2002, Celera announced that it would shift its focus to drug development, because its original plan to sell access to its genome databases was not as profitable as anticipated.

With the successful completion of the first major phase of the Human Genome Project, scientists could direct their energies to the daunting task of analyzing the tens of thousands of human genes and their relationship to the hundreds of thousands of human proteins. Based on experience gained during the quest for the complete sequence of the human genome, scientists suggested creating a complete inventory of human proteins and calling this effort the Human Proteome Project (HUPO).

Australian scientists Marc Wilkins and Keith Williams coined the term proteome in 1995. Proteome stands for the "set of PROTEins encoded by the genOME." Although nucleic acids have received so much attention since Watson and Crick introduced the DNA double helix, proteins are actually the hard-working macromolecules that do most of the tasks needed for life. Proteins are the essential structural elements of the cell, and they serve as enzymes, hormones, antibodies, and so forth. Because the relationship between genes and proteins is dynamic and complex, the proteome can be thought of as the total set of proteins expressed in a given organelle, cell, tissue, or organism at a given time with respect to properties such as expression levels, posttranslational modifications, and interactions with other molecules. The term proteomics is routinely used as the name of the science and the process of analyzing and cataloging all the proteins encoded by a genome. Some scientists think of proteomics as "protein-based genomics."

Scientists argue that instead of establishing a mere catalog of proteins, proteomics would create a complete description of cells, tissues, or whole organisms in terms of proteins, and the way that they change in response to developmental and environmental signals. Many practical benefits are anticipated from the analysis of such patterns of proteins; for instance they could lead to major improvements in rational drug design and the discovery of new disease markers and therapeutic targets.

The first major conferences devoted to the possibility of establishing a major proteome project was held in April 2001 in McLean, Virginia. Comparing the challenges raised by proteomics to those involved in the Human Genome Project, the organizers called the conference "Human Proteome Project: Genes Were Easy." The founders of HUPO hoped to bring academic, commercial, national, and regional proteome organizations into a worldwide effort to study the output of all human genes. However, skeptics warned that the intended endpoint of the proteome project was more nebulous than that of the genome project. Cynics warned that battles about access to the data generated by public and private groups might be even more intense than those that marked the rivalry between the public genome project and Celera.

Publication of the first draft of the human gene map in 2001 was accompanied by claims that the Human Genome Project would provide the complete script from which scientists could read and decode the "nature" of the human race. More sober evaluations suggested that the HGP had produced a parts list, rather than a blueprint or a script. Although advances in proteomics may indeed lead to new drugs and therapeutic interventions, the feasibility and potential of the Human Proteome Project are still points of debate, as indicated by the following essays.


Viewpoint: Yes, governmental and private agencies should now commit themselves to a Human Proteome Project because of the many practical benefits such work could bring, from improvements in rational drug design to the discovery of new disease markers and therapeutic targets.

Human Proteome

A gene is a piece of DNA. The 35,000 or more genes that each person carries are the working units of heredity that pass from parents to child. A genome can be defined as all the DNA in a given organism.

The genome's entire job is to direct cells to make, or express, proteins. The genes in each cell in a human body use DNA-encoded instructions to direct the expression of one or many proteins. So 35,000 genes may generate millions of proteins, each of which is then modified in many ways by the cellular machinery.

In the words of Stanley Fields, professor of genetics and medicine at the University of Washington-Seattle, "… Proteins get phosphorylated, glycosylated, acetylated, ubiquitinated, farnesylated, sulphated, linked to … anchors … change location in the cell, get cleaved into pieces, adjust their stability, and … bind to other proteins, nucleic acids, lipids, small molecules, and more."

All these modifications make it possible for proteins to become hair and nails, enzymes and connective tissue, bone and cartilage, tendons and ligaments, the functional machinery of cells, and much more. So the genome supplies these building blocks of life, but proteins do all the work, and, according to David Eisenberg, director of the UCLA-DOE Laboratory of Structural Biology and Molecular Medicine, they do it "as part of an extended web of interacting molecules."

Compared with the genes in the human genome, proteins are so complex and dynamic that some experts say there can be no such thing as an identifiable human proteome—defined as all the proteins encoded in the human genome—and so no basis for a human proteome project.

Others hold that a true understanding of the genetic blueprint represented by the human genome can't begin until all the genes are not only mapped but annotated (explained), along with their products—proteins. And even though the proteome does exist in near-infinite dimensions, it is possible to start planning and identifying short-term milestones and measures of success, and prioritize specific stages of a human proteome project.

The History of Proteomics

According to Dr. Norman Anderson, chief scientist at the Large Scale Biology Corp., what is now proteomics started in the 1960s as the first large-scale engineering project—the original Molecular Anatomy Program at Oak Ridge National Laboratory in Tennessee. It was supported by the National Institutes of Health (NIH) and the Atomic Energy Commission, and aimed to produce as complete an inventory of cells at the molecular level as technology would allow.

The sponsors were especially interested in discovering and purifying new pathogenic viruses, and technology developed in the program included a systematic approach to fractionating (separating into different components) proteins, large-scale vaccine purification methods, high-pressure liquid chromatography, and high-speed computerized clinical analyzers.

In 1975, after a new technology called high-resolution two-dimensional electrophoresis was introduced, the program moved to Argonne National Laboratory to take advantage of image-analysis techniques and emerging automation.

Beginning in 1980, an attempt was made to launch a Human Protein Index (HPI) project as a serious national objective. In 1983 some HPI proponents proposed launching a dual effort—sequencing the human genome and indexing proteins as parallel projects. The Human Genome Project was first to succeed, partly because the basic gene-sequencing technology was widely available.

Now—thanks to advances in mass spectrometry and automation, and the expansion of many protein-related technologies and systems—a coordinated program in proteomics could finish what the Human Genome Project started in 1983.

With this in mind, in February 2001 the Human Proteome Organization (HUPO) was launched, with the official formation of a global advisory council of academic and industry leaders in proteomics. HUPO is not necessarily meant to become the official Human Proteome Project, although it could become such a framework.

It was formed in part to help increase awareness of proteomics across society, and create a broader understanding of proteomics and the opportunities it offers for diagnosing and treating disease. It was also formed as a venue for discussing the pros and cons of creating a formal project, and for shaping such a project if it goes forward.

As a global body, HUPO wants to foster international cooperation across the proteomics community and promote related scientific research around the world. Participants include those representing government and financial interests, to make sure the benefits of proteomics are well distributed. Advisory council members are from Europe, North America, the Far East, and Japan. Special task forces in Europe and Japan represent HUPO at a regional level.

In April 2001, HUPO held its first three-day meeting in McLean, Virginia, and more than 500 members of the global proteomics community attended. The international conference was called Human Proteome Project: Genes Were Easy.

Genes Were Easy

Topics at the April meeting included such basics as funding and fostering, scope and scale, financial implications, current proteomic efforts and lessons, patent and insurance issues, and potential solutions. Throughout the meeting a series of round table discussions by the big guns in international proteomics tackled thornier issues and posed tough questions.

Questions from Round Table 1: Lessons from the Human Genome Project included:

What agreements are needed on data reporting and data quality? Who will own the information? Would a formal project spur technological innovation? What happens if there is no formal project? What kind of funding would be required and where should it come from? What should be the benefits from the project?

Questions from Round Table 2: Lessons from Current Efforts-Defining the End Result of the Human Proteome Project included:

Is it a complete list of all human proteins? Is it a full database of protein-expression levels? Would it include healthy and disease-related protein expression? Would it include protein expression in response to drug treatment? Is it a full database of human protein-protein interactions? Is it a complete understanding of the function of each protein?

Discussions in Round Table 3 focused on the major technical challenges posed by a human proteome project. These included: capability to handle the full range of protein expression levels; reproducibility of protein expression studies; reducing false positives in yeast 2-hybrid studies; and significantly increasing high-throughput capacity and automation.

Six months later, in October 2001, HUPO convened a planning meeting to review the state of the art in proteomics and consider how to further knowledge of the human proteome. The HUPO meeting was held in Leesburg, Virginia, and sponsored by the National Cancer Institute (NCI) and the Food and Drug Administration (FDA).

Meeting attendees agreed that the constantly changing human proteome has quasi-infinite dimensions, and that initiating a large international human proteome project would be much harder than the human genome project because there can be no single goal, like sequencing the human genome. Other complicating factors include the more complex and diverse technologies and resulting data, and a need for more sophisticated data-integration tools for the proteome than for the genome.

Still, the group ambitiously defined the goal of proteomics as identifying all proteins encoded in the human genome; determining their range of expression across cell types, their modifications, their interactions with other proteins, and their structure-function relationships; and understanding protein-expression patterns in different states of health and disease.

Just because a proteome project is harder to conceptualize and carry out than a genome project doesn't mean it shouldn't be done. So the attendees recommended short-term milestones and measures of success. They agreed that private-public partnerships should be formed to work on specific, affordable, and compelling areas of scientific opportunity as individual projects with defined time frames.

They agreed that a proteome project should combine elements of expression proteomics, functional proteomics, and a proteome knowledge base.

Expression proteomics involves identifying and quantitatively analyzing all proteins encoded in the human genome and assessing their cellular localization and modifications.

Functional proteomics involves defining protein interactions, understanding the role of individual proteins in specific pathways and cellular structures, and determining their structure and function.

A proteome knowledge base involves organizing proteome-related data into a database that integrates new knowledge with currently scattered proteome-related data, leading to an annotated human proteome.

They also agreed that technology development should be an integral component of the project, that all users should have open access to data that comes from the project, that strong consideration should be given to integrating proteome knowledge with data at the genome level, that the project should have definable milestones and deliverables, and that a planning and piloting phase should precede the project's production phase.

The Alternative to Cooperation

The debate continues about whether a Human Proteome Project is possible, even while HUPO and the proteomics community hammer together an increasingly solid foundation for such a global project. According to the HUPO literature, this careful and consensual planning is helping "harness the naturally occurring forces of politics and markets, technology and pure research" that will make an international proteomics effort possible.

Now that the human genome is sequenced, the next logical step is to understand proteins—the products of genes—in the context of disease. With the tools of proteomics, specific proteins can be identified as early, accurate markers for disease. Proteins are important in planning and monitoring therapeutic treatments. Understanding protein-expression patterns can help predict potential toxic side effects during drug screening. Proteins identified as relevant in specific disease conditions could have an important role in developing new therapeutic treatments.

Genomics spawned a multibillon-dollar research effort and many commercial successes. But, according to HUPO founding member Prof. Ian Humphery-Smith, "proteins are central to understanding cellular function and disease processes. Without a concerted effort in proteomics, the fruits of genomics will go unrealized."

At the National Institute of General Medical Sciences (NIGMS), John Norvell, director of the $150-million Protein Structure Initiative, found that when researchers had total discretion to pick their own projects, they studied proteins with interesting properties and ignored proteins whose structures could illuminate the structures of many similar proteins. "We decided an organized effort was thus needed," he said.

"Proteome analysis will only contribute substantially to our understanding of complex human diseases," says proteomics researcher Prof. Joachim Klose of Humboldt University in Germany, "if a worldwide endeavor is initiated aiming at a systematic characterization of all human proteins."

For the fledgling Human Proteome Project and the resulting understanding of how proteins work together to carry out cellular processes—what Stanley Fields calls the real enterprise at hand—the alternative to international cooperation through a systematic proteomics effort, says Ian Humphery-Smith, "is sabotage and self-interested competition."


Viewpoint: No, governmental and private agencies should not commit themselves to a Human Proteome Project; the intended endpoint of the project is unclear, and the battles over access to the data might be even more intense than those that marked the Human Genome Project.

Less than 10 years ago, while scientists were in the throes of sequencing the human genome, Marc Wilkins of Proteome Systems in Australia invented the word "proteome" to describe the qualitative and quantitative study of proteins produced in an organism, which includes protein structure, protein interactions, and biochemical roles, such as in health and disease. Now at the start of the twenty-first century, as the human genome sequencing project is near completion, the proteome research project is being discussed, but with caution and debate surrounding the cost, feasibility, and usefulness of the project.

Trying to translate and comprehend the chemical alphabet sequence of the 3 billion nucleotide base pairs of the human genome DNA in the Human Proteome Project follows the "Central Dogma" of molecular biology—DNA is transcribed into RNA, which is translated into protein. (The nucleotide base pairs of DNA code for RNA, which, in turn, codes for protein). In addition to gaining basic information about proteins—their structure, cellular interactions, and biochemistry—biotechnology and pharmaceutical companies are particularly interested in investigating a vast number of proteins the human genome encodes as potential drug target candidates.

So what are the perils of the Proteome Project? Critics cite the complexity of the human proteome, technology, and money.

If sequencing the 3 billion base pairs of the human genome seemed like a huge project, the Proteome Project is much more complex and involves far more data. In addition, the Human Proteome Project investigates many aspects of proteins and is not just a single task, like sequencing in the Human Genome Project. The projected 34,000 or so genes encode 500,000 to 1 x 106 (million) proteins. The DNA in the 250 cell types in the human body, such as skin, liver, or brain neuron, is the same, but the proteins in the different cell types are not. The amount of protein can vary according to cell type, cell health, stress, or age. At different times and conditions, different subproteomes will be expressed. Scientists expect that in the next 10 years several subproteome projects will take place, rather than the "entire" human proteome project being done. The subproteome projects, for example, would investigate proteins found in one tissue, body fluid, or cell type at a separate moment of life. Will there ever be a complete catalogue of human proteins? According to Denis Hochstrasser at the Swiss Institute of Bioinformatics, we would never know.

The scientific research community agrees that a variety of technologies and technological developments are necessary to investigate the proteome. For example, it is slow and tedious to excise protein spots from commonly run two-dimensional polyacrylamide gel electrophoresis (PAGE) experiments by hand. In addition, big hydrophobic proteins do not dissolve in the solvents currently used in 2D PAGE experiments. It is also hard to distinguish very large or very small proteins in 2D PAGE experiments. Automating the running and analysis of 2D gel technology would also allow a 2D gel experiment to be run in days rather than months or years. Hoping to make the process of separating and identifying proteins easier, Hochstrasser and colleagues are developing a molecular scanner that would automate the separation and identification of thousands of protein types in a cell.

Developments in mass spectrometry and bioinformatic technologies will also be necessary to further the proteome project. Although mass spectrometry is useful for molecularly characterizing proteins, the technique can fail to detect rare proteins. Also the cost of a mass spectrometer is huge. In 2002, the cost of a mass spectrometer is estimated to be $500,000. Nuclear magnetic resonance (NMR) tools that enable scientists to take dynamic, "movie" data of proteins instead of static snapshots will also be necessary. In order to store and analyze all the information on proteins, researchers will also need to develop bioinformatic computer databases.

More words of caution about the proteome project are uttered when the amount of money necessary to do various parts of the project are estimated. Each of the genome sequencing projects, which leads to the discovery of more proteins and proteome projects, is a multi-million-dollar endeavor. Celera raised 1 billion dollars for a new proteomics center, and the University of Michigan has received $15 million in grants from the U.S. National Institutes of Health (NIH). Each of the components of a Japanese proteome project has million-dollar budgets, which is typical for a proteome project anywhere in the world. 3,000 protein structures in five years will cost $160 million. Technology development will cost $88 million. The synchrotron alone will cost $300 million. Critics of the proteome project feel that the era of big public science initiatives, such as sending a man to the Moon, especially for a program that is not well-defined, is or should be over.


Further Reading

Abbott, Alison. "Publication of Human Genomes Sparks Fresh Sequence Debate." Nature 409, no. 6822 (2001): 747.

Ashraf, Haroon. "Caution Marks Prospects for Exploiting the Genome." Lancet 351, no. 9255 (2001): 536.

Begley, Sharon. "Solving the Next Genome Puzzle." Newsweek 137, no. 8 (February 19, 2001): 52.

Bradbury, Jane. "Proteomics: The Next Step after Genomics?" Lancet 356, no. 9223 (July 1, 2000): 50.

"Celera in Talks to Launch Private Sector Human Proteome Project." Nature 403, no. 6772 (2000): 815-16.

Eisenberg, David. "Protein Function in the Post-genomic Era." Nature 405 (June 15, 2000).

Ezzell, Carol. "Move Over, Human Genome." Scientific American 286, no. 4 (2002): 40-47.

———. "Proteomics: Biotech's Next Big Challenge." Scientific American (April 2002): 40-47.

Fields, Stanley. "Proteomics in Genomeland" Science 291, no. 5507 (2001): 1221.

Gavaghan, Helen. "Companies of All Sizes Are Prospecting for Proteins." Nature 404, no. 6778 (2000): 684-86.

Human Genome News. Sponsored by the U.S. Department of Energy Human Genome Program [cited July 20, 2002]. <>.

HUPO Human Proteome Organization [cited July 20, 2002]. <>.

HUPO Workshop, Meeting Report, October 7,2001 [cited July 20, 2002]. <>.

Pennisi, E. "So Many Choices, So Little Money." Science 294, no. 5 (October 5, 2001): 82-85.

Petricoin, Emanuel. "The Need for a Human Proteome Project—All Aboard?" Proteomics 5 (May 1, 2001): 637-40.

"The Proteome Isn't Genome II." Nature 410, no. 6830 (April 12, 2001): 725.

Schrof Fischer, Joannie. "We've Only Just Begun: Gene Map in Hand, the Hunt for Proteins Is On." U.S. News & World Report 129, no. 1 (July 3, 2000): 47.

Vidal, Marc. "A Biological Atlas of Functional Maps." Cell 104 (February 9, 2001): 333-39.



The use of computers in biology-related sciences. Even though the three terms bioinformatics, computational biology, and bioinformation infrastructure are often used interchangeably, bioinformatics typically refers to database-like activities. Creating and maintaining sets of data that are in a consistent state over essentially indefinite periods of time.


The chemical inside the nucleus of a cell that carries the genetic instructions for making living organisms.


Determining the exact order of the base pairs in a segment of DNA.


The functional and physical unit of heredity passed from parent to offspring. Genes are pieces of DNA, and most genes contain the information for making a specific protein.


The process by which proteins are made from the instructions encoded in DNA.


Determining the relative positions of genes on a chromosome and the distance between them.


All the DNA in an organism or cell, including chromosomes in the nucleus and DNA in mitochondria.


Protein molecules are separated according to their physical properties such as their size, shape, charge, hydrophobicity, and affinity for other molecules. The term high-performance liquid chromatography was coined to describe the separation of molecules under high pressure in a stainless-steel column filled with a matrix.


An international research project to map each human gene and to completely sequence human DNA.


An instrument that separates beams of ions according to their mass-to-charge ratio, and records beam deflection and intensity directly on a photographic plate or film.


A structural component, or building block, of DNA and RNA. A nucleotide is a base (one of four chemicals: ade-nine, thymine, guanine, and cytosine) plus a molecule of sugar and one of phosphoric acid.


The "Central Dogma" of molecular biology states that DNA, a gene, is transcribed into RNA, which is translated into protein. The RNA nucleotides are repeated in various patterns of three, called codons, which indicate a specific amino acid. The order and type of codons in a RNA sequence determine the amino acids and resulting protein.


The study of protein function, regulation, and expression in relation to the normal function of the cell and in disease.


Allows protein biochemists to cut isolated spots out of a gel and sequence their amino acids.


Conventional two-dimensional electrophoresis method of resolving proteins. Proteins are first separated using isoelectric point (electric charge of molecule) and then by molecular weight.


A series of biological molecules that are connected. A two-dimensional sequence of amino acids can form a protein. A two-dimensional sequence of nucleotide bases forms DNA.

About this article

Now that the Human Genome Project is essentially complete, should governmental and private agencies commit themselves to the Human Proteome Project, which would study the output of all human genes

Updated About content Print Article