Proteomics is the science of studying the multitude of proteomes found in living organisms. A proteome is the entire collection of proteins expressed by a genome or in a tissue. The contents of a proteome can differ in various tissue types, and it can change as a result of aging, disease, drug treatment, or environmental effects.
This is contrary to the concept of a genome, which is an organism's complete collection of DNA. A genome's composition remains more or less constant from tissue to tissue, except for mutations and polymorphisms that can occur.
The word "proteome" was first coined in late 1994. By 1997 there were a number of research conferences focusing on proteomics.
According to the first draft of the human genome, based on the work by the Human Genome Project and by Celera Inc., there are only between thirty thousand and seventy thousand genes in the human genome, many fewer than had been estimated previously. However, as of 2002 there were still groups that believed that there are at least 120,000 genes. Regardless of which of these estimates proves more accurate, the number of potential proteins in the human proteome is quite large. Although the first draft of the human genome reduced the estimates for the total number of human genes, it also predicted a greater amount of alternative splicing of genes, and therefore more distinct protein products per gene, than had been anticipated.
At its simplest level, proteomics is the study of protein expression in a proteome, or trying to understand the relative levels (amounts) of each protein within the mixture. Proteomics attempts to characterize proteins, compare variations in their expression levels in normal and disease states, study their interactions with other proteins, and identify their functional roles.
Unlike the traditional approach of studying individual proteins one at a time, proteomics uses an automated, high-throughput approach. High-throughput refers to the number of items (in this case, proteins) that can be analyzed or studied per unit of time. New technologies and substantial bioinformatics tools are required to compare entire proteomes. Expansion of the field of proteomics into the realm of "big science" (meaning many dollars invested by a large number of companies and universities) is several years behind the expansion of genomics. This is primarily because proteins are more difficult to work with in a laboratory setting than are nucleic acids such as DNA.
The development of protein analysis technologies is more difficult than the development of DNA analysis technologies for three reasons. First, the basic alphabet for encoding proteins consists of twenty amino acids, whereas there are only four different nucleotides, the alphabet of DNA. Second, the messenger RNA (mRNA) for some genes can be differentially spliced, meaning that multiple messages can be made from a single gene, resulting in multiple, distinct protein products. Finally, many proteins are modified once they have been synthesized. This is known as post-translational modification. There are a number of types of post-translational modifications, such as the addition of sugar, phosphate, sulfate, lipid, acetyl, or methyl groups. Each of these modifications has the ability to change the functional activity of a protein.
The above issues have made the elucidation of reliable, high-throughput techniques for characterizing proteins, including their expression levels, on a proteome-wide level a major challenge. Hence, techniques for doing, for example, high-throughput DNA sequencing and gene expression studies have been developed and commercialized on a large scale sooner than similar protein analysis techniques. This is not to imply that all of the techniques involved in proteomics are new. Some, such as two-dimensional gel electrophoresis , have been around since the 1970s. However, the need to adapt these techniques to a large "proteome" scale brings with it a unique set of challenges.
For researchers involved in areas such as drug discovery, proteomics approaches will need to be used to obtain a greater understanding of disease mechanisms and drugs' mechanisms of action. Large-scale studies looking at gene expression via quantification of mRNA abundance are already possible and well commercialized. These technologies are very powerful, and the highest throughput approaches are capable of analyzing tens of thousands of genes per experiment. Sophisticated bioinformatics systems have been, and continue to be, developed to analyze these vast amounts of data. However, studies have shown that mRNA levels do not necessarily correlate well with protein levels.
Researchers must understand proteins and their roles, since proteins are the functional units within cells. As of 2002, the vast majority of drug targets were proteins. There are a handful of drugs, including some chemotherapeutic agents, that bind to DNA, but most drugs bind to specific protein targets. In the cases where the target is a protein, the drugs themselves are primarily small inorganic molecules or, in some cases, small proteins, such as hormones , that bind to a larger protein target in the body. Some drugs are actually therapeutic proteins that are delivered to the site of the disease.
The primary attributes used to identify proteins include the protein's mass and apparent mass, its isoelectric point, and its N-and C-terminal sequence tags. A protein's mass and its apparent mass are probably the most common characteristics used. Protein mass is determined by adding the total mass of all the amino acids in the protein to the mass of any molecules added through post-translational modification. A protein's isoelectric point is the pH at which it is neutrally charged. A protein's N-and C-terminal sequence tags are short sequences of amino acids on either end of the protein. Since there are twenty different possible amino acids at each position in a protein, a peptide of only four or five amino acids in length is likely to be unique to a specific protein. There are 160,000 (204) combinations of sequences that are four amino acids long.
The most commonly used laboratory techniques in proteomics are two-dimensional polyacrylamide gel electrophoresis (2-D PAGE) and mass spec-trometry. These techniques have been modified for use in proteomics. Both can be used in combination with more traditional protein separation techniques, including column chromatography.
Starting in the late 1990s, several companies also started developing "protein chips," another strategy for studying proteomes and other complex protein mixtures. These chips allow a researcher to collect minute quantities of proteins that bind to specific molecules on their surface. By 2001, some companies announced they were developing "antibody chips" onto which antibodies will be attached. The antibodies can then be used as probes to capture and quantify specific proteins found in complex mixtures.
The use of 2-D PAGE allows the simultaneous separation of thousands of proteins, and the technique is still a key tool in proteomics technologies. The first dimension of protein separation on the gel is by isoelectric focusing, in which proteins are separated along a pH gradient until they reach a stationary position, where their net charge is zero.
The second dimension of separation on the gel is by molecular mass. Sodium dodecyl sulphate (SDS) is applied, and it binds to all the proteins. This provides the proteins with a uniform charge along their length, so that they will migrate across the gel according to their molecular mass when a current is applied. After the 2-D PAGE is run, the gel is stained. The result is a two-dimensional map consisting of hundreds or thousands of protein spots.
Since the early use of 2-D PAGE in the early 1970s, a number of modifications have been made to make gels more reproducible and more amenable to the higher-throughput use necessary for proteomics applications. However, 2-D PAGE is still something of an art form, and high-quality, reproducible results are difficult to obtain except in the hands of very experienced users. The technology needs to be further simplified to allow casual and novice users to obtain reproducible, quality results.
Mass spectrometry is an analytical technique that very accurately measures the mass of proteins and peptides . There are two common types of mass spectrometry. The first type, matrix-assisted laser desorption/ionization time-of-flight mass spectrometry, can be used to analyze proteins that are embedded in solid samples and measures their mass in a flight tube. The second type, electrospray ionization mass spectrometry, can be used to analyze proteins that are in a liquid solution and measures their mass in either a flight tube or in a device known as a quadrupole. There are also other variations on these techniques.
Mass spectrometry is commonly used for peptide mass fingerprinting. In this process, a protein sample is isolated by 2-D PAGE and cut with an enzyme that specifically targets particular amino acids. Mass spectrometry is used to measure the masses of the resulting cut pieces, or peptides. These masses can be thought of as a fingerprint that can be compared to the fingerprints of proteins whose amino acid sequences have already been analyzed and stored in a database.
To determine the fingerprints of proteins that have already been sequenced, a computer program determines the amino acid composition, and thus the masses, of the pieces that would result if those proteins were also cut by the same enzyme. A list of proteins is generated from the database, sorted by how many peptides they share with the unknown experimental protein.
There are also technologies, including the yeast two-hybrid system, that can be used to study interactions between proteins. These approaches complement 2-D PAGE and mass spectrometry data by helping to elucidate functional cellular pathways.
Databases and Computational Approaches
There is an ever-increasing number of protein and proteome databases being developed. The most comprehensive information about specific proteins is found in databases that store protein sequences. One of the first and probably the best known such database is SWISS-PROT, which was created in 1986.
SWISS-PROT is a curated database that provides not only protein sequences but also such information as descriptions of a protein's function, its domain structure, and post-translational modifications, as well as links to other related databases. Other sequence-based protein databases include the Yeast Proteome Database and Human PSD.
There are also a number of widely used pattern and profile databases that are used to reveal relationships among proteins based on the presence of particular groups of amino acids in the proteins' sequences. Such groups, known as patterns, motifs, domains, signatures, or fingerprints, are found in specific regions of proteins that are important to some function of the protein. They could be in an area that performs some type of enzymatic activity or that is the site of a certain post-translational modification. Both their sequence and structure are typically well conserved. Some of the best known pattern and profile databases are: PROSITE, Pfam, PRINTS, and BLOCKS.
see also Alternative Splicing; Bioinformatics; Gel Electrophoresis; Genome; Human Genome Project; Mass Spectrometry; Post-translational Control; Proteins.
Anthony J. Recupero
Wilkins, Marc R., et al. eds. Proteome Research: New Frontiers in Functional Genomics. New York: Springer-Verlag, 1997.
"Proteomics." Genetics. . Encyclopedia.com. (November 20, 2017). http://www.encyclopedia.com/medicine/medical-magazines/proteomics
"Proteomics." Genetics. . Retrieved November 20, 2017 from Encyclopedia.com: http://www.encyclopedia.com/medicine/medical-magazines/proteomics
Proteomics is a discipline of microbiology and molecular biology that has arisen from the gene sequencing efforts that culminated in the sequencing of the human genome in the last years of the twentieth century. In addition to the human genome, sequences of disease-causing bacteria are being deduced. Although fundamental, knowledge of the sequence of nucleotides that comprise deoxyribonucleic acid reveals only a portion of the protein structure encoded by the DNA . Because proteins are an essential element of bacterial structure and function (e.g., role in causing infection), the knowledge of the three-dimensional structure and associations of proteins is vital. Proteomics is an approach to unravel the structure and function of proteins.
The word proteomics is derived from PROTEin complement to a genOME. Essentially, this is the spectrum of proteins that are produced from the template of an organism's genetic material under a given set of conditions. Proteomics compares the protein profiles of proteomes under different conditions in order to unravel biological processes.
The origin of proteomics dates back to the identification of the double-stranded structure of DNA by Watson and Crick in 1953. More recently, the development of the techniques of protein sequencing and gel electrophoresis in the 1960s and 1970s provided the technical means to probe protein structure. In 1986, the first protein sequence database was created (SWISS-PROT, located at the University of Geneva). By the mid-1990s, the concept of the proteome and the discipline of proteomics were well established. The power of proteomics was manifest in March 2000, when the complete proteome of a whole organism was published, that of the bacterium Mycoplasma genitalium
Proteomics research often involves the comparison of the proteins produced by a bacterium (example, Escherichia coli ) grown at different temperatures, or in the presence of different food sources, or a population grown in the lab versus a population recovered from an infection. Escherichia coli responds to changing environments by altering the proteins it produces. However, the full extent of the various alterations and their molecular bases are largely unknown. Proteomics research essentially attempts to provide a molecular explanation for bacterial behavior.
Proteomics can be widely applied to research of diverse microbes. For example, the yeast Saccharomyces cerevisiae is being studied to reveal the proteins produced and their functional associations with one another.
The task of sorting out all the proteins that can be produced by a bacterium or yeast cell is formidable. Targeting of the research effort is essential. For example, the comparison of the protein profile of a bacterium obtained directly from an infection (in vivo ) with populations of the same microbe grown under defined conditions in the lab (in vitro ) could identify proteins that are unique to the infection. Some of these could become targets for diagnosis, therapy, or for prevention of the infection.
The study of proteins is difficult. The amount of protein cannot be amplified as easily as can the amount of DNA, making the detection of minute amounts of protein challenging. The structure of proteins must be maintained, which can be difficult. For example, enzymes , heat, light, or the energy of mixing can break down some proteins.
With the advent of the so-called DNA chips , the expression of thousands of genes can be monitored simultaneously. But DNA is static. It exists and is either expressed or not. Moreover, the expression of a protein does not necessarily mean that the protein is active. Also, proteins can be modified after being produced. Proteins can adopt different shapes, which can determine different functions and levels of activity after they have been produced. These functions provide the structural and operational framework for the life of the bacterium. Proteomics represents the next step after gene expression analysis
Proteomics utilizes various techniques to probe protein expression and structure. The migration of proteins can depend on their net charge and on the size of the protein molecule. When these migrations are in two dimensions, as in 2-D polyacrylamide gel electrophoresis, thousands of proteins can be distinguished in a single experiment. A technique called mass spectrometry analyzes a trait of proteins known as the mass-to-charge ratio, which essentially enables the sequence of amino acids comprising the protein to be determined. Techniques exist that detect modifications after protein manufacture, such as the addition of phosphate groups. Analogous to DNA chips, so-called protein microarrays have been developed. In these, a solid support holds various molecules (antibodies and receptors, as two examples) that will specifically bind protein. The binding pattern of proteins to the support can help determine what proteins are being made and when they are synthesized.
Proteomics typically operates in tandem with bioinformatics , which is an integration of mathematical, statistical, and computational methods to unravel biological data. The vast amount of protein information emerging from a single experiment would be impossible to analyze by manual computation or analysis. Accordingly, comparison of the data with other databases and the use of computer modeling programs, such as those that calculate three-dimensional structures, are invaluable in proteomics.
The knowledge of protein expression and structure, and the potential changes in structure and function under different conditions, could allow the tailoring of treatment strategies. For example, in the lungs of those afflicted with cystic fibrosis, the bacterium Pseudomonas aeuruginosa forms adherent populations on the surface of the lung tissue. These populations, which are enclosed in a glycocalyx that the bacteri produce, are very resistant to treatments and directly and indirectly damage the lung tissue to a lethal extent. Presently, it is known that the bacteria change their genetic expression as they become more firmly associated with the surface. Through proteomics, more details of the proteins involved in the initial approach to the surface and the subsequent, irreversible surface adhesion could be revealed. Once the targets are known, it is conceivable that they can be blocked. Thus, biofilms would not form and the bacteria could be more expeditiously eliminated from the lungs.
See also Biotechnology; Molecular biology and molecular genetics
"Proteomics." World of Microbiology and Immunology. . Encyclopedia.com. (November 20, 2017). http://www.encyclopedia.com/science/encyclopedias-almanacs-transcripts-and-maps/proteomics
"Proteomics." World of Microbiology and Immunology. . Retrieved November 20, 2017 from Encyclopedia.com: http://www.encyclopedia.com/science/encyclopedias-almanacs-transcripts-and-maps/proteomics
"proteomics." A Dictionary of Biology. . Encyclopedia.com. (November 20, 2017). http://www.encyclopedia.com/science/dictionaries-thesauruses-pictures-and-press-releases/proteomics
"proteomics." A Dictionary of Biology. . Retrieved November 20, 2017 from Encyclopedia.com: http://www.encyclopedia.com/science/dictionaries-thesauruses-pictures-and-press-releases/proteomics