DNA microarrays are tools used to analyze and measure the activity of genes. Researchers can use microarrays and other methods to measure changes in gene expression and thereby learn how cells respond to a disease or to some other challenge.
Humans have 30,000 to 70,000 genes, each consisting of a sequence of bases, the building blocks of the hereditary material DNA. Before they can carry out their function, genes are copied to make messenger RNA (mRNA), in a process called transcription . This molecule is in turn used as a template for the synthesis of a protein molecule (translation ). This entire process, including transcription of RNA and translation of protein, is referred to as gene expression. Only a subset of the full set of genes is expressed in a given tissue at a given time. In fact, this differential pattern of gene expression is ultimately what distinguishes lung tissue from skin, liver, and muscle tissue.
Even within a given tissue type, different genes are expressed at different times. For example, there is a very tightly controlled sequence of gene expression during the course of embryonic development. Tissues also respond to metabolic and other challenges. The pattern of gene expression changes in the liver in response to the consumption of a large meal. Similarly, muscle gene expression changes in response to vigorous exercise or injury. Drugs can also affect gene expression. Researchers can use microarrays and other methods to measure these changes in gene expression, and from them learn about how cells respond to disease or to other challenges.
Microarrays measure gene expression by taking advantage of the process of hybridization (molecular) . DNA is made up of four bases: guanine, adenine, cytosine, and thymine, which are abbreviated G, A, C, and T, respectively. G and C can bind to one another, forming a base pair, as can A and T, but no other combinations of bases can form base pairs. G and C are said to be " complementary " bases, as are A and T.
The bases on each of the two strands of DNA that make up a chromosome are complementary to the bases on the opposite strand. Long pieces of DNA will not bind to each other (or "hybridize") unless they are complementary. Hybridization allows researchers to test whether two pieces of DNA are complementary. If they bind to one another (hybridize) then they are opposite strands of a single gene. If they do not bind to one another, then they are unrelated.
Hybridization can be used to measure the levels of hundreds of different mRNAs within a given tissue, thereby providing a picture of gene expression within that tissue. RNA is isolated from the tissue of interest and allowed to hybridize to a solid support to which many different DNA pieces, from many different genes, have been attached. Because the RNA is labeled with a fluorescent tag, the amount bound to a given spot can be measured. The fluorescent intensity of each spot is a measure of the level of that mRNA that was expressed in the original tissue. In this way, the levels of expression of up to 12,000 different genes can be measured with a single microarray.
There are two basic types of microarrays. One type is created by a company called Affymetrix. Affymetrix manufactures silicon and glass chips that resemble semiconductor chips and that are manufactured using the same photolithographic techniques. These chips have sets of very short (20 basepair) stretches of DNA representing each gene. A second type of microarray is commonly called a printed array and is made by spotting small amounts of DNA on glass slides. These arrays frequently have smaller numbers of genes on each slide, but researchers can easily modify them for specific experiments.
Microarrays produce enormous amounts of data, and the analysis of that data can be quite complex. The sheer volume of data requires special software and a database in which to store both the measurements and the results of the analyses. The exact form that the analysis takes depends on the nature of the experiment being performed. If just two samples are being directly compared (for example, gene expression in mouse heart tissue is compared with and without the administration of a drug), relatively straightforward statistical tests can be performed. If larger numbers of samples are being measured, the same tests can be performed between two samples at a time, but more sophisticated, "clustering" analyses can be performed as well.
Clustering analysis identifies groups of genes that react the same way across several different samples. For example, researchers might analyze gene expression in heart tissue from a set of mouse embryos that range in age from five to fifteen days. A clustering analysis would be able to detect a group of genes whose expression levels all increase slowly from days five to nine, peak at day ten and then fall to zero by day twelve. Only genes that have this precise pattern of expression would cluster together, in this type of analysis.
The Role of Bioinformatics
One of the tremendous difficulties in performing any kind of expression analysis is the manipulation of very large amounts of biological data, a field of study called bioinformatics. The usefulness of gene expression data depends on how much information is available for each identified gene. In other words, the identities of the genes associated with each spot on a microarray must be accessible as the analysis is done.
Descriptions and classifications of each gene on the array must be readily available, as no researcher can remember such details about the tens of thousands of genes that may be involved in the analysis. An analysis might be done many times, with slight changes in the parameters of the clustering algorithm each time. The genes that cluster together are examined at the end of each analysis, to look for reproducible patterns. This analysis must be done with the full understanding of the biology of the system being studied. Clusters of genes are most informative if they group in a biologically reasonable way. For this reason, microarray expression analysis is frequently exploratory. The results of the analysis are used to suggest additional, corroborative experiments.
Another bioinformatics challenge in gene expression studies is collecting information about the samples under analysis and storing the information in databases. If gene expression patterns of one hundred different tumor samples are being examined, it may be necessary to restrict the analysis to subgroups of the tumors in order to observe patterns in the data. This subgrouping or stratification of the samples is best performed on the basis of independently determined properties of those samples. For example, samples from only metastatic cancer cells could be grouped together for analysis and compared with those from nonmetastatic cancer cells, or the age of the patient at the onset of disease could be used to segregate the samples into different groups. Such subgroup analysis can only be done if complete information is collected and stored for all samples.
Applications of Microarray Analysis
Microarrays are new enough that their applications are still being developed. Microarray expression analysis can be used to help study complex, multigenic diseases such as Parkinson's disease (PD). The great challenge in understanding the genetics of such disorders is identifying susceptibility genes, which are genes that increase a person's risk of developing the disease. Frequently, the first step in discovering a susceptibility gene is linkage analysis . This technique can identify regions of a chromosome that harbor such a gene, but the regions that are identified are frequently very large, containing hundreds of genes. Screening through all of these genes individually is tremendously slow and labor-intensive. Expression analysis using microarrays can help prioritize these genes for further analysis by providing independent lines of evidence that specific genes are involved in the disease process.
Brain tissue can be collected through anatomical donations from patients with Parkinson's disease and from unaffected individuals, for example. Regions of the brain that are especially affected in Parkinson's patients can be compared to the same regions from unaffected individuals. Genes whose levels of expression vary can be identified. Hundreds or thousands of genes may be identified in this way, but they can then be compared to those that are found, through linkage analysis, to be linked to Parkinson's disease. There may be only tens of genes common to both groups. These genes can be prioritized for detailed examination through other methods. The key here is that expression analysis and linkage analysis provide independent evidence of a given gene's involvement in a disease process. It is the synthesis of information from these two independent lines of evidence that makes this approach powerful.
Another very powerful application of microarray expression data is called classification analysis. This technique uses gene expression data to separate tissue samples into two or more groups. For example, one type of tumor may respond very well to an aggressive program of chemotherapy treatment, while another type may respond better to surgical removal followed by radiation therapy. Further, these two types of tumors may be difficult or impossible to tell apart under a microscope. Choosing the correct method of treatment and applying that treatment early in the course of disease could significantly improve a patient's chances of survival.
In such a case, expression analysis can be used to give a detailed picture of the genes that are expressed in the two types of tumor. A training set (a small set of samples in each category) can be used to find specific patterns of gene expression that are characteristic for each type of tumor. New tumors can then be analyzed, and their expression profiles can be used to predict the group to which they belong. These approaches are used with great success to refine the clinical management of cancer patients. A 2001 study by S. Dhanasekaran, "Delineation of Prognostic Biomarkers in Prostate Cancer," offers an example of this kind of work. Additional applications of microarrays are still being developed.
Gene expression analysis can also be done using a powerful technique called serial analysis of gene expression (SAGE). Like microarrays, SAGE starts by isolating RNA from the tissue of interest. This RNA is then processed through a long series of steps resulting in the isolation of a set of very short sequences, called tags, from each transcript in the cell. These tags are converted into corresponding segments of DNA. These pieces of DNA, which are 14 base pairs long, are then linked together into long chains, and their sequence of bases is determined. Tens of thousands of these SAGE tags are sequenced from each tissue that is being studied. The tags corresponding to a given gene from one tissue are counted and compared to those from the same gene in another tissue.
For example, a colon cancer tumor sample might generate 50,000 SAGE tags, thirty-three of which correspond to a specific gene. A second library made from normal colon cells might have fifty thousand tags, eleven of which correspond to the same gene. This would indicate that the gene is expressed at a level that is three times as great in tumor cells than it is in normal cells.
SAGE data is significantly more difficult and expensive to produce than microarray data, but it offers the advantage of providing very precise and quantitative measurements of expression levels. SAGE has the further advantage that it can detect genes that have not been previously characterized. Such unknown genes cannot be detected by microarrays, because researchers must first know their sequence before they can place them on the array. SAGE therefore can be used as a gene discovery tool.
SAGE has been used most extensively in cancer research. Investigators in the Cancer Genome and Anatomy Project have created more than one hundred SAGE libraries from normal and cancerous tissue. Analysis of these libraries has revealed a great deal about the way that gene expression changes in cancerous tissue, which in turn has provided insight into new diagnostic and treatment options.
SAGE has also been used as a tool to help calculate the total number of genes in the human body, as well as to describe the ways in which genes are regulated and processed at different times. Microarrays and SAGE analysis are only two of the many ways that scientists have examined gene expression. As these techniques become more refined, and as new techniques are developed, they will provide a powerful tool to investigate how the incredible diversity and complexity of our tissues can arise, even though every cell in our bodies contains exactly the same set of genes.
see also Bioinformatics; Cancer; Complex Traits; Gene Discovery; In Situ Hybridization; Linkage and Recombination; Mapping.
Michael A. Hauser
Bloom, Mark V., Greg A. Freyer, and David A. Micklos. Laboratory DNA Science: An Introduction to Recombinant DNA Techniques and Methods of Genome Analysis. Menlo Park, CA: Addison-Wesley, 1996.
Dhanasekaran, S. M., et al. "Delineation of Prognostic Biomarkers in Prostate Cancer." Nature 412 (2001): 822-826.
Cancer Genome Anatomy Project. National Cancer Institute. <http://cgap.nci.nih.gov>.
DNA Chips and Microarrays
DNA chips and microarrays
A DNA (deoxyribonucleic acid ) chip is a solid support (typically glass or nylon) onto which are fixed single strands of DNA sequences. The sequences are made synthetically and are arranged in a pattern that is referred to as an array. DNA chips are a means by which a large amount of DNA can be screened for the presence of target regions. Furthermore, samples can be compared to compare the effects of a treatment, environmental condition, or other factor on the activity. One example of the use of a DNA microarray is the screening for the development of a mutation in a gene . The original gene would be capable of binding to the synthetic DNA target, whereas the mutated gene does not bind. Such an experiment has been exploited in the search for genetic determinants of antibiotic resistance , and in the manufacture of compounds to which the resistant microorganisms will be susceptible.
A gene chip is wafer-like in appearance, and resembles a microtransistor chips. However, instead of transistors, a DNA chip contains an orderly and densely packed array of DNA species. Arrays are made by spotting DNA samples over the surface of the chip in a patterned manner. The spots can be applied by hand or with robotic automation. The latter can produce very small spots, which collectively is termed a microarray.
Each spot in an array is, in reality, a single-stranded piece of DNA. Depending upon the sequence of the tethered piece of DNA, a complimentary region of sample DNA can specifically bind. The design of the array is dependent on the nature of the experiment.
The synthetic DNA is constructed so that known sequences are presented to whatever sample is subsequently applied to the chip. DNA, or ribonucleic acid (typically messenger RNA ) from the samples being examined are treated to as to cut the double helix of DNA into its two single strand components, following be enzymatic treatment that cuts the DNA into smaller pieces. The pieces are labeled with fluorescent dyes . For example, the DNA from one sample of bacteria could be tagged with a green fluorescent dye (dye that will fluoresce green under illumination with a certain wavelength of light) and the DNA from a second sample of bacteria could be tagged with a red fluorescent dye (which will fluoresce red under illumination with the same wavelength of light). Both sets of DNA are flooded over the chip. Where the sample DNA finds a complimentary piece of synthetic DNA, binding will occur. Finally the nature of the bound sample DNA is ascertained by illuminating the chip and observing for the presence and the pattern of green and red regions (usually dots).
A microarray can also be used to determine the level of expression of a gene. For example, an array can be constructed such that the messenger RNA of a particular gene will bind to the target. Thus, the bound RNAs represent genes that were being actively transcribed, or at least recently. By monitoring genetic expression, the response of microorganisms to a treatment or condition can be examined. As an example, DNA from a bacterial species growing in suspension can be compared with the same species growing as surface-adherent biofilm in order to probe the genetic nature of the alterations that occur in the bacteria upon association with a surface. Since the method detects DNA, the survey can be all-encompassing, assaying for genetic changes to protein, carbohydrate, lipid, and other constituents in the same experiment.
The power of DNA chip technology has been recently illustrated in the Human Genome Project. This effort began in 1990, with the goal of sequencing the complete human genome. The projected time for the project's completion was 40 years. Yet, by 2001, the sequencing was essentially complete. The reason for the project's rapid completion is the development of the gene chip.
Vast amounts of information are obtained from a single experiment. Up to 260,000 genes can be probed on a single chip. The analysis of this information has spawned a new science called bioinformatics , where biology and computing mesh.
Gene chips are having a profound impact on research. Pharmaceutical companies are able to screen for gene-based drugs much faster than before. In the future, DNA chip technology will extend to the office of the family physician. For example, a patient with a sore throat could be tested with a single-use, disposable, inexpensive gene chip in order to identify the source of the infection and its antibiotic susceptibility profile. Therapy could commence sooner and would be precisely targeted to the causative infectious agent.
See also DNA (Deoxyribonucleic acid); DNA chips and micro arrays; DNA hybridization; Genetic identification of microorganisms; Laboratory techniques in immunology; Laboratory techniques in microbiology; Molecular biology and molecular genetics