Gene discovery is the process of identifying genes that contribute to the development of a trait or phenotype . Researchers often try to discover the genes that are involved in specific diseases. They also try to find the genes that contribute to many other traits.
Gene discovery begins with clearly defining a trait of interest and determining if that trait has a genetic and/or environmental basis. This is done using several approaches, such as sibling recurrence risk ratio, familial aggregation, and twin and adoption studies. The sibling recurrence risk ratio is the frequency of a disease among the relatives of an affected person, divided by the frequency of the disease in the general population. The greater the ratio, the stronger the genetic component of the disease.
A trait also is suspected of having a strong genetic component when familial aggregation, which is the clustering of patients in a single family, occurs. Familial aggregation can sometimes be misleading, however. Since families often share the same environment, it is difficult to know whether environmental or genetic factors are the cause of clustering.
In twin studies, concordance rates play a critical role. Concordance is the percentage of second twins that exhibit a trait when the trait occurs in the first twin. Twins generally also share environments, so concordance rates are often compared between monozygotic twins, who are genetically identical, and dizygotic or "fraternal" twins, who share on average 50 percent of their genetic material. The greater the difference in concordance rates between monozygotic and fraternal twins, the stronger the genetic contribution to the trait. The combination of evidence from all these approaches indicates whether a particular phenotype is likely to have a genetic basis.
Approaches for Identifying Genes
Once genetic influence has been established, two research approaches are commonly used to identify the specific genes involved. These are the candidate gene approach and the genomic screening approach.
Candidate Gene Approach.
In the candidate gene approach, genes are selected based on their known or predicted biological function and on their hypothesized relation to the disease or trait. These genes are subject to mutation analysis to determine whether they are really involved. The problem with the candidate gene approach is that it relies on assumptions about the molecular mechanisms underlying the development of a trait. However, diseases are usually studied because little is known about their causes, so initial ideas about these "molecular mechanisms" often prove to be wrong.
In addition, the candidate gene approach can be very time consuming, and it has been successful only infrequently. Chromosomal abnormalities, such as deletions, inversions, or translocations , in individuals exhibiting a trait, as well as animal models mimicking the trait, are especially important for a candidate gene approach, since they provide clues to the genetic basis of the trait.
Genomic Screen Approach.
A genomic screen is a systematic survey in which polymorphic DNA markers, evenly spaced along all the chromosomes, are used to determine if a marker is inherited along with the trait, indicating genetic linkage. This is performed taking the DNA from each individual in the study and identifying the type of marker each has on his chromosomes. These data are then analyzed using statistical programs to see if the marker and the trait that is being studied travel together through families significantly more often than would be expected just by chance. If a DNA marker is found to be linked with a trait, it suggests that the marker and the gene responsible for the trait are rarely separated by crossing over and are therefore near each other on a region of a chromosome. Further fine mapping of this region with more closely spaced markers can narrow the region where the gene of interest lies.
Genomic screening does not require prior biological understanding of the pathophysiology of a disease. It requires large sets of data from families containing multiple members who are affected with the trait, and it tends to be the more expensive of the two approaches.
Genomic screening usually leads to the identification of one or several loci , or relatively large areas in the genome , that are linked with a trait but that contain many different genes. The genes in such regions need to be prioritized.
Genes are considered to be good candidates when their putative functions fit with the known or predicted pathway of the disease. If any known gene in the linkage region appears to be a good candidate gene, it is subjected to mutation analysis to determine if there is a potentially disease-causing mutation that segregates only with the affected individuals.
Both the gene candidate and the genetic screen approach require collecting data from a large number of families. Recruiting and medically evaluating affected and unaffected individuals for participation in a genetic study, and collecting their DNA samples, is a long and complicated phase of gene discovery.
Once one or more loci have been identified through a genomic screen as possibly containing a gene of interest, additional techniques are needed to locate the exact gene responsible. Positional cloning is the process of identifying a disease gene based on its location on a chromosome. Genetic screening is the inital step of positional cloning. The usual steps are: (1) linkage (locates a chromosome area); (2) fine mapping (narrows down the initial genomic area to a smaller region); (3) candidate gene analysis (looks for mutations in genes lying in that small area). By using positional cloning, researchers can identify or "clone" a gene knowing only its location on a chromosome, as determined through linkage analysis. Once the location is identified, a physical representation of the genes and DNA in the linked region is constructed.
Before the Human Genome Project was completed, such a physical representation was constructed by using a "contig," a group of overlapping DNA fragments that together cover the linked region. The contig is a scaffold or platform on which to place genes and other sequences in the correct position. The geneticist continues to collect families looking for new recombinations that reduce the piece of DNA that all the affected family members share but that has not been inherited by any of the unaffected individuals.
Positional cloning is a laborious process if the region is large and if the genes and polymorphisms making up the contig are not known. This portion of the process has been greatly helped by the Human Genome Project, as it provides the contig all filled out and correctly mapped.
In most cases, the smallest inherited piece of DNA is still quite large molecularly, and can contain many genes. Only one gene causes the disease, though, so each gene must be tested for mutations that segregate with the trait. This can be very time consuming as well. If no mutations are found, the process is repeated with the next gene in the smallest shared region of DNA.
The third gene identified through positional cloning was the cystic fibrosis transmembrane conductance regulator gene. The gene, identified in 1989, regulates chloride ion transport across the plasma membrane and consists of twenty-seven exons spread over a 230-kilobase-pair region on chromosome seven. This gene was difficult to clone because of the lack of chromosome abnormalities that would have helped locate it. No human genome sequence was available at that time. Numerous technical problems also arose in constructing a physical map of the linkage region, creating the need to screen numerous libraries to obtain a full-length clone of the gene. More than 550 mutations were identified by late 2001.
The sequencing of the human genome and the advancement of bioinformatics and molecular tools to identify genes has made the process of gene discovery much easier. Most of the human genome sequence is now available on the Internet, with known and predicted genes annotated. This has reduced the need to laboriously build contigs.
Many of the common single-gene diseases have already been associated with a gene, and research efforts are now shifting to the more difficult task of finding genes that give susceptibility to developing complex diseases. (Such diseases show familial aggregation but do not follow any clear Mendelian inheritance pattern.)
It is thought that a combination of several genetic predisposition factors interact with environmental factors to trigger complex diseases, and that it is not a single gene but multiple genes that contribute to such traits, and the identification of each of these genes is correspondingly more difficult.
Complex traits can constitute various other challenges for researchers. Genetic heterogeneity is where alleles at more than one locus trigger the same phenotype, or mutations in the same gene cause different phenotypes. Reduced penetrance is where a predisposing genotype does not necessarily cause the phenotype to manifest itself. Phenocopy is where a trait looks identical but has a different cause than the one being studied.
To address these challenges, scientists use association studies, which are based on the principle that if a particular allele and trait occur simultaneously at a statistically significant frequency, the allele is likely to be involved in the development of the trait. (Linkage studies, by contrast, are based on finding DNA markers and traits that are linked within families.)
Alzheimer's disease is one example of a complex trait. Three genes have been found to contribute to the rare, early forms of the disease. Genetic screens have found that a fourth locus is linked to the common, late-onset form of the disease. Association studies have revealed that one allele of this fourth locus increases a patient's risk of developing Alzheimer's in a dose-dependent fashion, where the risk posed by having two alleles is greater than the risk posed by having just one.
see also Alzheimer's Disease; Bioinformatics; Cloning Genes; Complex Traits; Cystic Fibrosis; DNA Libraries; Human Genome Project; Internet; Linkage and Recombination; Mendelian Genetics; Twins.
Sofia A. Oliveria
and Jeffery M. Vance
Lewis, Ricki. Human Genetics: Concepts and Applications, 5th ed. Boston: McGraw-Hill,2002.
Peltonen, Leena, and Victor A. McKusick. "Dissecting Human Disease in the Postgenomic Era." Science 291 (2001): 1224-1229.