DNA libraries, like conventional libraries, are used to collect and store information. In DNA libraries, the information is stored as a set of DNA molecules, each of which contains biological sequences that can be used for a variety of applications. All DNA libraries are collections of DNA fragments that represent a particular biological system of interest. By analyzing the DNA from a particular organism or tissue, researchers can answer a variety of important questions. The two most common uses for these DNA collections are DNA sequencing and gene cloning.
The Importance of Vectors
Several types of DNA libraries have been developed for specific purposes, but all share some common features. The DNA fragments that make up the library are attached to other DNA sequences that are used as "handles" to maintain the fragments. These "handles," called vectors, allow the DNA to be replicated and stored, typically within model organisms such as yeast or bacteria.
Different types of vectors can be used to store DNA fragments of different lengths. For example, plasmid vectors can store small fragments (from a few hundred bases up to ten or twenty thousand bases of sequence), while viral vectors, or viral-plasmid hybrids such as cosmids, can store up to fifty thousand bases, and yeast artificial chromosome (YAC) vectors can store hundreds of thousands of bases. In general, plasmid-based vectors are the easiest to manipulate but store the smallest fragments. They are commonly used for applications that involve complex manipulations, such as cloning or gene expression, but that require only small DNA fragments (e.g., cDNA libraries, as described below).
Artificial Chromosome Vectors and Genomic Libraries
Yeast artificial chromosome vectors act like real chromosomes in yeast and can store much longer DNA fragments, some over 150 kilobases in size, big enough for several genes along with their regulatory sequences. However, YAC vectors are difficult to manipulate, are prone to spontaneous rearrangement, and have been supplanted by bacterial artificial chromosome (BAC) vectors.
BAC vectors are derived from the F plasmid of Escherichia coli . This plasmid behaves like a chromosome and not like a typical plasmid. BACs can store very large DNA fragments—in excess of three hundred kilobases in some cases, although typical fragments are about half that size. The unique features of BAC vectors are very well suited to creating and maintaining DNA libraries. For example, once a BAC vector enters a cell, it will exclude all other BAC vectors, which means that a given E. coli clone will contain only one unique library fragment. Furthermore, E. coli cells are relatively easy to grow and store, and DNA purification from the bacterium is straightforward. BAC libraries played a key role in the massive sequencing efforts that made up the Human Genome Project.
Many of the large-format DNA libraries (YACs, BACs) are used exclusively to store genomic DNA for sequencing projects. Larger fragments permit easier assembly of finished DNA sequence and require the maintenance of fewer clones, which is particularly important when sequencing large genomes. DNA sequencing, however, is only one application of DNA libraries.
DNA libraries are created by generating a set of DNA fragments of the desired size and then attaching those fragments to the appropriate vector sequence. For genomic DNA, the fragments are normally generated by either enzymatic digestion or simple mechanical shearing of all the DNA of the genome, including noncoding sequences. Fragments are then enzymatically attached to the vector sequences, in a reaction known as ligation. The collected fragments, now attached to vector sequences, are then moved into the appropriate host organism for growth and evaluation. Conditions are chosen so that only one fragment enters each organism, which can then be grown up into a colony whose individuals all carry the same fragment.
Complementary DNA Libraries
Genomic DNA is not always the source of the fragments in a DNA library. A second major class of libraries uses cDNA which is generated by copying the messenger RNA from an organism or tissue of interest. Because it reflects the mRNA content of a biological system (or cell type) at a particular time and under particular conditions, a cDNA library can be considered a "snapshot" of gene expression in that system. This information can be of great value in understanding when and how certain genes are expressed in an organism or cell type. Additionally, cDNA, unlike genomic DNA, lacks introns and other noncoding segments of sequence and is relatively straightforward to clone and express. This greatly facilitates the analysis of gene products (proteins) in eukaryotes .
Creating a cDNA library is similar to creating a genomic DNA library, except that the starting material for cDNA libraries is mRNA, not DNA. The enzyme reverse transcriptase is used to copy the mRNA to DNA. The DNA fragments are then cloned into vectors (typically plasmids) by ligation and moved into a host organism, as with genomic libraries.
Often, cDNA libraries are constructed using plasmid vectors with sequences that allow the cloned cDNA fragments to be expressed as proteins. Such "expression libraries" can be searched with protein-finding tools such as antibodies, and then the gene coding for the protein can be isolated. cDNA libraries are also used for expressed sequence tag (EST) analysis, in which small portions of many cDNAs are sequenced to provide an overview of gene expression in a particular sample.
DNA libraries play important roles in modern molecular biology research. The many genome-sequencing projects that are revolutionizing our understanding of genetics are entirely dependent on genomic DNA library techniques. cDNA libraries are invaluable in the study of gene expression and protein function, and for EST analysis. Continued progress in the development of library techniques and a continued interest in their applications suggest that these tools will remain an important part of the field for years to come.
see also Chromosomes, Artificial; Cloning Genes; Model Organisms; Plasmid; Polymerase Chain Reaction; Restriction Enzymes; Reverse Transcriptase.
Daniel J. Tomso
Bloom, Mark V., Greg A. Freyer, and David A. Micklos. Laboratory DNA Science: An Introduction to Recombinant DNA Techniques and Methods of Genome Analysis. Menlo Park, CA: Addison-Wesley, 1996.