A genome is the complete collection of hereditary information for an individual organism. In cellular life forms, the hereditary information exists as DNA. There are two fundamentally distinct types of cells in the living world, prokaryotic and eukaryotic, and the organization of genomes differs in these two types of cells.
Prokaryotes comprise the bacteria and archaea. The latter were originally designated "extremophiles" because they favor such extreme environments as high acidity, salinity , or temperature. Prokaryotic cells tend to be very small, have few or no cytoplasmic organelles, and have the cellular DNA arranged in a "nucleoid region" that is not separated from the remainder of the cell by any membrane. Eukaryotes exist as unicellular or multicellular organisms. Among the unicellular eukaryotes are the protozoa, some types of algae, and a few forms of fungi, while the multicellular organisms include animals, plants, and most fungi.
Eukaryotic cells are larger than prokaryotic cells, have a complex array of cytoplasmic structures, and have a prominent nucleus that communicates with components in the cytoplasm through an elaborate nuclear envelope. The hereditary information occurs principally in the nucleus of eukaryotic cells; in addition, minuscule (but essential) amounts of hereditary information occur in some cytoplasmic organelles (specifically, in chloroplasts for plants and algae, and in mitochondria for all eukaryotic groups).
Eukaryotic cells pass through a "cycle," progressing from a newly formed cell to a cell that is dividing to produce the next generation of progeny cells. Prior to division, the cell is in an "interphase"; during division, the cell is in a "division phase." During interphase, the nuclear DNA is organized in a dispersed network of chromatin , which is a complex consisting of nucleic acid and basic proteins. Immediately prior to and during division, the chromatin condenses to a series of discrete, compact structures called chromosomes. Thus, the physical organization of the genome varies from inter-phase to division phase. Finally, viruses (which are noncellular, parasitic "life forms") have genomes of double-stranded DNA, single-stranded DNA, double-stranded RNA, or single-stranded RNA.
In sexually reproducing eukaryotes, progeny organisms receive a portion of their genetic information from each parent, receiving half the information from each. These parental contributions are designated haploid complements. The haploid complement can be represented as a "C value," which expresses the haploid complement as an amount of DNA measured in base pairs . Alternatively, the haploid complement can be expressed as the number of chromosomes contributed by each parent: This number of chromosomes is characteristic of each species. Finally, the haploid complement can be expressed as the number of genes on the haploid set of chromosomes.
Each species has a characteristic number of chromosomes. For species with genetically determined sexes, the haploid set is composed of autosomes plus a sex chromosome. Homo sapiens, for example, have 22 autosomes plus an X chromosome or Y chromosome. The haploid DNA content of chimpanzees is nearly identical, but is organized into 23 autosomes plus a sex chromosome.
The record for minimum number of chromosomes belongs to a sub-species of the ant, Myrmecia pilosula. The females have a single pair of chromosomes, while males have only a single chromosome. Like some other members of the insect class, these ants reproduce by a process called haplodiploidy, in which diploid fertilized eggs develop into females, while haploid unfertilized eggs develop into males.
The record for maximum number of chromosomes is found in the plant kingdom, due to a condition known as polyploidy. In polyploidy, many extra sets of chromosomes beyond the normal diploid number may accumulate over time. Cultivars of wheat exist with diploid numbers of chromosomes equaling 14, 28, or 42 (multiples of the haploid number, which is 7). Polyploids exist for many cultivated plants, including potatoes, strawberries, and cotton, as well as in wild plants such as dandelions. Polyploidy has led to striking numbers, and the known record is held by the fern Ophioglossum reticulatum, which has approximately 630 pairs.
Genome Size or C Value
The C value is the amount of DNA in a haploid complement. Currently, the amount is reported as the total number of base pairs. Generally, more complex organisms have more DNA. For example, the haploid complement of Homo sapiens DNA contains between 3.12 and 3.2 gigabases (the prefix "giga" denotes billions), while the haploid complement of yeast (Saccharomyces cerevisiae ) DNA contains 12,057,500 base pairs.
Unexpected genomic sizes occur, however, in a condition called the C value paradox. Two closely related species can have widely divergent amounts of DNA. For example, Paramecium caudatum has a C value of 8,600,000 kilobases (where the prefix "kilo" denotes thousands) while its near relative P. aurelia has a C value of just 190,000 kilobases. Another paradoxical circumstance occurs when a simpler organism has a C value higher than a more complex organism. For example, Amphiuma means (a newt) and Amoeba dubia (an amoeba) have, respectively, C values that are 26 and 209 times the C value of humans.
Number of Nuclear Genes, "Gene Density," and Intergenic Sequences
An important trend in genome evolution has been the accumulation, both within the genes (intragenic) and between genes (intergenic), of DNA that does not code for any gene products. Homo sapiens have between 31,000 and 70,000 genes; mice have 24,780; Caenorhabditis elegans (a roundworm) has more than 19,099; fruit flies have 13,601; and yeast approximately 6,000. A ratio of gene number to C value indicates that lower organisms have both smaller genes and lower numbers of nongene base pairs between adjacent genes. Higher eukaryotes have a larger number of intragenic inserts (introns), greater intergenic distances, and more abundant repeated sequences.
In higher eukaryotes, only a small portion of the genome is organized into genes. For example, in humans less than 2 percent of the genome specifies protein products. Another portion (about 20 percent in humans) is present as gene fragments, pseudogenes (sequences that resemble genes but are not expressed as proteins), and surrounding stretches of nucleotides. The vast majority of nucleotides (approximately 75 percent in humans) constitute extragenic sequences. Two forms of extragenic sequences are prominent: unique sequences and repetitive sequences.
For repetitive sequences, two types of organization occur: short tandem repeats (called satellite sequences) and widely distributed, interspersed repeats. Satellites are recurrent short sequences present in essential chromosomal structures such as centromeres and telomeres . Interspersed repeats are generated from transposons, which are nucleotide sequences that can replicate themselves and become distributed throughout the genome. An example of interspersed repeats that occurs in humans is a sequence of a few hundred nucleotides called Alu, which occurs approximately a million times. In higher plants, satellites and interspersed sequences constitute the bulk of the genome.
Ploidy reflects the reproductive mechanisms of an organism. Animals commonly have both a maternal and a paternal parent. Through meiosis, the former forms a haploid gamete called an ovum (or egg); the latter forms a haploid gamete called a sperm. During fertilization, the egg and sperm unite to form a diploid zygote that matures to an adult organism. Thus, the genome of adult animals is diploid, while the genome of their gametes is haploid.
Plants exhibit an alternation of generations; sporophytes (the mature, visible plant) are diploid; through meiosis, they produce spores that germinate into gametophytes; the gametophytes are haploid and produce gametes that fuse to reestablish the diploid state. Fungi also exhibit an alternation of generations. They commonly exist as multinucleate tubes of cytoplasm called hyphae. The individual nuclei are most often haploid (though may be diploid in the lower fungi).
Hyphae of different members of a fungal species sometimes fuse; in this circumstance (called heterokaryosis) the genome becomes the sum of the two (dikaryotic) haploid complements. Unicellular protistan organisms, a group that includes protozoans and most algae, exhibit many variations. For example, the ciliates (such as paramecia) have diploid micronuclei and polyploid macronuclei; the former are the basis of inheritance; the latter establish the genetic character of an existing organism.
Mitochondrial and Chloroplast Genomes
Two cytoplasmic organelles responsible for the production of energy are the mitochondria (present in nearly all eukaryotic cells) and chloroplasts (present only in photosynthetic organisms). Both contain small, circular DNA molecules that constitute the nonnuclear portion of a eukaryotic genome. These organelles are descended from formerly free-living bacteria that took up residence in the first eukaryotes.
The human mitochondrial genome contains 16,569 base pairs specifying 13 protein products and 24 RNA products. In both lower eukaryotes and especially plants, larger mitochondrial genomes are present. In extreme cases, mitochondrial genomes may be several hundred thousand or millions of base pairs. Chloroplast genomes contain between 100 and 200 kilobases. It is thought that each was once larger, but over time their genes have been moved to the nucleus.
Prokaryotic genomes are composed of a chromosome plus various accessory elements. The former is most commonly a circular double-stranded DNA molecule but may be a linear molecule in some major groups, such as Streptomyces and Borrelia (the causative agent of Lyme disease). Accessory elements most prominently include plasmids (commonly circular but linear in Actinomycetes and some Proteobacteria ) as well as insertion sequence (IS) elements, transposons, and prophages (derived from viruses). Other variations in chromosomal geometry exist: multiple circular chromosomes are found in some organisms; combinations of circular and linear chromosomes occur in others; and, in the extreme (observed in Streptomyces ), circular and linear chromosomes can convert between those two topologies.
The smallest bacterial chromosome, with only 580 kilobase pairs (kbp) occurs in Mycoplasma genitalium, and the largest, with 9,200 kbp, occurs in Myxococcus xanthus. Representative sizes cluster between 2,000 and 5,000 kbp (e.g., Escherichia coli MG1655 has 4,649,221 bp). A typical bacterial gene contains approximately a thousand base pairs. M. genitalium has approximately 470 genes, while M. xanthus has more than 10,000, and E. coli has approximately 4,288.
By 2002 the nucleotide sequences of more than seventy-five prokaryotic chromosomes had been mapped. One goal of these sequencing projects is gene annotation: establishing the location, function, and allelic variation for each gene. In E. coli MG1655, for example, the positions of the 4,288 protein-coding genes have been identified; the average distance between genes is 118 base pairs; and the noncoding sequences (some of which may function as regulatory sites) constitute less than 11 percent of the genome. The function of approximately 40 percent of the genes, however, remains unknown. Notably, the chromosomal size and gene content of another isolate of E. coli, the pathogenic H157:O7 strain, are quite different. The H157:O7 chromosome is 20 percent larger, while MG1655 and H157:O7 share 4.1 million base pairs (mbp) in common. H157:O7 has 1.34 mbp that are not found in MG1655 and MG1655 has 0.53 mbp that are not found in H157:O7.
The genomes of closely related prokaryotes often have different organizations. These differences arise from rearrangements (such as inversions) between repeated elements, IS elements, and transposons and from the "horizontal transfer" of nucleotide sequences between cells. The latter phenomenon is mediated most commonly by conjugative plasmids, which are nonessential, autonomous accessory genetic elements that can acquire genes (such as antibiotic resistance genes) and then move them from a donor organism to a recipient. The dynamic character of genomic organization in prokaryotes is often designated as "genomic plasticity."
A series of repeated elements exist in the chromosomes of prokaryotes. In some instances the repeats are redundant copies of essential, long nucleotide sequences, as is seen in ribosomal RNA loci. Other repeats are small and have known functions (as in the Chi sequences in E. coli that facilitate genetic crossing over) or unknown functions (as in the REP [repeated extragenic palindromic] sequences in E. coli ).
Viral genomes are composed of single-stranded or double-stranded DNA or RNA. Single-stranded RNAs are either positive (capable of being immediately translated into protein) or negative. Double-stranded RNA genomes are most often segmented, with each segment being a single gene, while the other genomes are single circular or linear molecules. The Retroviridae have single-stranded RNA genomes that are converted by an enzyme (reverse transcriptase) into double-stranded DNA that becomes incorporated into the genome of the host.
The smallest known virus, containing 5,386 bases, is a member of the Microviridae, which infects bacteria and is designated fX174. The largest viral genomes occur in Poxviridae, which can possess as many as 309 kbp.
Viruses are extraordinarily efficient in using the coding capacity of their genomes. The virus known as fX174 contains ten genes, and the end of one gene commonly overlaps with the beginning of the following gene. In addition, two smaller genes are nested within larger genes (this compaction being achieved by having the two genes expressed in alternate "reading frames"). As a consequence of this efficiency, only 36 bases are not translated into an amino acid sequence. At the opposite extreme, the various pox viruses share more than 100 similar genes and may have an equal number of unique genes.
see also Archaea; Cell, Eukaryotic; Cell Cycle; Conjugation; Eubacteria; Evolution of Genes; Gene; Genomics; Human Genome Project; Polymorphisms; Polyploidy; Reading Frame; Reverse Transcriptase; Transposable Genetic Elements; Virus.
Brown, T. A. Genomes. New York: Wiley-Liss, 1999.
Casjens, Sherwood. "The Diverse and Dynamic Structure of Bacterial Genomes." Annual Review of Genetics 32 (1998): 339-377.
Gould, Stephen J. "The Ant and the Plant." In Bully for Brontosaurus. New York:
W. W. Norton, 1991.
The genome (sometimes spelled geneome) is, in the broadest use of the term, the full set of genes or genetic material carried by a particular organism representing a particular species or population. The size of a genome is usually measured in numbers of genes or base pairs.
With the success of the Human Genome Project and other international genome projects and programs, by 2003, scientists have, to a great extent, constructed genetic maps delineating individual base sequences that constitute the basis of human genome.
A genomic sequence is the actual order of the nitrogenous bases in the DNA nucleotide sequence that, with subtle alterations that create differing gene forms (alleles ), comprise an organism's genetic material.
In humans, the genome comprises one representative of each of the chromosome pairs of the adult diploid parent. In this sense, a genome is a single set of genetic instructions.
Not all of alleles within a genome are expressed, some are masked by the presence of dominant forms. The genomic formula is a mathematical expression of the number of subsets of genomes present in an individual cell or organism. One of the commonly encountered genomic formulae designations is the haploid number (n) that represents a set with a single copy of each gene. This is sometimes called the basic number. The diploid form contains two sets of genes and is designated 2n; the triploid is 3n, and the tetraploid is 4n. Genetic abnormalities, where one chromosome is missing from the genome, can be represented in the same manner. For example, a diploid organism with one chromosome missing is a monosomic cell and is represented by 2n-1. A diploid with two chromosomes missing is termed a nullisomic and is represented by 2n-2. Additions of chromosomes can also occur and are represented in the same form; for example, 2n+1 is trisomic.
In a report published in the February 2001 issue of the international scientific journal nature (usually spelled in the lowercase) researchers reported findings that indicated that the human genome consisted of far fewer genes than previous projected by estimation of phylogenic relationships between humans and other species. Since then additional estimates fix the size of the human genome at about 30,000 genes. By contrast, some worms carry about 22,000 genes in their genome.
As of 2003, while work still continues on the Human Genome Project scientists are also beginning a "Genomes to Life" research program designed to identify and characterize the protein complexes important in animal , especially human, and microbial cell reactions and to further identify the specific genes that regulate these processes.
Genomic libraries have been used in human genetics as part of the Human Genome Project.
A genomic library is a comprehensive collection of cloned DNA fragments derived from a genome. Each part of the genome is represented in the library several times, and the number of times it is represented on average is called the coverage of a library. The library can be screened for the presence of the sequence of interest by radioactively labelling the DNA (usually between 100 and 500 nucleotides long) and using this as a probe to identify the clone that contains the selected sequence. The clone selected can then be grown in bacteria to produce large amounts of clone DNA, which can be studied. If for example, the sequence of interest was part of a gene, by using this sequence as a probe, the clone containing, hopefully, the whole gene could be isolated.
The genetic material of an organism consists of deoxyribonucleic acid (DNA). A gene is a segment of DNA that encodes a protein (or a structural ribonucleic acid [RNA] for example, ribosomal RNA), along with the regulatory elements that control expression of that gene. The entire complement of DNA within the chromosomes of an organism is called the genome. The more complex organisms, that is, eukaryotes, contain much more DNA in their genomes than is found in genes. This nongene DNA has often been called "junk DNA," as scientists have yet to find a specific function for it. The junk DNA can amount to 90 to 99 percent of the total DNA in the cell nucleus .
Within the nucleus the DNA is part of the chromosomes. The number of chromosomes varies with species but is generally about twenty to forty pairs. However, there are exceptions: The round worm Ascaris megalocephala has but one pair of chromosomes, while the fern Ophioglossum reticulatum has six hundred thirty. Humans have twenty-three pairs. In prokaryotes , such as bacteria, the DNA is found in a single chromosome, and this constitutes the bacterial genome.
The concept of a genome can be extended. Mitochondria , the cellular organelles found in all eukaryotes, as well as plastids such as the chloroplast found in plants, originally evolved from bacteria-like ancestors that took up residence within the primitive eukaryotic cell . These are called endosymbionts. Mitochondria and chloroplasts retain some of the genes of these ancestral endosymbionts, and one can then speak of the mitochondrial or chloroplast genome. In addition, many bacteria harbor plasmids , small circular pieces of DNA containing a few genes that form a plasmid genome.
Genomes do not have to consist of double-stranded DNA. Indeed, it is among the viruses that one finds a wide variety of genome forms. These genomes may be composed of double-stranded or single-stranded DNA. The DNA molecules may be linear or form a circle. Other viruses use RNA as their genetic material. These RNA genomes may be single-stranded or double-stranded. Viroids are another interesting group. Viroids are diseasecausing entities in plants, such as the tomato stunt viroid or the avocado sunblotch viroid. Viroids resemble viruses, but unlike viruses they lack a coat protein(s) and consist of a genome of only approximately 240 to 400 bases of RNA.
The study of genomes has been made possible by the development of automated DNA sequencers and high-powered computers that can overlap pieces of genome sequence to derive the entire DNA base sequence. This led to the development in the late 1990s of a new field of study called genomics. Genomics uses genome sequence data to identify genes, to predict the structure of gene products, to study the evolution of individual genes, or to examine the genetic relationships among species. With this technology, genome sequencing is progressing rapidly. The National Institutes of Health maintains a genome database (www.ncbi.nlm.nih.gov). As of May 2001, more than six hundred complete genomes have been deposited in the database. Most of these are viruses, along with four eukaryotes and almost fifty prokaryotes. Several hundred more partial sequences are also available. A first draft of the entire three-billion-plus bases of the human genome was completed in early 2000 and announced on June 26 of that year. The work is expected to be completed in 2003.
see also Cell Evolution; Chloroplast; Chromosome, Eukaryotic; DNA; DNA Viruses; Gene; Human Genome Project; Mitochondrion
Ralph R. Meyer
Singer, Maxine, and Paul Berg. Genes and Genomes. Mill Valley, CA: University Science Books, 1991.
An organism's genome is the complete set of genetic instructions, passed from one generation to the next. The genome consists of a set of instructions for building each of the components of a living cell or virus. The information is found in nucleic acids: usually deoxyribonucleic acid (DNA ), but sometimes ribonucleic acid (RNA).
Genetic information is organized into units called genes, each of which provides instructions to build one cellular component. Genes are parts of large strands of DNA called chromosomes. Much of what we know about human chromosomes comes from the Human Genome Project, begun in 1990. In February 2001 David Baltimore of the International Human Genome Sequencing Consortium and J. Craig Venter of Celera Genomics separately announced initial results on the complete sequencing of human DNA. More information about this project is available online. Scientists were able to determine the chemical code of chromosomes by applying recombinant DNA techniques, whereby millions of copies of human DNA were reproduced in bacteria, and polymerase chain reaction (PCR), which copies small sections of DNA for sequencing, to amplify a few strands of human DNA into more than a trillion strands, enough to detect in the sequencers. Enough DNA was produced to enable determination of the genetic code. Recombinant DNA also provides the basis for the construction of transgenic species such as Bt corn, which contains a gene from the bacteria Bacillus thuringiensis that ultimately helps to protect the corn plant from insects. The terms "biotechnology" and "genomics" are used to describe the application of these new techniques and new genetic information to produce new products. Examples of new biotech products include medical products such as cloned human insulin, human growth factor, and human clotting factors. These products are used to treat diseases caused by genetic errors. The new cloned materials are cheaper and safer than animal substitutes or isolated human materials. Another new biotech product is the herbicide-resistant soybean. These plants allow farmers to kill weeds in fields already planted with soybeans, without affecting the growth of soybeans.
see also Agricultural Chemistry; Chromosome; Genes; Nucleic Acids; Polymerase Chain Reaction (PCR); Recombinant DNA.
Felsenfeld, Gary (1985). "DNA." Scientific American 253(4):58–67.
International Human Genome Sequencing Consortium (2001). "Initial Sequencing and Analysis of the Human Genome." Nature 409:860–921.
Venter, J. Craig, et al. (2001). "The Sequence of the Human Genome." Science 291: 1304–1351.
Weinberg, Robert (1985). "The Molecules of Life." Scientific American 253(4):48–57.
The genome is the full set of genes or genetic material carried by a particular organism. The size of a genome is usually measured in numbers of genes or base pairs (a base, or nucleotide, is the building block of the genetic material).
A genomic sequence is the actual order of the nitrogen-containing bases in the DNA nucleotide sequence. The sequence includes regions that encode a product. Each of these regions constitutes a gene. In eukaryotes (organisms whose genetic material is enclosed within a specialized membrane called chromosomes. In prokaryotes such as bacteria, where the genetic material is not enclosed in a nuclear membrane, the genome is not organized into chromosomes.
The size of the genome varies among diffent organisms. Some viruses, which utilize much of a host’s replication machinery to make new copies of themselves, have a genome that may contain fewer than a dozen genes. The human genome contains about 30,000 genes, perhaps a bit humbling when one considers that an earthworm’s genome contains approximately 23,000 genes.
With the explosion of knowledge of the genomic composition of a variety of life, comparison of genes is possible. Genomic libraries, a comprehensive collection of cloned DNA fragments derived from a genome, can be screened for the presence of the sequence of interest by radioactively labeling the DNA (usually between 100 and 500 nucleotides long) and using this as a probe to identify the clone that contains the selected sequence. The clone selected can then be grown in bacteria to produce large amounts of clone DNA, which can be studied. If for example, the sequence of interest was part of a gene, by using this sequence as a probe, the clone containing, hopefully, the whole gene could be isolated. Such genomic comparison has revealed genetic similarities between a number of organisms.
ge·nome / ˈjēˌnōm/ • n. Biol. the haploid set of chromosomes in a gamete or microorganism, or in each cell of a multicellular organism. ∎ the complete set of genes or genetic material present in a cell or organism. DERIVATIVES: ge·no·mic / jēˈnämik; -ˈnō-; ji-/ adj.