Can computational chemistry provide reliable information about novel compounds

views updated

Can computational chemistry provide reliable information about novel compounds?

Viewpoint: Yes, properly used, computational chemistry can be a reliable guide to the properties of novel compounds.

Viewpoint: No, computational chemistry is not a reliable guide to the properties of novel compounds.

In 1897, British physicist J. J. Thomson announced the discovery of the electron. Fourteen years later, New Zealander Ernest Rutherford, working at the University of Manchester, astounded the physics world by demonstrating that all the positive charge in the atom is concentrated in a tiny nucleus. Rutherford's discovery was unsettling because the laws of Newton's mechanics and the more recently discovered laws of electricity and magnetism were utterly inconsistent with the existence of stable arrays of electrons either standing still or moving around a nucleus. Over the next 20 years, physicists, notably Bohr, Heisenberg, Schrödinger, Dirac, and Pauli, developed quantum mechanics. Quantum mechanics is a comprehensive theory of motion that agrees with Newtonian mechanics for macroscopic objects but describes the motion of subatomic particles. The theory was confirmed by detailed agreement with experimental results and allows for stable electron motion around a point-like nucleus.

Although modern chemistry texts almost invariably begin with a discussion of atomic structure and explain chemical phenomena in terms of the sharing or exchange of electrons, it would be a serious mistake to believe that modern chemistry began with the discovery of the electron. A list of chemical elements can be found in Lavoisier's treatise of 1789, and Mendeleyev in 1870 systematized the elements by their chemical properties in his periodic table. To late-nineteenth-century chemists, molecules were composed of atoms, thought of as small, hard balls held together by somewhat springy chemical bonds that had a definite orientation in space.

Although most chemists now accept that quantum mechanics provides an accurate description of the motions of the electrons and nuclei that underlie chemical phenomena, many chemists differ markedly on the extent to which quantum mechanics actually provides insight that can guide the progress of chemical research and invention. Although the quantum mechanical equations that govern a system of electrons and nuclei are not difficult to write, they are extraordinarily difficult to solve.

It is an unfortunate fact of mathematics that the equations governing the motion of more than two interacting particles, whether those of Newtonian or of quantum mechanics, cannot be solved exactly. In quantum mechanics, the system is described by a wave function that according to the standard, or Copenhagen (after Bohr), interpretation, gives the probability of finding electrons and nuclei at any set of points in space. The possible wave functions for the system are the solutions of the Schrödinger equation for the system, which includes terms describing the kinetic energy of the particles and the electromagnetic attraction between every pair of charged particles. For molecules with more than one electron, the wave function has to satisfy the rather esoteric criterion of changing algebraic sign (plus to minus or vice versa) whenever the coordinates of any two electrons are interchanged. This strange mathematical requirement is the basis for the Pauli exclusion principle, which limits the electrons in atoms to only two in each atomic orbital.

Because the Schrödinger equation cannot be solved exactly, except for systems of only two particles, such as the hydrogen atom, techniques have been developed for finding approximate solutions. Approximate methods can be characterized as either ab initio or semiempirical depending on whether they make use of physical da ta other than those contained in the Schrödinger equation. Both types of calculations, along with certain other modeling techniques that generally require the use of high-speed computers, now constitute the field of computational chemistry.

Ab initio calculations are generally done only for molecules composed of a relatively small number of electrons. When most ab initio techniques are used, it is assumed that the overall electron wave function can be expressed as a sum of terms, each describing an assignment of electrons to molecular orbitals that are combinations of the orbitals of the atoms that make up the molecule. The interaction between electrons consists of an averaged electrical repulsion plus so-called exchange terms that are a consequence of the Pauli principle and the mathematical requirements it places on the overall wave function. The calculation of the exchange terms is generally the most time-consuming part of an ab initio calculation.

For molecules such as insulin, the hormone that regulates the metabolism of sugar in the body, which contain several hundred atoms and several thousand electrons, accurate ab initio calculation might take centuries or longer, even on the fastest supercomputers. Fortunately for biology and medicine, the molecules in living organisms involve a relatively few bond types, and the forces between the atoms can be modeled with so-called molecular mechanics programs. Such an approach represents, in spirit, a return to the nineteenth century view of the molecule as a collection of atoms held together by localized bonds. The results are generally satisfactory for biological molecules and candidate drugs, partly because only a few types of bonds have to be considered and partly because molecules in living cells generally recognize other molecules on the basis of their shape and the distribution of electrical charge within them.

For chemical purposes, the wave complete function actually contains more information than is needed. The strengths of the chemical bonds, their vibrational characteristics, the charge distribution within the molecule, and the shape of molecules overall are completely determined if the accurate time-averaged distribution of electrons, the so-called electron density, is known. Unfortunately, quantum theory does not provide a method by which electron density can be directly computed. The so-called density functional methods, honored with the awarding of the 1998 Nobel Prize to Walter Kohn, actually involve use of a simple formula to approximate the exchange terms in the calculation of the molecular orbitals according to the calculated electron density. In practical density function calculations, the formula chosen is that which best describes a related series of compounds. This blurs the boundary between ab initio and semiempirical methods, causing some chemists to question whether it is quantum theory or chemical insight that is responsible for the success of the methods.

Computational chemistry as currently practiced seems to be a reliable guide to the behavior of biological molecules and some families of catalysts. It has reduced greatly the cost and time associated with developing new drugs. On the other hand, chemists continue to debate the value of computational methods in explaining the character of compounds, such as aluminum monoxide (AlO) and carbon monophosphide (CP), that stretch traditional notions of bonding. It is these chemical novelties that can be counted on to keep the debate alive for some time.

—DONALD R. FRANCESCHETTI

Viewpoint: Yes, properly used, computational chemistry can be a reliable guide to the properties of novel compounds.

The cartoon image of a chemist standing amid boiling, fuming beakers and test tubes in a cluttered chemistry laboratory needs to be redrawn. Many twenty-first century research chemists are likely to be seen in front of high-resolution computer displays constructing colorful three-dimensional images of a novel compound. These scientists may be using computer-assisted drug design (CADD) software that incorporates computational methods to develop pharmacophores, sets of generalized molecular features that are responsible for particular biological activities. There is also a good possibility the scientist has access to sophisticated molecular modeling programs to make virtual designer molecules, as for drug candidates, or possibly catalysts that could accelerate production of advanced chemical compounds.

In the past two decades, computational power and software have advanced to the point at which mathematical models can produce three-dimensional structures of molecules as complex as novel proteins and can relate receptor structures to drug candidates. Chemists can work "in silico" using computational techniques to perform virtual experiments to narrow options for the ideal drug or catalyst before moving into a "wet" laboratory to test the designs.

Overview of Computational Chemistry

When theoretical chemistry, the mathematical description of chemistry, is implemented on a computer, it is called computational chemistry. Some very complex quantum mechanical mathematical equations that require approximate computations are used in theoretical chemistry. Approximate computations produce results that are useful but should not be considered "exact" solutions. When computations are derived directly from theoretical principles, with no experimental data used, the term ab initio is used to describe the results. Ab initio is Latin for "from the beginning," as from first principles. Because the energy of the electrons associated with atoms accounts for the chemistry of the atoms, calculations of wave function and electron density are used in computational chemistry calculations.

Wave function is a mathematical expression used in quantum mechanics to describe properties of a moving particle, such as an electron in an atom. Among the properties are energy level and location in space. Electron density is a function defined over a given space, such as an electron cloud. It relates to the number of electrons present in the space. Ab initio schemes include the quantum Monte Carlo method and the density functional theory method. As the name implies, the density functional theory method entails electron density. The quantum Monte Carlo method uses a sophisticated form of guessing to evaluate required quantities that are difficult to compute.

Modifications of ab initio schemes include tactics such as semiempirical calculations that incorporate some real-world data from the laboratory and simplify calculations. Semiempirical calculations work best when only a few elements are used and the molecules are of moderate size. The method works well for both organic and inorganic molecules. For larger molecules it is possible to avoid quantum mechanics and use methods referred to as molecular mechanics. These methods set up a simple algebraic expression for the total energy of a compound and bypass the need to compute a wave function or total electron density. For these calculations all the information used must come from experimental data. The molecular mechanics method can be used for the modeling of enormous molecules, such as proteins and segments of DNA. This is a favorite tool for computational biochemists. Powerful software packages are based on the molecular mechanics method. Although there are some shortcomings in this method on the side of theory, the software is easy to use.

1998 Nobel Prize in Chemistry

The 1998 Nobel Prize in chemistry was jointly awarded to Professor Walter Kohn of the University of California at Santa Barbara and Professor John A. Pople of Northwestern University. The laureates were honored for their contributions in developing methods that can be used for theoretical studies of the properties of molecules and the chemical processes in which they are involved. The citation for Walter Kohn notes his development of the density functional theory, and the citation for John Pople notes his development of computational methods in quantum chemistry.

A press release by the Royal Swedish Academy of Sciences pointed out that although quantum mechanics had been used in physics since very early in the 1900s, applications within chemistry were long in coming. It was not possible to handle the complicated mathematical relations of quantum mechanics for complex systems such as molecules until computers came into use at the start of the 1960s. By the 1990s theoretical and computational developments had revolutionized all of chemistry. Walter Kohn and John Pople were recognized for being the two most prominent figures in this movement. Computer-based calculations are now widely used to supplement experimental techniques.

The Swedish Academy press release notes that Walter Kohn showed it was not necessary to consider the motion of each electron. It is sufficient to know the average number of electrons located at any one point in space. This led Kohn to a computationally simpler method, the density functional theory. John Pople was a leader in new methods of computation. He designed a computer program that at a number of points was superior to any others being developed at the time. Pople continued to refine the method and build up a well-documented model of chemistry. He included Kohn's density functional theory to open up the analysis of even complex molecules. Applications of Pople's methods include predicting bond formation of small molecules to receptor sites in proteins, which is a useful concept in drug research.

Computer-Assisted Drug Design

CADD has gained in importance in the pharmaceutical industry because drug design is a slow and expensive process. According to the Pharmaceutical Research and Manufacturers of America (PhRMA), it takes 12 to 15 years to bring a new drug to market, the first six and one-half years being discovery and preclinical testing. The Tufts Center for the Study of Drug Development estimates the average cost of developing one drug is approximately $802 million. That figure includes the cost of failures, and far more candidate drugs do not make it to market than do.

It is important to eliminate bad candidates early in the drug development process. A particular compound may be effective against a disease-causing agent outside the body but be ineffective when the agent is in the body. That is, the drug candidate may not meet ADMET criteria. ADMET is an acronym for absorption, distribution, metabolism, excretion, and toxicity. It refers to how a living creature's biology interacts with a drug. Animal testing has not been eliminated in drug design, but it has been greatly reduced with the new in silico methods.

Computational chemistry is used to explore potential drug candidates for lead compounds that have some activity against a given disease. Techniques such as modeling quantitative structure-activity relationships (QSAR), which help to identify molecules that might bind tightly to diseased molecules, are widely used. At the beginning of drug design, computational chemistry is used to establish what are called virtual libraries of potential drug candidates. Large collections of these potentially useful chemical structures are called libraries. Collections of computer-designed potential compounds are called virtual libraries. Computational chemistry is used in conjunction with combinatorial chemistry, a grid-based synthesis technique that can simultaneously generate tens, hundreds, even thousands of related chemical compound collections, also called libraries.

The chemical libraries have to be screened for real usefulness for a particular drug application. All of the first-screened most likely candidates are called hits. When the hits are optimized, they are called leads. Optimizing means refining the molecules to maximize the desired properties. Once a number of lead compounds have been found, computational chemistry is used to check the ADMET criteria. One of the challenges in using CADD and combinatorial chemistry for drug design is the data overload produced that has to be mined (sorted) and managed. These processes would be impossible without a computer. An example of the size of libraries can be seen in the libraries produced by a biotechnology company called 3-Dimensional Pharmaceuticals (3DP). 3DP has a probe library of more than 300,000 individually synthesized compounds that have been selected for maximum diversity and compound screening utility. The company also has a synthetically accessible library of more than 4 billion compounds that have been developed through computational chemistry and can be synthesized on demand.

Computer Chemistry for Catalyst Development

The second largest application of combinatorial chemistry, which generally includes computational chemistry, is the development of catalysts for industrial applications that include the vast chemical and petrochemical industries. It also includes emission controls to meet the growing world pressures for cleaner, greener processing of fuel and chemicals. BASF, one of the world leaders among chemical industries, uses molecular modeling as the starting point in its catalyst research. However, the challenges in the development of catalysts are different from the challenges in drug design, except for data management and mining. Complex interrelations between composition and processing variables make the development of catalysts demanding. Intense research and development are being conducted internationally.

In the United States, Symyx, a high-tech company in California, is a leader in combinatorial chemistry. The company's closest competition is a company in The Netherlands, Avantium, that was formed in February 2000. Avantium has produced software under the trademark of Virtual Lab with informatics and simulation capability. By starting with computational chemistry, catalyst research saves laboratory time and chemicals. Symyx representatives estimate their approach is up to 100 times faster than traditional research methods and reduces the cost per experiment to as little as 1% of the cost of traditional research methods. Eliminating chemical waste in research and development is a valuable added benefit.

Software Solutions

For computational chemistry to be widely used by chemists it has to be available in familiar formats that create an interface between sophisticated computational chemistry and state-of-the-art molecular modeling and analysis tools and the desktop. A number of companies specialize in software for the scientific community. Tripos, headquartered in St. Louis, has been in the business since 1979 and so has "grown up" with the advances in computational chemistry. Since 1998, Tripos has been developing a flexible drug discovery software platform that has attracted the attention of Pfizer, the world's largest drug maker. In January 2002, Pfizer joined Tripos to jointly design, develop, and test a range of methods for analyzing and interpreting drug design data.

The drug design company Pharmacopeia in June 2000 spun off a wholly owned subsidiary, Accelrys, to produce software for computation, simulation, management, and mining of data for biologists, chemists, and materials scientists. Pharmacopeia has had a software segment for some time to meet its computational chemistry needs, but the needs grew and so did the software segment with the acquisition of a number of small specialty companies. Accelrys offers a suite of approximately 130 programs that include data for the analysis of DNA. The company also has software to annotate protein sequences, a capability that aids in the use of genomic data for drug discovery. Those data are becoming available from the Human Genome Project.

Is computational chemistry a reliable guide to the properties of novel compounds? The research and development results of every major chemistry and biochemistry laboratory indicate that it is.

—M. C. NAGEL

Viewpoint: No, computational chemistry is not a reliable guide to the properties of novel compounds.

In recent years computational chemistry and the modeling of molecules have revolutionized chemistry and biochemistry. Both the 1998 and the 1999 Nobel Prizes in chemistry were awarded in the fields of molecular modeling. Some chemists have even foreseen a time when the lab-coated chemistry researcher will be a thing of the past and will be more computer scientist than laboratory experimenter. Despite the rhetoric and much heralded successes of molecular modeling techniques, computational chemistry has many limitations and is not a reliable method when applied to novel compounds.

Even the Best Model Is Only a Model

The most accurate and complete theoretical model available for computational chemistry is quantum mechanics and the Schrödinger wave equation. Unfortunately, Schrödinger's equation is too complex to be solved completely for all but the simplest situations, such as a single particle in a box or the hydrogen atom. To reduce the problem of calculating the properties of atoms and molecules to something possible, approximate methods must be used. When this is done, errors and strange glitches often are introduced in the models calculated.

A number of models approximate the Schrödinger wave equation, some of which are more exhaustive than others in their calculations and so can suffer from extremely long computation times. The so-called ab initio methods of molecular orbital calculation can lead to accurate approximations of a molecule. Such methods are used for can calculation of the properties of a molecule with knowledge of only the constituent atoms, although in practice information often is added or approximated to reduce calculation times. For example, the Hartree-Fock method works on the principle that if the wave functions for all but one electron are known, it is possible to calculate the wave function of the remaining electron. With an average of the effects of all but one electron, the movement of one electron is estimated. The process is repeated for another electron in the previously averaged group until a self-consistent solution is obtained. However, a number of approximations are introduced in the practical calculations. For example, the method ignores the Pauli principle of electron exclusion—that no more than one electron can exist in the same space—because it effectively omits calculations of particle spin. Relativistic effects resulting from heavy nuclei or high-speed particles are ignored in ab initio methods. More general approximations are used to increase the speed of calculation, but at the expense of accuracy. However, the method can still result in computations that would be far too long for practical purposes, even when run on a supercomputer.

The Born-Oppenheimer approximation ignores correlation between the motion of electrons and that of nuclei because the electrons move much faster. It is a good approximation but still requires exhaustive computations of the electron orbitals. The Huckel theory, developed in 1931 uses a more extreme approximation by ignoring electrons not directly involved in chemical bonds and by assuming that some electron orbitals can be treated as identical. Application of this theory greatly reduces the number of calculations. There is always a tradeoff between the accuracy of the model and the speed at which it can be generated. In general, the larger the molecule to be modeled, the less accurate the model can be, given reasonable time constraints.

Even with the great advances in computational speed, parallel processing, and other innovations in computer technology that have occurred, such approximations can take many days, months, or years to run for a single molecule. Yet even if speed issues are overcome with new generations of super-fast computers, ab initio models are only models, and they still suffer from errors caused by the approximations used in calculation. A number of common, simple molecules resist ab initio computation, and results are produced that do not agree with results of experiments. For example, ab initio calculations of AlO, CP, and nitric oxide (NO) all give vibrational frequencies that wildly disagree with results of experiments. Methods exist to correct the errors for well-studied molecules such as these. However, for novel compounds that have no experimental confirmation, there can be no knowledge of what the errors may be, or of how to compensate for them.

Nonquantum Models

Quantum mechanical models give much more information than is needed for chemical analysis, especially if molecular geometry is the main focus. Many of the computer programs used to model molecules ignore quantum effects and reduce the problem to classic calculations, which can be performed much faster. There is a limit to the information that can be gained from such methods, and faster turnaround times come with greater risk of inaccuracies and errors in the model. Force-field methods, which evolved out of the study of vibrational spectroscopy, are related to the old method of building three-dimensional wire-frame real-world models. The models generated can be thought of as a collection of balls (the atoms) connected by springs (the chemical bonds). With calculation of the energy and bond angles in the molecule, a computer-generated model of the likely geometry of the molecule is produced. The calculations are fast in comparison with quantum methods, and the results are sufficiently accurate to be useful, but they cannot be relied on completely and give poor results in some circumstances, such as non-bonded interactions and metal-carbon bonds. As with quantum methods, the final results usually are adjusted with additional parameters so the generated results agree with those of experiments. In the case of novel compounds, such a process can only be a best guess.

There are less accurate modeling methods that are faster still. Previously generated data often are used as a starting point. In the case of novel compounds, this method can only be a general guide. Minimization algorithms are used to attempt to find a best fit to an approximate molecular geometry. Small adjustments in the likely geometry of a molecule are computed to find the lowest energy configuration. For rigid molecules, this is likely to be a good fit to the real geometry, but for flexible molecules, other computations should be added to give validity to the computer models. One textbook compares the process of minimization to "walking around a hillside in a thick fog, trying to find the way down hill." Whereas going down to a local minimum is not much of a problem, if there are isolated valleys that do not connect, how can you be sure you have reached the lowest point? Conformation searches are a more systematic method of finding the lowest point; they are akin to having the fog-bound walker go down many paths rather than only one. However, such an algorithm takes many times longer, and so again many computer programs use approximate methods to increase the speed of calculations.

Experimental Confirmation Needed

Many computer models agree closely with experimental data because the programs have been tweaked. Parameters and corrections have been added to compensate for errors in the approximations used to generate the model. These parameters can be guessed for novel compounds when such new structures closely resemble previously studied molecules. However, such guesses must be checked against experimental results, and the process falls down for novel compounds that differ too greatly from known compounds.

Computational chemistry is a rapidly changing field. As computers increase in speed and the modeling programs used to analyze molecules are adjusted and improved, the range of molecules and interaction between molecules that can be represented increases. There are, however, limits to the accuracy of a computed model, even with an ideal computer. As a result, approximate methods must be used to generate chemical models in a reasonable time. The errors often found in such models are corrected by the addition of values derived from experimental data. With novel compounds, such data do not exist, and further approximations and guesses must be made, limiting the accuracy of any computed model. The accurate modeling of large and novel molecules represents a great challenge to computational chemistry. Although great strides are being made in the field, theoretical limits continue to make such models unreliable. The final analysis must always be experiments in the real world.

—DAVID TULLOCH

KEY TERMS

AB INITIO:

Latin for "from the beginning." A term that describes calculations that require no added parameters, only the most basic starting information. The properties are worked out from the beginning. In practice such methods can be overly time consuming, and some parameters often are introduced to increase the speed of calculation.

CATALYST:

A chemical added to a reaction to accelerate a process but that is not consumed by the process.

COMBINATORIAL CHEMISTRY:

Technology designed to greatly accelerate the rate at which chemical discoveries can be made. With robots and other advanced techniques, hundreds or thousands of different chemical compounds can be produced simultaneously in rows of minute wells called microarrays.

ELECTRON CLOUD:

The physical space in which electrons are most likely to be found.

INFORMATICS:

The application of computational and statistical techniques to the management of information.

MOLECULAR MODELING:

Any representation of a molecule, from a two-dimensional pencil drawing to a three-dimensional wire-frame construction. The term has become linked to the computational methods of modeling molecules and is sometimes uses synonymously with computational chemistry.

QUANTUM MECHANICS:

A physical theory that describes the motion of subatomic particles according to the principles of quantum theory. Quantum theory states that energy is absorbed or radiated not continuously but discontinuously in multiples of discrete units.

SCHRÖDINGER'S WAVE EQUATION:

A mathematical formula that describes the behavior of atomic particles in terms of their wave properties for calculation of the allowed energy levels of a quantum system. The solution is the probability of finding a particle at a certain position.

WHAT ARE CHEMICAL LIBRARIES?

Chemicals are not what one usually expects to find in libraries. But in the sense that libraries are organized collections of useful information, the term chemical library is an appropriate name for the collections of systematically produced chemical compounds that are the result of combinatorial chemistry.

Combinatorial chemistry is a relatively new way to "do" chemistry. Whereas traditional research chemists follow the conventional scientific method of mixing reactants in test tubes, beakers, or flasks to produce one compound at a time, combinatorial research chemists use miniaturized equipment and robotics to prepare large numbers of compounds in grid-based plates (called microtiter sheets) that have very small wells arranged in rows in which many experiments can be run simultaneously.

Based on the work of Robert Bruce Merrifield, winner of the 1984 Nobel Prize in chemistry, combinatorial chemistry was the Eureka! discovery of Richard Houghten in 1985. Houghten wrote about it later, "I woke up in the middle of the night with the 'flash'… if I could make mixtures in a systematic manner … " (Lebl, 1999).

Most combinatorial chemistry reactions are performed on tiny beads of plastic called microspheres. One reactant is attached to each bead, and the beads are placed systematically one to a well. One row of wells has the same reactant on each bead, but the reactants are slightly different on each successive row. The other chemical needed for the reaction is added in solution to the plate column by column. The wells in a column have the same reactant, but each column is systematically slightly different from the previous column. That is why combinatorial chemistry is described as grid-based. The method is also called parallel synthesis. There is another way to do combinatorial chemistry in which the beads are contained in small mesh containers. That was the way Houghten first did it. He called it "tea bag" synthesis.

Making a large number of related compounds quickly is only the beginning. Sorting them and identifying the products that are useful is called high throughput screening and assaying. This process also is automatic. Running combinatorial chemistry laboratories requires sophisticated software, called data management and mining.

There are two major applications for combinatorial chemistry. Drug design is the first use, and right behind it is research to find new and better catalysts for cleaner, greener chemical manufacturing and fossil fuel burning.

—M. C. Nagel

Science in Dispute