Reviews in Computational Chemistry [Vol.23]
 9780470082010, 0470082011

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Reviews in Computational Chemistry Volume 23 Edited by

Kenny B. Lipkowitz Thomas R. Cundari Editor Emeritus

Donald B. Boyd

WILEY-VCH

Reviews in Computational Chemistry Volume 23

Reviews in Computational Chemistry Volume 23 Edited by

Kenny B. Lipkowitz Thomas R. Cundari Editor Emeritus

Donald B. Boyd

WILEY-VCH

Kenny B. Lipkowitz Department of Chemistry Howard University 525 College Street, N.W. Washington, D. C., 20059, U.S.A [email protected] Thomas R. Cundari Department of Chemistry University of North Texas Box 305070, Denton, Texas 76203-5070, U.S.A. [email protected]

Donald B. Boyd Department of Chemistry and Chemical Biology Indiana University-Purdue University at Indianapolis 402 North Blackford Street Indianapolis, Indiana 46202-3274, U.S.A. [email protected]

Copyright ß 2007 by John Wiley & Sons, Inc. All rights reserved Published by John Wiley & Sons, Inc., Hoboken, New Jersey Published simultaneously in Canada No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee of the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www. copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission. Limit of Liability/Disclaimer of Warranty: While the Publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com. Wiley Bicentennial Logo: Richard J. Pacifico Library of Congress Cataloging-in-Publication Data: ISBN 978-0-470-08201-0 ISSN 1069-3599 Printed in the United States of America 10 9 8 7 6 5 4 3 2 1

Preface Students wanting to become computational chemists face a steep learning curve that can be intellectually and emotionally challenging. Those students are expected to know basic physics, from quantum mechanics to statistical mechanics, along with a full comprehension of electricity and magnetism; they are required to be conversant in calculus, algebra, graph theory, and statistics; they are expected to be cognizant of algorithmic issues of computer science, and they are expected to be well versed in the experimental aspects of the topic they intend to model, whether it be in the realm of materials science, biology, or engineering. Beginning in the mid-1990s, and continuing into this century, there appeared a series of books on molecular modeling and computational chemistry that addressed the needs of such students. Those books are very well organized, are extremely well written, and have been received enthusiastically by the community at large. The editors of Reviews in Computational Chemistry knew that such books would avail themselves to a hungry public, and further knew that only an introduction to the wide range of topics in this discipline could be covered in a single book. Accordingly, a decision was made to provide lengthier, more detailed descriptions of the many computational tools that a computational scientist would need for his or her career. Reviews in Computational Chemistry thus set out on a trajectory of providing pedagogically driven chapters for both the novice who wants to become a computational molecular scientist as well as for the seasoned professional who wants to quickly learn about a computational method outside of his or her area of expertise. In this, the 23rd volume of the series, we continue that tradition by providing seven chapters on a wide variety of topics. Most bench chemists who use software for computing quantum mechanical properties, structures, and energies of molecular systems are well aware of the n4 bottleneck associated with the calculation of the required electron repulsion integrals and quickly find this scaling problem to be a major impediment to their studies. In Chapter 1, Christian Ochsenfeld, Jo¨rg Kussmann, and Daniel Lambrecht provide a tutorial on the topic of linear scaling methods

v

vi

Preface

in quantum chemistry. The authors begin by putting into perspective the existing scaling problems associated with approximating the solution to the Schro¨dinger equation. They review the basics of self-consistent field (SCF) theory within the Born–Oppenheimer (BO) approximation and focus the readers’ attention on the interplay between the cubic scaling for diagonalization of the Fock matrix and the quadratic scaling for the formation of that matrix. They then describe how one can reduce this problem by selecting numerically significant integrals using Schwarz or multipole-based integral estimates, illustrating those concepts with easy-to-follow diagrams and demonstrating the results with simple plots. The calculation of integrals by multipole expansion is then presented, beginning with a very simple example that shows the novice how individual pair-wise interactions between point charges can be collected into charge distributions that, when combined with a clever tree algorithm, can avoid the quadratic scaling step. The authors provide the reader with a basic understanding of multipole expansions and then describe the fast multipole method (FMM) and its generalization for continuous (Gaussian) distributions, the continuous FMM method, before providing an overview of other multipole expansions and tree codes used to speed up the calculation of two-electron integrals. Exactly how this linear scaling is accomplished is illustrated nicely by the authors through the use of an example of a molecule of substantial size—an octameric fragment of DNA. Having described how to reduce the scaling behavior for the construction of the Coulomb part of the Fock matrix, the authors bring to the fore the remaining component within Hartree–Frock (HF) theory, the exchange part [also required for hybrid density functional theory (DFT)]. As before, with a didactic style, the authors show how one can exploit localization properties of the density matrix to achieve linear scaling of the exchange part of the Hamiltonian. Then the authors show how one can avoid the conventional diagonalization of the assembled Fock matrix and reduce what would be a cubic scaling process to one that is linear. The tensor formalism is introduced, properties of the one-particle density matrix are described, and the density-matrix-based energy functional is introduced to solve the SCF problem. The authors go beyond just the computation of energies by then explaining energy gradients and molecular response properties. The chapter concludes with an overview of what it takes to reduce the scaling behavior of post-HF methods for large systems. Linear scaling techniques in quantum chemistry are becoming more widely implemented in software packages. The bench chemist who is inclined to use this software ought not treat it as a black box, but instead should be cognizant of the assumptions, approximations and pitfalls associated with linear scaling methodology. This chapter makes all of this visible to the user in a clear and coherent manner. The BO approximation is sufficient in quantum chemistry for describing most chemical processes. However, many nonadiabatic processes exist in nature that cannot be described adequately within this context, examples of

Preface

vii

which include the ubiquitous photophysical and photochemical processes associated with photosynthesis, vision and charge transfer reactions, among others. Nonadiabatic phenomena occur when two or more potential energy surfaces approach each other, and the coupling between those surfaces becomes important. Conical intersections are the actual crossings of those surfaces. In Chapter 2, Spiridoula Matsika highlights where the BO approximation breaks down, the differences between adiabatic and diabatic representations for studying nuclear dynamics, and the significance of the noncrossing rule. She follows this with an introduction to, and explanation of, conical intersections by addressing the Jahn–Teller effect, symmetry allowed conical intersections, accidental intersections, the branching plane, and how topography is used to characterize conical intersections. Post-HF methods, including MCSCF, MRCI, and CASPT2, and single reference methods are surveyed along with the many considerations a user must take into account when choosing a particular electronic structure method for computing conical intersections. These explanations are followed by a description of how to actually locate conical intersections using Lagrange multipliers and projected gradient techniques. Then, with this background in hand, the author provides us with several applications to show what can be done to analyze such intersections. Matsika’s review of the field covers inorganic and organic molecules but focuses primarily on biologically relevant systems, especially on nucleic acids. Most of the tutorial focuses on two-state conical intersections, but a description of three-state intersections is also given. That is followed by a discussion of spin-orbit coupling that, when included in the Hamiltonian, provides new and qualitatively different effects in the radiationless behavior of chemical systems. In this part of the review, the author points out how such effects can couple states of different spin multiplicity whose intersections would otherwise not be conical, along with explaining the influence of such coupling on systems with an odd number of electrons for which there are qualitative changes in the characteristics of the intersection. Novice molecular modelers intending to carry out quantum mechanical calculations are encouraged to peruse this chapter to determine whether their systems are susceptible to nonadiabatic processes that would require evaluation of conical intersections, and then to read this tutorial to ensure that a proper treatment of the system is being pursued. We also urge the reader to see the chapter by Michael A. Robb, Marco Garavelli, Massimo Olivucci, and Fernando Bernardi in Volume 15 that also examined some of these issues. Most of us are content with computing structures, energies, and some properties of molecules in the ground or excited states. For other researchers, however, kinetic information is required when rate constants for chemical reactions must be evaluated. How does one go about computing rate constants for, say, a large catalytic system like an enzyme in which a critical step is the transfer of a light particle such as a hydride or a proton that is subject to quantum tunneling effects? In Chapter 3, Antonio Fernandez-Ramos,

viii

Preface

Benjamin Ellingson, Bruce Garrett, and Donald Truhlar provide a tutorial on the topic of variational transition state theory (VTST) with multidimensional tunneling that gives us a good starting point to answer that question. The tutorial begins with a description of conventional transition state theory (TST), highlighting the tenets upon which it is constructed, the merits of its use, and pointing out that it only provides an approximation to the true rate constant because it assesses the one-way flux through a dividing surface that is appropriate for small, classic vibrations around a saddle point. Canonical and other types of variational TSTs are introduced by the authors at this point along with highlighting the influences that quantum effects have on the reaction coordinate; a section on practical methods for quantized VTST calculations is subsequently presented to address these concerns. Here the authors cover some algorithms used to calculate the reaction path by describing the minimum energy path and an algorithm for a variational reaction path, showing us how to evaluate partition functions using both rectilinear and curvilinear coordinates, describing the influence of anharmonic vibrational levels on those partition functions, and demonstrating how to calculate the number of states needed for microcanonical variational theory calculations. The authors then focus on quantum effects on reaction coordinate motion; such effects are usually dominated by tunneling but also include nonclassical reflection, both of which are incorporated in a multiplicative transmission coefficient. In this part of the tutorial, multidimensional tunneling corrections are highlighted for the novice. Because the reaction path is a curvilinear coordinate, the curvature of that path couples motion along the reaction coordinate to local vibrational modes that are perpendicular to it. The coupling causes the system to take a shorter path than the reaction coordinate by tunneling. Both small- and large-curvature tunneling motions, with and without vibrational excitations, are compared in this part of the tutorial. In the second part of Chapter 3, the authors deal with VTST in complex systems. Because an analytical potential energy surface (PES) is usually not available, the authors begin by describing how one can build the PES from electronic structure calculations using ‘‘on the fly’’ quantum methods for direct dynamics calculations, i.e., without the fitting of those energies in the form of a potential function, and then they explain how one can derive those surfaces by interpolation through the use of their multiconfiguration molecular mechanics algorithm (MCMM) and by a mapping procedure. This, in turn, is followed by a description of how to incorporate both low-level and high-level calculations to generate the PES so as to make the calculation of rate constants very fast. In this chapter, the authors also cover other topics of relevance to the prediction of accurate rate calculations, including reactions in liquids and, because there is more than one reaction coordinate, how to use ensemble-averaged VTST. Finally, the authors provide two insightful examples, one in the gas phase and the other in solution, to demonstrate the speed and accuracy of modern methods for predicting rate constants.

Preface

ix

Chemists have traditionally worked on a scale of size ranging from Angstrom units to nanometers; we take a molecular view of the scientific problems at hand in which atomic-level detail is de rigueur. What happens, though, if the career path you take or the research project you are engaged with involves, say, long-chain polymeric systems consisting of a few thousand monomers in a melt where the relevant length scales run from bond lengths to the contour length of the chain, which is on the order of micrometers, and where the relevant relaxation times increase as N3.4 for chains of length N? One approach to addressing such a problem is to invoke coarse-grained techniques, and in Chapter 4, Roland Faller shows us how this is accomplished. The author sets the stage for such a computational scene by first pointing out that one needs to define the system to be evaluated and then one needs to select a suitable model to combine simulations on a variety of length and time scales. An explanation is then provided about how one assigns interaction sites on the coarse-grain scale. Because it may be necessary to use two or more models to cover the range of relevant interactions of interest to the scientist or engineer, the author emphasizes that a meaningful mapping between scales is needed for meaningful results. For example, atomistic models can treat lengths of scale from a few hundred picometers to tens of nanometers, whereas meso-scale models are useful from the multi-nanometer scale up to a few micrometers in size, but if we want to enter the realm of micrometers and beyond, a second or third mapping is needed. A brief tutorial on the various types of existing mapping strategies is given for the novice modeler. First, static mapping methods are discussed, including single-chain distribution models, iterative structural coarse-graining, and mapping onto simple models. Then the author teaches us about dynamic mapping, including mapping by chain diffusion, mapping through local correlation times, and direct mapping of the Lennard–Jones time. Following that part of the tutorial, the author describes coarse-grained Monte Carlo simulations and reverse mapping, in which atomistic detail is reintroduced at the end of the simulation. Faller then goes beyond polymers to describe examples of coarse-grain modeling of lipid bilayer systems. Nowadays the scientific community is expecting more than it has in the past from a computational chemist in terms of both the quality and the scope of the modeling endeavor. Because advances in computing machinery will not likely allow us to take a fully atomistic approach to such modeling in the next decade, this chapter, written from the perspective of an engineer, gives us the insights needed to carry out simulations on both small and large scales. Many of the readers of this book series work in the pharmaceutical industry where informatics is especially relevant. Different databases are available free of charge in some cases but more usually for a fee, even if that fee comes from within the company where large investments are made in developing a proprietary database. One might want to know in advance if a given database is more diverse than another, or one might want to answer

x

Preface

the question: ‘‘How much additional chemical diversity is added when we double the size of the current database?’’ Given the costs of generating compound libraries (real or virtual), answering such a question requires that a management team should have good insights about information in general; otherwise, poor decisions could have costly ripple effects that negatively influence both big and small companies alike. In Chapter 5, Jeffrey Godden and Ju¨rgen Bajorath provide a tutorial on the analysis of information content that focuses on Shannon entropy (SE) applied to molecules. Here, any structural representation of a molecule, including the limitless number of molecular descriptors currently in use today or to be used in the future, is to be understood as a communication carrying a specific amount of information. The authors begin their tutorial by providing a historical account of how this area of informatics developed, and they explain the relationship between Shannon entropy used in the telecommunications industry and information being conveyed in a typical molecular database. Simple equations and simple examples are used to illustrate the concepts. The authors then use these concepts to show the reader how one would compare descriptors in, say, the Available Chemical Directory (ACD) with those in the Molecular Drug Data Report (MDDR). Here it is emphasized that because SE is a nonparametric distribution measure, this entropy-based approach is well suited for information content analysis of descriptors with different units, different numerical ranges, and different variability. Now one can begin answering questions such as ‘‘Which descriptors carry high levels of information for a specific compound set?’’ [which in turn could be used for deriving a statistically meaningful quantitative structure-activity relationship/quantitative structure-property relationship (QSAR/QSPR) model] and ‘‘Which descriptors carry low levels of information?’’ The authors continue their tutorial by describing the influence of boundary effects on such analyses, and they give hints about what to do and what not to do for the novice modeler who would otherwise become trapped in one or more computational pitfalls that are not visible to a beginner. An extension of the method called differential Shannon entropy (DSE) analysis is then introduced, and the reader is shown how DSE can reveal descriptors that are sensitive to systematic differences in the properties of different databases or classes of molecules. A brief glimpse into the information content of organic molecules is given, and then uses of SE in quantum mechanical calculations, molecular dynamics simulations, and other types of modeling are presented. The authors end this chapter with examples of SE and DSE analysis for the modeling of physicochemical properties and for accurate classification of molecules, a topic that is described in the following chapter. Many of us make binary, black/white, either/or type decisions every day: ‘‘Should I buy this house now or wait?’’, ‘‘Should I say something to my boss or not?’’, and so on. These types of queries are commonly posed in a scientific setting as well, where, for example, the question might be on a health-related

Preface

xi

issue like ‘‘Is this cell cancerous or not?’’, and in the business world, where we might ask, ‘‘Is this lead molecule toxic or not?’’ Robust methods for simple classification do exist. One of the more popular and successful techniques involves a group of supervised learning methods called support vector machines (SVM) that can be applied to classification as well as to regression. In Chapter 6, Ovidiu Ivanciuc covers the topic of SVMs in chemistry. Following a historical introduction that covers the development of SVM and other kernel-based techniques, the author provides a non-mathematical introduction to SVM, beginning with the classification of linearly separable classes and then continues by teaching us about partitioning classes that cannot be separated with a linear classifier, which is a situation where mapping into a highdimensional feature space is accomplished with nonlinear functions called kernels. The author uses a simple ‘‘synthetic’’ dataset to demonstrate the concepts for the beginner, and he provides simple MATLAB-generated plots to illustrate what should and should not be done for both classification and regression. The next topic unveiled in this tutorial is pattern classification, which is used, for example, in clinical diagnostics and in speech analysis as well as for chemical problems where one might need to recognize, say, the provenance of agricultural products like wine, olive oil, or honey based on chemical composition or spectral analysis. Again, very simple examples and clear plots are presented to show the utility of this method for pattern classification along with restrictions on its use. Because SVMs are based on structural risk minimization, which in turn is derived from statistical learning theory, the machine algorithm is considered deterministic. Accordingly, concepts related to the expected risk or to the expected error are next introduced by describing the Vapnik–Chervonenkis dimension, a construct used to indicate how high in complexity a classifier must be to minimize the empirical risk. With this background, pattern classification with linear support machines is described for the reader, showing how to establish the optimum separation hyperplane for a given finite set of learning patterns. The equations needed to accomplish this are developed in a clear and concise manner, and again, simple examples are given for SVM classification of linearly separable data, and then for nonlinearly separable data. Because there are cases where complex relationships exist between input parameters and the class of a pattern, the author devotes a full section to nonlinear support machines, showing first how patterns are mapped to a feature space, and then describing feature functions and kernels, including linear kernels, polynomial kernels, and Gaussian and exponential radial basis function kernels, along with others like neural, Fourier series, spline, additive, and tensor product kernels. Also covered in this section of the chapter are weighted SVMs for imbalanced classification, and multiclass SVM classification. A significant portion of the review describes SVM regression. Here simple examples and clear diagrams are used to illustrate the concepts being described. This precedes a section on optimizing the SVM model, i.e., finding good prediction statistics. Given this background, the

xii

Preface

author then spends the remainder of the chapter first on practical aspects of SVM classification, providing guidelines for their use in cheminformatics and QSAR, and then on applications of SVM regression. Several examples from this section of the review include predicting the mechanism of action for polar and nonpolar narcotic compounds, classifying the carcinogenic activity of polycyclic aromatic hydrocarbons, and using SVM regression for developing a QSAR for benzodiazepine receptor ligands. The chapter ends with a literature review of SVM applications in chemistry. SVM resources on the Web are identified, and then SVM software for chemists interested in cheminformatics and computational chemistry are tabulated in a convenient, easyto-read list that describes what those programs can do. In the final chapter, Donald B. Boyd presents a historical account of the growth of computational chemistry that covers hardware, software, events, trends, hurdles, successes, and people in the pharmaceutical industry. In the 1960s, there were no computational chemists in that industry. That term had not yet been invented. A smattering of theoretical chemists, statisticians, quantum chemists, and crystallographers were among the computer-savvy scientists at that time period who set the stage for modern computational science and informatics to be played out in the pharmaceutical industry. The chapter conveys to the novice molecular modeler what it was like to rely on huge, offsite mainframes like the IBM 7094 or onsite machines used mostly for payroll and bookkeeping with little time available for scientific computing. The smell and sounds of a computer center, replete with loud chunking noises from card punch machines and high-pitch ripping sounds from line printers, are well depicted for the young reader who grew up with quiet personal computers, graphical interfaces, and the Internet, all of which were only futuristic thoughts in the 1960s. Also brought to light is the fact that the armamentarium of the computational scientist in those days consisted of programs like Extended Hu¨ckel Theory and early versions of semiempirical HF-based quantum methods like CNDO. Molecular mechanics for fast geometry optimization of pharmaceutically relevant molecules was just being developed in academic laboratories. Preparation of input data involving atomic coordinates was a tedious process because it involved using handheld mechanical models, protractors, and tables of standard bond lengths and bond angles. But despite hardware and software limitations, there were useful insights from such computational endeavors deemed valuable by management and that, in turn, led eventually to the accelerated growth of computational chemistry in the 1980s and the fruition of such research in the 1990s. Woven into this historical tapestry are the expensive threads of hardware purchases beginning with IBM and CDC mainframes and followed by interactive machines like the DEC-10, department-sized super-minicomputers like the VAX 11/780, PCs, Macintoshes, UNIX workstations, supercomputers, array processors, servers, and now clusters of PCs. Interlaced throughout this story are the historical strands of software availability and use of

Preface

xiii

venerable programs like CNDO/2, MINDO/3, MOPAC, MM2, and molecular modeling packages like CHEMGRAF, SYBYL, and MacroModel along with other programs of utility to the pharmaceutical industry, including MACCS and REACCS. Along with the inanimate objects of computers and software, this chapter reveals some social dynamics involving computational chemists, medicinal chemists, and management. Stitched throughout this chapter are the nascent filaments of what we now call informatics, showing how the fabric of that industry evolved from dealing with a small number of molecules to now treating the enormous numbers of potential drug candidates coming from experimental combi-chem studies or from virtual screening by computer. This chapter conveys to the reader, in a compelling way, both the hardships and the successes of computational chemistry in the pharmaceutical industry. Reviews in Computational Chemistry is highly rated and well received by the scientific community at large; the reason for these accomplishments rests firmly on the shoulders of the authors whom we have contacted to provide the pedagogically driven reviews that have made this ongoing book series so popular. To those authors we are especially grateful. We are also glad to note that our publisher has plans to make our most recent volumes available in an online form through Wiley InterScience. Please check the Web (http://www.interscience.wiley.com/onlinebooks) or contact [email protected] for the latest information. For readers who appreciate the permanence and convenience of bound books, these will, of course, continue. We thank the authors of this and previous volumes for their excellent chapters. Kenny B. Lipkowitz Washington Thomas R. Cundari Denton April 2006

Contents 1.

Linear-Scaling Methods in Quantum Chemistry Christian Ochsenfeld, Jo¨rg Kussmann, and Daniel S. Lambrecht

1

Introduction Some Basics of SCF Theory Direct SCF Methods and Two-Electron Integral Screening Schwarz Integral Estimates Multipole-Based Integral Estimates (MBIE) Calculation of Integrals via Multipole Expansion A First Example Derivation of the Multipole Expansion The Fast Multipole Method: Breaking the Quadratic Wall Fast Multipole Methods for Continuous Charge Distributions Other Approaches Exchange-Type Contractions The Exchange-Correlation Matrix of KS-DFT Avoiding the Diagonalization Step—Density Matrix-Based SCF General Remarks Tensor Formalism Properties of the One-Particle Density Matrix Density Matrix-Based Energy Functional ‘‘Curvy Steps’’ in Energy Minimization Density Matrix-Based Quadratically Convergent SCF (D-QCSCF) Implications for Linear-Scaling Calculation of SCF Energies SCF Energy Gradients Molecular Response Properties at the SCF Level Vibrational Frequencies NMR Chemical Shieldings Density Matrix-Based Coupled Perturbed SCF (D-CPSCF)

1 4 8 9 11 15 16 20 27 32 34 35 40 42 42 43 47 49 53 55 56 57 59 60 61 62

xv

xvi

Contents Outlook on Electron Correlation Methods for Large Systems Long-Range Behavior of Correlation Effects Rigorous Selection of Transformed Products via Multipole-Based Integral Estimates (MBIE) Implications Conclusions References

2.

64 67 72 72 73 74

Conical Intersections in Molecular Systems Spiridoula Matsika

83

Introduction General Theory The Born–Oppenheimer Approximation and its Breakdown: Nonadiabatic Processes Adiabatic-Diabatic Representation The Noncrossing Rule The Geometric Phase Effect Conical Intersections and Symmetry The Branching Plane Characterizing Conical Intersections: Topography Derivative Coupling Electronic Structure Methods for Excited States Multiconfiguration Self-Consistent Field (MCSCF) Multireference Configuration Interaction (MRCI) Complete Active Space Second-Order Perturbation Theory (CASPT2) Single Reference Methods Choosing Electronic Structure Methods for Conical Intersections Locating Conical Intersections Dynamics Applications Conical Intersections in Biologically Relevant Systems Beyond the Double Cone Three-State Conical Intersections Spin-Orbit Coupling and Conical Intersections Conclusions and Future Directions Acknowledgments References

83 85 85 87 88 89 90 91 93 96 97 98 99 101 101 102 102 104 105 106 110 110 112 115 116 116

Contents 3.

Variational Transition State Theory with Multidimensional Tunneling Antonio Fernandez-Ramos, Benjamin A. Ellingson, Bruce C. Garrett, and Donald G. Truhlar Introduction Variational Transition State Theory for Gas-Phase Reactions Conventional Transition State Theory Canonical Variational Transition State Theory Other Variational Transition State Theories Quantum Effects on the Reaction Coordinate Practical Methods for Quantized VTST Calculations The Reaction Path Evaluation of Partition Functions Harmonic and Anharmonic Vibrational Energy Levels Calculations of Generalized Transition State Number of States Quantum Effects on Reaction Coordinate Motion Multidimensional Tunneling Corrections Based on the Adiabatic Approximation Large Curvature Transmission Coefficient The Microcanonically Optimized Transmission Coefficient Building the PES from Electronic Structure Calculation Direct Dynamics with Specific Reaction Parameters Interpolated VTST Dual-Level Dynamics Reactions in Liquids Ensemble-Averaged Variational Transition State Theory Gas-Phase Example: H þ CH4 Liquid-Phase Example: Menshutkin Reaction Concluding Remarks Acknowledgments References

4.

xvii

125

125 127 127 131 136 138 139 140 148 158 163 163 164 172 188 190 191 192 199 203 206 212 217 221 222 222

Coarse-Grain Modeling of Polymers Roland Faller

233

Introduction Defining the System Choice of Model Interaction Sites on the Coarse-Grained Scale

233 235 235 237

xviii

5.

6.

Contents Static Mapping Single-Chain Distribution Potentials Simplex Iterative Structural Coarse-Graining Mapping Onto Simple Models Dynamic Mapping Mapping by Chain Diffusion Mapping through Local Correlation Times Direct Mapping of the Lennard-Jones Time Coarse-Grained Monte Carlo Simulations Reverse Mapping A Look Beyond Polymers Conclusions Acknowledgments References

238 238 239 240 245 246 247 247 250 250 252 254 257 258 258

Analysis of Chemical Information Content Using Shannon Entropy Jeffrey W. Godden and Ju¨rgen Bajorath

263

Introduction Shannon Entropy Concept Descriptor Comparison Influence of Boundary Effects Extension of SE Analysis for Profiling of Chemical Libraries Information Content of Organic Molecules Shannon Entropy in Quantum Mechanics, Molecular Dynamics, and Modeling Examples of SE and DSE Analysis Conclusions References

263 265 269 273 275 278

Applications of Support Vector Machines in Chemistry Ovidiu Ivanciuc

291

Introduction A Nonmathematical Introduction to SVM Pattern Classification The Vapnik–Chervonenkis Dimension Pattern Classification with Linear Support Vector Machines SVM Classification for Linearly Separable Data Linear SVM for the Classification of Linearly Non-Separable Data

291 292 301 306 308 308

279 280 286 287

317

Contents

xix

Nonlinear Support Vector Machines Mapping Patterns to a Feature Space Feature Functions and Kernels Kernel Functions for SVM Hard Margin Nonlinear SVM Classification Soft Margin Nonlinear SVM Classification n-SVM Classification Weighted SVM for Imbalanced Classification Multi-class SVM Classification SVM Regression Optimizing the SVM Model Descriptor Selection Support Vectors Selection Jury SVM Kernels for Biosequences Kernels for Molecular Structures Practical Aspects of SVM Classification Predicting the Mechanism of Action for Polar and Nonpolar Narcotic Compounds Predicting the Mechanism of Action for Narcotic and Reactive Compounds Predicting the Mechanism of Action from Hydrophobicity and Experimental Toxicity Classifying the Carcinogenic Activity of Polycyclic Aromatic Hydrocarbons Structure-Odor Relationships for Pyrazines Practical Aspects of SVM Regression SVM Regression QSAR for the Phenol Toxicity to Tetrahymena pyriformis SVM Regression QSAR for Benzodiazepine Receptor Ligands SVM Regression QSAR for the Toxicity of Aromatic Compounds to Chlorella vulgaris SVM Regression QSAR for Bioconcentration Factors Review of SVM Applications in Chemistry Recognition of Chemical Classes and Drug Design QSAR Genotoxicity of Chemical Compounds Chemometrics Sensors Chemical Engineering Text Mining for Scientific Information

323 323 326 329 334 335 337 338 339 340 347 347 348 348 349 350 350 352 355 359 360 361 362 363 366 367 369 371 371 376 378 379 381 383 384

xx

7.

Contents SVM Resources on the Web SVM Software Conclusions References

385 387 391 392

How Computational Chemistry Became Important in the Pharmaceutical Industry Donald B. Boyd

401

Introduction Germination: The 1960s Gaining a Foothold: The 1970s Growth: The 1980s Gems Discovered: The 1990s Final Observations Acknowledgments References

401 404 408 414 424 437 443 443

Author Index

453

Subject Index

471

Contributors Ju¨rgen Bajorath, Department of Life Science Informatics, B-IT International Center for Information Technology, Rheinische Friedrich-Wilhelms-Universita¨t, Go¨rresstrasse 13, D-53113 Bonn, Germany (Electronic mail: bajorath@bit. uni-bonn.de ) Donald B. Boyd, Department of Chemistry and Chemical Biology, Indiana University-Purdue University at Indianapolis (IUPUI), 402 North Blackford Street, Indianapolis, Indiana 46202-3274, U.S.A. (Electronic mail: boyd@ chem.iupui.edu) Benjamin A. Ellingson, Department of Chemistry and Supercomputing Institute, University of Minnesota, 207 Pleasant Street S.E., Minneapolis, MN 55455, U. S. A. (Electronic mail: [email protected]) Roland Faller, Department of Chemical Engineering and Materials Science, University of California-Davis, 1 Shields Avenue, Davis, CA 95616, U. S. A. (Electronic mail: [email protected]) Antonio Fernandez-Ramos, Departamento de Quimica Fisica, Universidade de Santiago de Compostela, Facultade de Quimica, 15782 Santiago de Compostela, Spain (Electronic mail: [email protected]) Bruce C. Garrett, Chemical and Materials Sciences Division, Pacific Northwest National Laboratory, MS K9-90, P.O. Box 999, Richland, WA 99352, U. S. A. (Electronic Mail: [email protected]) Jeffrey W. Godden, Department of Life Science Informatics, B-IT International Center for Information Technology, Rheinische Friedrich-Wilhelms-Universita¨t, Go¨rresstrasse 13, D-53113 Bonn, Germany (Electronic mail: godden @bit.uni-bonn.de)

xxi

xxii

Contributors

Ovidiu Ivanciuc, Sealy Center for Structural Biology, Department of Biochemistry and Molecular Biology, University of Texas Medical Branch, 301 University Boulevard, Galveston, TX 77555, U. S. A. (Electronic mail: oiivanci @utmb.edu) Jo¨rg Kussmann, Institut fu¨r Physikalische und Theoretische Chemie, Universita¨t Tu¨bingen, Auf der Morgenstelle 8, D-72076 Tu¨bingen, Germany (Electronic mail: [email protected]) Daniel S. Lambrecht, Institut fu¨r Physikalische und Theoretische Chemie, Universita¨t Tu¨bingen, Auf der Morgenstelle 8, D-72076 Tu¨bingen, Germany (Electronic mail: [email protected]) Spiridoula Matsika, Department of Chemistry, Temple University, 1901 N. 13th Street, Philadelphia, PA 19122, U. S. A. (Electronic mail: smatsika @temple.edu) Christian Ochsenfeld, Institut fu¨r Physikalische und Theoretische Chemie, Universita¨t Tu¨bingen, Auf der Morgenstelle 8, D-72076 Tu¨bingen, Germany (Electronic mail: [email protected]) Donald G. Truhlar, Department of Chemistry and Supercomputing Institute, University of Minnesota, 207 Pleasant Street S.E., Minneapolis, MN 55455, U. S. A. (Electronic mail: [email protected])

Contributors to Previous Volumes Volume 1

(1990)

David Feller and Ernest R. Davidson, Basis Sets for Ab Initio Molecular Orbital Calculations and Intermolecular Interactions. James J. P. Stewart, Semiempirical Molecular Orbital Methods. Clifford E. Dykstra, Joseph D. Augspurger, Bernard Kirtman, and David J. Malik, Properties of Molecules by Direct Calculation. Ernest L. Plummer, The Application of Quantitative Design Strategies in Pesticide Design. Peter C. Jurs, Chemometrics and Multivariate Analysis in Analytical Chemistry. Yvonne C. Martin, Mark G. Bures, and Peter Willett, Searching Databases of Three-Dimensional Structures. Paul G. Mezey, Molecular Surfaces. Terry P. Lybrand, Computer Simulation of Biomolecular Systems Using Molecular Dynamics and Free Energy Perturbation Methods. Donald B. Boyd, Aspects of Molecular Modeling. Donald B. Boyd, Successes of Computer-Assisted Molecular Design. Ernest R. Davidson, Perspectives on Ab Initio Calculations.

xxiii

xxiv

Contributors to Previous Volumes

Volume 2 (1991) Andrew R. Leach, A Survey of Methods for Searching the Conformational Space of Small and Medium-Sized Molecules. John M. Troyer and Fred E. Cohen, Simplified Models for Understanding and Predicting Protein Structure. J. Phillip Bowen and Norman L. Allinger, Molecular Mechanics: The Art and Science of Parameterization. Uri Dinur and Arnold T. Hagler, New Approaches to Empirical Force Fields. Steve Scheiner, Calculating the Properties of Hydrogen Bonds by Ab Initio Methods. Donald E. Williams, Net Atomic Charge and Multipole Models for the Ab Initio Molecular Electric Potential. Peter Politzer and Jane S. Murray, Molecular Electrostatic Potentials and Chemical Reactivity. Michael C. Zerner, Semiempirical Molecular Orbital Methods. Lowell H. Hall and Lemont B. Kier, The Molecular Connectivity Chi Indexes and Kappa Shape Indexes in Structure-Property Modeling. I. B. Bersuker and A. S. Dimoglo, The Electron-Topological Approach to the QSAR Problem. Donald B. Boyd, The Computational Chemistry Literature.

Volume 3 (1992) Tamar Schlick, Optimization Methods in Computational Chemistry. Harold A. Scheraga, Predicting Three-Dimensional Structures of Oligopeptides. Andrew E. Torda and Wilfred F. van Gunsteren, Molecular Modeling Using NMR Data. David F. V. Lewis, Computer-Assisted Methods in the Evaluation of Chemical Toxicity.

Contributors to Previous Volumes

xxv

Volume 4 (1993) Jerzy Cioslowski, Ab Initio Calculations on Large Molecules: Methodology and Applications. Michael L. McKee and Michael Page, Computing Reaction Pathways on Molecular Potential Energy Surfaces. Robert M. Whitnell and Kent R. Wilson, Computational Molecular Dynamics of Chemical Reactions in Solution. Roger L. DeKock, Jeffry D. Madura, Frank Rioux, and Joseph Casanova, Computational Chemistry in the Undergraduate Curriculum.

Volume 5 (1994) John D. Bolcer and Robert B. Hermann, The Development of Computational Chemistry in the United States. Rodney J. Bartlett and John F. Stanton, Applications of Post-Hartree–Fock Methods: A Tutorial. Steven M. Bachrach, Population Analysis and Electron Densities from Quantum Mechanics. Jeffry D. Madura, Malcolm E. Davis, Michael K. Gilson, Rebecca C. Wade, Brock A. Luty, and J. Andrew McCammon, Biological Applications of Electrostatic Calculations and Brownian Dynamics Simulations. K. V. Damodaran and Kenneth M. Merz Jr., Computer Simulation of Lipid Systems. Jeffrey M. Blaney and J. Scott Dixon, Distance Geometry in Molecular Modeling. Lisa M. Balbes, S. Wayne Mascarella, and Donald B. Boyd, A Perspective of Modern Methods in Computer-Aided Drug Design.

Volume 6 (1995) Christopher J. Cramer and Donald G. Truhlar, Continuum Solvation Models: Classical and Quantum Mechanical Implementations.

xxvi

Contributors to Previous Volumes

Clark R. Landis, Daniel M. Root, and Thomas Cleveland, Molecular Mechanics Force Fields for Modeling Inorganic and Organometallic Compounds. Vassilios Galiatsatos, Computational Methods for Modeling Polymers: An Introduction. Rick A. Kendall, Robert J. Harrison, Rik J. Littlefield, and Martyn F. Guest, High Performance Computing in Computational Chemistry: Methods and Machines. Donald B. Boyd, Molecular Modeling Software in Use: Publication Trends.  sawa and Kenny B. Lipkowitz, Appendix: Published Force Field Eiji O Parameters.

Volume 7 (1996) Geoffrey M. Downs and Peter Willett, Similarity Searching in Databases of Chemical Structures. Andrew C. Good and Jonathan S. Mason, Three-Dimensional Structure Database Searches. Jiali Gao, Methods and Applications of Combined Quantum Mechanical and Molecular Mechanical Potentials. Libero J. Bartolotti and Ken Flurchick, An Introduction to Density Functional Theory. Alain St-Amant, Density Functional Methods in Biomolecular Modeling. Danya Yang and Arvi Rauk, The A Priori Calculation of Vibrational Circular Dichroism Intensities. Donald B. Boyd, Appendix: Compendium of Software for Molecular Modeling.

Volume 8 (1996) Zdenek Slanina, Shyi-Long Lee, and Chin-hui Yu, Computations in Treating Fullerenes and Carbon Aggregates.

Contributors to Previous Volumes

xxvii

Gernot Frenking, Iris Antes, Marlis Bo¨hme, Stefan Dapprich, Andreas W. Ehlers, Volker Jonas, Arndt Neuhaus, Michael Otto, Ralf Stegmann, Achim Veldkamp, and Sergei F. Vyboishchikov, Pseudopotential Calculations of Transition Metal Compounds: Scope and Limitations. Thomas R. Cundari, Michael T. Benson, M. Leigh Lutz, and Shaun O. Sommerer, Effective Core Potential Approaches to the Chemistry of the Heavier Elements. Jan Almlo¨f and Odd Gropen, Relativistic Effects in Chemistry. Donald B. Chesnut, The Ab Initio Computation of Nuclear Magnetic Resonance Chemical Shielding.

Volume 9 (1996) James R. Damewood, Jr., Peptide Mimetic Design with the Aid of Computational Chemistry. T. P. Straatsma, Free Energy by Molecular Simulation. Robert J. Woods, The Application of Molecular Modeling Techniques to the Determination of Oligosaccharide Solution Conformations. Ingrid Pettersson and Tommy Liljefors, Molecular Mechanics Calculated Conformational Energies of Organic Molecules: A Comparison of Force Fields. Gustavo A. Arteca, Molecular Shape Descriptors.

Volume 10 (1997) Richard Judson, Genetic Algorithms and Their Use in Chemistry. Eric C. Martin, David C. Spellmeyer, Roger E. Critchlow Jr., and Jeffrey M. Blaney, Does Combinatorial Chemistry Obviate Computer-Aided Drug Design? Robert Q. Topper, Visualizing Molecular Phase Space: Nonstatistical Effects in Reaction Dynamics. Raima Larter and Kenneth Showalter, Computational Studies in Nonlinear Dynamics.

xxviii

Contributors to Previous Volumes

Stephen J. Smith and Brian T. Sutcliffe, The Development of Computational Chemistry in the United Kingdom.

Volume 11 (1997) Mark A. Murcko, Recent Advances in Ligand Design Methods. David E. Clark, Christopher W. Murray, and Jin Li, Current Issues in De Novo Molecular Design. Tudor I. Oprea and Chris L. Waller, Theoretical and Practical Aspects of Three-Dimensional Quantitative Structure–Activity Relationships. Giovanni Greco, Ettore Novellino, and Yvonne Connolly Martin, Approaches to Three-Dimensional Quantitative Structure–Activity Relationships. Pierre-Alain Carrupt, Bernard Testa, and Patrick Gaillard, Computational Approaches to Lipophilicity: Methods and Applications. Ganesan Ravishanker, Pascal Auffinger, David R. Langley, Bhyravabhotla Jayaram, Matthew A. Young, and David L. Beveridge, Treatment of Counterions in Computer Simulations of DNA. Donald B. Boyd, Appendix: Compendium of Software and Internet Tools for Computational Chemistry.

Volume 12 (1998) Hagai Meirovitch, Calculation of the Free Energy and the Entropy of Macromolecular Systems by Computer Simulation. Ramzi Kutteh and T. P. Straatsma, Molecular Dynamics with General Holonomic Constraints and Application to Internal Coordinate Constraints. John C. Shelley and Daniel R. Be´rard, Computer Simulation of Water Physisorption at Metal–Water Interfaces. Donald W. Brenner, Olga A. Shenderova, and Denis A. Areshkin, Quantum-Based Analytic Interatomic Forces and Materials Simulation. Henry A. Kurtz and Douglas S. Dudis, Quantum Mechanical Methods for Predicting Nonlinear Optical Properties. Chung F. Wong, Tom Thacher, and Herschel Rabitz, Sensitivity Analysis in Biomolecular Simulation.

Contributors to Previous Volumes

xxix

Paul Verwer and Frank J. J. Leusen, Computer Simulation to Predict Possible Crystal Polymorphs. Jean-Louis Rivail and Bernard Maigret, Computational Chemistry in France: A Historical Survey.

Volume 13 (1999) Thomas Bally and Weston Thatcher Borden, Calculations on Open-Shell Molecules: A Beginner’s Guide. Neil R. Kestner and Jaime E. Combariza, Basis Set Superposition Errors: Theory and Practice. James B. Anderson, Quantum Monte Carlo: Atoms, Molecules, Clusters, Liquids, and Solids. Anders Wallqvist and Raymond D. Mountain, Molecular Models of Water: Derivation and Description. James M. Briggs and Jan Antosiewicz, Simulation of pH-dependent Properties of Proteins Using Mesoscopic Models. Harold E. Helson, Structure Diagram Generation.

Volume 14 (2000) Michelle Miller Francl and Lisa Emily Chirlian, The Pluses and Minuses of Mapping Atomic Charges to Electrostatic Potentials. T. Daniel Crawford and Henry F. Schaefer III, An Introduction to Coupled Cluster Theory for Computational Chemists. Bastiaan van de Graaf, Swie Lan Njo, and Konstantin S. Smirnov, Introduction to Zeolite Modeling. Sarah L. Price, Toward More Accurate Model Intermolecular Potentials For Organic Molecules. Christopher J. Mundy, Sundaram Balasubramanian, Ken Bagchi, Mark E. Tuckerman, Glenn J. Martyna, and Michael L. Klein, Nonequilibrium Molecular Dynamics. Donald B. Boyd and Kenny B. Lipkowitz, History of the Gordon Research Conferences on Computational Chemistry.

xxx

Contributors to Previous Volumes

Mehran Jalaie and Kenny B. Lipkowitz, Appendix: Published Force Field Parameters for Molecular Mechanics, Molecular Dynamics, and Monte Carlo Simulations.

Volume 15 (2000) F. Matthias Bickelhaupt and Evert Jan Baerends, Kohn-Sham Density Functional Theory: Predicting and Understanding Chemistry. Michael A. Robb, Marco Garavelli, Massimo Olivucci, and Fernando Bernardi, A Computational Strategy for Organic Photochemistry. Larry A. Curtiss, Paul C. Redfern, and David J. Frurip, Theoretical Methods for Computing Enthalpies of Formation of Gaseous Compounds. Russell J. Boyd, The Development of Computational Chemistry in Canada.

Volume 16 (2000) Richard A. Lewis, Stephen D. Pickett, and David E. Clark, Computer-Aided Molecular Diversity Analysis and Combinatorial Library Design. Keith L. Peterson, Artificial Neural Networks and Their Use in Chemistry. Jo¨rg-Ru¨diger Hill, Clive M. Freeman, and Lalitha Subramanian, Use of Force Fields in Materials Modeling. M. Rami Reddy, Mark D. Erion, and Atul Agarwal, Free Energy Calculations: Use and Limitations in Predicting Ligand Binding Affinities.

Volume 17 (2001) Ingo Muegge and Matthias Rarey, Small Molecule Docking and Scoring. Lutz P. Ehrlich and Rebecca C. Wade, Protein-Protein Docking. Christel M. Marian, Spin-Orbit Coupling in Molecules. Lemont B. Kier, Chao-Kun Cheng, and Paul G. Seybold, Cellular Automata Models of Aqueous Solution Systems. Kenny B. Lipkowitz and Donald B. Boyd, Appendix: Books Published on the Topics of Computational Chemistry.

Contributors to Previous Volumes

xxxi

Volume 18 (2002) Geoff M. Downs and John M. Barnard, Clustering Methods and Their Uses in Computational Chemistry. Hans-Joachim Bo¨hm and Martin Stahl, The Use of Scoring Functions in Drug Discovery Applications. Steven W. Rick and Steven J. Stuart, Potentials and Algorithms for Incorporating Polarizability in Computer Simulations. Dmitry V. Matyushov and Gregory A. Voth, New Developments in the Theoretical Description of Charge-Transfer Reactions in Condensed Phases. George R. Famini and Leland Y. Wilson, Linear Free Energy Relationships Using Quantum Mechanical Descriptors. Sigrid D. Peyerimhoff, The Development of Computational Chemistry in Germany. Donald B. Boyd and Kenny B. Lipkowitz, Appendix: Examination of the Employment Environment for Computational Chemistry.

Volume 19 (2003) Robert Q. Topper, David L. Freeman, Denise Bergin, and Keirnan R. LaMarche, Computational Techniques and Strategies for Monte Carlo Thermodynamic Calculations, with Applications to Nanoclusters. David E. Smith and Anthony D. J. Haymet, Computing Hydrophobicity. Lipeng Sun and William L. Hase, Born-Oppenheimer Direct Dynamics Classical Trajectory Simulations. Gene Lamm, The Poisson–Boltzmann Equation.

Volume 20 (2004) Sason Shaik and Philippe C. Hiberty, Valence Bond Theory: Its History, Fundamentals and Applications. A Primer. Nikita Matsunaga and Shiro Koseki, Modeling of Spin Forbidden Reactions.

xxxii

Contributors to Previous Volumes

Stefan Grimme, Calculation of the Electronic Spectra of Large Molecules. Raymond Kapral, Simulating Chemical Waves and Patterns. Costel Saˆrbu and Horia Pop, Fuzzy Soft-Computing Methods and Their Applications in Chemistry. Sean Ekins and Peter Swaan, Development of Computational Models for Enzymes, Transporters, Channels and Receptors Relevant to ADME/Tox.

Volume 21 (2005) Roberto Dovesi, Bartolomeo Civalleri, Roberto Orlando, Carla Roetti, and Victor R. Saunders, Ab Initio Quantum Simulation in Solid State Chemistry. Patrick Bultinck, Xavier Girone´s, and Ramon Carbo´-Dorca, Molecular Quantum Similarity: Theory and Applications. Jean-Loup Faulon, Donald P. Visco, Jr., and Diana Roe, Enumerating Molecules. David J. Livingstone and David W. Salt, Variable Selection- Spoilt for Choice. Nathan A. Baker, Biomolecular Applications of Poisson-Boltzmann Methods. Baltazar Aguda, Georghe Craciun, and Rengul Cetin-Atalay, Data Sources and Computational Approaches for Generating Models of Gene Regulatory Networks.

Volume 22 (2006) Patrice Koehl, Protein Structure Classification. Emilio Esposito, Dror Tobi, and Jeffry Madura, Comparative Protein Modeling. Joan-Emma Shea, Miriam Friedel, and Andrij Baumketner, Simulations of Protein Folding. Marco Saraniti, Shela Aboud, and Robert Eisenberg, The Simulation of Ionic Charge Transport in Biological Ion Channels: An Introduction to Numerical Methods. C. Matthew Sundling, Nagamani Sukumar, Hongmei Zhang, Curt Breneman, and Mark Embrechts, Wavelets in Chemistry and Chemoinformatics.

CHAPTER 1

Linear-Scaling Methods in Quantum Chemistry Christian Ochsenfeld, Jo¨rg Kussmann, and Daniel S. Lambrecht Institut fu¨r Physikalische und Theoretische Chemie, Universita¨t Tu¨bingen, D-72076 Tu¨bingen, Germany

INTRODUCTION With the introduction of the Schro¨dinger equation in 1926,1 it was in principle clear how to describe a molecular system and its properties exactly in a nonrelativistic sense. However, for most molecular systems of chemical interest, the analytic solution of the Schro¨dinger equation is not possible. Therefore, since 1926, a multitude of hierarchical approximations (some of which are displayed in Figure 1) have been devised that allow for a systematic approach to the exact solution of the Schro¨dinger equation. Although the Schro¨dinger equation as the fundamental equation in electronic structure theory is already quite old, the field of quantum chemistry is still fairly young and fast moving, and much can be expected in the future for developing and applying quantum chemical methods for the treatment of molecular systems. The importance of the systematic hierarchy for solving the Schro¨dinger equation cannot be overemphasized, because it allows one, in principle, to systematically approach the exact result for a molecular property of interest. The simplest approach in this hierarchy is the Hartree–Fock (HF) method, which describes electron–electron interactions within a mean-field approach.2–4 The electroncorrelation effects neglected in this approach can be described by the so-called Reviews in Computational Chemistry, Volume 23 edited by Kenny B. Lipkowitz and Thomas R. Cundari Copyright ß 2007 Wiley-VCH, John Wiley & Sons, Inc.

1

2

Linear-Scaling Methods in Quantum Chemistry

Figure 1 The hierarchy of ab initio methods: A selection of common approximations for solving the electronic Schro¨dinger equation is displayed. In addition, the asymptotic scaling order ðOðÞÞ with respect to molecular size M is listed.

post-HF methods, with prominent examples such as perturbation theory (e.g., MP2: Møller–Plesset second-order perturbation theory5) or the coupled- cluster (CC) expansion (see e.g. Ref. 6 for a review; CCSD: CC singles doubles; CCSD(T): CCSD with perturbative triples; or CCSDT: CC singles doubles triples). In this way, the hierarchy of ab initio methods allows for reliable ‘‘measurements’’ and for estimating the error bars of simpler approximations. In Figure 1, we also list density functional theory (DFT),7–9 although it does not provide (at least in its current form) a systematic way of improving upon the result. Despite this deficiency in its current form, DFT has pragmatically proven to be highly useful for the description of many molecular systems, while offering a good compromise between accuracy and computational cost. Therefore, DFT has become a standard tool of modern quantum chemistry. The main difficulty associated with the hierarchy of quantum chemical methods is the strong increase of the computational effort with molecular size (M) (compare Figures 1 and 2), especially when approaching the exact solution. Even the simplest approach, the HF method,4 scales conventionally as OðM3 Þ, where OðÞ denotes the order of the asymptotic scaling behavior. This means that when choosing another molecule to study that is 10 times larger than the current molecule, the computational effort is increased by a factor of 1000. The increase becomes even more dramatic if the electron correlation effects neglected in the HF approach are either accounted for by, e.g., MP2 or CCSD, for which the scaling behavior is OðM5 Þ or OðM6 Þ, respectively. The OðM6 Þ scaling entails an increase of the computational effort by a factor of 1, 000, 000 for a 10-fold larger system. At this stage it is worthwhile to spend some time to clarify the scaling behavior. The focus of this chapter is on methods whose efforts increase only linearly

Introduction (a)

3

(b) O(M5)

O(M3)

Computation time

Computation time

2.7M 2 M2

2.7M M

O(M) Molecule size (M)

Molecule size (M)

Figure 2 The computation time behaves approximately as: computation time ¼ a  Mn . Here, Mn is called the scaling behavior, and a is the prefactor. The graphs provide a schematical comparison of computation times for (a) different scaling behaviors and (b) different prefactors (a ¼ 1 and a ¼ 2:7).

with molecular size, M (defined by, e.g., the number of atoms), while the atomcentered basis set framework is retained. Within the same atomic-orbital (AO) basis, the total number of basis functions (N) scales similarly with the molecular size, so that the scaling behavior can be described as well by the number of AOs. However, increasing the number of basis functions for a specific molecule would typically not lead to a linear-scaling behavior. The size of the atom-centered basis simply defines the prefactor of the calculation (i.e., the constant factor with which the scaling behavior is multiplied; see Figure 2). In the current tutorial, we therefore mainly employ the molecule size M for describing the scaling property. To illustrate how prohibitive even an OðM3 Þ scaling would be for the calculation of large molecules, we can think about ‘‘Moore’s law.’’10 It is an empirical observation proposed in 1965, leading to the statement that computer speed roughly doubles every 1.5 years and that has been, as a rule of thumb, astonishingly valid over the last decades. The factor of 1000 for a 10-fold larger molecule can be described as roughly 210 , which would—with Moore’s assumption—correspond to 15 years of computer development required, whereas an OðM6 Þ scaling would lead to even 30 years. In other words, one would need to wait 15 years for the computers to evolve to perform an HF calculation for a 10-fold larger molecule within the same time frame. This is clearly not an option for any enthusiastic researcher attempting to grasp deeper insights into molecular processes in chemistry, biochemistry, or even biology. Therefore, the aim of this didactical review is to provide some insights into reducing the scaling behavior of quantum chemical methods so that they scale linearly with molecular size. In this way, any increase in computer speed translates directly into an increase of the treatable molecular size with

4

Linear-Scaling Methods in Quantum Chemistry

respect to time requirements. The focus of this review is on presenting some basic ideas of these linear-scaling methods, without giving a complete overview of the many different approaches introduced in the literature. For basic aspects of quantum mechanics and quantum chemical methods, the reader is referred to the textbook literature such as, e.g., Refs. 4 and 11–13. In this chapter, we describe mainly linear-scaling self-consistent field (SCF) methods such as HF and DFT, which are closely related in the way energies, energy gradients, and molecular properties are computed. With these linear-scaling methods, molecular systems with more than 1000 atoms can nowadays be computed on simple one-processor workstations. In addition, we provide a brief outlook concerning electron-correlation methods and what might be expected in the future for reducing their scaling behavior while preserving rigorous error bounds. The review is structured as follows:  After a brief introduction to some basics of SCF theories, we describe in the following four sections how Fock-type matrices can be built in a linear-scaling fashion, which is one of the key issues in SCF theories.  The reduction of the scaling for forming Fock-type matrices leads then to the necessity for avoiding the second rate-determining step in SCF energy computations, the cubically scaling diagonalization step.  With the described methods, the linear-scaling calculation of SCF energies becomes possible. However, for characterizing stationary points on potential energy surfaces, the calculation of energy gradients is crucial, which is described in the succeeding section.  To obtain a link to experimental studies, the computation of response properties is often very important. Examples include vibrational frequencies or nuclear magnetic resonance (NMR) chemical shifts, for which the response of the one-particle density matrix with respect to a perturbation needs to be computed. Therefore, we describe ways to reduce the strong increase of the computational effort with molecular size.  Finally, we provide in the last section a brief outlook on the long-range behavior of electron correlation effects for the example of MP2 theory and show how significant contributions to the correlation energy can be preselected, so that the scaling behavior can be reduced to linear.

SOME BASICS OF SCF THEORY The simplest approximation used to solve the time-independent Schro¨dinger equation ^ ¼ E H

½1

Some Basics of SCF Theory

5

within the commonly used Born–Oppenheimer approach14,15 of clamped nuclei and the electronic Hamiltonian X X ZA X X 1 X ^ el ¼  1 r2i  þ H riA 2 i r i i j>i ij A X XX 1 ^i þ h ¼ r i i j>i ij

½2

is the expansion of the wave function in a Slater determinant16 as an antisymmetrized product of one-particle functions ji (spin orbitals): ðr1 r2    rN Þ ¼ jj1 j2    jN i    j1 ðr1 Þ j2 ðr1 Þ    jN ðr1 Þ     j ðr Þ j ðr Þ    j ðr Þ  2 2 2   1 2 N 1  ¼ pffiffiffiffiffiffi  . . . .  .. .. .. N!  ..     j ðr Þ j ðr Þ    j ðr Þ  1 N 2 N N N

½3

With this expansion for the wave function, the expectation value using the electronic Hamiltonian (Eq. [2]) can be calculated using the Slater–Condon rules.4 The result is (in Dirac notation): EHF ¼

XX X ^jj i þ 1 hji jj jjji jj i hji jh i 2 i j i

½4

Minimizing the HF expectation value (Eq. [4]) with respect to orbital rotations while imposing orthonormality constraints leads to the well-known HF equation:2–4 ^ j ¼ ei j F i i

½5

^ as the Fock operator and ei as the orbital energy. To algebraize this with F equation and allow for a suitable solution on computers, it is necessary to expand the one-particle functions in a set of fixed basis functions wm (typically contracted Gaussian basis functions are used in quantum chemistry): X Cmi wm ½6 ji ¼ m

leading to the Roothaan–Hall equations17,18 FC ¼ SCe

½7

6

Linear-Scaling Methods in Quantum Chemistry

where F is the Fock matrix, S is the overlap, C is the coefficient matrix of the molecular orbitals (MOs), and e is the diagonal matrix of the molecularorbital energies. The Fock matrix of a closed-shell molecule is built by contracting the one-particle density matrix Pmn ¼

Nocc X

Cmi Cni

i

½8

with the four-center two-electron integrals and adding the one-electron part hmn : Fmn ¼ hmn þ

X ls

Pls ½2ðmnjlsÞ  ðmsjlnÞ

½9

We use the Mulliken notation for two-electron integrals over (real-valued) Gaussian atomic basis functions in the following: ðmnjlsÞ ¼

ð

wm ðr1 Þwn ðr1 Þ

1 w ðr2 Þws ðr2 Þ dr1 dr2 r12 l

½10

Because the Fock matrix depends on the one-particle density matrix P constructed conventionally using the MO coefficient matrix C as the solution of the pseudo-eigenvalue problem (Eq. [7]), the SCF equation needs to be solved iteratively. The same holds for Kohn–Sham density functional theory (KS– DFT)8,9 where the exchange part in the Fock matrix (Eq. [9]) is at least partly replaced by a so-called exchange-correlation functional term. For both HF and DFT, Eq. [7] needs to be solved self-consistently, and accordingly, these methods are denoted as SCF methods. Two rate-determining steps occur in the iterative SCF procedure. The first is the formation of the Fock matrix, and the second is the solution of the pseudo-eigenvalue problem. The latter step is conventionally done as a diagonalization to solve the generalized eigenvalue problem (Eq. [7]), and thus, the computational effort of conventional SCF scales cubically with system size ½OðM3 Þ. The construction of the Fock matrix scales formally with M4 (or more precisely with N4 ; see discussion above) due to the two-electron integrals being four-index quantities. However, the asymptotic scaling of the number of twoelectron integrals reduces to OðM2 Þ for larger molecular systems. This can be understood by considering the following example: The charge distribution of electron 1 in a two-electron integral (Eq. [10]) is described by the product of basis functions wm  wn . If we consider a selected basis function wm , then only basis functions wn that are ‘‘close’’ to the center of wm will form nonvanishing charge distributions. This is because the Gaussian basis functions

Some Basics of SCF Theory

7

decay exponentially with distance. Therefore, the number of basis functions wn overlapping with the function wm will asymptotically (for large molecules) remain constant with increasing molecular size (in a way one can imagine a ‘‘sphere’’ around the selected basis function as shown in Figure 3). Overall there are OðMÞ basis-function pairs describing each of the two electrons, so that a total of OðM2 Þ two-electron integrals results: ð

1 wm ðr1 Þwn ðr1 Þ wl ðr2 Þws ðr2 Þ dr1 dr2 |fflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflffl} r12 |fflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflffl} OðMÞ

½11

OðMÞ

Figure 3 Illustration of the basis functions-pair domain behavior. For a given basis function (or shell) m, only those n’s must be considered whose overlap integrals Smn ¼ ðmjnÞ exceed a certain threshold. In the upper graph the values of the overlap integral is depicted as a function of the m  n distance. In the lower graph we consider a linear chain of 15 Gaussian functions (circles) at selected points in space. For m ¼ 8 (center atom) and a numerical threshold of 107 , only the shell pairs closer than 4:01 a:u: ¼ 2:12 A˚ ðn ¼ 4–12Þ are numerically significant (shaded area); all other n’s (1– 3,13–15) are numerically insignificant for the formation of the selected charge distribution mn and may be neglected. (Chosen distance is 1 a:u: Each Gaussian is an s function of unit exponent.)

8

Linear-Scaling Methods in Quantum Chemistry

Although the diagonalization of the Fock matrix scales cubically with system size as compared with the quadratic scaling for the formation of the Fock matrix, its prefactor is rather small as shown schematically in Figure 4. Therefore, the diagonalization dominates only for large molecules and/or for fast Fock formation methods as described later.

DIRECT SCF METHODS AND TWO-ELECTRON INTEGRAL SCREENING In ‘‘nondirect’’ SCF methods, all two-electron integrals are calculated once, stored on disk, and later reused in the subsequent SCF iterations. Because the number of integrals scales formally as M4 , storing and retrieving twoelectron integrals is an extremely expensive step as far as disk space and input/ output (I/O) time are concerned. For large molecules the required disk space (and calculation time, see discussion below) easily exceeds all available capacities. Almlo¨f et al.19 observed in a seminal paper that recomputing integrals whenever needed, rather than storing them to disk, could not only be

12 Conventional Fock matrix formation Diagonalization CFMM/LinK Fock matrix formation

10

CPU time [h]

8

6

4

2

0

0

2000

4000

6000

8000

10000

Number of basis functions

Figure 4 Typical timing behavior of the quadratic Fock matrix formation versus the cubically scaling diagonalization step (small prefactor) in SCF energy calculations. The timings for a conventional Fock matrix formation, the linear-scaling CFMM/LinK schemes (as explained later in this review), and Fock matrix diagonalization for a series of DNA molecules ðA-TÞn , n ¼ 1  16 are depicted. Integral threshold is 106 , basis set 6-31G .

Direct SCF Methods and Two-Electron Integral Screening

9

competitive to the methods used so far, but also even surpass them as far as computational and storage efficiency are concerned. Direct schemes have two advantages: First, storage requirements are greatly decreased, and second, calculations for large molecules can actually be made much faster than for nondirect methods, because within the SCF iterations, information on the locality of the molecular system (via the oneparticle density matrix; see discussion below) can be exploited. In this way, the introduction of the ‘‘direct’’ SCF approach constitutes an important step toward the applicability of quantum chemical methods to large molecules. The formation of Fock-type matrices can be schematically divided into two steps:  Selection of numerically significant integrals: integral screening.  Calculation of integrals and formation of final matrices. In the following section, we focus first on the selection of numerically significant integrals, and later we discuss the different contractions of the twoelectron integrals.

Schwarz Integral Estimates Although the asymptotic OðM2 Þ scaling of the four-center two-electron integrals had been known at least since 1973,20 it was only in 1989 in the seminal work of Ha¨ser and Ahlrichs21 that an efficient and widely accepted way of rigorously preselecting the numerically significant two-electron integrals was introduced: 1

1

jðmnjlsÞj  jðmnjmnÞj2  jðlsjlsÞj2 ¼ Qmn Qls

½12

This so-called Schwarz integral screening provides a rigorous upper bound to the four-index integrals, while requiring just the computation of simple twoindex quantities. In this way, small four-index integrals below a chosen threshold can be neglected and the formal M4 scaling associated with the formation of the Fock matrix in HF theory is reduced to OðM2 Þ. As we will see shortly, this was a breakthrough for direct SCF methods19,21,22 and increased the applicability of SCF methods dramatically. In contrast to computing the two-electron integrals before an SCF calculation, the key feature of the direct SCF method is the recalculation of twoelectron integrals in each SCF iteration. Although this recalculation has the disadvantage of computing the integrals multiple times for building the Coulomb and exchange parts (denoted as J and K) of the Hamiltonian, Jmn  Kmn ¼

X ls

Pls ½2ðmnjlsÞ  ðmsjlnÞ

½13

10

Linear-Scaling Methods in Quantum Chemistry

it not only avoids the bottleneck of storing the huge number of four-center two-electron integrals, but also it becomes possible to screen the two-electron integrals in combination with the corresponding one-particle density matrix P available in each iteration. For example, when calculating the Coulomb part (Jmn ), integrals are neglected if their contributions to the Coulomb matrix are below a selected threshold of 10# : neglect ðmnjlsÞ;

if

jPls j  Qmn Qls  10#

½14

An analogous screening criterion may be formulated for the exchange matrix Kmn . It has to be pointed out that in nondirect SCF, the density matrix is not available for screening, because all integrals are calculated prior to the SCF run. Equation [12] has to be employed for screening instead of Eq. [14]. This procedure has a severe disadvantage: Although an integral ðmnjlsÞ itself may be large, its contribution Pls ðmnjlsÞ to the Coulomb (or, analogously, the exchange) matrix and finally the total energy may be negligible, because the density matrix elements Pls are often small. Integral screening for nondirect SCF is, therefore, much less efficient than for direct SCF, because a large number of integrals whose contribution to the final result is negligible cannot be discarded due to the missing coupling with the density matrix. A further improvement on integral screening can be achieved by employing difference densities (c.f. Ref. 19 and 21). The Fock matrices of iterations n and n  1 are given by FðnÞ ¼ h þ PðnÞ  II

Fðn1Þ ¼ h þ Pðn1Þ  II

½15

with II as the antisymmetrized two-electron integrals. Instead of constructing the full Fock matrix in each iteration, a recursive scheme as in the following equation may be used: FðnÞ ¼ Fðn1Þ þ PðnÞ  II

½16

with the difference density PðnÞ for the nth iteration defined as PðnÞ ¼ PðnÞ  Pðn1Þ

½17

Within this scheme, the number of two-electron integrals needed for the Fock matrix updates PðnÞ  II in each iteration may be screened by replacing Pls with Pls in Eq. [14] for the Coulomb part    ðnÞ   ½18 neglect ðmnjlsÞ; if Pls   Qmn Qls  10#

Direct SCF Methods and Two-Electron Integral Screening

11

and in an analogous fashion for the exchange part. As the SCF calculation approaches convergence, the change PðnÞ in the density matrix becomes smaller and smaller and finally approaches zero (within numerical accuracy). The number of two-electron integrals surviving the screening of Eq. [18] is therefore significantly smaller than without difference density screening (Eq. [14]). For an improved algorithm by Ha¨ser and Ahlrichs, where the norm of the difference densities PðnÞ is minimized further, the reader is referred to Ref. 21.

Multipole-Based Integral Estimates (MBIE) The Schwarz estimates introduced by Ha¨ser and Ahlrichs21 are now used in almost every quantum chemical code for two-electron integral screening. However, they are not optimal in a certain sense: They do not describe the 1=R decay behavior between the charge distributions of the two-electron integrals, as we will explain shortly. Consider a two-electron repulsion integral (ERI). ð 1

B ðr2 Þdr1 dr2 ½19 ðmnjlsÞ  ðAjBÞ ¼ A ðr1 Þ r12 which consists of the two charge distributions A and B describing the spatial distribution of electrons 1 and 2, respectively (see Figure 5). Here, A and B are collective indices for the ‘‘bra’’ and ‘‘ket’’ basis functions, i.e., A ¼ mn and B ¼ ls. The A ðr1 Þ and B ðr2 Þ are Gaussian distributions built as products of two Gaussians A ðr1 Þ ¼ wm ðr1 Þ  wn ðr1 Þ and B ðr2 Þ ¼ wl ðr2 Þ  ws ðr2 Þ, respectively. The integral describes the Coulomb repulsion between electrons e 1 and e 2 , whose spatial distribution is represented by A and B , respectively, as illustrated in Figure 5. As stated by Coulomb’s law, the repulsion energy between two charges is proportional to 1=R, where R is the distance between the two particles. Similarly, for the two-electron integral, one finds (see Ref. 13) that ðmnjlsÞ χμχν

Smn Sls R

½20 χλχσ

e−1

e−2

A

B R

 Figure 5 The spatial distributions of electrons e 1 and e2 are described by the orbital products A ¼ wm wn and B ¼ wl ws centered about A and B, respectively. The distance between both centers is denoted as R.

12

Linear-Scaling Methods in Quantum Chemistry

for sufficiently large separations. We will refer to this in the following as exponential and 1=R-coupling as denoted by exponential coupling : 1=R  coupling :

2

2

ðmnjlsÞ Smn Sls eamn Rmn eals Rls 1 ðmnjlsÞ R

½21

where Rmn (Rls ) is the distance between basis function centers wm and wn (wl and ws ). amn and als are some constants irrelevant for the following discussion. Although the Schwarz integral estimates account correctly for the exponential coupling of mn and ls, the 1=R decay when increasing the distance R between the charge distributions is entirely missing. This is illustrated in Figure 6 where both the exact behavior of a two-electron integral and the Schwarz estimate (abbreviated as QQ) behavior are shown. The 1=R distance decay becomes not only important for the treatment of large molecules in SCF theories, but also in electron correlation methods, where the decay behavior is at least 1=R4 .23,24 We will return to the latter issue in our outlook on electron correlation methods later in this review. Almlo¨f pointed out in 197225 that the missing 1/R dependence in the Schwarz screening might be approximated by the following equation via overlap integrals (SA and SB ): ðmnjlsÞ  ðAjBÞ

SA SB RAB

½22

0.8 QQ MBIE-0 MBIE-1 exact

Integral Value

0.6

0.4

0.2

0

0

5

10

15

20

H-F distance (Angstrom)

Figure 6 Comparison of integral estimates MBIE-0, MBIE-1, QQ (Schwarz), and exact 1=R-dependence of two-electron repulsion integrals in a hydrogen-fluoride dimer for integral ðdzz dzz jpz pz Þ with minimum exponents on bra and ket side, respectively, of min ¼ 8:000000  101 and xmin ¼ 6:401217  101 using a 6-31G basis.

Direct SCF Methods and Two-Electron Integral Screening

13

However, Eq. [22] does not represent a rigorous upper bound to the twoelectron integral. Almlo¨f26 as well as Ha¨ser and Ahlrichs21 noted later that nonrigorous bounds for integrals cannot be used in screening as efficiently as rigorous bounds, because the error is uncontrollable. To achieve sufficient accuracy with nonrigorous integral bounds, the thresholds would need to be lowered to an extent that renders them virtually useless for practical applications. Recently, new multipole-based integral estimates (MBIE) have been introduced by Lambrecht and Ochsenfeld.23 They are simple, rigorous, and tight upper bounds to the two-electron integrals, and at the same time, they account for the 1=R decay behavior. Because these estimates can be applied generally in quantum chemistry and are expected to be particularly important in view of electron-correlation theories for larger molecules, we briefly outline the main ideas of this MBIE method. For a discussion of the latter in the context of electron correlation, see also the last section of this tutorial. For a two-electron integral with well-separated charge distributions (we will define this in more detail in the section on multipole expansions of twoelectron integrals), it is possible to expand the r112 operator in a multipole series as13,27–29 ðmnjlsÞ ¼

MMð0Þ MMð1Þ MMð2Þ þ þ þ ... R R2 R3

½23

where the MMðnÞ denote n-th order multipole terms. For example, MM(0) describes the monopole-monopole (overlap-overlap) interaction, MM(1) stands for dipole-monopole terms and MM(2) contains the quadrupolemonopole and dipole-dipole interactions. This series intrinsically contains the 1=R-dependence for which we aim. With the definition of ‘‘absolute spherical multipoles’’ of order n, MðnÞ as the absolute value of the radial part of spherical multipoles ð ðnÞ MA  j A ðrÞrn j r2 dr ½24 and collecting all the nth order terms over the absolute multipole integrals by MMðnÞ , we obtain an upper bound to the two-electron integral: jðmnjlsÞj 

MMð0Þ MMð1Þ MMð2Þ þ þ þ OðR4 Þ R R2 R3

½25

Here MMðnÞ stands for expressions involving absolute multipole integrals of order n. Although this expansion represents a rigorous upper bound to the two-electron integral, it is of no practical use in this form, because the series involves, in principle, an infinite (or, at least, a high) number of terms. Discarding the higher order terms would, of course, not lead to a rigorous upper bound.

14

Linear-Scaling Methods in Quantum Chemistry

The key feature of the MBIE method is to replace the higher order terms by lower order ones, while preserving the rigorous upper bound. This is not trivial, but analytical expressions can be derived that relate higher order multipoles to lower order terms.23 The key idea is illustrated in the following equation:   MMð0Þ X 1  n  1   mnls jðmnjlsÞj     R0 R0  n¼0

½26

Here, all multipoles with n 1 are related to the monopole (overlap) term MMð0Þ by virtue of the analytically derived modified distance R0 (to be described later). This replacement greatly simplifies the form of the series; summing up the geometric series, we obtain the estimate to zeroth order (MBIE-0):   Mð0Þ Mð0Þ   mn ls  jðmnjlsÞj     R0  1 

½27

Note that this integral bound contains the 1=R-coupling through the modified distance R0 . The crucial point of MBIE is that R0 must be changed analytically such that the MBIE expression is a rigorous upper bound. After a tedious derivation,23 it was found that R0  R  RAþB ¼ RAB  RA  RB

½28

and 

1 RA K  2

nþ1 2n

; with

ðn þ 1 þ lÞ K¼ ll=2n

ðnþ1þlÞ 2n

 nþ1 1 2n  e

½29

guarantees that MBIE-0 is indeed a rigorous upper bound. In the previous equation, n is the multipole order up to which MBIE is valid. l and  denote the total angular momentum and the orbital exponents of the Gaussian basis function product, respectively. In the foregoing outline, all terms with n 1 were related back to monopoles (n0 ¼ 0). In a similar fashion, we can relate higher order terms back to dipoles (MBIE-1), quadrupoles (MBIE-2), etc. (n0 ¼ 1; 2; . . .). For example, the MBIE-1 criterion, where all higher order terms are related back to expressions over dipoles, has the following form:     Mð0Þ Mð0Þ  Mð1Þ Mð0Þ þ Mð0Þ Mð1Þ   mn ls   mn ls mn ls  ½30 jðmnjlsÞj   þ      R R02  R0

Calculation of Integrals via Multipole Expansion

15

It is important to note that independent of the order, MBIE always guarantees upper bounds to the two-electron integral.23 The efficiency of the MBIE integral estimates is illustrated in Figure 6, which shows that in contrast to the Schwarz estimates (QQ), MBIE accounts for the 1/R decay behavior of twoelectron integrals. For SCF methods, we have found MBIE-1 to be a sufficiently good screening criterion. It overestimates the true number of significant two-electron integrals by just a few percent, while the screening overhead is negligible. The presentation of actual timings with the MBIE screening will be deferred to a later section, once we have introduced the linear-scaling methods for forming the Fock matrix. As mentioned, it is clear that the MBIE estimates require the validity of the multipole expansion for two-electron integrals, similar to the requirements for the fast multipole methods (these multipole expansions will be presented in detail in the next section). For the near-field part of the integrals, i.e., for charge distributions that are so close that the multipole expansion is not applicable, MBIE cannot be used. Here one can resort to, for example, the Schwarz bounds. MBIE23 is the first rigorous integral screening criterion that takes both the exponential and the 1=R-coupling into account. We also point out that the Schwarz bound significantly overestimates the true integral value, if the ‘‘bra’’ and ‘‘ket’’ basis-function exponents are very different, as has been discussed in the work of Gill et al.30 In contrast, MBIE does not suffer from such a problem. Because the computation, handling, and contraction of two-electron integrals is central to many quantum chemical methods, it is clear that MBIE can be widely applied. Therefore, by introducing the 1=R dependence in the two-electron integral estimates within MBIE, we not only gain performance for the treatment of large molecules in SCF theories, but MBIE becomes the first screening criterion that allows for the rigorous preselection of contributions to the computation of electron correlation effects in AO-based theories. In these a coupling of at least 1=R4 is observed, which finally leads to linear scaling for electron correlation methods, as we will outline in our outlook on electron correlation methods in a later section of this review.

CALCULATION OF INTEGRALS VIA MULTIPOLE EXPANSION As we have seen, the number of Coulomb integrals scales as OðM2 Þ for sufficiently large molecules. To overcome this potential bottleneck, the naive pair-wise summation over electron–electron interactions has to be circumvented. We will see that the multipole expansion of the two-electron integrals can be used, thus allowing us to achieve an overall OðMÞ scaling for calculating the Coulomb matrix.

16

Linear-Scaling Methods in Quantum Chemistry

A

R

{q } i

B

{q j}



A

ΩA

R

B

ΩB

Figure 7 The interaction of several point charges fqi g and fqj g (small open circles) can be approximated as the net interaction term between two charge distributions A and

B (large circles).

One advantage of using the multipole expansion in tackling the Coulomb problem is that instead of treating individual pair-wise interactions between point charges, one can collect them into charge distributions and use the total net interaction between these distributions as illustrated in Figure 7. Combined with a clever tree algorithm, the multipole expansion can be used to avoid the quadratic step of summing over pair-wise interactions to obtain an OðMÞ scaling behavior. Another advantage is the separation of ‘‘bra’’ and ‘‘ket’’ quantities, making it possible to precalculate some auxiliary entities before integral calculation itself starts, thus reducing the scaling prefactor. In the next section, we consider as an introductory example the replacement of individual interactions with effective interactions by using a multipole series. After gaining some basic understanding of the multipole expansion, we derive in detail the spherical multipole expansion, which is one of the most prominent types of multipole expansions used in the calculation of molecular integrals. Once our mathematical tools are derived, we explain an algorithm that scales linearly with the number of interacting particles, namely the fast multipole method (FMM), but that is only suitable for point charges. Then we consider continuous (Gaussian) charge distributions by introducing a generalization of FMM, the continuous fast multipole method (CFMM) in the next section. We complete our tour through multipole methods with a brief overview of other approaches that make use of multipole expansions and tree codes to speed up the calculation of two-electron integrals.

A First Example Before deriving and discussing the multipole expansion in detail, let us first have a glimpse at its usefulness by means of a simple example. Imagine we want to calculate the Coulomb interaction energy between a point charge q1 and a set of point charges fq2 ; q3 ; q4 ; q5 g (all of unit charge) as depicted in Figure 8(a).

Calculation of Integrals via Multipole Expansion

17

Figure 8 Cartoon illustrating the usefulness of the multipole expansion. (a) Naive approach: The interaction between a point charge (open circle) located at A and a set of four point charges distributed around B is calculated by summing over each individual pair-wise interaction term. (b) Multipole expansion: The four individual charges have been replaced by their net effect, where they behave like a single new (more complicated) charge distribution B . The separation between A and B is R ¼ 10 a:u: The point charges of B have a distance of r ¼ 1 a:u: from B.

The simplest way to calculate the interaction energy U1B between q1 and all other charges is by summing over all four pair-wise interaction terms. Using the geometry described in Figure 8(a), this yields the following result: U1B ¼ q1 

5 X i¼2

fi ðr1 Þ ¼ q1 

5 X qi ¼ 0:4010 a:u: r i¼2 1i

½31

Here fi ðr1 Þ denotes the electrostatic potential generated by charge i as it is felt at the location of q1 . This calculation of the repulsion energy is neither difficult nor time-consuming, but it must not be forgotten that the total number of interaction terms in this naive approach scales like OðM2 Þ: If we want to evaluate all interaction energies between two sets of M particles fq1a ; q2a ; . . . ; qMa g and fq1b ; q2b ; . . . ; qMb g, we end up with a number of pair-wise interaction terms on the order of M2 . As the number of interacting particles grows, the number of interaction terms soon becomes intractable. For example, a DNA molecule with 1052 atoms would require the calculation of 1; 359; 959 ð1; 359; 959  1Þ=2 ¼ 924; 743; 560; 861 pair interactions (with a 6-31G basis set and an integral screening threshold of 106 ), presenting the researcher with the enormous number of almost one trillion interaction terms! For large molecules described by several thousand basis functions, we must therefore avoid the naive quadratic loop over pairs of interacting particles. This is where the strength of the multipole expansion comes into play. As we will derive later, the potential f arising from an arbitrary charge can be expanded in terms of monopole (q), dipole (D), quadrupole (Q), and higher order multipole interactions: fðrÞ ¼ fð0Þ ðrÞ þ fð1Þ ðrÞ þ fð2Þ ðrÞ þ . . . q D  ^r 1 ^r  Q  ^r þ ... ¼ þ 2 þ  r3 r r 2

½32

18

Linear-Scaling Methods in Quantum Chemistry

where ^r denotes the unit vector in the direction of r and r ¼ jrj is the length of r. Even if a charge has a complicated structure, for example, is composed of several point charges like in our examples (Figures 7 and 8), its potential can always be expanded in the form of Eq. [32], where the multipoles are those of the composite charge distribution . Instead of looking at the field generated by the individual point charges, we can therefore consider their net effect [Figure 8(b)]. The expansion can be truncated after a finite number of, say, L terms. For the spherical multipole expansion, L ¼ 15–21 is known to provide accuracies on the order of 107 a:u: and better in the total energy.31 Instead of taking into account all M field terms of the individual charges qi , the total field is then given by the L terms of the multipole expansion of the total charge distribution

as ftotal ðrÞ ¼

M X i

fi ðrÞ ! fnet MP ðrÞ

L X n¼0

fðnÞ ðrÞ

½33

In the same manner, the total interaction energy of a point charge qi with all other charges qj is replaced with the net interaction Utotal ¼ qi

M X qj j

rij

net ! UMP

qi

L X n¼0

fðnÞ ðri Þ

½34

Thus, instead of having OðMÞ interaction terms for each point charge qi , we end up with a sum over the different orders of the multipoles of the charge distribution(s) . Because L is constant for a certain level of accuracy, we can thus calculate the interaction energy between qi and one composite charge with complexity that is independent of the number of particles belonging to . To see how this works, let us return to our previous example [Figure 8(a,b)], where we combine the four point particles fq2 ; q3 ; q4 ; q5 g into one new charge distribution B using the multipole series. Index B denotes the center of the new charge distribution and also the center of the multipole expansion. The new ‘‘particle’’ B certainly has a more complicated structure than its constituent point-charge particles; the composite charge distribution has not only a charge (monopole), but in general also a dipole, quadrupole, and higher moments. We now calculate these lower order multipole moments to evaluate the interaction energy. The monopole is simply the total charge of the constituting point charges: qB ¼

5 X j¼2

qj ¼ 4:0000

½35

Calculation of Integrals via Multipole Expansion

19

The interaction energy between q1 and B is, therefore, to zeroth order: ð0Þ

U1B ¼

q1 qB ¼ 0:4000 a:u: R

½36

For the next-highest term, the dipole interaction, we obtain exactly zero because of symmetry reasons: dB ¼ ð1Þ U1B

X j2B

qj rj ¼ 0

½37

¼ 0:0000 a:u:

The quadrupole moment tensor of B is given as 0 2 2 X 3xi  ri @ QB ¼ yi xi i zi xi

xi yi 3y2i  r2i z i yi

1 0 2:0000 xi zi yi zi A ¼ @ 0:0000 0:0000 3z2i  r2i

0:0000 4:0000 0:0000

1 0:0000 0:0000 A 2:0000 ½38

and the second-order quadrupole interaction term gives the result ð2Þ

U1B ¼

1 ^r1  QB  ^rA ¼ 0:0010 a:u:  R3 2

½39

We truncate our expansion here because a distribution of four point charges can have, at most, a quadrupole moment. For high-accuracy calculations on more complicated distributions, the expansion must be carried out to higher orders. Putting everything together, we can approximate the interaction energy by ð0Þ

ð1Þ

ð2Þ

U1B U1B þ U1B þ U1B ¼ 0:4010 a:u:

½40

Note that for this example the exact result of Eq. [31] is reproduced. Instead of treating each interaction term of q1 with fq2 ; . . . ; q5 g explicitly, we approximated the total net influence of all charges using the multipole expansion and ended up with L (instead of M) interaction terms. The computational workload is thus of order OðLÞ instead of OðMÞ when evaluating the total interaction energy between a single point charge and a set of other point charges. It is clear that if the number of point charges in B is large, using the multipole series with only L net field terms may lead to significant savings in CPU time.

20

Linear-Scaling Methods in Quantum Chemistry

Derivation of the Multipole Expansion The multipole expansion may be carried out in several coordinate systems, which may be chosen depending on the symmetry properties of the problem under investigation. Spherical polar and Cartesian coordinates are used most commonly when calculating two-electron integrals. We outline here the derivation for the spherical series. The interested reader may find more detailed discussions, for example, in the books of Eyring, Walter and Kimball27 or Morse and Feshbach.32,33 A discussion of the multipole expansion in the framework of atomic and molecular interactions and potentials may be found in the article of Williams in this series of reviews (Ref. 34) or the book by Hirschfelder, Curtiss and Bird.28 Before deriving the multipole series, let us start with some nomenclature. We want to evaluate the electron repulsion integral ðmnjlsÞ  ðAjBÞ ¼

ð

A ðr1 Þ B ðr2 Þ dr1 dr2 r12

½41

Here, A and B are collective indices for the ‘‘bra’’ and ‘‘ket’’ basis functions; i.e., A ¼ mn and B ¼ ls. A ðr1 Þ and B ðr2 Þ are Gaussian distributions, which are products of two Gaussians A ðr1 Þ ¼ wm ðr1 Þ  wn ðr1 Þ and

B ðr2 Þ ¼ wl ðr2 Þ  ws ðr2 Þ. The centers of the Gaussian distributions are A and B, respectively. For our task it is handy to express the electronic coordinates by their position relative to the centers A and B as r1 ¼ r1A þ A r2 ¼ r2B þ B

½42

and introduce the vector Dr12 as Dr12 ¼ r1A  r2B

r12 ¼ jDr12 j

½43

The separation between the centers is given by R¼BA R ¼ jRj

½44

Our objective is to find a series expansion of the interelectronic distance r12 , which facilitates the separation into an angular and a radial part and which decouples the coordinates of electrons 1 and 2. We follow here the derivation of Eyring et al.27

Calculation of Integrals via Multipole Expansion

21

With the definitions introduced in the previous paragraph, the interelectronic separation can be expressed as r12 ¼ jr1A  r2B þ Rj qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ r212 þ R2  2 cos y  r12  R

½45

Here y is the angle subtended between the vectors Dr12 and R. We denote the larger radial part of the two vectors with r> and the smaller with r< as r> ¼ r< ¼





R

R > r12

r12 r12

R < r12 R > r12

R

R < r12

½46

and introduce the fraction x as x¼

r< r>

½47

The interelectonic distance may now be expressed in terms of x and r> (containing all radial dependence) and the angle y (containing all angular dependence): pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 þ x2  2x cos y

½48

1 1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi r> 1 þ x2  2x cos y 1 1X ¼ an ðxÞPn ðcos yÞ r> n¼0

½49

r12 ¼ r>

We are now ready to separate the angular and the radial parts of r12 . To this end the angular part of the Coulomb term is expanded in Legendre polynomials Pn ðcos yÞ, which form a complete and orthonormal set of eigenfunctions on the interval cos y 2  1; 1½, and the radial part is expressed through coefficients an ðxÞ that will be determined shortly: r1 12 ¼

The coefficients are determined by exploiting the orthonormality of the Legendre polynomials: ð1

1

Pn ðcos yÞPm ðcos yÞ sin ydy ¼

2 dnm 2n þ 1

½50

22

Linear-Scaling Methods in Quantum Chemistry

Squaring the left- and right-hand side terms of Eq. [49], multiplying with the surface element of the unit sphere (sin ydy) and integrating, we obtain ðX ð 1 X 1 1 sin ydy ¼ an ðxÞam ðxÞPn ðcos yÞPm ðcos yÞ sin ydy r2> 1 þ x2  2x cos y n¼0 m¼0 ¼

1 X

2 a2n ðxÞ 2n þ 1 n¼0

½51

The left-hand side integral may be calculated easily and yields a logarithmic function of x. Taylor-expanding this gives ð 1 1 sin ydy 1 1þx X 2 ¼ ln ¼ x2n ½52 r2> 1 þ x2  2x cos y x 1  x n¼0 2n þ 1 Comparing the right-hand side terms of Eqs. [51] and [52], we find that an ðxÞ ¼ xn and our expansion reads as r1 12 ¼

1 1X xn Pn ðcos yÞ r> n¼0

½53

This expansion facilitates the separation of radial and angular parts of the electron–electron and nucleus-nucleus distance vectors. But we are not yet done with our derivation, because one important goal has not yet been achieved, namely the separation of the electronic coordinates r1 and r2 . These are still coupled through the angular parts in the Legendre polynomials and inverse powers of r12 occurring in xn . Without decoupling the electron coordinates, we cannot precalculate terms of the series that depend on electrons 1 and 2 independently, and accordingly, we cannot circumvent a quadratic loop over the electronic coordinates, which would spoil our aim of an OðMÞ method. Invoking the addition theorem of the spherical harmonic functions Ynm ðy; fÞ as expressed in Pn ðcos yÞ ¼

n X

4p Y m ðy12 ; f12 ÞYnm ðyAB ; fAB Þ 2n þ1 n m¼n

½54

the angular parts of the electronic and nuclear coordinates can be decoupled and the multipole series becomes n 1 X 1X 4p rn< m Y ðy12 ; f12 ÞYnm ðyAB ; fAB Þ r> n¼0 m¼n 2n þ 1 rn> n 8   1 X n 1X 4p r12 n m > > > Yn ðy12 ; f12 ÞYnm ðyAB ; fAB Þ for r12 < R > < R R 2n þ1 n¼0 m¼n ¼   n 1 X > 1 X 4p R n m > > > Yn ðy12 ; f12 ÞYnm ðyAB ; fAB Þ for r12 > R : r12 n¼0 m¼n 2n þ1 r12

r1 12 ¼

½55

Calculation of Integrals via Multipole Expansion

23

Here r12 ; y12 ; f12 are the components of Dr12 in spherical coordinates and those of R ¼ RAB are denoted accordingly. So far this expansion is obviously convergent and holds exactly for both cases r12 < R and R < r12 , because jPn ðcos yÞj  1. (For a more detailed discussion the mathematically inclined reader is referred to Refs. 32,33, and 35.) Clearly the upper branch of the series complies with our goal: It decouples the radial parts of the interelectronic coordinates, because only positive powers of r12 occur. This, in the end (see next section), facilitates the factorization of the resulting integrals into parts depending only on r1A and r2B. The integrals are also easy to compute using some standard algorithm for molecular integral evaluation.36–41 The lower branch of the expansion, however, does not allow factorization of the integrals and is difficult to calculate. For that reason, when employing the multipole expansion of the two-electron integral, one usually assumes that r12 < R. That is to say, the charge distributions A and B are required to be nonoverlapping. With this presumption, only the upper part of the series remains and the multipole expansion of the Coulomb operator for nonoverlapping distributions is r1 12

  n 1 X 1X 4p r12 n m Yn ðy12 ; f12 ÞYnm ðyAB ; fAB Þ ¼ R R n¼0 m¼n 2n þ 1

½56

The presumption r12 < R deserves some comment. The cautious reader will have noticed that when dealing with continuous charge distributions like Gaussian products, there are always regions of integration in which the charge distributions overlap to some extent such that r12 > R. Strictly speaking, one would always have to include both branches of the expansion when dealing with charge distributions extending over whole space in order to obtain a convergent and exact series representation of the two-electron integral. In the literature, this is sometimes described by the notion of ‘‘asymptotic convergence’’ of the multipole expansion,13 which means that the series only converges exactly if the overlap tends to zero or the separation R between the ‘‘bra’’ and ‘‘ket’’ charge distributions A and B goes to infinity. For an in-depth investigation of this mathematically involved topic, the reader is referred to the original literature, cf. Ref. 42 and references therein. In practice, however, this does not pose serious problems, because one can derive useful estimates of the error introduced by dropping the second term of the series. One often defines the extent of a Gaussian distribution

A as

RA ¼

erfc1 ð10# Þ pffiffiffiffiffi A

½57

24

Linear-Scaling Methods in Quantum Chemistry

where 10# denotes the desired accuracy. For Gaussian distributions of s angular momentum being separated by RAþB > RA þ RB

½58

one can then show that employing the multipole expansion (Eq. [56]) for calculating the ½ssjss integral leads to an error on the order of 10# times the size of the integral: ½AjBðexactÞ ¼ ½AjBðMPÞ þ error error  10#  ½AjBðMPÞ

½59

For example, to calculate an ½ssjss integral with exponents A ¼ B ¼ 1 to an accuracy of 107 using the multipole expansion, centers A and B have to be ˚ Similar expressions can be derived for higher separated by RA þ RB ¼ 4:0A. angular momenta, but the expression for s Gaussians is usually sufficiently accurate. Together with judicious convergence criteria, the multipole expansion for ERIs produces results that are numerically exact for all practical purposes. Spherical Multipole Expansion for Two-Electron Integrals The spherical multipole expansion as derived can be cast into a different form that achieves higher efficiency for computer implementations and finally decouples the angular parts of the electron coordinates. Notice that when carrying out the multipole summation in the above for4p and mulation each term has to be multiplied with a normalization constant 2nþ1 an inverse power of R. We can introduce new angular functions replacing the spherical harmonics, which already include the normalization constants, and we can precompute them prior to the evaluation of the two-electron integrals. To remove the constant fraction in front of each term, the solid harmonics in Racah’s normalization are used: rffiffiffiffiffiffiffiffiffiffiffiffiffiffi 4p Y m ðy; fÞ ½60 Cnm ðy; fÞ ¼ 2n þ 1 n These are employed to define the scaled regular and irregular solid harmonics of 1 Rnm ðrÞ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rn Cnm ðy; fÞ ðn  mÞ!ðn þ mÞ! pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðnþ1Þ Cnm ðy; fÞ Inm ðrÞ ¼ ðn  mÞ!ðn þ mÞ!r

½61

In terms of these functions, the one-center multipole expansion of the Coulomb operator reads as r1 12 ¼

1 X n X

n¼0 m¼n

 Rnm ðDr12 ÞInm ðRÞ

½62

Calculation of Integrals via Multipole Expansion

25

Using the addition theorem for regular solid harmonics Rnm ðr þ sÞ ¼

k n X X

k¼0 l¼k

Rnk;ml ðrÞRkl ðsÞ

½63

we can finally separate the electronic coordinates, and exploiting the behavior of Rkl ðsÞ in going from s to s, and rewriting the summation indices, our final multipole expansion of r1 12 with electron coordinates 1 and 2 separated reads: r1 12 ¼

n X 1 X k 1 X X

n¼0 m¼n k¼0 l¼k

 ð1Þk Rnm ðr1A ÞInþk;mþl ðRÞRkl ðr2B Þ

½64

Until here we concentrated on the multipole expansion of the Coulomb operator. Now we obtain the series for the two-electron integrals. Inserting the expansion into the two-electron integral and absorbing the ð1Þk prefactor in the interaction matrix T, we arrive at an efficient multipole expansion of the two-electron integrals: ½AjB ¼

1 X k 1 X n X X

n¼0 m¼n k¼0 l¼k

B qA nm ðAÞTnm;kl ðRÞqkl ðBÞ

ð

qA nm ðAÞ ¼ A ðrÞRnm ðrA Þdr ð qBkl ðBÞ ¼ B ðrÞRkl ðrB Þdr

½65

 ðRÞ Tnm;kl ðRÞ ¼ ð1Þk Inþk;mþl

Here the q’s are spherical multipole moments (monopole, dipole, quadrupole, etc.) of charge distributions A and B , respectively. Note that we used square brackets to denote an uncontracted two-electron integral. A suitable generalization for contracted integrals is described in the next section. Collecting all multipole moments of center A into a vector qA ðAÞ, those of B into qB ðBÞ, and arranging the elements of the interaction tensor in matrix form TðRÞ, the multipole expansion can also be formulated in matrix notation as ½AjB ¼ qA ðAÞ  TðRÞ  qB ðBÞ

½66

In the following, we will often drop the arguments, because it is clear on which variables the terms depend. We notice that because each multipole vector has OðL2 Þ components [n ¼ 0; . . . ; L; for each n, there are 2L þ 1 m-components, thus a total of Lð2L þ 1Þ], the total cost of evaluating a single integral using the multipole expansion has OðL4 Þ complexity. The spherical multipole integrals may be

26

Linear-Scaling Methods in Quantum Chemistry

calculated by means of some well-known recursive algorithms (cf. Refs. 13, 36–41,43, and 44) in OðL2 MÞ work. For the interaction tensor, efficient recursion algorithms also exist (cf. Ref. 13). The two-electron integrals are for our purposes real entities, so it is clear that using complex terms (solid harmonics) in the multipole expansion is unnecessary and only makes the computer implementation slower and more difficult. A reformulation in terms of real multipole integrals and interaction matrix elements is possible by splitting each term into a real and a complex part and dealing with them separately. After some algebra, one can see that the imaginary part drops out and one obtains the multipole expansion in terms of real-valued multipole moments and interaction matrix elements. The realvalued multipole expansion may be cast into exactly the same form as that of the complex series. As both formulations are formally very similar, we do not introduce the real formulation here but instead refer the interested reader to the literature, e.g., Ref. 13. The Multipole Translation Operator So far we have only derived the multipole expansion for primitive Gaussian distributions. As pointed out in the introductory example, one of the main strengths of the multipole expansion is that it can be used to treat the interactions of several primitive charge distributions simultaneously by combining them into one single, albeit more complicated, distribution. It will turn out to be useful to translate the centers of multipole expansions to different points in space; e.g., if qðAÞ is an expansion about A, we must find a way to convert it to a series about A  t, where t is the translation vector. We first consider a simple case: the transition from primitive to contracted Gaussian distributions. To this end, the multipole expressions for primitive charge distributions ab  a and gd  b have to be contracted with ls the contraction coefficients kmn ab and kgd as ðmnjlsÞ ¼ ðAjBÞ ¼

XX ab

gd

abgd ls ab kmn ðRabgd Þ  qgd ðbgd Þ ab kgd q ðaab Þ  T |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}

½67

¼½abjgd

Here, Rabgd is the distance vector between the primitive centers aab and bgd . Using contracted multipole integrals instead of primitive integrals, we obtain the following expansion for contracted integrals: ðAjBÞ ¼ qmn ðAÞ  TðRÞ  qls ðBÞ X mn ð kab ab ðrÞRnm ðra Þdr ðAÞ ¼ qmn nm ab

qls nm ðBÞ

¼

X gd

kls gd

ð

gd ðrÞRnm ðrb Þdr

½68

Calculation of Integrals via Multipole Expansion

27

Note that the primitive expansion is centered at points aab and bgd , which are, in general, different from the centers A and B of the contracted charge distributions. By contracting we have in fact carried out a translation from the primitive to the contracted centers. What we now must find out is how to translate expansions to an arbitrary center. Recalling the addition theorem for regular solid harmonics (Eq. [63]), we see that a multipole expansion centered at a can be translated by a vector t to a new center A ¼ a  t as ð A qnm ðAÞ ¼ a ðrÞRnm ðra  tÞdr ¼

ð Rnk;ml ðtÞ a ðrÞRkl ðra Þdr |fflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflffl} k¼0 l¼k |fflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflffl} ¼W ðtÞ

n X k X

nm;kl

½69

¼qakl ðaÞ

We can therefore translate an expansion by simply multiplying with the ‘‘translation operator’’ Wnm;kl ðtÞ and summing over the angular momenta. Translating an expansion from a center a by ta to a new center A may be written in matrix form as qA ðAÞ ¼ Wðta Þ  qa ðaÞ

½70

with a similar expression for the translation from b by tb to B. With this the multipole expansion reads as ðAjBÞ ¼

XX  Wðta Þqa ðaÞ  TðRÞ  Wðtb Þqb ðbÞ a2A b2B

½71

where the summation runs over all a and b centers that need to be shifted to the new centers A and B, respectively. To achieve a specified numerical accuracy for the translation, it is sufficient to truncate the translation series roughly after OðL2 Þ terms. Translating the expansion center of a charge distribution is therefore an OðL4 Þ step. In the next section, we will see that the translation operator is necessary to obtain a true linear-scaling method.

The Fast Multipole Method: Breaking the Quadratic Wall We have seen in the introductory example that the multipole expansion may be used to replace particle interactions with their net effects (see Figure 7). But this alone does not reduce the scaling exponent: If there are d distributions with p particles on average, we will have to evaluate Oðd2 Þ interaction terms. As d ¼ M=p, this so-called ‘‘Naive Multipole Method’’ still has a

28

Linear-Scaling Methods in Quantum Chemistry

complexity of OðM2 Þ(but with a reduced scaling prefactor). For very large molecules, we would again run into problems due to quadratic scaling. We now outline the FMM of Greengard and coworkers (cf. Refs. 45–47) as a way to reduce the scaling exponent to linear without sacrificing accuracy. An efficient derivation and implementation of the FMM has been presented by White and Head-Gordon.48,49 It is important to know that FMM was designed for point charges. Molecular integrals, however, involve (Gaussian) charge distributions, which are continuous in space. A generalization of FMM to continuous charge distributions was developed by White, Johnson, Gill, and Head-Gordon31 and is explained in the next section. First let us consider an example that captures one of the essentials of FMM. Imagine six boxes A, B and C1 to C4 , which, for example, represent fragments of a DNA molecule [Figure 9(a)]. Say that A and B are rather close, whereas C1 to C4 are approximately four times as far from A. Throughout the following discussion, we will asume that the interacting particles are evenly distributed among the boxes (for a discussion of approaches to nonevenly distributed charges, see Ref. 46). The interaction energies UAC1 ; . . . ; UAC4 will then be approximately four times smaller than UAB , because the interaction energy is proportional to 1=R.

Figure 9 In the naive multipole method, (a) the molecule is divided into boxes of equal sizes (for simplicity, only a few selected boxes are shown). For calculating interactions between remote boxes, e.g., A and C1C4 , one can use much larger boxes (b) and still achieve high numerical accuracy. This reduces the computational complexity considerably and is an important stepstone on the way to linear scaling.

Calculation of Integrals via Multipole Expansion

29

The absolute error caused by truncating the multipole expansion is also proportional to 1=R.13 The error is therefore larger for close pairs of boxes of the same size and smaller for remote pairs of the same size. In our example, we could thus combine all four boxes C1 to C4 into a four times larger box C [Figure 9(b)], and the errors in the interaction energies of the close boxes (UAB ) and of the remote boxes (UAC ) would still be on the same order of magnitude. The computational cost, however, is reduced drastically by increasing the size of remote boxes. This idea of resorting to fine grains when describing close interactions and coarse grains for remote interactions is one of the key concepts of the FMM.45 In the framework of FMM, the graining is realized by a hierarchical boxing scheme, which we outline now for the sake of simplicity in one (rather than three) dimension. The molecule is divided into a hierarchy of boxes. Each box is divided into two smaller boxes when going from one level to the next: At level 0 there is one box, at level 1 there are two, level 2 contains four boxes, and so on. This is illustrated for four hierarchy levels in Figure 10 (in general there will be more levels). The first box we will call parent (abbreviated as P), and the latter are the child boxes (abbreviated as C). In Figure 10 at the bottom, for example, the parent of box A is called P(A). In FMM, one distinguishes between near-field (NF) and far-field (FF) interactions. All interactions that can be treated by multipole expansions belong to the far field, all others are near-field. In our illustration of Figure 10, all charges separated by more than two boxes (WS ¼ 2) are far-field. Here, WS is the so-called well-separatedness (WS) criterion. The NF interactions are calculated by conventional methods, e.g., summing over all pairs of charges within the near field. For each level 3 box, there are two to four near-field boxes, depending on whether the box is located at the end or in the inner region of the molecule. The number of particles within the near-field of a particular box therefore remains constant. Altogether,

Level 0

Level 1

P(A)

A

PFF(A)

NF

NF

FF

FF

FF

Level 2

Level 3

Figure 10 Illustration of the far-field calculation in FMM.

30

Linear-Scaling Methods in Quantum Chemistry

there are OðMÞ lowest level boxes, and therefore, the total computational cost for evaluating the near-field interactions scales linearly. The calculation of the far-field interactions in OðMÞ work is the crucial step of the algorithm. We outline it now for the electrostatic field V felt by the charges in box A (see Figure 10). Note that V is the algebraized form of the multipole expanded electrostatic potential fðnÞ ðrÞ we used in the introductory example. For A’s parent P(A), there is only one far-field box with which it interacts: the ‘‘parent far-field’’ PFF(A), as indicated by an arrow in the illustration. We denote this field as VPFF PðAÞ . Box A feels the field of, in principle, five level 3 far-field boxes. Three of them are marked ‘‘FF’’ in the figure. Their field will be denoted as VFF A . The interaction with the remaining two level 3 boxes can be described at a coarser grain, that is, at level 2. We call every far-field interaction that can be described at a higher boxing level parent far-field (PFF) and all remaining (same-level) interactions are denoted far-field (FF). The union of PFF and FF we call total far-field. The interaction of box A with the remaining two far-field boxes is contained in the parent far-field VPFF PðAÞ . Here the translation operator W comes into play: VPFF PðAÞ is the field felt at the center of P(A); to obtain the field experienced at the center of A, we must apply the translation operator. Altogether, the total far field of A is given as PðAÞ

FF ¼ VA Vtotal A FF þ WAPðAÞ  VPFF

½72

Note that the interaction of A with the three closest boxes is calculated at level 3, whereas the more remote interactions are calculated at level 2. In general, FF interactions are evaluated at the highest possible FMM level, i.e., using the largest possible boxes. Consider now a molecule that is twice as large as the previous one, as illustrated in Figure 11. The area shaded in gray denotes a subunit of the size of our previous example (for comparison). Considering again the FF of a box A, then in proceeding from the gray subunit to the total system (doubling the size of the system) displayed in Figure 11, there are only three additional interactions with the new part of the system at levels 2 and 3. The number of these additional interactions is constant and does not scale with the total system size in chosing larger systems, because they are done at the highest possible level of boxes. Therefore, because we have a total number of OðMÞ child boxes, the total effort scales with OðMÞ. Note that the inheritance of the field vectors from parent to child (and vice versa) is crucial for the overall linear scaling of the algorithm. If this could not be done, forming a V vector would involve a summation over the higher level boxes, which would lead to an OðM log MÞ scaling. Finally we note that ‘‘inheritance’’ is only possible through the multipole translation operator, because the field experienced at a parent’s center must be shifted to the child’s center.

Calculation of Integrals via Multipole Expansion

31

Figure 11 Illustration of the far field calculation in FMM for a molecule that is twice as large as the example of Figure 10.

Now we are ready to give a detailed description of the algorithm. The FMM can be divided into four steps or, in FMM language, ‘‘passes.’’ In Pass 1, the multipole expansions of all boxes at the lowest level are calculated. In addition, multipole expansions at higher levels are formed by translating the expansions of the lowest box level to higher ones. These are then converted to the multipole-expanded far fields in Pass 2. Then, in Pass 3, the FF is calculated for all boxes by adding their own FF and their parent’s FF (the inheritance trick). Finally, in Pass 4, the interaction energies of all lowest level boxes with their NF and FF is evaluated to yield the total interaction energy. Altogether, FMM calculates the total interaction energy in OðMÞ work. Pass 1: 1. Calculate multipole expansions for all particles i and boxes A at the lowest level: X qiA qA ¼ i2A

2. Generate multipole expansions for all boxes at higher levels by translating the children’s expansions to the parent’s center: X qA ¼ WAB  qB B2CðAÞ

32

Linear-Scaling Methods in Quantum Chemistry

Pass 2: 3. Calculate far-field vector for each box at every level: VFF A ¼

X

TAB qB

B2FFðAÞ

Pass 3: 4. Generate FF vector for each box at every level by adding current boxes’ FF and the parent boxes’ FF: FF PFF FF FF ¼ VFF Vtotal A A þ VA ¼ VA þ WAPðAÞ VPðAÞ

Special cases: Levels 0 and 1 have no FF. For Level ¼ 2 there is no PFF: FF Vtotal ¼ VFF A A

Pass 4: 5. Calculate interaction energy of each lowest level box with its total FF: UAtotal FF ¼

X i2A

FF qiA  Vtotal A

6. Calculate total interaction energy of each lowest level box (NF þ total FF): UA ¼ UANF þ UAtotal FF ¼

X qi qj þ UAtotal FF Rij i;j2NFðAÞ

7. The total Coulomb interaction energy is given by the sum over the interaction energies of all lowest-level boxes (factor of one half accounts for double counting of interactions): U¼

1X UA 2 A

Fast Multipole Methods for Continuous Charge Distributions So far the FMM considers only point charges. In quantum chemistry, however, we must deal with continuous charge distributions as, for example, in the form of Gaussian distributions. For these continuous distributions, one encounters two difficulties: How to define the spatial extent of a continuous charge distribution (they may extend over the whole space in general), and

Calculation of Integrals via Multipole Expansion

33

how to treat different extents of charge distributions in an efficient way. We discuss here a prominent generalization of FMM to continuous charge distributions, the CFMM,31 which addresses these issues. When treating continuous charge distributions with the multipole expansion, we must ensure that the ‘‘bra’’ and ‘‘ket’’ distributions are nonoverlapping to guarantee convergence of the multipole series. Because Gaussian distributions extend over the whole space, they are never nonoverlapping in a strict sense. However, if the contributions of the overlapping regions to the two-integrals are numerically negligible, employing the multipole series causes no problems in practice.13 From the analytic expressions for the two-electron integral ð A j B Þ over s Gaussians, we pointed out that the error caused by employing the multipole expansion for calculating the ERI is on the order of E, if the ‘‘bra–ket’’ distance R is chosen such that the following equation holds: R > RA þ RB RA ¼ RB ¼

erfc1 ðEÞ pffiffiffiffiffi A 1

erfc ðEÞ pffiffiffiffiffi B

½73

Extended criteria can be derived for higher angular momenta, but in practice, it is usually sufficient to use Eq. [73]. With it, we have a criterion at hand that ensures convergence of the multipole series to the exact value of the integral within numerical accuracy of OðEÞ even for Gaussians. We now outline the CFMM as first formulated by White et al.31 For a discussion of performance issues and a detailed description of implementational considerations, the interested reader is referred to the original literature. A pedagogical introduction to (C)FMM can be found in the book by Helgaker, Jørgensen and Olsen.13 It is important to notice that although the error estimate of Eq. [73] holds only rigorously for s functions, the maximum box–box interaction error is used in CFMM. The CFMM error estimate is therefore generally considered an upper bound to the true error.31 We must keep track of the spatial extents of Gaussian distributions in order to know when the multipole expansion is applicable and when it is not; in other words, we must know the size of the NF for each Gaussian. To that end, a ‘‘well-separatedness criterion’’(WS) or, synonymously, an NF width parameter is introduced, which stores the number of boxes by which a pair of equal Gaussian distributions have to be separated to be treated as an FF pair: WSn ¼ 2n;

n ¼ 1; 2; . . .

½74

34

Linear-Scaling Methods in Quantum Chemistry

With this nomenclature, WS1 means that the NF is two boxes wide, whereas WS2 stands for four boxes and so on (in one dimension). All Gaussian distributions are sorted into boxes (like in FMM) and into branches of WS parameters according to their extents. That is, a Gaussian of extent RA is sorted into the branch with   RA WSn ¼ max WS1 ; 2 d e ½75 Lbox where d e is the ceiling function (smallest integer greater than or equal to argument) and Lbox denotes the size of a lowest level box. Accordingly, the tightest distributions, which have only very small extents, are assigned to the WS1 branch; less tight distributions are assigned to branches WSn with larger n; and the most diffuse distributions belong to the branch with largest wellseparatedness criterion. WS is chosen to be as small as possible while containing the distribution completely; i.e., the far field is chosen as large as possible so as to benefit from the multipole expansion. The WS criterion for two distributions of (in general) different extents is given by WSnm ¼

WSn þ WSm 2

½76

This means the two distributions have to be separated by WSnm boxes in order to be treated as FF interactions. Apart from the additional assignment of charge distributions into WS branches, the CFMM steps are formally similar to those of FMM. The only important difference between CFMM and FMM is that the width of the NF is chosen according to the spatial extent of the charge distributions and the multipole moments are calculated by integration (instead of summation for point charges) with the former method; everything else stays essentially the same. Finally, we comment on the computational complexity of CFMM. It is important to notice that the overall computational complexity of CFMM is OðMÞ for calculating the Coulomb integral matrix, if the Gaussian charge distributions are not extremely diffuse. In the limit of exceedingly diffuse distributions, the NF would extend over the whole molecule, which ultimately results in a late onset of linear scaling (but with reduced prefactor in comparison with conventional methods). In practice, however, this problem is usually not observed for the basis sets commonly used in quantum chemistry for calculations on large molecules: Calculating the Coulomb matrix via CFMM is a linear-scaling step.

Other Approaches We concentrated here on the linear-scaling calculation of the Coulomb matrix in the frame of (C)FMM, which are used commonly in quantum chemical calculations of large molecules. It should be noted that other tree

Exchange-Type Contractions

35

codes for large molecules exist like, for example, Barnes–Hut (BH) tree methods50 or the quantum chemical tree code (QCTC) of Challacombe, Schwegler, and Almlo¨f.51,52 These differ in the structure of the tree used for organizing the boxes, or the kinds of expansions used for the integrals. BH methods traditionally use Cartesian multipole expansions,29,53 whereas the QCTC employs the fast Gauss transform.54 Several variations of these methods have been reported; see the references in Ref. 52 for example. Another variation of FMM are the very fast multipole methods (vFMM).55 In the original FMM formulation, the multipole series is truncated after a maximum angular momentum L, which is kept constant during the whole calculation. Depending on the shape of the charge distributions and the box size, it may not be necessary to carry out the summation up to L but, rather, to a smaller angular momentum Leff < L to reach a defined level of accuracy for some boxes. This is essentially the idea behind vFMM, which truncates the multipole series at a certain angular momentum Leff based on an empirical criterion. Strain, Scuseria, and Frisch developed a variation of CFMM, the Gaussian very fast multipole method (GvFMM),56 which truncates the series after Leff < L terms in the spirit of vFMM.55 Finally, we note that one of the most costly steps in calculating the Coulomb matrix using CFMM or GvFMM is the explicit evaluation of the near-field integrals. Although this step is scaling linearly with the size of the molecule, one can further decrease the prefactor by resorting to special methods that speed up the near-field integral calculation. Here we would like to mention just two recently developed methods: The use of auxiliary basis set expansions57–61 in the multipole accelerated resolution of the identity (MARI-J) approach62 and the Fourier transform Coulomb (FTC) method.63–66

EXCHANGE-TYPE CONTRACTIONS Now that we have described how to reduce the scaling behavior for the construction of the Coulomb part in the Fock matrix (Eq. [9]), the remaining part within HF theory, which is as well required in hybrid DFT, is the exchange part. The exchange matrix is formed by contracting the two-electron integrals with the one-particle density matrix P, where the density matrix elements couple the two sides of the integral: Kmn ¼

X ls

Pls ðmljnsÞ

½77

At first sight, it seems that the formation requires asymptotically OðM2 Þ fourcenter two-electron integrals, so that the overall scaling of the computational effort for building the exchange part would be quadratic. However, the coupling of the two charge distributions of the two-electron integral by the

36

Linear-Scaling Methods in Quantum Chemistry

Figure 12 Significant elements in (a) the canonical MO coefficient matrix ðCÞ and (b) the one-particle density matrix ðPÞ computed at the HF/6-31G level for four DNA base pairs (DNA4 ). Significant elements with respect to a threshold of 107 are colored in black.

one-particle density matrix is of central importance for the scaling behavior as will become clear from the discussion below. Therefore, it is crucial to discuss first the behavior of the one-particle density matrix (Equation [8]). The canonical MO coefficient matrix ðCÞ and the one-particle density matrix (P) are depicted in Figure 12 as computed at the HF/6-31G level for a DNA fragment with four base pairs (DNA4 ). Here, negligible matrix elements below a threshold of 107 are plotted in white, whereas significant elements are shown in black. The figure clearly illustrates that basically no nonsignificant elements occur in the canonical MO coefficient matrix, because the canonical MOs are typically delocalized over the entire system. This is different for the one-particle density matrix where a considerable number of negligible elements occurs. To reduce the computational scaling behavior by exploiting the localization of the one-particle density matrix, it is not sufficient to have many zero elements in the matrix, but it is necessary that the number of significant elements scales only linearly with system size. This favorable behavior of the density matrix is indeed observed, as shown in Figure 13 again for DNA fragments, in comparison with the OðM2 Þ behavior of the canonical coefficient matrix. It is important to note that the scaling behavior of the number of significant elements in the one-particle density matrix is closely related to the highest occupied molecular orbital-lowest unoccupied molecular orbital (HOMO–LUMO) gap of molecular systems; see Refs. 67–70. Therefore, the asymptotic linear scaling behavior holds only for systems with a nonvanishing HOMO–LUMO gap, so that for a ‘‘truly metallic’’ system, for instance, a quadratic behavior would result. Nevertheless, for a multitude of important chemical and biochemical systems, the scaling of the one-particle density matrix is

Exchange-Type Contractions Number of significant elements

37

DNA fragments (6–31G*)

C O(N2) P O(N) 10–5 P O(N) 10–6

15000000

10000000

5000000

0

0

100

200 300 Number of atoms

400

Figure 13 Scaling behavior of significant elements in the one-particle density matrix for DNA fragments computed at the HF/6-31G level as compared with the scaling behavior of the MO coefficient matrix. Two different thresholds of 105 and 106 are shown as compared with the M2 behavior of the coefficient matrix.

asymptotically linear. Therefore, the main goal in many linear-scaling theories is to exploit the scaling behavior of the density matrix and to avoid entirely the use of the nonlocal molecular orbital coefficient matrix. For the linear-scaling formation of the exchange part of the Hamiltonian, the favorable scaling behavior of the one-particle density matrix P needs to be exploited. If we have a linear scaling density P, then to each index m of a matrix element Pmn , there can be only a constant number of elements with an index n that is significant with respect to a given threshold. This is nothing else than the definition of a linear-scaling matrix. However, this means that if we consider the formation of the exchange part in the Fock matrix (Eq. [77]), the asymptotically OðM2 Þ number of four-center two-electron integrals are coupled over the one-particle density matrix elements, so that the overall number of required two-electron integrals is reduced to linear for a linear-scaling density matrix: Kmn ¼

X ls

ns Þ Pls ð ml j |{z} |{z} M M |fflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflffl} M

½78 Pls

38

Linear-Scaling Methods in Quantum Chemistry

This result also becomes clear in looking at the pseudo-code for the formation of the exchange part: Loop over m ! OðMÞ Loop over l ! Oð1Þ: coupled to m via overlap Loop over s ! Oð1Þ: coupled to l via Pls Loop over n ! Oð1Þ: coupled to s via overlap Kmn þ ¼ Pls ðmljnsÞ endloop endloop endloop endloop

Here, the outside loop runs over all basis functions m [scaling as OðMÞ]. The second loop is over the second index of all significant charge distributions ml (see above), so that this second loop scales as Oð1Þ (asymptotically independent of molecular size). The third loop is coupled to the Oð1Þ l loop over the linear-scaling one-particle density matrix, therefore scaling as well independently of system size [Oð1Þ]. And finally the last loop is behaving as Oð1Þ due to the coupling in forming the charge distributions ns . These two simple considerations illustrate that the scaling of the exchange part is linked closely to the scaling behavior of the one-particle density matrix. It is important to note, however, that the onset of the scaling behavior for the exchange formation can be significantly earlier than the one for the density matrix using the same thresholds due to the coupling over the twoelectron integrals. We will discuss this in more detail and present timings in the context of the calculation of energy gradients. The key to a truly linear-scaling exchange is the efficient implementation of the screening for significant contributions to the exchange matrix in a nonquadratic manner. The time effort for a quadratic screening can be reduced significantly (see Ref. 71), but this depends strongly on the molecular system being studied and the type of exchange contraction (e.g., exchange-type contraction of perturbed densities). It is clear that asymptotically an OðM2 Þ screening procedure would dominate the calculation. The first attempts to reduce the scaling of the exchange formation required assumptions about the long-range density and exchange behavior.72,73 Because of these assumptions they were not able to readily ensure a prescribed accuracy. This difficulty was overcome by Schwegler et al. in 199774 in their ONX (order N exchange) algorithm by employing the traditional densityweighted Schwarz integral estimates of direct SCF methods21 within a novel loop structure. For nonmetallic systems, i.e., systems with a nonvanishing HOMO–LUMO gap, this achieves effective linear scaling by using preordered integral estimates, which allow the calculation to leave a loop early and to avoid unnecessary computational effort. However, unlike conventional direct

Exchange-Type Contractions

39

integral contraction schemes,19,21,22 the original (compare as well Ref. 75) ONX does not exploit permutational symmetry of the two-electron integrals. It is clear that as long as the exchange formation is dominated by the integral computation, it is favorable to avoid sacrificing permutational symmetry that is, in an ideal case, a factor of four. The need to exploit permutational symmetry is of particular importance if the algorithm reverts to quadratic scaling for small molecules or small band gap systems. To avoid these problems, the LinK method was introduced by Ochsenfeld et al. in 1998.71 It reduces the computational scaling for the exchange part to linear for systems with a nonvanishing HOMO–LUMO gap, while preserving the highly optimized structure of conventional direct SCF methods with only negligible prescreening overhead and without imposing predefined decay properties. The LinK method leads to early advantages as compared with conventional methods for systems with larger band gaps. Due to negligible screening overhead, it is also competitive with conventional SCF schemes for both small molecules and for systems with small band gaps. For the formation of an exchangetype matrix in, e.g., coupled perturbed SCF theory, the LinK method achieves sub-linear scaling, or more precisely, independence of the computational effort from molecular size for local perturbations.71 Because implementing the linearscaling screening is tricky and does not provide much further insight for the current tutorial, we refer the reader to the original literature for details.71,76 We conclude this section by presenting some illustrative results for comparing Schwarz and MBIE screening using test calculations on DNA molecules

CPU time [h]

1

QQ MBIE-1 exact

0.5

0 0

4

8 DNA base pairs

12

16

Figure 14 Illustrative timings comparing Schwarz (QQ) and MBIE screening for calculating the exchange matrix for a series of DNAn molecules (n ¼ 1; 2; 4; 8; 16) with up to 1052 atoms (10674 basis functions). All calculations were performed within the LinK method and with a 6-31G basis at a threshold of 107 on an Intel Xeon 3.6 GHz machine.

40

Linear-Scaling Methods in Quantum Chemistry

with up to 1052 atoms. In Figure 14, the calculation time for building one Hartree–Fock exchange matrix using Schwarz (QQ) and MBIE screening, respectively, is shown, while in both schemes, the LinK method is used. We observe that, in both schemes, the calculation time scales indeed linearly with the molecule size as pointed out in the foregoing discussion. A speedup of the calculation by a factor of 2.1 is observed by employing MBIE as compared with the QQ screening, whereas the numerical error in the exchange energy is preserved and is on the order of 0.1 mHartree for both screening approaches using a threshold of 107 . We have also compared these timings to ‘‘exact’’ screening, that is, the estimated calculation time that would result if the two-electron integrals were known exactly in the screening process. From the fact that the MBIE and ‘‘exact’’ graphs almost coincide, it is evident that MBIE screening is close to optimal for SCF.

THE EXCHANGE-CORRELATION MATRIX OF KS-DFT Although the Kohn–Sham-DFT method has been well established in solid-state physics for many years, it was introduced to the computational chemistry community by a reformulation within a finite Gaussian basis set.59,77–80 Nowadays basically all popular ab initio packages provide a variety of exchange-correlation (XC) functionals that are widely used in computational chemistry and physics. In this section, we will not present the different types of XC functionals (see Ref. 9 and references therein; Refs. 81 and 82 also treat the recently developed meta-GGA functionals) but discuss only briefly the OðMÞ formation of the XC potential matrix Vxc in the given basis.83,84 It has to be mentioned that hybrid XC functionals85 also contain a certain amount of exact exchange K, which can be formed in OðMÞ fashion within the LinK scheme71,76 as described above. The XC energy Exc is in general a functional of the density r. Within the GGA, Exc is also a functional of the density gradient rr, and within the meta-GGA, it is a functional of r; rr, and additionally the kinetic energy density t: ð ½79 Exc ¼ fxc ½ra ðrÞ; rb ðrÞ; rra ðrÞ; rrb ðrÞ; ta ðrÞ; tb ðrÞ dr The potential v xc arising from exchange-correlation interactions between electrons is defined by the derivative of the XC energy functional Exc with respect to the one-particle density rðrÞ as v xc ðrÞ ¼

qfxc ðrÞ qrðrÞ

½80

The Exchange-Correlation Matrix of KS-DFT

41

and the discrete representation in the given basis results from integration over r as ð qfxc ðrÞ ½81 vxc jwn i ¼ w ðrÞwn ðrÞ dr ðVxc Þmn ¼ hwm j^ qrðrÞ m Because it is typically not possible to determine Exc and Vxc by analytic integration, a numerical quadrature has to be used. Therefore, Eq. [79] is rewritten to become A grid NA N X X pA wi fxc ðri Þ ½82 Exc ¼ i

A

A is the number of grid points fri g and wi is the weight to the given where Ngrid grid point of atom A. pA is the nuclear partition function that enables a split of the molecular grid into single atomic integral contributions. In a first step, the atomic grids are constructed usually by a combination of radial and angular grid points.86 After determining the partition factors pA by, e.g., the popular method proposed by Becke,87 the different atomic grids are merged to yield the molecular grid in OðMÞ fashion. For each atomic grid, the integral contribution is calculated with a scaling behavior independent of system size. After determining the constant number of basis functions wm required for the actual subgrid, as well as the corresponding basis function pairs wm wn , the representation of the one-particle density within the partial grid is formed by X rðri Þ ¼ Pmn wm ðri Þwn ðri Þ ½83 mn

with analogous equations for rrðri Þ and tðri Þ. At this point it is important to note that the localization or delocalization of the electrons resulting in a sparse or dense discrete density matrix P does not effect the scaling behavior of the algorithm; i.e., the strict OðMÞ scaling holds even for metallic systems due to the overlap-type coupling of wm and wn (see also discussion for extremely diffuse basis functions in the context of CFMM). The evaluation of the XC functional and its derivatives at each point of the sub-grid is followed by the summation of the energy functional values to yield the XC energy Exc . To form the matrix representation of the corresponding XC potential Vxc in the given basis, the different first-order derivatives have to be contracted with the corresponding basis function values as NA

hwm j^ vxc jwn i ¼

grid NA X X

A

i

pA wi

qfxc ðri Þ w ðri Þwn ðri Þ qrðri Þ m

½84

For determining higher order derivatives of the XC potential, which are needed for response properties, the implementation can be done in a similar fashion, so that an OðMÞ scaling behavior is ensured as well.

42

Linear-Scaling Methods in Quantum Chemistry

AVOIDING THE DIAGONALIZATION STEP—DENSITY MATRIX-BASED SCF In the last sections, we have seen that the Fock matrix can be formed in a linear-scaling fashion. This however means that the second rate-determining step within the SCF approach becomes more important for large molecules due to its OðM3 Þ scaling, although it shows a rather small prefactor: The solution of the generalized eigenvalue problem is typically done by a diagonalization of the Fock matrix. We now discuss general approaches of how to avoid the diagonalization step entirely and to reduce the cubic scaling to linear. The necessity for diagonalization alternatives is illustrated by the following example: Both the Fock matrix diagonalization and the Fock matrix construction using LinK/CFMM require for a DNA8 molecule (eight stacked DNA base pairs) with 5290 basis functions approximately 22 minutes on a 3.6-GHz Xeon processor (HF/6-31G , threshold 107 , MBIE screening).23 This changes for a DNA16 system (with 1052 atoms and 10674 basis functions), where the diagonalization is already more costly than calculating the two-electron integral matrices—141 versus 51 minutes. Therefore it is clearly necessary to circumvent the diagonalization for large molecules. By diagonalizing the Fock matrix, the canonical MO coefficient matrix ðCÞ is obtained (see Eq. [7]). However, we have seen in a previous section that almost all elements in the coefficient matrix are significant, which contrasts with the favorable behavior of the one-particle density matrix ðPÞ. The density matrix is conventionally constructed from the coefficient matrix by a matrix product (Eq. [8]). Although the Roothaan–Hall equations are useful for smallto medium-sized molecules, it makes no sense to solve first for a nonlocal quantity ðCÞ and generate from this the local quantity ðPÞ in order to compute the Fock matrix or the energy of a molecule. Therefore, the goal is to solve directly for the one-particle density matrix as a local quantity and avoid entirely the use of the molecular orbital coefficient matrix.

General Remarks In the following sections, we will provide an overview of density matrixbased SCF theory that allows one to exploit the naturally local behavior of the one-particle density matrix for molecular systems with a nonvanishing HOMO–LUMO gap. Besides the density matrix-based theories sketched below,68,88–94 a range of other methods exists, including divide-and-conquer methods,95–98 Fermi operator expansions (FOE),99,100 Fermi operator projection (FOP),101 orbital minimization (OM),102–105 and optimal basis densitymatrix minimization (OBDMM).106,107 Although different in detail, many share as a common feature the idea of (imposed or natural) localization regions in order to achieve an overall OðMÞ complexity. This notion implies that the density matrix (or the molecule) may be divided into smaller

Avoiding the Diagonalization Step—Density Matrix-Based SCF

43

submatrices (submolecules), of which only a linear-scaling number of fragments may interact with each other. For an overview of these methods, the reader is referred to reviews by Goedecker,108,109 Scuseria,84,110 and Bowler et al.111,112 In the field of ab initio quantum chemistry, it seems that density matrix-based schemes are (so far) favored, whereas other diagonalization alternatives are mainly applied to large tight-binding or semi-empirical calculations. We will therefore focus on density matrix-based approaches, which not only allow one to avoid the diagonalization step, but also provide a way for the efficient calculation of molecular response properties such as NMR chemical shifts for large systems.113 In the next section, we begin by describing some basics of tensor formalisms that are useful (but not necessary) for understanding methods employing nonorthogonal basis functions. That section is followed by a brief outline of selected properties of the density matrix. With this we then turn to the formulation of diagonalization alternatives based on solving directly for the one-particle density matrix.

Tensor Formalism To account correctly for the metric of the space spanned by the basis functions (overlap matrix Smn ; see also Figure 15), it is convenient to handle operations in a nonorthogonal basis (like the AO basis) using tensor notation. As we will be concerned with AO-formulations of quantum chemical methods in the following, a basic understanding of this topic is useful for comprehending the succeeding sections, although we can only give a very brief introduction here. For a more thorough and yet pedagogical introduction to tensor theory in quantum chemistry along with a detailed list of references to the literature, the interested reader is referred to the review of Head-Gordon et al.114

y

χ1

f

χ2 g

x

Figure 15 Basis vectors and functions may be (a) orthogonal, (b) nonorthogonal, or even (c) curvilinear like (a) ðx; yÞ, (b) ðf ; gÞ, and (c) ðw1 ; w2 Þ in this illustration. The metric gmn  Smn (which can be identified with the overlap matrix in quantum chemistry) describes uniquely the kind of coordinate system (a–c) spanned by the basis and provides a measure for distances, volumes, etc.

44

Linear-Scaling Methods in Quantum Chemistry

A general introduction to tensor analysis and its relation to Dirac’s notation may be found in the book of Schouten.115 Every vector x in an n-dimensional linear vector space may be expressed as a linear combination of basis vectors ei : x¼

n X i¼1

xi ei  xi ei

½85

where xi are the components of x in the ei representation. On the right-hand side, we used Einstein’s sum convention, which we employ for the sake of brevity whenever applicable. Vectors with lower indices will be called ‘‘covariant,’’ e.g., the ei are covariant basis vectors. The basis vectors are nonorthogonal in general. That is, the scalar product of every pair gives a number S: ei  ej ¼ Sij ; 0  jSij j  1

½86

where we assume that the basis vectors are normalized such that Sii ¼ 1. One could wish, for reasons that will become clear later, to find a second set of basis vectors ej , such that for every ei , there is an ej with j

ei  e ¼



1 0

if i ¼ j if i 6¼ j

½87

Equation [87] is similar to the case of normalized orthogonal basis sets (where ei  ej ¼ dij ), with the difference being that one vector comes from the first basis set and the second from the other basis set. For that reason, Eq. [87] is referred to as the ‘‘biorthogonality’’ or ‘‘biorthonormality’’ condition. Basis vectors meeting the biorthogonality requirement with respect to a covariant basis ei will be denoted with an upper index and are called ‘‘contravariant.’’ The contravariant basis vectors will also be nonorthogonal in general; i.e., ei  ej ¼ Sij . Instead of expanding x in terms of covariant basis vectors, one may use equally well the contravariant basis as x¼

n X i¼1

xi ei  xi ei

½88

Here the xi are the components of x in contravariant representation. So far we have restricted ourselves to vectors so as to simplify the discussion. Now we turn to tensors. A tensor TðkÞ of rank k may be seen as an entity whose components are described by TðkÞ ¼

X

i1 ;i2 ;...;ik

T

i1 ;i2 ;...;ik

e e ...e i1 i2

ik

½89

Avoiding the Diagonalization Step—Density Matrix-Based SCF

45

with k indices i1 ; i2 ; . . . ; ik . Note that here the indices are placed below T and the e’s to denote that each may either be co- or contravariant depending on the chosen representation.115 For example, one could choose a set of covariant basis vectors (T i1 ;i2 ;...;ik ei1 ei2 . . . eik ), a contravariant set (Ti1 ;i2 ;...;ik ei1 ei2 . . . eik ), or even a mixed representation. To be called a tensor, such an entity must obey certain rules concerning coordinate transformations, which we will, however, not discuss here and assume to be fulfilled in the following. Consider the following examples for illustration: A vector a in ndimensional space is described completely by its n components ai . It may therefore be seen as a one-index quantity or a tensor of rank one. A matrix A has n2 components Aij (two indices) and is a rank two tensor. A tensor of rank three has n3 components, and its components have three indices, Tijk , and so on. As a special case, scalars have only n0 ¼ 1 component and are tensors of rank zero. The following important rules of tensor analysis should be mentioned: Addition and subtraction is only defined for tensors of the same rank and of the same transformation properties (co-/contravariance). For example, adding a matrix and a vector is not valid. Multiplication (also called tensor contraction) is only defined for pairs of indices, where one index is co- and the other is contravariant. As another example, xi xi is a valid tensor contraction, but xi xi is not. Tensor notation may be applied to quantum chemical entities such as basis functions and matrix elements. For example, jwm i is a covariant tensor of rank one. Like before, superscripts, e.g., jwm i, denote contravariant tensors. Co- and contravariant basis functions are defined to be biorthogonal; that is, they obey the conditions of hwm jwn i ¼ dmn and hwm jwn i ¼ dnm

½90

where the first index refers to the ‘‘bra’’ side and the second to the ‘‘ket.’’ For the sake of simplicity, we do not pay attention to the order of indices in the following; e.g., we use dmn instead of dmn or dmn . For an in-depth discussion of this point, see Ref. 116. Co- and contravariant basis functions are nonorthogonal in general; i.e., the following equation holds: hwm jwn i ¼ Smn and hwm jwn i ¼ Smn

½91

It can be shown that co- and contravariant tensors may be converted into each other by applying the contravariant and covariant metric tensors gmn ¼ ðS1 Þmn  Smn and gmn ¼ Smn as jwm i ¼ gmn jwn i and jwm i ¼ gmn jwn i

½92

where Smn is the well-known overlap matrix and Smn ¼ ðS1 Þmn is its inverse.

46

Linear-Scaling Methods in Quantum Chemistry

For tensors Tð2Þ of rank two, we have the following choices as far as coand contravariance of the component indices are concerned: (1) Tmn , (2) T mn , (3) Tmn , and (4) Tnm . Alternative (1) is said to be ‘‘fully covariant,’’ (2) is ‘‘fully contravariant,’’ and the other two are ‘‘mixed’’ representations. In principle, one is free to formulate physical laws and quantum chemical equations in any of these alternative representations, because the results are independent of the choice of representation. Furthermore, by applying the metric tensors, one may convert between all of these alternatives. It turns out, however, that it is convenient to use representations (3) or (4), which are sometimes called the ‘‘natural representation.’’ In this notation, every ‘‘ket’’ is considered to be a covariant tensor, and every ‘‘bra’’ is contravariant, which is advantageous as a result of the condition of biorthogonality; in the natural representation, one obtains equations that are formally identical to those in an orthogonal basis, and operator equations may be translated directly into tensor equations in this natural representation. On the contrary, in fully co- or contravariant equations, one has to take the metric into account in many places, leading to formally more difficult equations. Let us look at how to translate the idempotency requirement of the density operator (which will be discussed in the next section more extensively) into a tensor equation as, for example, ^ ^2 ¼ r r

½93

Introducing the matrix elements of the density operator in the natural reprerjwn i, one may easily cast this operator equation into tensentation, Pmn ¼ hwm j^ sor form: Pml Pln ¼ Pmn

½94

This natural tensor equation is formally similar to the operator equation. If we wish to cast this equation into another (nonorthogonal) representation, we can do so by applying the metric tensor as described above. Let us, for example, rewrite Eq. [94] using the fully contravariant form of the density matrix: Pma gal Plb gbn ¼ Pmb gbn |fflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflffl} |fflfflffl{zfflfflffl}

½95

Pma Sal Pln ¼ Pmn ; or in matrix notation : PSP ¼ P

½96

Pml Pln

Pmn

Because gbn occurs on both sides of the equation, we can remove it, and inserting gal ¼ Sal ; we arrive at

This is the same result that is used in AO-based density matrix-based formulations of quantum chemistry discussed in later sections.

Avoiding the Diagonalization Step—Density Matrix-Based SCF

47

Atomic basis functions in quantum chemistry transform like covariant tensors. Matrices of molecular integrals are therefore fully covariant tensors; e.g., the matrix elements of the Fock matrix are Fmn ¼ hwm jFjwn i. In contrast, rjwn i. This reprethe density matrix is a fully contravariant tensor, Pmn ¼ hwm j^ 114,116 sentation is called the ‘‘covariant integral representation.’’ The derivation of working equations in AO-based quantum chemistry can therefore be divided into two steps: (1) formulation of the basic equations in natural tensor representation, and (2) conversion to covariant integral representation by applying the metric tensors. The first step yields equations that are similar to the underlying operator or orthonormal-basis equations and are therefore simple to derive. The second step automatically yields tensorially correct equations for nonorthogonal basis functions, whose derivation may become unwieldy without tensor notation because of the frequent occurrence of the overlap matrix and its inverse. In the following we will tacitly assume some basic knowledge of tensor analysis, especially as far as co- and contravariance is concerned. We will, however, in general not use upper and lower indices to discriminate co- and contravariance, because this is traditionally omitted in quantum chemistry and would greatly complicate the notation. The rare occasions where this tensor notation is needed will be pointed out explicitly.

Properties of the One-Particle Density Matrix A system with N electrons is fully described by the corresponding wave function and, following the interpretation of Born,117 j j2 dr1 dr2 . . . drN ¼  ðr1 r2 . . . rN Þ ðr1 r2 . . . rN Þdr1 dr2 . . . drN

½97

represents the probability for finding electron 1 in dr1 , electron 2 in dr2 , and so on. The probability for an arbitrary electron to be found at r1 is obtained by integrating over the positions of the remaining electrons and accounting for the indistinguishability of fermions: ð ½98 rðr1 Þ ¼ N ðr1 r2 . . . rN Þ  ðr1 r2 . . . rN Þdr2 . . . drN

which defines the so-called one-particle density function (see Ref. 118). It is important to note that these functions are quadratic in the wavefunction and invariant to unitary transformations of the wave function. If we consider the Hartree–Fock approach of building a Slater determinant from one-particle functions ji (compare our section on the basics of SCF theory), we can similarly define the one-particle density as X ji ðrÞji ðrÞ ½99 rðrÞ ¼ i2occ

48

Linear-Scaling Methods in Quantum Chemistry

^ as and the corresponding density operator r X ^¼ jji ihji j r

½100

i2occ

The density operator is a projector onto the occupied space, which becomes more clear if one considers, for example, an arbitrary function f that is expanded in the basis fjr g as f ¼

X r

ar jjr i

½101

By projection with the density operator, the orthonormality causes all components other than those corresponding to the occupied space to disappear: ^f ¼ r

XX

i2occ

r

ar jji ihji jjr i ¼

XX

i2occ

r

ar jji idir ¼

X

i2occ

ai jji i

½102

Projecting a second time gives the same result, leading to the idempo^Þ in tency property of projection operators ð^ r2 ¼ r ^2 ¼ r

X ^ jji ihji j ¼ r jji i hji jjj i hjj j ¼ |fflfflffl{zfflfflffl} i2occ i2occ

X j2occ

½103

dij

If the one-particle functions ji are expanded in basis functions wm as X ji ¼ Cmi wm ½104 m

the density operator can be written as XX X X ^¼ jji ihji j ¼ Pmn jwm ihwn j Cmi Cni jwm ihwn j ¼ r i2occ

mn i2occ

mn

½105

with the one-particle density matrix P introduced in Eq. [8]. If we consider again the idempotency property of the density operator, X XX ^ ^2 ¼ Pmn jwm ihwn j ¼ r ½106 Pmn jwm i hwn jwl i hws j Pls ¼ r |fflfflffl{zfflfflffl} mn mn ls Snl

then it becomes immediately clear that the following holds: PSP ¼ P

½107

Avoiding the Diagonalization Step—Density Matrix-Based SCF

49

Note that we have already derived this equation by the help of tensor notation in the previous section. The overlap matrix S appears in a nonorthogonal basis and is important for correct contraction with co- and contravariant basis sets. Therefore, either PS or SP is a projector onto the occupied space depending on the tensor properties of the quantity to which it is applied. The same holds for the complementary projector onto the virtual space ð1  PSÞ or ð1  SPÞ. An important technique that we will exploit in the next sections is the decomposition of matrices into occupied–occupied (oo), occupied–virtual (ov/vo), and virtual–virtual (vv) blocks.118 This is done by projecting these matrices onto the occupied and/or virtual space using the projectors PS or SP (occupied) and ð1  PSÞ or ð1  SPÞ (virtual) as it is shown in the following equation for a covariant matrix A: A ¼ 1A1 ¼ SPAPS þ SPAð1  PSÞ þ ð1  SPÞAPS þ ð1  SPÞAð1  PSÞ ¼ Aoo þ Aov þ Avo þ Avv

½108

Density Matrix-Based Energy Functional As discussed, for a formulation of SCF theories suitable for large molecules, it is necessary to avoid the nonlocal MO coefficient matrix, which is conventionally obtained by diagonalizing the Fock matrix. Instead we employ the one-particle density matrix throughout. For achieving such a reformulation of SCF theory in a density matrix-based way, we can start by looking at SCF theory from a slightly different viewpoint. To solve the SCF problem, we need to minimize the energy functional of   1 ½109 E ¼ tr Ph þ PGðPÞ 2 where GðPÞ denotes the two-electron integral matrices contracted with the density matrix. We minimize the energy with respect to changes in the one-particle density matrix, dE ! ¼0 dP

½110

enforcing two constraints: First, the idempotency condition of the following equation needs to be accounted for: PSP ¼ P

½111

and, second, the number of electrons N must be correct: tr ðPSÞ ¼ N

½112

50

Linear-Scaling Methods in Quantum Chemistry

These conditions are automatically fulfilled upon diagonalization of the Fock or the Kohn–Sham matrices and the formation of the density matrix (Eq. [8]). The question now becomes: How do we impose these properties without diagonalization? Li, Nunes, and Vanderbilt (abbreviated as LNV)88 first realized in the context of tight-binding (TB) calculations (see also the related work of Daw119) that insertion of a purification transformation introduced by McWeeny in 1959120,121 allows one to incorporate the idempotency constrain directly into the energy functional (Eq. [109]). In addition, the constraint of having the correct number of electrons was imposed by fixing the chemical potential mchem as in:   TB ~ ½113 ELNV ¼ tr PðH  mchem 1Þ ~ denotes the purified density matrix: where 1 is the unit matrix and P ~ ¼ 3PSP  2PSPSP P

½114

This purification transformation of McWeeny120,121 allows one to create, out of a nearly idempotent density matrix P; a more idempotent matrix ~. The method converges quadratically toward the fully idempotent matrix.120 P ~ ðxÞ ¼ 3x2  2x3 (for S ¼ 1) is shown in Figure 16 and possesses The function x two stationary points at f ð0Þ ¼ 0 and f ð1Þ ¼ 1. The purification transformation (Eq. [114]) converges quadratically to an idempotent density matrix whose eigenvalues are either 0 or 1, which correspond to virtual or occupied states, respectively. The necessary convergence condition is that the starting eigenvalues of P are in the range (0.5, 1.5).

~x

2 1.5 1 0.5

−1

0

−0.5

0.5

1

1.5

−0.5 −1

~ ¼ 3x2  2x3 . Figure 16 ‘‘Purification’’ function x

x

Avoiding the Diagonalization Step—Density Matrix-Based SCF

51

Let us illustrate the purification transformation by some numerical examples. Suppose x is close to zero, say, x ¼ 0:1. Purification will then bring ~ ¼ 3ð0:1Þ2  2ð0:1Þ3 ¼ 0:028. Suppose, on the other hand, it closer to zero: x that x ¼ 0:9, that is, close to one. This time the purity transformation brings it ~ ¼ 3ð0:9Þ2  2ð0:9Þ3 ¼ 0:972. We have also illustrated the concloser to 1: x vergence of the purification transformation for several starting values of x and the density matrix in Figure 17. In passing by, we note that the original LNV approach was designed for orthogonal basis functions. Nunes and Vanderbilt later presented a generalization to nonorthogonal problems89 (see as well later work by White et al. in Ref. 122). A modified LNV scheme for SCF theories was introduced by Ochsenfeld and Head-Gordon.68 Similarly Millam and Scuseria90 presented as well an extension of the LNV algorithm to the HF method. In the derivation of density matrix-based SCF theory below, we do not employ the chemical potential introduced by LNV,88 but instead we follow the derivation of Ochsenfeld and Head-Gordon, because McWeeny’s purification automatically preserves the electron number.68 Therefore, to avoid the diagonalization within the SCF procedure, we minimize the energy functional   ~GðP ~Þ ~ ¼ tr P ~h þ 1 P E 2

½115

~ is the inserted purified density. with respect to density-matrix changes, where P The simplest approach is therefore to optimize the density matrix (e.g., starting

Figure 17 Convergence of purification transformation for different starting values (left). Purification of the density matrix after a typical geometry optimization step within DQCSCF (see the following section for a definition) calculation (right), the logarithmic value of the norm of the residual (logjjPðiÞ  Pði1Þ jj) is plotted.

52

Linear-Scaling Methods in Quantum Chemistry

with a trial density matrix PðiÞ ) by searching for an energy minimum along the direction of the negative energy gradient: Pðiþ1Þ ¼ PðiÞ  s 

qE½PðiÞ  qPðiÞ

½116

where s is the step length. The gradient is built by forming the derivative of the energy functional (Eq. [115]): ~ qE ~ qP ~ qE ¼ ¼ 3FPS þ 3SPF  2SPFPS  2FPSPS  2SPSPF ~ qP qP qP

½117

At convergence this energy gradient expression reduces to the usual criterion of FPS  SPF ¼ 0. It is important to note that the covariant energy gradient (Eq. [117]) cannot be added directly to the contravariant one-particle density matrix, so that a transformation with the metric is required. Therefore, let us look briefly at the tensor properties of the energy gradient. Rewriting the energy gradient (Eq. [117]) in tensor notation, ! ~ qE ¼ ðrEÞmn ¼ 3Fml Pls Ssn þ 3Sml Pls Fsn qP mn  2Sml Pls Fsa Pab Sbn  2Fml Pls Ssa Pab Sbn ls

ab

 2Sml P Ssa P Fbn

½118

we note immediately that this gradient is a fully covariant tensor. Because the density matrix is fully contravariant in ‘‘covariant integral representation,’’ it is tensorially inconsistent to generate a new density matrix by adding the fully covariant gradient to the fully contravariant density matrix PðiÞ . Converting the covariant to contravariant indices by applying the inverse metric, we find the tensorially consistent formulation of the energy gradient116 as follows: ðrEÞmn ¼ gml ðrEÞls gsn

¼ ðS1 Þml ðrEÞls ðS1 Þsn

½119

With this the fully contravariant energy gradient results: ðrEÞ ¼

qE ¼ 3S1 FP þ 3PFS1  2PFP  2S1 FPSP  2PSPFS1 qP

½120

Because all matrices for its formation can be built in a linear-scaling fashion and because they are sparse for systems with a nonvanishing

Avoiding the Diagonalization Step—Density Matrix-Based SCF

53

HOMO–LUMO gap, the energy gradient with respect to the density matrix can be built with linear-scaling effort. Due to symmetries, only a few sparse matrix multiplications are required for the computation of the gradient. In this way, it is possible to avoid the diagonalization in the SCF procedure, thereby reducing the computational scaling asymptotically to linear68,122 for large molecules.

‘‘Curvy Steps’’ in Energy Minimization In the simplest density matrix-based method, optimization steps are taken in the direction of the negative gradient rE. The best one can therefore do is to find the minimum energy along a straight line defined by the gradient direction in each step. One can show, however, that it is possible to find the minimum along a curved path at essentially no additional cost,91,92 which potentially leads to more efficient minimization steps. With this approach, the idempotency condition is fulfilled through higher orders than in the density matrix-based scheme described above, where just the lowest-order purification transformation of McWeeny121 enters the formulation. It is useful to describe the generation of a new density matrix from the previous matrix by unitary transformation: ~ ~ P ¼ Uy PU

½121

Every unitary matrix U can be represented by an exponential function of an anti-Hermitian matrix D13 U ¼ eD ¼ 1 þ D þ

1 2 D þ ... 2!

½122

or in the tensor notation introduced in the previous section as m

Unm ¼ ðeD Þmn  en

½123

In this notation, the exponential parametrization of the new density matrix becomes ~m y m l s ml l sn ~ Ps e P n ¼ ðU Þl Ps Un ¼ e

½124

The density matrix (and thus the Hartree–Fock energy) can now be seen as functions of the parameter D and the requirement for an energy minimum becomes ðrEÞnm ¼

~ ~~ l ~l qE qE qP s q Ps s ¼0 m ¼ Fl m ¼ ~l qn qP qmn ~ qn s

½125

54

Linear-Scaling Methods in Quantum Chemistry

If one is not yet at a minimum, one can for example use the method of steepest descent to arrive at the optimum density matrix. Inserting the explicit depen~m m ~ dence of P n on n , e.g., in the form of the Taylor expansion of Eq. [124] m around n ¼ 0, one obtains for the direction of steepest descent: ! qE  m ¼ ðFlm Pln  Pml Fnl Þ ½126 n ¼   qnm  ¼0

Until now this ansatz is similar to the LNV approach (reformulated in natural tensor notation), starting from an almost idempotent density matrix. (Inserting the purity-transformed density matrix and going to covariant integral representation, the previous equation yields exactly the same result as in the LNV approach.) One could search along this direction in a steepest descent manner to reach the energy minimum. It is instructive to notice that these searches along a straight line may be interpreted as truncating the Taylor series of the exponential transformation after the linear term:     ~ ~ ¼ 1  D þ 1 D2 þ . . . P 1 þ D þ 1 D2 þ . . . P  DP þ PD ½127 P 2! 2!

Now we introduce ‘‘curvy steps.’’ An intuitive interpretation of what is done here is to expand the Taylor series of the exponential transformation to higher orders, such that the step directions are no longer straight lines, but instead they are curved. Invoking the Baker–(Campbell–)Hausdorff lemma (see, e.g., Ref. 123), the unitary transformation of the density matrix can be written as 1 1 ~ ~ P ¼ P þ ½P; D þ ½½P; D; D þ ½½½P; D; D; D þ . . . 2! 3!

½128

or X 1 m ~ ~m ¼ P P½ j n n j! j¼0 where the P½j are short-hand notations for nested commutators, which can be calculated by recursion using

m ½129 P½ jþ1 ¼ ½P½ j ; Dmn n

In a similar way, the Hartree–Fock trial energy, as a function of the trans~ ~, can be written as a series in the step length s, as ~ ½P formed density matrix E

m X sj ~ ½ j ¼ P½ j Fn ~ ½ j ; with E ~¼ ½130 E E m n j! j¼0

Avoiding the Diagonalization Step—Density Matrix-Based SCF

55

This equation describes the dependence of the trial energy on the step length ~ ½ j ’s and P½ j ’s. along a curved step direction given by the E In the ‘‘curvy steps’’ approach, higher terms of the Taylor expansion may be retained by including higher order commutators in the sum of Eq. [130] (letting j run to high orders), which corresponds to making steps along curved (polynomial) directions. If the series of Eq. [130] was truncated after j ¼ 1, one would obtain essentially the same step directions as in the LNV approach (starting from an idempotent density matrix). A step along a curved direction is superior to one along a straight line and should lead to faster convergence as far as the number of iterations is concerned, because higher order terms of the Taylor expansion are kept in the transformation of the density matrix. If all intermediate matrices are stored in memory, searching along curved directions is not more expensive than for straight-line steps; therefore, Head-Gordon and coworkers.91 find the curvy steps method to be faster than the LNV approach.

Density Matrix-Based Quadratically Convergent SCF (D-QCSCF) We have shown, in principle, how to circumvent the diagonalization and introduced two alternatives for choosing the density updates—the methods of steepest descent and ‘‘curvy steps.’’ Now we derive another density update, on which the density matrix-based quadratically convergent SCF method (D-QCSCF) of Ochsenfeld and Head-Gordon68 is based. This will also be our starting point in deriving linear-scaling methods for energy derivatives needed to determine response properties like vibrational frequencies or NMR chemical shifts, which are described in the next two sections. To minimize the energy functional (Eq. [115]) with respect to density changes ~ ! dE ¼ 0 dP

½131

we can use, for example, a Newton–Raphson scheme.124 The Taylor expansion of the energy functional around P in changes of the density matrix (P) is given as  ~  ~ dE 1 d2 E ~ ~   ðP Þ2 þ . . . ðP Þ þ EðP þ P Þ ¼ EðPÞ þ dP P ¼0 2 dP2 P ¼0

½132

For small changes P in the density P, terms higher than linear in the expansion can be discarded. We want to minimize the energy gradient of Eq. [132] as

56

Linear-Scaling Methods in Quantum Chemistry ~ ðP þ P Þ ! dE ¼0 dP

½133

Neglecting all terms higher than linear and differentiating Eq. [132], we immediately arrive at the Newton–Raphson equation, which has to be solved iteratively to obtain the density update PD : ~ ~ d2 E dE ðP Þ ¼  2 dP dP

½134

The term on the right-hand side of Eq. [134] is already known from the simple energy gradient (Eq.[117]) and the left-hand side can be calculated as   ~ q qEðPÞ tr P ¼ 3FP S þ 3SP F  2FP SPS  2FPSP S qP qP

 2SP FPS  2SPFP S  2SP SPF  2SPSP F þ 3GðXÞPS þ 3SPGðXÞ  2GðXÞPSPS  2SPSPGðXÞ  2SPGðXÞPS

½135

with X ¼ 3P SP þ 3PSP  2P SPSP  2PSP SP  2PSPSP

½136

After the density update has been determined using the Newton–Raphson equations shown above, the density matrix may be updated as ðiÞ

Pðiþ1Þ ¼ PðiÞ þ s P

½137

The procedure of determining P and updating the density is iterated until selfconsistency is obtained. For molecules with a nonvanishing HOMO–LUMO gap, all matrices involved are sparse, such that solving the SCF eigenvalue problem is altogether an asymptotically linear-scaling step.

Implications for Linear-Scaling Calculation of SCF Energies This concludes our derivation of tools for the linear-scaling calculation of SCF energies. We have outlined methods that enable the linear-scaling execution of the two expensive steps of SCF calculations: first, efficient integral screening and linear-scaling formation of Fock-type matrices, and second, methods for circumventing the diagonalization step, used conventionally for

SCF Energy Gradients

57

solving the SCF pseudo-eigenvalue problem. With these methods, it is now possible to calculate HF and DFT energies with an effort scaling asymptotically linear, so that molecular systems with 1000 and more atoms can be handled with today’s computers.

SCF ENERGY GRADIENTS Up to this stage of our review, we were mainly focusing on the computation of the energy of a molecule. However, to obtain suitable instruments for studying molecular systems and to be able to establish useful connections to experimental investigations, we also need to compute other molecular properties than just the energy. The first step toward this is the calculation of energy gradients, e.g., with respect to nuclear coordinates, which allow us to locate stationary points on a potential energy surface. In addition, energy gradients are crucial for performing direct Born–Oppenheimer molecular dynamics. The energy gradients with respect to a nuclear coordinate, as an example, can be obtained by differentiating the SCF energy expression of Eq. [109]: qE 1 ¼ trðPhx Þ þ trðP Gx ðPÞÞ þ trðPx FÞ qx 2

½138

where Gx ðPÞ denotes the contraction of derivative two-electron integrals with the one-particle density matrix and hx the derivative of the core-Hamiltonian matrix. Note that the computation of these integral derivatives can be done in a linear-scaling fashion by slight modifications of the previously introduced OðMÞ algorithms like CFMM and LinK, for example. Although the derivative density matrix Px occurs, Pulay pointed out125–127 that it can be avoided by exploiting the solved Roothaan–Hall equations and the derivative of the orthonormality relation. In this way, the perturbed density matrix is replaced by an expression containing the overlap integral derivative Sx as qE 1 ¼ trðPhx Þ þ trðP Gx ðPÞÞ  trðWSx Þ qx 2

½139

where W is the ‘‘energy-weighted density matrix’’ expressed as Wmn ¼

X i

ei Cmi Cni

½140

This formulation requires one to compute the energy-weighted density matrix using the molecular orbital coefficient matrix C, which must be avoided to

58

Linear-Scaling Methods in Quantum Chemistry

achieve an overall linear-scaling behavior. Therefore, we need to derive an alternative expression for substituting tr ðPx FÞ (see also Ref. 128). To obtain equations that are independent of Px , it is necessary to consider the different contributions to the derivative density. As for any matrix representation of operators, it is possible to split the contributions into different subspace projections (compare Eq. [108]): Px ¼ Pxoo þ Pxov þ Pxvo þ Pxvv

½141

These projections will be analyzed below. At SCF convergence, the following equations are valid: FPS ¼ SPF P ¼ PSP

½142 ½143

In addition, after introducing the perturbation ‘‘x,’’ the derivative of the idempotency relation (Eq. [143]) has to be fulfilled: Px ¼ Px SP þ P Sx P þ PS Px

½144

Projecting Eq. [144] onto the occupied space and employing the idempotency relation (Eq. [143]) allows us to identify Pxoo : Pxoo ¼ PSPx SP ¼ PSPx SP þ PSx P þ PSPx SP ¼ PSx P

½145

This shows that the occupied–occupied part of Px is directly linked to the derivative of the overlap integrals. In addition, the virtual–virtual block of Px vanishes: Px  PSPx  Px SP þ |fflfflfflffl{zfflfflfflffl} PSPx SP ¼ 0 Pxvv ¼ ð1  PSÞPx ð1  SPÞ ¼ |fflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflffl} PSx P

PSx P

½146

With these properties of Px at hand, the remaining part of tr ðPx FÞ is the occ/virt and the virt/occ blocks: trðPxov FÞ ¼ trðPSPx ð1  SPÞFÞ ¼ trðPx ð1  SPÞFPSÞ ¼ trðPx Fvo Þ

trðPxvo FÞ

x

¼ trðP Fov Þ

½147

½148

In Eqs. [147] and [148], the cyclic permutation possible within a trace has been exploited, which shows that the projection of Px can be transferred to the Fock matrix. At this stage it is important to think again on what is done

Molecular Response Properties at the SCF Level

59

in solving the SCF equations at the HF or DFT level: The Fock matrix is diagonalized, so that blocks coupling virtual and occupied parts vanish: PSP F ¼ 0 FPS ¼ SPF  S |{z} Fov ¼ SPFð1  PSÞ ¼ SPF  SP |{z} SPF

P

Fvo ¼ 0

½149 ½150

Therefore, the term involved in the energy gradient simplifies to trðPx FÞ ¼ trðPxoo FÞ ¼ trðPSx PFÞ ¼ trðPFPSx Þ

½151

In this way the final density matrix-based energy gradient expression results, which avoids the use of the conventional energy-weighted density matrix (which is conventionally constructed via the delocalized, canonical MO coefficients): qE 1 ¼ trðPhx Þ þ trðP Gx ðPÞÞ  trðPFPSx Þ qx 2

½152

Here, only quantities that are sparse for systems with a nonvanishing HOMO– LUMO gap enter the formulation, whereas the derivative quantities hx , Sx , and the contraction of the one-particle density matrix with the derivative integrals can be computed in a linear-scaling fashion. For the integral contractions slightly modified CFMM- and LinK-type schemes can be used to reduce the scaling.76,129 It is important to note that, for example, in the formation of the exchange energy gradient EKx , the derivative two-electron integrals are contracted with two one-particle density matrices: EKx ¼

XX mn

ls

Pmn ðmljnsÞx Pls

½153

Therefore, the coupling in reducing the scaling behavior is even stronger than for the calculation of the Fock matrix. An example for the scaling behavior of the energy-gradient calculation as compared with the conventional quadratic scaling behavior is displayed in Figure 18 for DNA fragments. It is important to note that the numerical accuracy is the same in both cases.

MOLECULAR RESPONSE PROPERTIES AT THE SCF LEVEL With the energy gradients at hand, one can now locate stationary points on a potential hypersurface or study the dynamics of a system. However, a huge class of important molecular properties is more complicated to calculate,

60

Linear-Scaling Methods in Quantum Chemistry

Figure 18 Comparison of timings for standard OðM2 Þ (STD JKgrad ) and linear-scaling energy gradients for DNA fragments (HF/6-31G ) using the LinK76 and CFMM129 methods.

because they are linked to the response of the molecular system with respect to a perturbation. Examples include vibrational frequencies, NMR chemical shifts, and polarizabilities. An excellent review is given in Ref. 130, so that we focus in the following on the key issues that need to be resolved to reduce the scaling behavior for the computation of response properties to being linear at the SCF level. In the following, we first describe briefly how properties such as vibrational frequencies and NMR chemical shifts are computed. Then we focus on how to calculate the common difficult part, namely the response of the oneparticle density matrix with respect to a perturbation and how to reduce the computational scaling to linear.

Vibrational Frequencies The second derivatives with respect to nuclear displacements are crucial for characterizing stationary points on a potential hypersurface. They provide as well the normal modes of the system and can be linked within the harmonic approximation to the vibrational frequencies of the system, which can be measured experimentally by IR or Raman spectroscopy. By taking the derivative of the SCF energy gradient expression (Eq. [152]) with respect to another

Molecular Response Properties at the SCF Level

61

nuclear coordinate y, we obtain the following expression for the second derivatives: ! ! !   q2 E q2 h 1 q2 II q2 S qF qS ¼ tr P þ tr P P  tr PFP þ tr P P qxqy qxqy 2 qxqy qxqy qy qx       qP qh qP qII qP qS qP qS þ tr ½154 P  tr FP þ PF þ tr qy qx qy qx qy qx qy qx where II abbreviates the antisymmetrized two-electron integrals. In contrast to the simple energy gradient expression, the computation of the perturbed oneparticle density matrix cannot be avoided anymore for the second derivatives. To obtain this response of the one-particle density matrix with respect to the perturbation y, the coupled-perturbed Hartree–Fock or coupled-perturbed Kohn–Sham equations need to be solved.131–136 The standard path for a solution in the MO basis scales as OðM5 Þ,135,136 whereas an AO formulation introduced later128,136 reduces the computational effort to OðM4 Þ. The scaling behavior of the latter scheme is due to a partial solution of the coupledperturbed self-consistent field (CPSCF) equations in the MO basis. To reduce this scaling behavior, Ochsenfeld and Head-Gordon68 reformulated the CPSCF theory in a density matrix-based scheme (D-CPSCF), so that asymptotically a linear-scaling behavior becomes possible. Closely related to this density matrix-based approach, Ochsenfeld, Kussmann, and Koziol recently introduced an even more efficient density-based approach for the solution of the CPSCF equations in the context of the calculation of NMR chemical shieldings.113 Therefore, we focus in the following on the calculation of NMR shieldings and formulate the corresponding D-CPSCF theory within this context.

NMR Chemical Shieldings The routine calculation of NMR chemical shifts137–141 using quantum chemical methods has become possible since the introduction of local gaugeorigin methods,142–148 which provide a solution to the gauge-origin problem within approximated schemes. In our formulation, we use gauge-including atomic orbitals (GIAO):144–146   i wm ðBÞ ¼ wm ð0Þ exp  B ðRm  R0 Þ  r ½155 2c where B is the magnetic field vector and wm ð0Þ denotes the standard fieldindependent basis functions. The location of the basis functions and the gauge origin are described by Rm and R0 , respectively. The use of the GIAO functions permits us to avoid the gauge-origin problem and has proven to be particularly successful.137 In the following, we constrain ourselves to the HF method (GIAO-HF),145,146,148 which provides useful results for many molecular systems.

62

Linear-Scaling Methods in Quantum Chemistry

For example, in many systems, we found the GIAO-HF method to yield 1 H-NMR chemical shifts with an accuracy of typically 0.2–0.5 ppm.149–152 For other nuclei the inclusion of correlation effects can become more important.137,139, 153,154 The computation at the GIAO-DFT level is closely related. NMR chemical shifts are calculated as second derivatives of the energy with respect to the external magnetic field B and the nuclear magnetic spin mNj of a nucleus N: sN ij ¼

q2 E qBi qmN j

½156

where i, j are x, y, z coordinates. This leads to: sN ij ¼

X mn

Pmn

X qPmn qhmn q2 hmn þ qBi qmNj qBi qmNj mn

½157

The equation shows that, similar to the calculation of vibrational frequencies, the response of the one-particle density matrix to a perturbation is necessary, which is in the case of NMR shieldings the magnetic field Bi . Therefore, the CPSCF equations need to be solved for the perturbed one-particle density qP matrices qBmni (short: PBi ). In the context of NMR shieldings, the computational effort of conventional schemes146,148 scales cubically with molecular size. To reduce the scaling behavior for the calculation of response properties, we focus in the following on a reformulation of the CPSCF equations in a density matrix-based scheme, so that the scaling of the computational effort can be reduced to linear for systems with a nonvanishing HOMO–LUMO gap.

Density Matrix-Based Coupled Perturbed SCF (D-CPSCF) The solution of the CPSCF equations is necessary for obtaining the response of the one-particle density matrix with respect to a perturbation. As mentioned, in conventional formulations of CPSCF theory, AO–MO transformations are involved, so that again the delocalized, canonical MO coefficients are required. In this way, it is not possible to reduce the computational effort to linear. Therefore, the key feature of linear-scaling CPSCF theory is to avoid these MO transformations and, instead, to solve directly for the perturbed one-particle density matrices. The quadratically convergent density matrix-based SCF method described above in the context of avoiding the diagonalization within the SCF cycle can be used as the basis for the reformulation of the response equations.68 Related alternative approaches have been later proposed in Refs. 155 and 156, but we follow in this review our derivation presented in Refs. 68 and 113, which we have found to be useful in obtaining an efficient density matrix-based CPSCF scheme for large molecules.

Molecular Response Properties at the SCF Level

63

In the following, we focus on the determination of the density matrix as perturbed with respect to the magnetic field ðPBi Þ, whereas an extension to other perturbations is straightforward. Within a linear response formalism (only terms linear in the external perturbation are considered), we can solve for the perturbed density matrix PBi directly "

" # # ~ ½P ~ ½P q2 E q qE Bi P ¼ qBi qP qP2

½158

~ is the functional described in Eq. [115]. Inserting Eq. [135] and the where E derivative of Eq. [117] with respect to the perturbation Bi into Eq. [158], we obtain68 3FPBi S þ 3SPBi F  2FPBi SPS  4FPSPBi S  4SPBi SPF  2SPSPBi F þ GðXÞPS þ SPGðXÞ  2SPGðXÞPS ¼ FPSBi þ SBi PF þ 2FPSBi PS þ 2SPSBi PF  FðBi Þ PS  SPFðBi Þ þ 2SPFðBi Þ PS

½159

with FðBi Þ ¼ hBi þ GBi ðPÞ Bi

Bi

Bi

Bi

X ¼ P SP þ PSP  2PSP SP  PS P ¼ P

Bi

½160

½161

At this stage, it is worthwhile to consider some properties of the derivative density matrix, which can—as described in the section on density matrixbased energy gradients—be split into a sum of subspace projections (Eq. [108]): PBi ¼ PBooi þ PBovi þ PBvoi þ PBvvi

½162

The comparison with the derivative of the idempotency relation PBi ¼ PBi SP þ PSBi P þ PSPBi

½163

clarifies the different contributions of PBi : PBooi ¼ SBooi PBovi PBvvi

¼

PBvoi

¼0

½164

½165

½166

64

Linear-Scaling Methods in Quantum Chemistry

where the sign on the right of Eq. [164] originates from the fact that the firstorder matrices with respect to the magnetic perturbation are skew-symmetric. As we can directly calculate SBi , we only have to determine the occupied– virtual part PBovi . To solve only for PBovi and PBvoi , the equation system of Eq. [159] can be projected by SP from the left and PS from the right, respectively, and the two resulting equations are added. In this way, we obtain the following density matrix-based CPSCF equations,113 which provide superior convergence properties, in particular if sparse algebra is employed: FPBi SPS þ SPSPBi F  FPSPBi S  SPBi SPF þ GðPBi ÞPS þ SPGðPBi Þ  2SPGðPBi ÞPS ¼ FPSBi þ SBi PF  FPSBi PS  SPSBi PF  FðBi Þ PS  SPFðBi Þ þ 2SPFðBi Þ PS

½167

with q½ðmnjlsÞ  12 ðmljnsÞ qhmn X þ Pls qBi qBi ls X 1 qPls ðmljnsÞ Gmn ðPBlsi Þ ¼  2 ls qBi ðBi Þ Fmn ¼

½168 ½169

The convergence properties of the density matrix-based equations, i.e., the number of iterations to converge PBi , are similar to the ones encountered for a solution in the MO space, so that the advantage of using sparse multiplications within the density-based approach allows us to reduce the scaling property of the computational effort in an efficient manner. In this way, NMR chemical shift calculations with linear-scaling effort become possible and systems with 1000 and more atoms can be treated at the HF or DFT level on today’s computers.113 Extensions to other molecular properties can be formulated in a similar fashion.

OUTLOOK ON ELECTRON CORRELATION METHODS FOR LARGE SYSTEMS Although the main focus of the current review is to provide insights into reducing the scaling behavior of HF and DFT methods, it seems appropriate to provide a brief outlook on the behavior of post-HF methods. The importance of these methods cannot be overemphasized, because it is the systematic hierarchy of approaches to the exact solution of the electronic Schro¨dinger equation that allows for systematic and reliable studies of molecular systems. A concise overview of the huge amount of interesting and successful work done in the field of reducing the scaling behavior of post-HF methods is beyond the scope of this chapter; therefore,

Outlook on Electron Correlation Methods for Large Systems

65

we just provide some insights into why a reduction of the scaling behavior with rigorous error bounds should also be possible here. For a firsthand account of the impressive progress made in the field, the reader is referred to the work of Pulay and Saebø,157–160 Werner and coworkers,161–163 Head-Gordon and coworkers,164,165 Almlo¨f and Ha¨ser,166–168 Ayala and Scuseria,169,170 Friesner and coworkers,171 Carter and coworkers,172 Schu¨tz and Werner,173 and Schu¨tz.174 Recent reviews may be found in Refs. 175 and 114. To explain some principles, we focus here on the most simple of these approaches, the Møller–Plesset perturbation theory to second order (MP2). In the conventional, canonical MO-based formulation, the closed-shell MP2 correlation energy is given by EMP2 ¼ 

X ðiajjbÞ½2ðiajjbÞ  ðibjjaÞ ijab

e a þ e b  ei  e j

½170

with the MO integrals ð

ðiajjbÞ ¼ ji ðr1 Þja ðr1 Þ

1  j ðr2 Þjb ðr2 Þdr1 dr2 r12 j

½171

Indices i, j denote occupied orbitals, whereas a, b are virtuals. The difficulty is that the integrals computed in the AO basis need to be transformed into the MO basis: ðiajjbÞ ¼

X

Cmi Cna Clj Csb

mnls

ðmnjlsÞ

½172

If the transformations are done in a successive instead of a simultaneous way, the computational effort reduces from formally OðM8 Þ to OðM5 Þ. However, due to the nonlocality of the canonical MOs (discussed above in the context of SCF methods), this OðM5 Þ effort holds in the asymptotic limit, so that no reduction can be expected. The only factor is that the four transformations scale differently depending on whether occupied or virtual indices are transformed. To avoid the canonical, delocalized orbitals, Almlo¨f suggested in 1991166 using a Laplace transform for eliminating the disturbing denominator xq  ea þ eb  ei  ej : 1 ¼ xq

1 ð 0

expðxq tÞdt

t X a¼1

oðaÞ expðxq tðaÞ Þ

½173

where the integral can be replaced by a summation over a few grid points. In typical applications, it has been found by Ha¨ser and Almlo¨f167 that t ¼ 5  8

66

Linear-Scaling Methods in Quantum Chemistry

provides mHartree accuracy. This approach was employed by Ha¨ser to formulate an AO–MP2 formalism,168 which we briefly revise in the following: With the definition of two pseudo-density matrices, ðaÞ 1=4 PðaÞ mn ¼ jo j

occ X

 ðaÞ ¼ joðaÞ j1=4 P mn

virt X

i

Cmi expððei  eF ÞtðaÞ ÞCni

½174

Cma expððeF  ea ÞtðaÞ ÞCna

½175

and

a

where eF is ðeHOMO þ eLUMO Þ=2,166–168 the MP2 energy expression becomes t X

eJK

ðaÞ

ðaÞ

EMP2 ¼ 

ðaÞ

½176

a¼1

with ðaÞ

eJK ¼

X X

mnls m0 n0 l0 s0

ðaÞ ðaÞ

 0 ðm0 n0 jl0 s0 ÞP 0 P  ½2ðmnjlsÞ  ðmsjlnÞ Pmm0 P nn ll ss0

½177

For each integration point (a ¼ 1 . . . t), four, formally OðM5 Þ scaling transformations are necessary to yield the transformed two-electron integrals ðm njl sÞðaÞ ¼

X

m0 n0 l0 s0

ðaÞ

ðaÞ

ðaÞ ðaÞ

 0 ðm0 n0 jl0 s0 ÞP 0 P  Pmm0 P nn ll ss0

½178 ðaÞ

which are contracted in a final, formally OðM4 Þ scaling step in a Coulomb- (eJ ) ðaÞ and an exchange-type (eK ) fashion: X ðaÞ ðaÞ ðaÞ eJK ¼ 2eJ  eK ¼ ðm njl sÞðaÞ ½2ðmnjlsÞ  ðmsjlnÞ ½179 mnls

) only indicates Here m and m denote the same index, where the bar of m (or m , respectively). that the index has been transformed with P (or P In contrast to the conventional MO-based formulation, the AO-based Laplace formalism allows one to reduce the conventional OðN5 Þ scaling of the computational cost for large molecules. However, for small molecules, the overhead consists of the need to compute t = 5–8 exponentials and the larger prefactor for the transformations scaling formally as N 5 compared with the nocc  N 4 , . . ., and n2occ  n2virt  N scaling for the different MO-based

Outlook on Electron Correlation Methods for Large Systems

67

transformations (nocc and nvirt denote the number of occupied and virtual orbitals, respectively). Despite this overhead for small molecules, the central drawback in MO-based transformations caused by the delocalized nature of canonical MOs is avoided and the scaling can be reduced for large molecules. The AO–MP2 method introduced in 1993 by Ha¨ser168 applies screening criteria to the intermediate four-index quantities in order to reduce the computational scaling for larger molecules. Here, the Schwarz inequality introduced earlier in this review21,168,177 1    1  ðmnjlsÞ  ðmnjmnÞ2 ðlsjlsÞ2 ¼ Q Q mn ls

½180

1  ðm njm nÞ2  Zmn

½181

is used, which we denote as QQ-screening. Ha¨ser168 adapted the Schwarz screening for estimating the transformed quantities occurring in AO–MP2 theory, which we abbreviate in the following as QQZZ or Pseudo-Schwarz screening, where Z is defined as an upper bound approximation to the transformed Schwarz criterium (see Ref. 168):

As pointed out by Ha¨ser,168 this screening protocol yields asymptotically a quadratically scaling MP2 method for systems with a significant HOMO– LUMO gap. This quadratic scaling of AO–MP2 was further reduced to become linear by Ayala and Scuseria by ‘‘introducing interaction domains and neglecting selective domain-domain interactions’’.169 In this tutorial, we use the Laplace approach to explain some aspects of the long-range behavior of electron-correlation methods, without commenting on which one of the many approaches for reducing the computational effort will become the standard replacement of conventional correlation formulations. We follow here our discussion presented in a recent publication,24 which permits for the first time to determine rigorously which of the transformed integral products contribute to the MP2 energy.

Long-Range Behavior of Correlation Effects The formation of the correlation energy in AO–MP2 consists of the transformation (Eq. [178]) and the contraction step (Eq. [179]). We start our discussion by considering the distance dependence of correlation contributions. Transformed Integrals Some discussion in this section is similar to considerations of Ayala and Scuseria.169 However, we present here a different argument for deriving rigorous and tight upper bounds for estimating transformed integral products following the work in Ref. 24.

68

Linear-Scaling Methods in Quantum Chemistry

For nonoverlapping charge distributions A ¼ mn ¼ wm wn and B ¼ ls centered at A and B, respectively, the two-electron integral ðmnjlsÞ is bound from above (see Refs. 23 and 31) by

1 r12

  1 1 X hðr1A  r2B Þn i     R  n¼0 Rn

½182

with R ¼ jB  Aj and the position of the electrons r1 ¼ r1A þ A and r2 ¼ r2B þ B, whereas h i abbreviates the two-electron integral. This expansion in multipoles such as overlap Mð0Þ ¼ S, dipole Mð1Þ , and higher order terms Mð2Þ , Mð3Þ , . . . leads to     ðmnjlsÞ  R1 Mð0Þ Mð0Þ  mn ls    ð0Þ ð0Þ ð1Þ  þ R2 Mð1Þ  M M M mn mn ls  ls    ð0Þ ð1Þ ð1Þ ð0Þ ð2Þ  þ R3 Mð2Þ M  2M M þ M M mn mn mn ls ls ls     ð0Þ ð2Þ ð1Þ ð1Þ ð2Þ ð0Þ ð3Þ  M  3M M þ 3M M  M M þ R4 Mð3Þ mn mn mn mn ls ls ls ls  þ OðR5 Þ

½183

 (similar to the standard oneDue to the orthogonality properties of P and P particle density matrix P of SCF theory and its complement (1 – P) in an orthogonal basis), the transformation of the overlap leads to XX m0

n0

ð0Þ

 nn0 ¼ Smn ¼ M ¼ 0 Pmm0 Sm0 n0 P mn

½184

so that all terms involving the overlap ðMð0Þ Þ are zero. Therefore, the expansion for the transformed integrals becomes      ð1Þ ð1Þ  ðm njl sÞ  R3   2Mmn Mls     ð2Þ ð1Þ ð1Þ ð2Þ  þ R4   3Mmn Mls þ 3Mmn Mls  þ OðR5 Þ

½185

  and an O R13 dependence for the transformed integrals results. Together with 1 the  1O  R behavior of the untransformed integrals, this leads to an overall O R4 decay in the contraction step (Eq. [179]). It is important to note that this distance dependence results only from the orthogonality properties of the pseudo-density matrices, where the only requirement is the validity of

Outlook on Electron Correlation Methods for Large Systems

69

the multipole expansion (Eq. [182]) for the untransformed integrals. No locality of the pseudo-density matrices has been exploited at this stage, which leads to an even stronger decay as discussed below. Coulomb-Type Contraction If the charge distributions mn , ls , and m0 n0 , l0 s0 , respectively, are nonoverlapping in the sense that the multipole expansion (Eq. [183]) is applicable to the untransformed two-electron integrals ðmnjlsÞ and ðm0 n0 jl0 s0 Þ, then the corresponding Coulomb terms can be written as    X X     ðaÞ   ðaÞ 0 0 0 0 ðaÞ  ðaÞ  ðmnjlsÞ  PðaÞ0 P e J   mm nn0 ðm n jl s ÞPll0 Pss0  mnls m0 n0 l0 s0



X

mnls

  ðaÞ   ð1Þ ð1Þ  ð0Þ  3  R1 Mð0Þ mn Mls   R   2Mmn Ml s

   ðaÞ  ð0Þ  ð1Þ ð2Þ ð2Þ ð1Þ  4  þ R1 Mð0Þ mn Mls   R 3Mmn Ml s  3Mmn Ml s ðaÞ     ð0Þ ð1Þ ð1Þ  ð0Þ ð1Þ  3  þ R2 Mð1Þ mn Mls  Mmn Mls   R   2Mmn Ml s ðaÞ     ð2Þ ð1Þ  ð0Þ ð1Þ ð2Þ ð0Þ ð1Þ  4   R  3M M M þ R2 Mð1Þ M  M M  3M n n m m mn mn l s l s ls ls þ 

½186

For the sake of notational simplicity, we have not made a distinction between distances of centers of untransformed or transformed charge distributions, because it is clear from the context. Considering in more detail the summation over the mn-part (the ls terms are omitted) of the first term of order R14 . X X Mð0Þ mn mn m0 n0

R



"

ðaÞ Pmm0

ð1Þ

M m0 n 0

 ðaÞ0 P R3 nn

#

¼

" XX m0 n0

mn

ðaÞ Pmm0

# ð1Þ ð0Þ Mmn  ðaÞ Mm0 n0 P 0  3 R nn R

½187

makes clear that we can either perform first the m0 ; n0 summation or the summation over m; n. In the second representation, the Mð0Þ term is multiplied by P ), which would show that the 14 term becomes zero. However, this is (and P R  are still fully orthogonal in the restricted space of indices only true if P and P where the multipole expansion is valid. Otherwise, missing indices would lead . This to nonvanishing overlap contributions after the projection with P and P leads to the following requirements:  mn and ls are nonoverlapping: valid multipole expansion for ðmnjlsÞ.  m0 n0 and l0 s0 are nonoverlapping: valid multipole expansion for ðm0 n0 jl0 s0 Þ.

70

Linear-Scaling Methods in Quantum Chemistry  For each m and n of mn , the elements m0 and n0 coupled via the significant  nn0 have to be contained in the sum P 0 0 . In other elements of Pmm0 and P mn  nn0 words: For each m and n of mn , the shell pairs coupled via Pmm0 and P (the significant elements) have to be nonoverlapping with l0 s0 , so that the multipole expansion can be applied.

If these criteria are fulfilled in the restricted space of indices defined by the mulÞ 0 0 is zero and with that the tipole expansion within a threshold, then ðPMð0Þ P mn 1 1 term in Eqs. [186] and [187], and two of the three terms in Eq. [186] disR5 R4 appear. If the analogous argumentation holds at the same time for ls , the third R15 contribution is zero as well, so that an overall R16 dependence of the required transformed integrals for the Coulomb-type contraction results. Therefore, for well-separated charge distributions in the above sense, the 1 behavior of transformed integrals turns into a R16 distance dependence. Such R4 a behavior is well known for van der Waals/dispersion-type interactions. In contrast to the R14 dependence, however, the R16 behavior is linked closely to  for systems the exponential decay of the pseudo-density matrices P and P with nonvanishing HOMO–LUMO gaps. Our experience from SCF theories shows that the one-particle density matrix is fairly long-ranged. Although we get for, e.g., DNA fragments a relatively early onset of a linear-scaling behavior for the computation of the Hartree–Fock exchange, it has to be stressed that this feature is strongly enhanced by the integral contractions (see Refs. 71 and 76) and not solely due to the locality of the one-particle density by itself. In this context, the true locality of the one-particle density matrix is needed for the R16 decay, so that this behavior is expected to start only at significantly larger distances as compared with the R14 decay. Nevertheless, it is clear that the R16 decay can be exploited in an analogous fashion by imposing the criteria listed above. The implications of this decay behavior for the Coulomb-type products are illustrated in Figure 19 for the example of linear alkanes. For an alkane with four to five carbon atoms, the exact number of required transformed products (MP2, 6-31G , providing an accuracy of 0.1 mHartree for the first Laplace coefficient) scales already as low as N 1:48 approaching the asymptotic linear scaling.24 However, the pseudo-Schwarz screening drastically overestimates the number of required products and no linear-scaling can be achieved using this criterion, because the distance dependence of the transformed products is not accounted for. Exchange-Type Contraction The exchange-type part of the AO–MP2 energy is computed as ðaÞ

eK ¼

X

mnls

ðm njl sÞðaÞ ðmsjlnÞ

½188

Outlook on Electron Correlation Methods for Large Systems C5H12 C10H22

4.5

C20H42

71

C40H82

QQZZ MBIE exact

4 9

Number of significant products (10 )

1.5

3.5 3 2.5 2 1.5 1 0.5 0

0

200

400 600 Number of basis functions

800

Figure 19 Comparison of the number of significant Coulomb-type integral products ðCn H2nþ2 =6-31G basis; in units of 109 ) as estimated over shells by Schwarz-type screening (QQZZ; 105 ) and MBIE (105 ) with the exact number of products selected via basis functions. For the latter, a threshold of 108 has been selected to provide comparable accuracy in the absolute energies of 0.1 mH (only data for the first Laplace coefficient in computing the MP2 energy is listed).

As discussed, the transformed two-electron integrals decay as R13 , whereas the untransformed ones decay as R1 , resulting in a total distance dependence of R14 . In addition, in the exchange contraction step, the exponentially decaying charge densities of the untransformed integral ms and ln couple the two sides of the transformed integral. Therefore, as long as the transformed charge distributions mn and ls decay exponentially, an overall exponential decay for the exchange-type contraction results. The exponential coupling is similar to the one encountered for the formation of exchange-type contributions in SCF theories using our LinK method for computing energies,71 energy gradients,76 or NMR chemical shifts,113 where the coupling of the two sides of the two-electron integrals is mediated over the one-particle densities or its derivatives. Therefore, in this context, the exchange-type contribution (Eq. [188]) to the correlation energy decays not only as R14 , but exponentially for systems with a nonvanishing HOMO–LUMO gap.24

72

Linear-Scaling Methods in Quantum Chemistry

Rigorous Selection of Transformed Products via Multipole-Based Integral Estimates (MBIE) The discussion above suggests that for the exploitation of the strong long-range decay behavior of at least R14 for electron correlation effects, it is crucial to introduce distance dependence into the integral estimates for transformed and untransformed two-electron integrals. Here, the MBIE scheme introduced by Lambrecht and Ochsenfeld23 discussed in the introductory parts of the current review allows one to rigorously preselect which of the transformed products actually contribute to the correlation energy.24 To preselect the transformed integral products required for computing the MP2 energy, one can modify the MBIE integral bounds23,24 so that an upper bound to the transformed integrals is obtained. In addition to this screening for the number of contributing products, one needs to select the significant untransformed integrals required for integral transformations. We will not discuss the derivation of MBIE bounds for AO-MP2 further in the current context, because the details would not provide more insight. For a detailed derivation the reader is referred to Ref. 24. The performance of our MBIE method in its current stage for preselecting the significant number of contributing transformed products is illustrated in Figure 19. Although MBIE in its current stage still overestimates the number of products, it is always a true upper bound, so that noncontributing products can be safely discarded and only a linear-scaling number of products can be preselected. The MBIE estimate in Figure 19 has been optimized with respect to the transformation as compared with the one described in Ref. 24. We expect further improvements in the future in order to approach the true number of required products, so as to reduce the computational effort. The MBIE screening formulas are crucial for the correct estimation of the long-range behavior of correlations effects:  First, MBIE describes the exponential coupling of the ‘‘bra’’ and ‘‘ket’’ indices as does the QQZZ screening.  Second, and most importantly, MBIE correctly describes the R14 dependence of the transformed products, so that for larger separations between ‘‘bra’’ and ‘‘ket’’ centers, the integral products vanish.  Third, the MBIE estimates are rigorous upper bounds. In addition, it is possible to exploit the R16 behavior as described above; however, the onset is expected to occur for a significantly larger ‘‘bra-ket’’ separation, so we focused here on the exploitation of the R14 decay.

Implications The considerations presented in this last section of the chapter illustrate that dynamic correlation is a local effect and that its description should,

Conclusions

73

therefore, scale linearly with the size of the molecule. This is not only true for the simple MP2 theory (where the ‘‘correlation interaction’’ between electrons decays as 1=R4 and faster) on which we based our argumentation, but also for more sophisticated approaches like, for example, coupled-cluster theory. Although a tremendous amount of work has been done by many research groups in this field, much remains to be done. The path to such improvements is, in principle, set, and based on the example of the foregoing analysis, we can be optimistic that the scientific community will eventually reach this exciting goal of performing highly accurate ab initio calculations for very large molecular systems.

CONCLUSIONS Much work has been done by many scientists over the last decades to bring quantum chemistry to the impressive stage it is today. Thinking back to just a bit more than 15 years ago, computing a non-symmetric molecule with, say, 10–20 atoms at the Hartree–Fock level was painful. Today molecules with more than 1000 atoms can be tackled at the HF or DFT level on one-processor computers, and widespread applicability to a multitude of chemical and biochemical problems has been achieved. Although advances in quantum chemistry certainly go hand in hand with the fast-evolving increase of computer speed, it is clear that the introduction of linear-scaling methods over the last ten years, or so, has made important contributions to this success. In this tutorial we have described some of the basic ideas for reducing the computational scaling of quantum chemical methods, without going into the details of the many different approaches followed by the numerous research groups involved in this field. We have presented linear-scaling methods for the calculation of SCF energies, energy gradients, and response properties, which open new possibilities for studying molecular systems with 1000 and more atoms on today’s computers. In addition, the given outlook on linearscaling electron correlation methods indicates that much more can be expected and that more and more highly accurate approaches in the ab initio hierarchy will be available as well for large molecules. Despite the success of linear-scaling methods, a multitude of challenges and open questions remain in the linear-scaling community. Some of the more important challenges include the following issues:  Many molecular properties remain for which so far no linear-scaling methods have been devised and implemented.  Reducing the prefactors stays an important issue and becomes even more important for linear-scaling methods (because any gain in the prefactor directly translates into the treatable molecule size).

74

Linear-Scaling Methods in Quantum Chemistry  The results for some molecular properties or electron correlation energies depend strongly on the size of the basis set; post-HF methods, in particular, require large basis sets. Even if a method scales linearly with molecular size, the computational cost may increase dramatically with the basis set size. Therefore, much more work needs to be devoted for tackling this basis set problem.  The more ‘‘metallic’’ a system is (small HOMO–LUMO gap), the less local is the one-particle density matrix. Therefore, the question about how to deal with strong delocalization in an efficient manner remains an important challenge.  Because matrix multiplications are central to many aspects of linearscaling schemes, any further speed-up in sparse matrix multiplications will be of importance, in particular if the systems are more ‘‘metallic’.  Although a multitude of open questions still exists even for HF and DFT linear-scaling schemes, the rigorous and efficient reduction of scaling in post-HF methods to account for the missing electron correlation effects remains one of the central challenges for the success of quantum chemistry.  Many large molecular systems are flexible, and dynamic effects are necessary for a realistic description. Therefore, molecular dynamics simulations are needed that require the computation of a huge number of points on a hypersurface, resulting in an extremely high computational cost for reliable methods.

From this small list of challenges, it becomes clear that there is still a great need for developing and improving linear-scaling methods. Nevertheless, the foregoing discussion has shown that much has been achieved for the approximate solution of the Schro¨dinger equation even for large molecules. For the future, the ultimate goal of solving the molecular Schro¨dinger equation to highest accuracy and efficiency appears to be reachable. Accomplishing this goal will allow us to rationalize and understand, to predict, and ultimately, to control the chemical and biochemical processes of very large molecular systems.

REFERENCES 1. E. Schro¨dinger, Ann. Phys., 79, 361 (1926). Quantisierung als Eigenwertproblem (erste Mitteilung). 2. D. R. Hartree, Proc. Cambridge Philos. Soc., 24, 89 (1928). The Wave Mechanics of an Atom with a Non-Coulomb Central Field. I. Theory and Methods. 3. V. Fock, Z. Phys., 61, 126 (1930). Na¨herungsmethode zur Lo¨sung des Quantenmechanischen Mehrko¨rperproblems.; Z. Phys., 62, 795 (1930). ‘‘Self-Consistent Field’’ mit Austausch fu¨r Natrium.

References

75

4. A. Szabo and N. S. Ostlund, Modern Quantum Chemistry - Introduction to Advanced Electronic Structure Theory, Dover Publications, Inc., Mineola, New York, 1989. 5. C. Møller and M. S. Plesset, Phys. Rev., 46, 618 (1934). Note on an Approximation Treatment for Many-Electron Systems. 6. R. J. Bartlett and J. F. Stanton, in Reviews in Computational Chemistry, Vol. 5, K. B. Lipkowitz and D. B. Boyd, Eds., VCH Publishers, New York, pp. 65–169, 1990. Application of Post-Hartree-Fock Methods: A Tutorial. 7. P. Hohenberg and W. Kohn, Phys. Rev. B, 136, 864 (1964). Inhomogeneous Electron Gas. 8. W. Kohn and L. J. Sham, Phys. Rev., 140, A1133 (1965). Self-Consistent Equations Including Exchange and Correlation Effects. 9. R. G. Parr and W. Yang, Density-Functional Theory of Atoms and Molecules, International Series of Monographs on Chemistry 16, Oxford Science Publications, Oxford, United Kingdom, 1989. 10. G. E. Moore, Electronics Magazine, 19 April, 1965. Cramming More Components onto Integrated Circuits. 11. W. Kutzelnigg, Einfu¨hrung in die Theoretische Chemie, VCH Weinheim, Weinheim, Germany, 2001. 12. I. N. Levine, Quantum Chemistry, Fifth ed., Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 2000. 13. T. Helgaker, P. Jørgensen, and J. Olsen, Molecular Electronic Structure Theory, Wiley, Chichester, United Kingdom, 2000. 14. M. Born and R. A. Oppenheimer, Ann. Phys., 84, 457 (1927) Zur Quantentheorie der Molekeln. 15. B. T. Sutcliffe, in Computational Techniques in Quantum Chemistry, G. H. F. Diercksen, B. T. Sutcliffe, and A. Veillard, Eds., Reidel, Boston, Massachusetts, 1975, pp. 1–105. Fundamentals of Computational Quantum Chemistry. 16. J. C. Slater, Quantum Theory of Matter, 2nd ed., McGraw-Hill, New York, 1968. 17. C. C. J. Roothaan, Rev. Mod. Phys., 23, 69 (1951). New Developments in Molecular Orbital Theory. 18. G. G. Hall, Proc. Roy. Soc., A205, 541 (1951). The Molecular-Orbital Theory of Chemical Valency. VIII. A Method of Calculating Ionization Potentials. 19. J. Almlo¨f, K. Faegri, and K. Korsell, J. Comput. Chem., 3, 385 (1982). Principles for a Direct SCF Approach to LCAO-MO Ab-initio Calculations. 20. V. Dyczmons, Theoret. Chim. Acta, 28, 307 (1973). No N4-dependence in the Calculation of Large Molecules. 21. M. Ha¨ser and R. Ahlrichs, J. Comput. Chem., 10, 104 (1989). Improvements on the Direct SCF Method. 22. D. Cremer and J. Gauss, J. Comput. Chem., 7, 274 (1986). An Unconventional SCF Method for Calculations on Large Molecules. 23. D. S. Lambrecht and C. Ochsenfeld, J. Chem. Phys., 123, 184101 (2005). Multipole-Based Integral Estimates for the Rigorous Description of Distance Dependence in Two-Electron Integrals. 24. D. S. Lambrecht, B. Doser, and C. Ochsenfeld, J. Chem. Phys., 123, 184102 (2005). Rigorous Integral Screening for Electron Correlation Methods. 25. J. E. Almlo¨f, USIP Report 72-09 (1972), republished in Theor. Chem. Acc. memorial issue: P. R. Taylor Theor. Chem. Acc., 97, 10 (1997). Methods for the Rapid Evaluation of Electron Repulsion Integrals in Large-scale LCGO Calculations. 26. J. E. Almlo¨f, in Modern Electronic Structure Theory, D. Yarkony, C.-Y. Ng, Eds., World Scientific Singapore, 1994, pp. 121–151. Direct Methods in Electronic Structure Theory. 27. H. Eyring, J. Walter, and G. E. Kimball, Quantum Chemistry, Wiley, New York, 1947.

76

Linear-Scaling Methods in Quantum Chemistry

28. J. O. Hirschfelder, C. F. Curtiss, and R. B. Byron, Molecular Theory of Gases and Liquids, Wiley, New York, 1954. 29. A. D. Buckingham, in Intermolecular Interactions: From Diatomics to Biopolymers, B. Pullman, Ed., Wiley, New York, 1987, pp. 1–67. Basic Theory of Intermolecular Forces: Applications to Small Molecules. 30. P. M. W. Gill, B. G. Johnson, and J. A. Pople, Chem. Phys. Lett., 217, 65 (1994). A Simple yet Powerful Upper Bound for Coulomb Integrals. 31. C. A. White, B. G. Johnson, P. M. W. Gill, and M. Head-Gordon, Chem. Phys. Lett., 230, 8 (1994). The Continuous Fast Multipole Method. 32. P. M. Morse and H. Feshbach, Methods of Theoretical Physics, Volume I, McGraw-Hill Education, Tokyo, Japan, 1953. 33. P. M. Morse and H. Feshbach, Methods of Theoretical Physics, Volume II, McGraw-Hill Education, Tokyo, Japan, 1953. 34. D. E. Williams, in Reviews in Computational Chemistry, Vol. 2, K. B. Lipkowitz and D. B. Boyd, Eds., VCH Publishers, New York, 1991, pp. 219–271. Net Atomic Charge and Multipole Models for the ab Initio Molecular Electric Potential. 35. G. B. Arfken and H. J. Weber, Mathematical Methods for Physicists, Academic Press, London, United Kingdom, 2001. 36. S. Obara and A. Saika, J. Chem. Phys., 84, 3963 (1985). Efficient Recursive Computation of Molecular Integrals over Cartesian Gaussian Functions. 37. M. Head-Gordon and J. A. Pople, J. Chem. Phys., 89, 5777 (1988). A Method for Twoelectron Gaussian Integral and Integral Derivative Evaluation Using Recurrence Relations. 38. P. M. W. Gill and J. A. Pople, J. Quantum Chem., 40, 753 (1991). The Prism Algorithm for Two-Electron Integrals. 39. L. E. McMurchie and E. R. Davidson, J. Comput. Phys., 26, 218 (1978). One- and TwoElectron Integrals over Cartesian Gaussian Functions. 40. P. M. W. Gill, B. G. Johnson, and J. A. Pople, Int. J. Quantum Chem., 40, 745 (1991). TwoElectron Repulsion Integrals Over Gaussian s Functions. 41. P. M. W. Gill, M. Head-Gordon, and J. A. Pople, Int. J. Quantum Chem., 23, 269 (1989). An Efficient Algorithm for the Generation of Two-Electron Repulsion Integrals over Gaussian Basis Functions. 42. A. V. Scherbinin, V. I. Pupyshev, and N. F. Stepanov Int. J. Quantum Chem., 60, 843 (1996). On the Use of Multipole Expansion of the Coulomb Potential in Quantum Chemistry. 43. V. R. Saunders, in Methods in Computational Molecular Physics, G. H. F. Diercksen and S. Wilson, Eds., NATO ASI Series, Series C: Mathematical and Physical Sciences, Vol. 113, D. Reidel Publishing Company, Dordrecht, The Netherlands, 1983, pp. 1–36. Molecular Integrals for Gaussian Type Functions. 44. T. Helgaker and P. R. Taylor, Modern Electronic Structure Theory, Vol. 2, D. Yarkony, Ed., World Scientific, Singapore, 1995, pp. 725–856. Gaussian Basis Sets and Molecular Integrals. 45. L. Greengard and V. Rokhlin, J. Comput. Phys., 60, 187 (1990). Rapid Solution of Integral Equations of Classical Potential Theory. 46. R. Beatson and L. Greengard, Available: www.math.nyu.edu/faculty/greengar/shortcourse_fmm.pdf. A Short Course on Fast Multipole Methods. 47. L. Greengard, Science, 265, 909 (1994). Fast Algorithms for Classical Physics. 48. C. A. White and M. Head-Gordon, J. Chem. Phys., 101, 6593 (1994). Derivation and Efficient Implementation of the Fast Multipole Method. 49. C. A. White and M. Head-Gordon, J. Chem. Phys., 105, 5061 (1996). Rotating Around the Angular Momentum Barrier in Fast Multipole Method Calculations. 50. J. Barnes and P. Hut, Nature (London), 324, 446 (1986). A Hierarchical OðNlogNÞ ForceCalculation Algorithm.

References

77

51. M. Challacombe, E. Schwegler, and J. Almlo¨f, in Computational Chemistry: Review of Current Trends, Vol. 53, J. Lesczszynski, Ed., World Scientific, Singapore, 1996, pp. 4685–4695. Modern Developments in Hartree-Fock Theory: Fast Methods for Computing the Coulomb Matrix. 52. M. Challacombe, E. Schwegler, and J. Almlo¨f, J. Chem. Phys., 104, 4685 (1995). Fast Assembly of the Coulomb Matrix: A Quantum Chemical Tree Code. 53. J. Cipriani and B. Silvi, Mol. Phys., 45, 259 (1982). Cartesian Expression of Electric Multipole Moments. 54. L. Greengard and J. Strain, J. Sci. Stat. Comp., 12, 79 (1991). The Fast Gauss Transform. 55. H. G. Petersen, D. Soelvason, J. W. Perram, and E. R. Smith, J. Chem. Phys., 101, 8870 (1994). The very fast multipole method. 56. M. C. Strain, G. E. Scuseria, and M. J. Frisch, Science, 271, 51 (1996). Achieving Linear Scaling for the Electronic Quantum Coulomb Problem. 57. O. Vahtras, J. Almlo¨f, and M. W. Feyereisen, Chem. Phys. Lett., 213, 514 (1993). Integral Approximations for LCAO-SCF Calculations. 58. R. A. Kendall and H. A. Fru¨chtl, Theor. Chem. Acc., 97, 158 (1997). The Impact of the Resolution of the Identity Approximate Integral Method on Modern Ab Initio Algorithm Development. 59. B. I. Dunlap, J. W. D. Connolly, and J. R. Sabin, J. Chem. Phys., 71, 3396 (1979). On Some Approximations in Applications of Xa Theory. 60. K. Eichkorn, O. Treutler, H. Oehm, M. Ha¨ser, and R. Ahlrichs, Chem. Phys. Lett., 240, 283 (1995). Auxiliary Basis Sets to Approximate Coulomb Potentials. 61. F. Weigend, Phys. Chem. Chem. Phys., 4, 4285 (2002). A Fully Direct RI-HF Algorithm: Implementation, Optimised Auxiliary Basis Sets, Demonstration of Accuracy and Efficiency. 62. M. Sierka, A. Hogekamp, and R. Ahlrichs, J. Chem. Phys., 118, 9136 (2003). Fast Evaluation of The Coulomb Potential for Electron Densities Using Multipole Accelerated Resolution of Identity Approximation. 63. L. Fu¨sti-Molnar and P. Pulay, J. Chem. Phys., 117, 7827 (2002). The Fourier Transform Coulomb Method: Efficient and Accurate Calculation of the Coulomb Operator in a Gaussian Basis. 64. L. Fu¨sti-Molnar and P. Pulay, J. Mol. Struct. (THEOCHEM), 666–667, 25 (2003). Gaussianbased First-principles Calculations on Large Systems Using the Fourier Transform Coulomb Method. 65. L. Fu¨sti-Molnar and P. Pulay, J. Chem. Phys., 119, 11080 (2003). New Developments in the Fourier Transform Coulomb Method: Efficient and Accurate Localization of the Filtered Core Functions and Implementation of the Coulomb Energy Forces. 66. L. Fu¨sti-Molnar and J. Kong, J. Chem. Phys., 122, 074108 (2005). Fast and Accurate Coulomb Calculation with Gaussian Functions. 67. R. Ahlrichs, M. Hoffmann-Ostenhof, T. Hoffmann-Ostenhof, and J. D. Morgan III, Phys. Rev. A, 23, 2106 (1981). Bounds on Decay of Electron Densities with Screening. 68. C. Ochsenfeld and M. Head-Gordon, Chem. Phys. Lett., 270, 399 (1997). A Reformulation of the Coupled Perturbed Self-consistent Field Equations Entirely Within a Local Atomic Orbital Density Matrix-based Scheme. 69. P. E. Maslen, C. Ochsenfeld, C. A. White, M. S. Lee, and M. Head-Gordon, J. Phys. Chem., 102, 2215 (1998). Locality and Sparsity of ab initio One–Particle Density Matrices and Localized Orbitals. 70. W. Kohn, Phys. Rev. Lett., 76, 3168 (1996). Density Functional and Density Matrix Method Scaling Linearly with the Number of Atoms. 71. C. Ochsenfeld, C. A. White, and M. Head-Gordon, J. Chem. Phys., 109, 1663 (1998). Linear and Sublinear Scaling Formation of Hartree-Fock-type Exchange Matrices.

78

Linear-Scaling Methods in Quantum Chemistry

72. E. Schwegler and M. Challacombe, J. Chem. Phys., 105, 2726 (1996). Linear Scaling Computation of the Hartree-Fock Exchange Matrix. 73. J. C. Burant, G. E. Scuseria, and M. J. Frisch, J. Chem. Phys., 105, 8969 (1996). A Linear Scaling Method for Hartree-Fock Exchange Calculations of Large Molecules. 74. E. Schwegler, M. Challacombe, and M. Head-Gordon, J. Chem. Phys., 106, 9708 (1997). Linear Scaling Computation of the Fock Matrix. II. Rigorous Bounds on Exchange Integrals and Incremental Fock Build. 75. E. Schwegler and M. Challacombe, Theoret. Chim. Acta, 104, 344 (2000). Linear Scaling Computation of the Hartree-Fock Exchange Matrix. III. Formation of the Exchange Matrix with Permutational Symmetry. 76. C. Ochsenfeld, Chem. Phys. Lett., 327, 216 (2000), Linear Scaling Exchange Gradients for Hartree-Fock and Hybrid Density Functional Theory. 77. H. Sambe and R. H. Felton, J. Chem. Phys., 62, 1122 (1975). A New Computational Approach to Slater’s SCF-Xa Equation. 78. C. Satoko, Chem. Phys. Lett., 82, 111 (1981). Direct Force Calculation in the Xa Method and its Application to Chemisorption of an Oxygen Atom on the Al(111) Surface. 79. R. Fournier, J. Andzelm, and D. R. Salahub, J. Chem. Phys., 90, 6371 (1989). Analytical Gradient of the Linear Combination of Gaussian-type Orbitals — Local Spin Density Energy. 80. J. A. Pople, P. M. W. Gill, and B. G. Johnson, Chem. Phys. Lett., 199, 557 (1992). Kohn-Sham Density-Functional Theory within a Finite Basis Set. 81. J. Tao, J. P. Perdew, V. N. Staroverov, and G. E. Scuseria, Phys. Rev. Lett., 91, 146401 (2003). Climbing the Density Functional Ladder: Nonempirical Meta-Generalized Gradient Approximation Designed for Molecules and Solids. 82. J. P. Perdew, A. Ruzsinszky, J. Tao, V. N. Staroverov, G. E. Scuseria, and G. I. Csonka, J. Chem. Phys., 123, 062201 (2005). Prescription for the Design and Selection of Density Functional Approximations: More Constraint Satisfaction with Fewer Fits. 83. B. G. Johnson, C. A. White, Q. Zhang, B. Chen, R. L. Graham, P. M. W. Gill, and M. HeadGordon, in Recent Developments in Density Functional Theory, J. M. Seminario, Ed., Vol. 4, Elsevier, Amsterdam, The Netherlands, 1996, pp. 441–463. Advances in Methodologies for Linear-Scaling Density Functional Calculations. 84. G. E. Scuseria, J. Phys. Chem. A, 103, 4782 (1999). Linear Scaling Density Functional Calculations with Gaussian Orbitals. 85. A. D. Becke, J. Chem. Phys., 98, 5648 (1992). Density-Functional Thermochemistry. III. The Role of Exact Exchange. 86. P. M. W. Gill, B. G. Johnson, and J. A. Pople, Chem. Phys. Lett., 209, 506 (1993). A Standard Grid for Density Functional Calculations. 87. A. D. Becke, J. Chem. Phys., 88, 2547 (1988). A Multicenter Numerical Integration Scheme for Polyatomic Molecules. 88. X.-P. Li, R. W. Nunes, and D. Vanderbilt, Phys. Rev. B, 47, 10891 (1993). Density-matrix Electronic-structure Method with Linear System-size Scaling. 89. R. W. Nunes and D. Vanderbilt, Phys. Rev. B, 50, 17611 (1994). Generalization of the Density-matrix Method to a Nonorthogonal Basis. 90. J. M. Milliam and G. E. Scuseria, J. Chem. Phys., 106, 5569 (1997). Linear Scaling Conjugate Gradient Density Matrix Search as an Alternative to Diagonalization for First Principles Electronic Structure Calculations. 91. M. Head-Gordon, Y. Shao, C. Saravanan, and C. A. White, Mol. Phys., 101, 37 (2003). Curvy Steps for Density Matrix Based Energy Minimization: Tensor Formulation and Toy Applications. 92. T. Helgaker, H. Larsen, J. Olsen, and P. Jørgensen, Chem. Phys. Lett., 327, 397 (2000). Direct Optimization of the AO Density Matrix in Hartree-Fock and Kohn-Sham Theories.

References

79

93. H. Larsen, J. Olsen, P. Jørgenson, and T. Helgaker, J. Chem. Phys., 115, 9685 (2001). Direct Optimization of the Atomic-orbital Density Matrix Using the Conjugate-gradient Method with a Multilevel Preconditioner. 94. M. Challacombe, J. Chem. Phys., 110, 2332 (1999). A Simplified Density Matrix Minimization for Linear Scaling Self-Consistent Field Theory. 95. W. Yang, Phys. Rev. Lett., 66, 1438 (1991). Direct Calculation of Electron Density in DensityFunctional Theory. 96. W. Yang, J. Chem. Phys., 94, 1208 (1991). A Local Projection Method for the Linear Combination of Atomic Orbital Implementation of Density-Functional Theory. 97. Q. Zhao and W. Yang, J. Chem. Phys., 102, 9598 (1995). Analytical Energy Gradients and Geometry Optimization in the Divide-and-conquer Method for Large Molecules. 98. W. Yang and T.-S. Lee, J. Chem. Phys., 103, 5674 (1995). A Density-matrix Divide-andconquer Approach for Electronic Structure Calculations of Large Molecules. 99. S. Goedecker and L. Colombo, Phys. Rev. Lett., 73, 122 (1994). Efficient Linear Scaling Algorithm for Tight-binding Molecular Dynamics. 100. S. Goedecker and M. Teter, Phys. Rev. B, 51, 9455 (1995). Tight-binding Electronic-structure Calculations and Tight-binding Molecular Dynamics with Localized Orbitals. 101. S. Goedecker, J. Comput. Phys., 118, 261 (1995). Low Complexity Algorithms for Electronic Structure Calculations. 102. J. Kim, F. Mauri, and G. Galli, Phys. Rev. B, 52, 1640 (1995). Total-energy Global Optimizations using Nonorthogonal Localized Orbitals. 103. F. Mauri and G. Galli, Phys. Rev. B, 50, 4316 (1994). Electronic-structure Calculations and Molecular-dynamics Simulations with Linear System-size Scaling. 104. F. Mauri, G. Galli, and R. Car, Phys. Rev. B, 47, 9973 (1993). Orbital Formulation for Electronic-structure Calculations with Linear System-size Scaling. 105. P. Ordejon, Comput. Mater. Sci., 12, 157 (1998). Order-N Tight-binding Methods for Electronic-structure and Molecular Dynamics. 106. E. Hernandez and M. Gillan, Phys. Rev. B, 51, 10157 (1995). Self-consistent First-principles Technique with Linear Scaling. 107. W. Hierse and E. Stechel, Phys. Rev. B, 50, 17811 (1994). Order-N Methods in Self-consistent Density-functional Calculations. 108. S. Goedecker, Rev. Mod. Phys., 71, 1085 (1999). Linear Scaling Electronic Structure Methods. 109. S. Goedecker and G. E. Scuseria, Commun. Science & Engineering, 5, 14 (2003). Linear Scaling Electronic Structure Methods in Chemistry and Physics. 110. A. D. Daniels and G. E. Scuseria, J. Chem. Phys., 110, 1321 (1999). What Is the Best Alternative to Diagonalization of the Hamiltonian in Large Scale Semiempirical Calculations? 111. D. R. Bowler, T. Miyazaki, and M. J. Gillan, J. Phys.: Condens. Matter, 14, 2781 (2002). Recent Progress in Linear Scaling Ab Initio Electronic Structure Techniques. 112. D. R. Bowler, I. J. Bush, and M. J. Gillan, Int. J. Quantum Chem., 77, 831 (2000). Practical Methods for Ab Initio Calculations on Thousands of Atoms. 113. C. Ochsenfeld, J. Kussmann, and F. Koziol, Angew. Chem., 116, 4585 (2004); Angew. Chem. Int. Ed., 43, 4485 (2004). Ab Initio NMR Spectra for Molecular Systems with a Thousand and More Atoms: A Linear-Scaling Method. 114. M. Head-Gordon, M. S. Lee, P. E. Maslen, T. van Voorhis, and S. Gwaltney, Modern Methods and Algorithms of Quantum Chemistry, Proceedings, Second ed., J. Grotendorst, Ed., John von Neumann Institute for Computing, Ju¨lich, Germany, NIC Series, Vol. 3, 2000, pp. 593–638. Tensors in Electronic Structure Theory: Basic Concepts and Applications to Electron Correlation Models.

80

Linear-Scaling Methods in Quantum Chemistry

115. J. A. Schouten, Tensor Analysis for Physicists, 2nd ed., Dover Publications, Mineola, New York, 1988. 116. M. Head-Gordon, P. E. Maslen, and C. A. White, J. Chem. Phys., 108, 616 (1998). A Tensor Formulation of Many-electron Theory in a Nonorthogonal Single-particle Basis. 117. A. Messiah, Quantum Mechanics, Dover Publications, Mineola, New York, 1999. 118. R. McWeeny, Methods of Molecular Quantum Mechanics (Theoretical Chemistry), 2nd ed., Academic Press Limited, London, United Kingdom, 1989. 119. M. S. Daw, Phys. Rev. B, 47, 10895 (1993). Model for Energetics of Solids based on the Density Matrix. 120. R. McWeeny, Phys. Rev., 114, 1528 (1959). Hartree-Fock Theory with Nonorthogonal Basis Functions. 121. R. McWeeny, Rev. Mod. Phys., 32, 335 (1960). Some Recent Advances in Density Matrix Theory. 122. C. A. White, P. E. Maslen, M. S. Lee, and M. Head-Gordon, Chem. Phys. Lett., 276, 133 (1997). The Tensor Properties of Energy Gradients Within a Non-orthogonal Basis. 123. J. J. Sakurai, Modern Quantum Mechanics, Addison Wesley, Reading, Massachusetts, 1993. 124. W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numerical Recipes in Fortran, 2nd ed., Cambridge University Press, Cambridge, United Kingdom, 1996. 125. P. Pulay, Mol. Phys. 17, 197 (1969). Ab initio Calculation of Force Constants and Equilibrium Geometries. I. Theory. 126. P. Pulay, in Modern Electronic Structure Theory, D. Yarkony, Ed., World Scientific, Singapore, 1995, pp. 1191–1240. Analytical Derivative Techniques and the Calculation of Vibrational Properties, in Modern Electronic Structure Theory. 127. P. Pulay, in Ab Initio Methods in Quantum Chemistry, K. P. Lawley, Ed., Wiley, New York, 1987, pp. 241–286. Analytic Derivative Methods in Quantum Chemistry. 128. M. Frisch, M. Head-Gordon, and J. A. Pople, Chem. Phys., 141, 189 (1990). Direct Analytic SCF Second Derivatives and Electric Field Properties. 129. Y. Shao, C. A. White, and M. Head-Gordon, J. Chem. Phys., 114, 6572 (2001). Efficient Evaluation of the Coulomb Force in Density Functional Theory Calculations. 130. J. Gauss, in Modern Methods and Algorithms of Quantum Chemistry, Proceedings, 2nd ed., J. Grotendorst, Ed., John von Neumann Institute for Computing, Ju¨lich, Germany, NIC Series Vol. 3, 2000, pp. 541–592. Molecular Properties. 131. J. Gerratt and I. M. Mills, J. Chem. Phys., 49, 1968 (1719). Force Constants and DipoleMoment Derivatives of Molecules form Perturbed Hartree-Fock Calculations. 132. C. E. Dykstra and P. G. Jasien, Chem. Phys. Lett., 109, 388 (1984). Derivative Hartree-Fock Theory to all Orders. 133. N. C. Handy, D. J. Tozer, G. J. Laming, C. W. Murray, and R. D. Amos, Isr. J. Chem. 33, 331 (1993). Analytic Second Derivatives of the Potential Energy Surface. 134. B. G. Johnson and M. J. Fisch, J. Chem. Phys., 100, 7429 (1994). An Implementation of Analytic Second Derivatives of the Gradient-corrected Density Functional Energy. 135. J. A. Pople, R. Krishnan, H. B. Schlegel, and J. S. Binkley, Int. J. Quantum Chem. Symp., S13 225 (1979). Derivative Studies in Hartree-Fock and Møller-Plesset Theories. 136. Y. Osamura, Y. Yamaguchi, P. Saxe, D. J. Fox, M. A. Vincent, and H. F. Schafer III., J. Mol. Struct.: THEOCHEM, 103, 183 (1983). Analytic Second Derivative Techniques for Self-Consistent-Field Wave Functions. A new Approach to the Solution of the coupled Perturbed Hartree-Fock Equations. 137. J. Gauss, Ber. Bunsenges. Phys. Chem., 99, 1001 (1995). Accurate Calculation of NMR Chemical Shifts. 138. T. Helgaker, M. Jaszunski, and K. Ruud, Chem. Rev., 99, 293 (1999). Ab Initio Methods for the Calculation of NMR Shielding and Indirect Spin-Spin Coupling Constants.

References

81

139. U. Fleischer, W. Kutzelnigg, and C. van Wu¨llen, in Encyclopedia of Computational Chemistry, P. v. R. Schleyer, N. L. Allinger, T. Clark, J. Gasteiger, P. A. Kollman, H. F. Schaefer III, and P. R. Schreiner, Eds., Wiley, Chichester, United Kingdom, 1998, pp. 1827. Ab initio NMR Chemical Shift Computation. 140. T. Helgaker, P. J. Wilson, R. D. Amos, and N. C. Handy, J. Chem. Phys., 113, 2983 (2000). Nuclear Shielding Constants by Density Functional Theory with Gauge Including Atomic Orbitals. 141. G. Schreckenbach and T. Ziegler, J. Phys. Chem., 99, 606 (1995). Calculation of NMR Shielding Tensors Using Gauge-Including Atomic Orbitals and Modern Density Functional Theory. 142. W. Kutzelnigg, Isr. J. Chem., 19, 193 (1980). Theory of Magnetic Susceptibilities and NMR Chemical Shifts in Terms of Localized Quantities. 143. A. E. Hansen and T. D. Bouman, J. Chem. Phys., 82, 5035 (1985). Localized Orbital/Local Origin Method for Calculation and Analysis of NMR Shieldings. Applications to 13C Shielding Tensors. 144. F. London, J. Phys. Radium, 8, 397 (1937). Quantum Theory of Interatomic Currents in Aromatic Compounds. 145. R. Ditchfield, Molecular Physics, 27, 789 (1974). Self-consistent Perturbation Theory of Diamagnetism. I. A Gauge-invariant LCAO Method for N.M.R. Chemical Shifts. 146. K. Wolinski, J. F. Hinton, and P. Pulay, J. Am. Chem. Soc., 112, 8251 (1990). Efficient Implementation of the Gauge-Independent Atomic Orbital Method for NMR Chemical Shift Calculations. 147. J. R. Cheeseman, G. W. Trucks, T. A. Keith, and M. J. Frisch, J. Chem. Phys., 104, 5497 (1996). A Comparison of Models for Calculating Nuclear Magnetic Resonance Shielding Tensors. 148. M. Ha¨ser, R. Ahlrichs, H. P. Baron, P. Weis, and H. Horn, Theoret. Chim. Acta, 83, 455 (1992). Direct Computation of Second-order SCF Properties of Large Molecules on Workstation Computers with an Application to Large Carbon Clusters. 149. C. Ochsenfeld, Phys. Chem. Chem. Phys., 2, 2153 (2000). An Ab Initio Study of the Relation between NMR Chemical Shifts and Solid-State Structures: Hexabenzocoronene Derivatives. 150. C. Ochsenfeld, S. P. Brown, I. Schnell, J. Gauss, and H. W. Spiess, J. Am. Chem. Soc., 123, 2597 (2001). Structure Assignment in the Solid State by the Coupling of Quantum Chemical Calculations with NMR Experiments: A Columnar Hexabenzocoronene Derivative. 151. S. P. Brown, T. Schaller, U. P. Seelbach, F. Koziol, C. Ochsenfeld, F.-G. Kla¨rner, and H. W. Spiess, Angew. Chem. Int. Ed., 40, 717 (2001). Structure and Dynamics of the Host-Guest Complex of a Molecular Tweezer: Coupling Synthesis, Solid-State NMR, and QuantumChemical Calculations. 152. C. Ochsenfeld, F. Koziol, S. P. Brown, T. Schaller, U. P. Seelbach, and F.-G. Kla¨rner, Solid State Nucl. Magn. Reson., 22, 128 (2002). A Study of a Molecular Tweezer Host-Guest System by a Combination of Quantum-Chemical Calculations and Solid-State NMR Experiments. 153. J. Gauss, and J. F. Stanton, Adv. Chem. Phys. 123, 355 (2002). Electron-Correlated Approaches for the Calculation of NMR Chemical Shifts. 154. M. Kaupp, M. Bu¨hl, and V. G. Malkin (Eds.), Calculation of NMR and EPR Parameters, Wiley-VCH Weinheim, 2004. 155. H. Larsen, T. Helgaker, J. Olsen, and P. Jørgensen, J. Chem. Phys., 115, 10344 (2001). Geometrical Derivatives and Magnetic Properties in Atomic-orbital Density-based HartreeFock Theory. 156. V. Weber and M. Challacombe, J. Chem. Phys., 123, 044106 (2005). Higher-Order Response in O (N) by Perturbed Projection.

157. P. Pulay, Chem. Phys. Lett., 100, 151 (1983). Localizability of Dynamic Electron Correlation.

82

Linear-Scaling Methods in Quantum Chemistry

158. S. Saebø and P. Pulay, Chem. Phys. Lett., 113, 13 (1985). Local Configuration Interaction: An Efficient Approach for Larger Molecules. 159. P. Pulay and S. Saebø, Theoret. Chim. Acta, 69, 357 (1985). Orbital-invariant Formulation and Second-order Gradient Evaluation in Møller-Plesset Perturbation Theory. 160. S. Saebø and P. Pulay, J. Chem. Phys., 86, 914 (1987). Fourth-order Møller-Plesset Perturbation Theory in the Local Correlation Treatment. I. Method. 161. C. Hampel and H.-J. Werner, J. Chem. Phys., 104, 6286 (1996). Local Treatment of Electron Correlation in Coupled Cluster Theory. 162. M. Schu¨tz, G. Hetzer, and H.-J. Werner, J. Chem. Phys., 111, 5691 (1999). Low-order Scaling Local Electron Correlation Methods. I. Linear Scaling Local MP2. 163. G. Hetzer, M. Schu¨tz, H. Stoll, and H.-J. Werner, J. Chem. Phys., 113, 9443 (2000). LowOrder Scaling Local Correlation Methods II: Splitting the Coulomb Operator in Linear Scaling Local Second-Order Møller-Plesset Perturbation Theory. 164. P. E. Maslen and M. Head-Gordon, Chem. Phys. Lett., 283, 102 (1998). Non-iterative Local Second Order Møller-Plesset Theory. 165. M. S. Lee, P. E. Maslen, and M. Head-Gordon, J. Chem. Phys., 112, 3592 (2000). Closely Approximating Second-order Møller-Plesset Perturbation Theory with a Local Triatomics in Molecules Model. 166. J. Almlo¨f, Chem. Phys. Lett., 181, 319 (1991). Elimination of Energy Denominators in Møller-Plesset Perturbation Theory by a Laplace Transform Approach. 167. M. Ha¨ser and J. Almlo¨f, J. Chem. Phys., 96, 489 (1992). Laplace Transform Techniques in Møller-Plesset Perturbation Theory. 168. M. Ha¨ser, Theoret. Chim. Acta, 87, 147 (1993). Møller-Plesset (MP2) Perturbation Theory for Large Molecules. 169. P. Y. Ayala and G. E. Scuseria, J. Chem. Phys., 110, 3660 (1999). Linear Scaling Second-order Møller-Plesset Theory in the Atomic Orbital Basis for Large Molecular Systems. 170. G. E. Scuseria and P. Y. Ayala, J. Chem. Phys., 111, 8330 (1999). Linear Scaling Coupled Cluster and Perturbation Theories in the Atomic Orbital Basis. 171. R. Friesner, R. B. Murphy, M. D. Beachy, M. N. Ringnalda, W. T. Pollard, B. D. Dunietz, and Y. Cao, J. Phys. Chem., 103, 1913 (1999). Correlated ab Initio Electronic Structure Calculations for Large Molecules. 172. D. Walter, A. B. Szilva, K. Niedfeldt, and E. A. Carter, J. Chem. Phys., 117, 1982 (2002). Local Weak-pairs Pseudospectral Multireference Configuration Interaction. 173. M. Schu¨tz and H.-J. Werner, J. Chem. Phys., 114, 661 (2001). Low-order Scaling Local Electron Correlation Methods. IV. Linear Scaling Local Coupled-Cluster (LCCSD). 174. M. Schu¨tz, J. Chem. Phys., 116, 8772 (2002). Low-order Scaling Local Electron Correlation Methods. V. Connected Triples beyond (T): Linear Scaling Local CCSDT-1b. 175. G. E. Scuseria and P. Y. Ayala, J. Chem. Phys., 111, 8330 (1999). Linear Scaling Coupled Cluster and Perturbation Theories in the Atomic Orbital Basis. 176. P. Knowles, M. Schu¨tz, and H.-J. Werner, in Modern Methods and Algorithms of Quantum Chemistry, Proceedings, Second ed., J. Grotendorst, Ed., John von Neumann Institute for Computing, Ju¨lich, Germany NIC Series, Vol. 3, 2000, pp. 97–197. Ab Initio Methods for Electron Correlation in Molecules. 177. J. L. Whitten, J. Chem. Phys., 58, 4496 (1973). Coulombic Potential Energy Integrals and Approximations.

CHAPTER 2

Conical Intersections in Molecular Systems Spiridoula Matsika Department of Chemistry, Temple University, Philadelphia, Pennsylvania 19122

INTRODUCTION The study of molecular systems using quantum mechanics is based on the Born–Oppenheimer approximation.1 This approximation relies on the fact that the electrons, because of their smaller mass, move much faster than the heavier nuclei, so they follow the motion of the nuclei adiabatically, whereas the latter move on the average potential of the former. The Born–Oppenheimer approximation is sufficient to describe most chemical processes. In fact, our notion of molecular structure is based on the Born–Oppenheimer approximation, because the molecular structure is formed by nuclei being placed in fixed positions. There are, however, essential nonadiabatic processes in nature that cannot be described within this approximation. Nonadiabatic processes are ubiquitous in photophysics and photochemistry, and they govern such important phenomena as photosynthesis, vision, and charge-transfer reactions. Based on the Born–Oppenheimer approximation, the behavior of molecules is described by the dynamics of the nuclei moving along a single potential energy surface (PES) generated by the electrons. Nonadiabatic phenomena occur when at least two potential energy surfaces approach each other and the coupling between them becomes important. The traditional

Reviews in Computational Chemistry, Volume 23 edited by Kenny B. Lipkowitz and Thomas R. Cundari Copyright ß 2007 Wiley-VCH, John Wiley & Sons, Inc.

83

84

Conical Intersections in Molecular Systems

way of studying nonadiabatic phenomena involves the concepts of avoided and intersystem crossings. As two PESs approach each other, the rate of nonadiabatic processes depends on the energy separating those two surfaces. In recent years, ultra-fast experimental techniques have allowed the observation of nonadiabatic processes that take place in femtoseconds,2 and these ultra-fast rates cannot be explained with the traditional theories. Conical intersections which are the actual crossings of two PESs, however, can facilitate rapid nonadiabatic transitions. Conical intersections were known mathematically since the 1930s,3,4 but they were regarded as mathematical curiosities rather than a useful concept for explaining photochemistry. The reason for this neglect is stated in a review written by Michl in 1974 that summarized the conception of conical intersections at that time.5 Michl stated that true surface touching ‘‘is a relatively uncommon occurrence and along most paths such crossings, even if ‘intended’, are more or less strongly avoided.’’ The modern era of nonadiabatic studies started in the 1990s when algorithms were developed that allowed for the location of conical intersections without the presence of symmetry.6,7 These algorithms have since revealed that conical intersections occur in the excited states of many molecules and are far from uncommon.8–14 In fact, Mead and Truhlar have shown that if an avoided crossing is found, it is more likely that a true conical intersection will be close by.15 We have progressed a long way since the first theoretical descriptions of conical intersections, and the abstract mathematical formulations of the previous century can now be used to study, even in quantitative terms, systems important in real life. Conical intersections can, and do, affect the photophysics and photochemistry of systems, or their spectroscopy, especially when the ground state is one of the intersecting states. In the last few years, this field has become a ‘‘hot’’ area with a growing appreciation by scientists of the importance of conical intersections in chemical dynamics. Several reviews have been written on the subject,8–14,16 and a book was published recently giving the theoretical formulation for conical intersections in structure and dynamics.17 A recent Faraday Discussion meeting brought together leaders in the field to discuss the current state of nonadiabatic methods, where conical intersections played a central role.18 In this pedagogically driven chapter, we present a basic introduction to the field and provide some examples that illustrate how conical intersections can explain the mechanism of nonadiabatic processes. The last section of this review presents recent developments that have extended the computational tools into cases beyond the most common case of two nonrelativistic intersecting states. This section includes cases of (1) three nonrelativistic intersecting states, and (2) two intersecting states that incorporate the spin-orbit coupling. This chapter is not intended as a comprehensive review of the field.

General Theory

85

GENERAL THEORY The Born–Oppenheimer Approximation and its Breakdown: Nonadiabatic Processes The time-independent Schro¨dinger equation for a molecule with N nuclei and M electrons is Hðr; RÞ ðr; RÞ ¼ E ðr; RÞ

½1

where R denotes all the nuclear coordinates and r all the electronic coordinates. The total nonrelativistic Hamiltonian of the system is given by Hðr; RÞ ¼ T nuc þ H e ðr; RÞ

½2

where T nuc is the nuclear kinetic energy operator and H e ðr; RÞ is the electronic Hamiltonian, which includes the electronic kinetic energy and the Coulomb interactions between the particles. H e ðr; RÞ depends parametrically on R. Within the Born–Oppenheimer (adiabatic) approximation,1 the coupling between nuclear and electronic degrees of freedom is ignored and the total wavefunction is assumed to be a product of a nuclear wavefunction wðRÞ and an electronic wavefunction cðr; RÞ, ðr; RÞ ¼ cðr; RÞwðRÞ

½3

Thus, the nuclear and electronic parts are separated, and solving the electronic Schro¨dinger equation provides the electronic eigenfunctions cI H e ðr; RÞcI ðr; RÞ ¼ EeI ðRÞcI ðr; RÞ

½4

Inserting the electronic solution back into the Schro¨dinger equation for the whole system, Eq. [1], and neglecting the effect of the nuclear kinetic energy operator on the electronic wavefunction, cI , gives ðT nuc þ EeI ÞwI ¼ EwI

½5

which will provide the nuclear wavefunction wI and total energy E. Solving Eqs. [4] and [5] is the task of theoretical chemistry. Electronic structure methods capable of solving the electronic problem have progressed enormously during the past 40 years and standardized computational models have emerged. John Pople received the Nobel Prize for Chemistry in 199819 for being one of the pioneers of this evolution. Solution of the electronic part of the Hamiltonian provides structures, reaction paths and transition

86

Conical Intersections in Molecular Systems

states for the study of chemical reactions, electronic energies for obtaining spectra, and many other static properties. To understand the detailed dynamics of chemical systems, however, the nuclear equation also has to be solved. The solution of this part of the Schro¨dinger equation has not been standardized yet. Furthermore, the quantum solution of Eq. [5] is so cumbersome that only molecules with a few atoms can be solved quantum mechanically. In most other cases, a classical or semi-classical method has to be employed in order to study the dynamics of nuclei. The Born–Oppenheimer approximation assumes that the electronic and nuclear motion are well separated and they do not interact; but this assumption is not always true. In a more rigorous treatment, the total wavefunction is not a product of the electronic and nuclear wavefunctions but rather an expansion in terms of the electronic wavefunctions cI ðr; RÞ20 ðr; RÞ ¼

X I

cI ðr; RÞwI ðRÞ

½6

where wI ðRÞ are expansion coefficients. The electronic wavefunctions are obtained by solving the electronic equation (Eq. [4]), and, because they form a complete set, the above expansion is exact when not truncated.21 The expansion coefficients wI can be obtained by inserting Eq. [6] into Eq. [1], multiplying by cI , and integrating over electronic coordinates. The Schro¨dinger equation then becomes 

T nuc 

 X 1 1 KII ðRÞ þ EeI ðRÞ wI ðRÞ ½KIJ ðRÞ þ 2f IJ ðRÞ  rwJ ðRÞ ¼ EwI ðRÞ 2m 2m J6¼I ½7

1 where the nuclear kinetic energy is taken as T nuc ¼  2m r2 , r refers to the gradient over the nuclear coordinates R, and m is a reduced mass. KIJ and f IJ are coupling terms that were neglected in the Born–Oppenheimer approximation; they are responsible for nonadiabatic transitions between different states I and J. They originate from the nuclear kinetic energy operator operating on the electronic wavefunctions cI ðr; RÞ and are given by

f IJ ðRÞ ¼ hcI ðr; RÞjrcJ ðr; RÞi

½8

KIJ ðRÞ ¼ hcI ðr; RÞjr2 cJ ðr; RÞi

½9

and

The brackets in Eqs. [8] and [9] denote integration over electronic coordinates r. The diagonal term KII corresponds to nonadiabatic corrections to a single

General Theory

87

potential energy surface that can usually be neglected. f IJ ðRÞ is the derivative coupling, a vector of dimension N int ¼ 3N  6, where N is the number of atoms in the molecule. The diagonal term f II is zero for real wavefunctions and KIJ can be expressed in terms of f IJ .21 The derivative coupling f IJ is a measure of the variation of the electronic wavefunction with nuclear coordinates and depends on the energy difference between states I and J (see the section on Derivative Coupling). When the couplings KIJ and f IJ are neglected, Eq. [7] reduces to that derived from the Born–Oppenheimer approximation. When the states are well separated, the coupling is small and the Born–Oppenheimer approximation is valid. If, however, the electronic eigenvalues are close, a small change in the nuclear coordinates may cause a large change in the electronic wavefunctions, a situation where the coupling becomes important and the more general Eq. [7] has to be used. Usually, only a small number of electronic states are close in energy, and the expansion of the total wavefunction is truncated to a small number of interacting states, most often two.

Adiabatic-Diabatic Representation In Eq. [7], the electronic wavefunctions are taken as the eigenfunctions of the electronic Hamiltonian. In this case, all the coupling matrix elements HIJ ¼ hcI jH e jcJ i are zero, and the coupling between different electronic states occurs through the nuclear kinetic energy terms; this formulation is called the adiabatic representation.17 Alternatively, a diabatic representation can be used.22–26 In this representation, the electronic wavefunctions used to expand the total wavefunction are not the eigenfunctions of the electronic Hamiltonian, but they are chosen so as to eliminate the derivative coupling. Therefore, the coupling terms do not appear in the Schro¨dinger equation, but the matrix element HIJ ¼ hfI jH e jfJ i is nonzero, which is the term responsible for the coupling of states, ½T nuc þ HII wI þ

X

Jð6¼IÞ

HIJ wJ ¼ EwI

½10

Note that fI are electronic wavefunctions that are not eigenfunctions of H e . In every realistic case (except in diatomic molecules) where the sum over states J is truncated, the derivative coupling cannot vanish completely for every R,27 but it can become negligibly small. Making f IJ very small corresponds to choosing the electronic wavefunctions so that they are always smooth functions of the nuclear coordinates. Physically diabatic functions maintain the character of the states. For example, assume that a diabatic state f1 corresponds to a covalent configuration and another diabatic state f2 corresponds to an ionic configuration. Before a nonadiabatic transition occurs, the adiabatic states will be the same, c1 ¼ f1 and c2 ¼ f2 . After a nonadiabatic

88

Conical Intersections in Molecular Systems

transition, f1 and f2 remain as the covalent and ionic configurations, respectively, but the adiabatic states will have switched, i.e., c1 ¼ f2 and c2 ¼ f1 . The problem with the diabatic representation is that, as already mentioned, it is not possible to have the nonadiabatic couplings zero for every R, so a strictly diabatic representation does not exist. Furthermore, although the adiabatic representation is unique and well defined (by the diagonalization of the electronic Hamiltonian), it is not true for the diabatic representation. The diabatic representation has other advantages, however, that make it the method of choice for studying nuclear dynamics in many cases. In the adiabatic representation, the coupling term is a vector, whereas in the diabatic representation, it is only a scalar and hence much easier to use. The diabatic representation is smooth but the nonadiabatic couplings in the adiabatic representation have singularities at the conical intersections. Many schemes for the construction of diabatic states have been developed, and detailed discussions about their construction can be found in several reviews.25,26

The Noncrossing Rule Neumann and Wigner proved, in their seminal work in 19293 that, for a molecular system with N int internal nuclear coordinates (N int ¼ 3N  6), two electronic surfaces become degenerate in a subspace of dimension N int  2. To illustrate this dimensionality rule, consider two intersecting adiabatic electronic states, c1 and c2 . These two states are expanded in terms of two diabatic states f1 and f2 , which are diagonal to all the remaining electronic states and to each other,28 c1 ¼ c11 f1 þ c21 f2 c2 ¼ c12 f1 þ c22 f2

½11

½12

The electronic energies are the eigevalues of the Hamiltonian matrix He ¼



H11 H21

H12 H22



½13

where Hij ¼ hfi jH e jfj i. The eigenvalues of He are given by  E1;2 ¼ H

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 H 2 þ H12

½14

 ¼ ðH11 þ H22 Þ=2 and H ¼ ðH11  H22 Þ=2. The eigenfunctions are where H c1 ¼ cos ða=2Þf1 þ sin ða=2Þf2 c2 ¼  sin ða=2Þf1 þ cos ða=2Þf2

½15 ½16

General Theory

89

where a satisfies H12 sin a ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 H 2 þ H12

H11  H22 cos a ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 2 H 2 þ H12

½17 ½18

For the eigenvalues of this matrix to be degenerate, two conditions must be satisfied H11  H22 ¼ 0 H12 ¼ 0

½19 ½20

In an N int -dimensional space, the two conditions are satisfied in an N int  2 subspace. This subspace, where the states are degenerate, is called the seam space. The two-dimensional space orthogonal to it, where the degeneracy is lifted, is called the branching or g  h space.11,28 So, conical intersections are not isolated points in space. Rather, they are an infinite number of connected points forming the seam. For a diatomic molecule that has only one degree of freedom, it is not possible for two electronic states of the same symmetry to become degenerate, and this restriction is often called the noncrossing rule. For polyatomic molecules, in contrast, there exist enough nuclear degrees of freedom and their electronic states can thus become degenerate,4 although the above rule does not guarantee this degeneracy will happen, i.e., that there exists a solution for Eqs. [19] and [20].

The Geometric Phase Effect It was first pointed out by Longuet-Higgins and Herzberg29,30 that a real electronic wavefunction changes sign when traversing around a conical intersection. Mead and Truhlar31 incorporated this geometric phase effect into the single electronic state problem and Berry generalized the theory.32 As a result of his work, this effect is often called the Berry phase.32 Equations [15] and [16] illustrate this effect. When a changes from a to a þ 2p, the wavefunctions will change sign, i.e., c1 ða þ 2pÞ ¼ c1 ðaÞ and c2 ða þ 2pÞ ¼ c2 ðaÞ.33 As the total wavefunction must be single valued, the electronic wavefunction in the adiabatic representation should be multiplied by a phase factor ensuring that the total wavefunction remains single valued. As a consequence, the geometric phase can affect nuclear dynamics even when a single potential energy surface is considered.34–36 The geometric phase effect can be considered as a signature of conical intersections and its presence is a proof that a true conical intersection has been found.

90

Conical Intersections in Molecular Systems

Conical Intersections and Symmetry Symmetry-Required Conical Intersections: The Jahn–Teller Effect Degenerate electronic states may exist when a molecular system has high symmetry. In this case, the requirements for degeneracy (Eqs. [19] and [20]) are satisfied by symmetry alone. The Jahn–Teller effect, which refers to these symmetry-induced degenerate states, has been known and studied for a long time. The Jahn–Teller theorem, published in 1937,37,38 states that a molecule in an orbitally degenerate electronic state is unstable and will distort geometrically to lift the degeneracy. Several books focusing on the Jahn–Teller effect have been written.39–41 Bersuker contributed to many of these volumes, and he recently published a review on the many advances made in this area.42 Common examples of the Jahn–Teller problem include the doubly degenerate E e problem and the triply degenerate T ðe þ tÞ problem. In a classic Jahn–Teller problem, such as the ‘‘mexican hat’’ in the E e problem, analytic expressions are used to model the region around the degeneracy. In this manner, bound vibronic states can be derived, which provide a means for experimental verification and study of conical intersections.43 A linear system does not exhibit the Jahn–Teller effect. Instead, these systems display a similar effect, the Renner–Teller effect, where the first-order coupling is zero, and the degeneracy is not lifted linearly but only in quadratic order. Symmetry-Allowed Conical Intersections When two states that cross are of different symmetry, the requirement that H12 ¼ 0 is satisfied by symmetry. The second requirement (that H ¼ 0) is satisfied in a subspace of dimension N int  1, where N int corresponds to the internal coordinates that retain the symmetry. This crossing is not a conical intersection, but it becomes one when the symmetry breaking coordinate that can couple the two states is included. For example, BH2 has a crossing between states A1 and B2 in C2v symmetry.44 In this symmetry, BH2 has two internal coordinates, the symmetric stretch and bending. In this space, the crossing occurs in a space of dimension 2  1 ¼ 1, i.e., a line. If the asymmetric stretch is included, the molecule has, in general, Cs symmetry, and both of the states that are crossing have A0 symmetry. The crossing between those two states is a conical intersection of dimension 3  2 ¼ 1 (termed accidental conical intersection, as discussed below). Accidental Conical Intersections Most molecular systems in nature have little or no symmetry, and it is in these systems that accidental conical intersections often exist. Locating accidental points of degeneracy is more difficult than the previous cases because there is no symmetry that can be used for guidance. This difficulty, along with the misinterpretation of the noncrossing rule, delayed the appreciation of accidental conical intersections. One of the early cases where accidental

General Theory

91

conical intersections were found is ozone.45 The global minimum of the ground state of ozone has C2v symmetry, but there exists a second local minimum at higher energies with D3h symmetry. The two ozone minima are separated by a transition state that lies close to a conical intersection between the 11 A1 and 21 A1 electronic states. In contrast to the previous example of BH2 , the intersecting states of ozone belong to the same irreducible representation in C2v symmetry. As there are two degrees of freedom in this symmetry (the symmetric stretch and bending), the dimension of the seam is 2  2 ¼ 0, so there is only a point of degeneracy. In the lower Cs symmetry, ozone has three degrees of freedom and the seam has dimension 3  2 ¼ 1. Thus, the two states cross along a line in Cs symmetry, and this line contains a single point at which the molecule has C2v symmetry.

The Branching Plane The matrix elements of H e when expanded in a Taylor expansion to first order around the point of conical intersection R0 become28,46  ðR0 Þ  dR  ðRÞ ¼ H  ðR0 Þ þ rH H

HðRÞ ¼ 0 þ rðHÞðR0 Þ  dR H12 ðRÞ ¼ 0 þ rH12 ðR0 Þ  dR

½21

½22 ½23

The requirements for a conical intersection at R (Eqs. [19] and [20]) then become rðHÞ  dR ¼ 0 rH12  dR ¼ 0

½24

½25

so that dR must be orthogonal to the subspace spanned by the vectors rðHÞ and rH12 for the degeneracy to remain. The subspace defined by the two vectors, where the degeneracy is lifted linearly, is the branching or g  h space.9,28 The intersection-adapted coordinates are defined by the unit vectors along the energy difference gradient and the coupling gradient, respectively, x ¼ g=g ¼ rðHÞ=g y ¼ h=h ¼ rH12 =h

½26 ½27

where Yarkony’s notation g ¼ rðHÞ and h ¼ rH12 has been used.11 In Eqs. [26] and [27], g, h are the norms of the corresponding vectors. Although these vectors are defined here for the two-state model under consideration, the description can be generalized for actual ab initio wavefunctions.9 Quasidegenerate perturbation theory can be used to describe the region around a

92

Conical Intersections in Molecular Systems

conical intersection, thus providing a way to formally describe this region.47,48 The Hamiltonian matrix of Eq. [13] in the branching plane becomes He ¼ ðsx x þ sy yÞI þ



gx hy

hy gx



½28

where x; y are displacements along the g; h directions, respectively, sx and sy  onto the branching plane, and I is a 2 2 unit are the projections of rH matrix. The energies after diagonalization are E1;2 ðx; yÞ ¼ sx x þ sy y

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðgxÞ2 þ ðhyÞ2

½29

If the energy of the two states is plotted around the conical intersection along the two special coordinates, the potential will have the form of a double cone. Figure 1 shows the three-dimensional plots of the energy of two intersecting states (a) along the branching plane and (b) along one of the branching coordinates and a seam coordinate. One can see that if the x; y axes are the branching coordinates, the degeneracy is lifted and the double cone is formed. If the x coordinate is a branching coordinate but the other coordinate is a seam coordinate, the degeneracy is lifted only along one coordinate and a double wedge is formed. Using cylindrical polar coordinates x ¼ r cos y, y ¼ r sin y, the above Hamiltonian becomes He ¼ rðsx cos y þ sy sin yÞI þ r



g cos y h sin y

h sin y g cos y



½30

E

E

y 0.4 3

x

3.1

r (a.u) (a)

3.2

3.3

0 –0.2 –0.4 3.4 –0.6

0.6

0.2

x (a.u)

(b)

Figure 1 The energies E1 ; E2 of two intersecting states plotted (a) along the branching plane and (b) along a branching coordinate and a seam coordinate.

General Theory

93

From Eq. [17], tan a ¼

h sin y h ¼ tan y g cos y g

½31

Thus, the angle a, which relates a diabatic representation to the adiabatic representation, is related to the angle y defined from the intersection-adapted coordinates.

Characterizing Conical Intersections: Topography Conical intersections are characterized by their topography.11,28 The topography of the PESs in the vicinity of a conical intersection plays a significant role in the efficacy of a conical intersection’s ability to promote a nonadiabatic transition.11,28,49–53 This topography is described to first order by the expression of the energies in Eq. [29] in terms of displacements x; y. The topography of the cone in the branching plane is given in terms of the set of parameters g; h; sx ; sy defined in the previous section. The parameters g; h give the slope of the cone in the two directions x; y and the parameters sx ; sy give the tilt of the cone. A vertical or peaked conical intersection is a conical intersection in which the sx and sy parameters are zero. If one or both of these parameters is nonzero, the conical intersection will be sloped. The cone is also characterized by the difference in the slopes, g and h. A symmetric cone is one in which the slopes g and h are equal, whereas an asymmetric cone has different slopes. Figure 2 shows a cone that is asymmetric and tilted mostly in one direction. The parameters for this cone in atomic units are g ¼ 0:13; h ¼ 0:02, sx ¼ 0:20, and sy ¼ 0:00. The three-dimensional plot is shown in panel (a), whereas the y ¼ 0 and x ¼ 0 planes are shown in panels (b) and (c), respectively. The cone along the x direction is steep and tilted, whereas along the y direction it is very flat and vertical because sy is zero (see Figure 2c). Figure 2d shows the energies around the cone as a function of the polar coordinate y. The topography of the cone affects the system’s dynamics. Simple classic arguments can rationalize the way topography affects a trajectory: Vertical cones facilitate transitions from the upper surface to the lower surface whereas tilted cones are less efficient.28,51 Actual quantum mechanical calculations have confirmed these generalizations.51,54 The efficacy of a conical intersection in promoting a nonadiabatic transition reflects the topography in the vicinity of a conical intersection.11 The g and h vectors represent nuclear displacements similar to the normal modes of a molecule. As the wavefunctions of the degenerate states can mix arbitrarily, these vectors are not unique. A unitary transformation can be used to rotate them in a way that makes them orthogonal to each other without changing the form of the Hamiltonian.55 The two vectors then span

94

Conical Intersections in Molecular Systems 0.2

E (a.u.)

0.15 0.20 0.15 0.10 0.05 0.00 –0.05 –0.10 –0.15 –0.20

0.1

E1

E (a.u.)

E2

0

E1

–0.05 –0.1

x (a.u.)

–0.4

–0.2 0

–0.15

0.2 0.4 y (a.u.)

(a)

–0.2

(b) 0.2

0.3

0.15

E (a.u.)

E(a.u)

–0.2

0

0.2

0.4

0.2

0.4

0.1

E2

0.1 0

–0.1

0.05

E2

0 –0.05

E1

–0.2

E1

–0.1

–0.3 –0.4 0

–0.4

x(a.u.)

0.4

0.2

(d)

E2

0.05

–0.15 50

100

150

200

250

θ (degrees)

300

350

–0.2

(c)

–0.4

–0.2

0

y(a.u.)

Figure 2 The energies E1 ; E2 of two intersecting states plotted (a) along the branching plane, (b) along the x direction, (c) along the y direction, and (d) along the polar coordinate y.

the branching plane and correspond to the molecular motion the system has when exiting the funnel. Examples of these vectors for conical intersections in the OHOH system and in uracil are given in Figure 3.56 If the molecule has symmetry at the point of the conical intersection, the vectors transform as irreducible representations of this group. The g vector is always totally symmetric because it represents the energy difference gradient of the two states. The coupling vector transforms as the direct product of the symmetry of the two states. For example, OHOH has linear symmetry, and a conical intersection between a  and a  state exists.57 The tuning vector g is symmetric (s), whereas the coupling vector h has p symmetry and distorts the linearity (Figure 3a). Uracil is planar at the conical intersection described by these vectors. The two states intersecting are A0 and A00 , so the two vectors transform as a0 and a00 . They are shown in Figure 3b. After a system on the higher surface encounters a conical intersection, it can emerge through the conical intersection to the lower surface. The conical intersection tends to orient the molecular motion in the directions defined by the branching plane. Accordingly, the outcome of a photochemical reaction

General Theory

95

h

g O

O

H

H

O

O

H

H

(b)

(a)

Figure 3 The g and h vectors defining the branching plane (a) for OHOH, (b) for uracil (reproduced with permission from Ref. 56).

and its associated branching ratios, for example, will be affected by the branching vectors and the gradient of the surfaces.14,57,58 As an example, we consider the reaction of ground state radicals OH(X) reacting with excited radicals OH(A).57 Figure 3a shows the g and h vectors at the conical intersection    as described above. Inspection of these vectors provides a guess as to what the products will be. Displacement along the positive tuning vector brings the two hydrogens close to oxygen suggesting the formation of water, whereas displacement along the negative tuning direction shows the tendency to form OH þ OH. Displacement along the coupling vector will bend the HOH unit leading to formation of water. Figure 4 shows a cartoon of the possible directions the system can take and the suggested products in each case. These speculations were confirmed by calculating ab initio, gradient-directed paths.57

Routing Effect

E

OH(A)+OH(X)

Quenching to OH(X)+OH(X) g h

Reaction to H 2 O+O

Figure 4 Cartoon of the possible outcomes for the reaction OH(X) þ OH(A) after emerging from a conical intersection.

96

Conical Intersections in Molecular Systems

Derivative Coupling The efficiency of a radiationless transition between two states depends not only on the energy difference between those states but also on the derivative coupling f IJ of the states. The derivative coupling appears in the equations that describe nonadiabatic nuclear dynamics (Eq. [7]). In the adiabatic representation, the derivative coupling is needed to carry out nuclear dynamics. The diabatic representation is defined by setting the derivative coupling equal to zero, which means that efficient ways to transform between the diabatic and adiabatic representations use the derivative coupling.26,55,59 By using the gradient operator r on the electronic Schro¨dinger equation H e cI ¼ EeI cI , multiplying by cJ , and integrating over electronic coordinates, Eq. [32] is obtained for the derivative coupling f IJ ðRÞ ¼

hcI jrHe jcJ i EeJ  EeI

½32

This expression shows that the derivative coupling is inversely proportional to the energy difference between the two states, so when the two states approach each other, the derivative coupling becomes large. At the conical intersection, the energy difference is zero and the derivative coupling becomes infinity. By differentiating the orthonormality condition for the wavefunctions cI , hcI jcJ i ¼ dIJ , one obtains the following properties f IJ ðRÞ ¼ f JI ðRÞ f II ðRÞ ¼ 0

½33

½34

so the derivative coupling is antihermitian. Transforming the derivative coupling to intersection-adapted coordinates and polar coordinates, the singular part at the conical intersection is restricted at the y component.11 Using these transformations, the derivative coupling is given by 

q 1 1 fy ¼ c1  c2 r r qy 

q ¼ cos ða=2Þf1 þ sin ða=2Þf2  ð sin ða=2Þf1 þ cos ða=2Þf2 Þ qy

½35

which becomes

1 1 qa fy  r 2r qy

½36

Electronic Structure Methods for Excited States

97

  q   if f1 and f2 are quasidiabatic states f1 qy f2 ¼ 0 . At the conical intersection, r ¼ 0 and fy is infinity. Alternatively, the opposite direction can be pursued where one can define the diabatic states f1;2 by setting fy equal to zero and finding the necessary transformation a.

ELECTRONIC STRUCTURE METHODS FOR EXCITED STATES Although nonadiabatic effects imply the breakdown of the Born–Oppenheimer approximation, they are still studied within the framework of the Born– Oppenheimer approximation, in which case the electronic Hamiltonian must be solved first. The electronic structure method that will be used to provide the energies and gradients of the states involved is very important for an accurate description of conical intersections. Ab initio electronic structure methods have been used for many years. Treating closed-shell systems in their ground state is a problem that, in many cases, can now be solved routinely by chemists using standardized methods and computer packages such as GAUSSIAN.60 In an ab initio approach, the first step is to solve the Hartree–Fock problem using a suitable basis set. In the Hartree–Fock model, each electron experiences only the average potential created by the other electrons. In reality, the instantaneous position of each electron, however, depends on the instantaneous position of the other electrons; but the Hartree–Fock model cannot account for this electron correlation. In order to obtain quantitative results, electron correlation (also referred to as dynamical correlation) should be included in the model and there are many methods available for accomplishing this task based on either variational or perturbation principles.61 The easiest method to understand conceptually is variational configuration interaction (CI).62 In this method, the electronic wavefunction is expanded in terms of configurations that are formed from excitations of electrons from the occupied orbitals in the Hartree–Fock wavefunction to the virtual orbitals. The expansion can be written as cI ¼

CSF N X

cIa ca

a¼1

½37

The basis of the expansion, ca , are configuration state functions (CSFs), which are linear combinations of Slater determinants that are eigenfunctions of the spin operator and have the correct spatial symmetry and total spin of the electronic state under investigation.62 The energies and wavefunctions are provided by solving the equation ½He ðRÞ  EeI ðRÞcI ðRÞ ¼ 0

½38

98

Conical Intersections in Molecular Systems

where He ðRÞ is the electronic Hamiltonian in the CSF basis. Usually, the expansion includes only single and double excitations and is referred to as CISD. If excitations from all occupied to all virtual orbitals are included, the method is a full CI (FCI) and gives the exact answer for the basis set used. In almost all cases, however, FCI yields a very large expansion and has to be truncated. A different approach for including electron correlation is perturbation theory. Perturbation models at various levels exist, denoted as MPn, where n is the order at which the perturbation expansion is terminated. The most popular model, MP2, goes up to second order in perturbation theory and is used extensively for accurate geometrical optimizations and reaction energies. A third approach to include dynamic electron correlation is the coupled cluster method.63 In this approach, the wavefunction is written as c ¼ eðT1 þT2 Þ c0

½39

where T1 and T2 are operators specifying single and double excitations, respectively, and c0 is the Hartree–Fock wavefunction. If single and double excitations are included, the method is denoted CCSD, and if an approximate way to include triple excitations is used, the method is denoted CCSD(T). Coupled cluster methods represent the most sophisticated methods to account for dynamical correlation when a single electronic configuration is a good first-order description of the chemical system. Apart from the above wavefunction-based methods, dynamic correlation can be included by the use of density-based methods, the density functional approaches.64 Density functional theory (DFT) has gained unprecedented popularity in recent years because of its success in predicting ground state structures with small computational effort. However, when treating excited states, the situation becomes more complicated. The simplest method that can be used to study excited states is CIS, which is equivalent to Hartree–Fock for the ground state. This method does not give quantitative results but can be used as a starting point. More sophisticated methods are discussed below.

Multiconfiguration Self-Consistent Field (MCSCF) As studying excited states requires the equivalent treatment of all electronic states, single reference methods cannot best describe excited states. The most straightforward way to treat excited states is the use of multireference methods. Multireference methods are extensions of the single reference Hartree–Fock or CI methods, where many configurations are used instead of a single configuration. Multiconfigurational methods are appropriate not only for excited states but also for ground states with multiconfigurational character, i.e., where one configuration is not sufficient to describe them. These problems require treatment of the nondynamical correlation, which occur because of near degeneracies and rearrangement of electrons within partly filled shells.62

Electronic Structure Methods for Excited States

99

One of the most frequently used methods to study excited states is MCSCF. In the MCSCF method, the wavefunction is written as a linear combination of CSFs and the molecular orbitals and coefficients of the expansion are simultaneously optimized using the variational principle. The choice of which configurations to be included is critical and depends on the chemical nature of the problem. This selection process is the most difficult step in setting up an MCSCF calculation, and the flexibility in the choice of the active space and its dependence on the investigator’s intuition has been a criticism of these methods. As a result of this flexibility, MCSCF methods cannot be used as part of a ‘‘black box’’ computational procedure. The configurations are usually generated by excitations of electrons within an active space. A very useful approach is the complete active space MCSCF designated as CASSCF.65,66 In this case, the configurations are generated by a full CI within the active space, i.e., all possible excitations of the active electrons within the active orbitals are used. Nevertheless, one must still choose the orbitals to be included in the active space. For small systems, a full valence active space can be used, where all the valence orbitals of all atoms are included, which is not possible for larger systems and compromises between rigor and computing time have to be made. In organic aromatic molecules, for example, one can specify as the active space the p orbitals in order to study p ! p states. If lone pairs that contribute to the excited states exist, they too should be part of the active space. Other types of excited states include Rydberg states, and, in order to be studied efficiently, the active space should include the Rydberg orbitals. If both valence and Rydberg states are being considered, or if the states have mixed character, the active space should include orbitals of both types. Rydberg states require additional considerations for a proper treatment. For example, because of their diffuse character, the basis set has to include diffuse functions of the appropriate type.

Multireference Configuration Interaction (MRCI) To include dynamical correlation into a quantum calculation, one must go beyond the MCSCF approach. A multireference configuration interaction model (MRCI) is a CI expansion in which many electronic configurations are used as references instead of using a single Hartree–Fock reference. The final expansion is a linear combination of all the references and of the configurations generated from single and double excitations out of these references to the virtual orbitals. The first step in an MRCI calculation is to generate the orbitals, and an MCSCF calculation is usually used for this step. When studying excited states, an average-of-states MCSCF is needed to guarantee good orbitals for all the states under consideration. When one is interested simply in excitation energies, each state can be calculated separately by optimizing the orbitals for each state. When conical intersections are involved, however, it is important to use orbitals that are common to all intersecting states. The

100

Conical Intersections in Molecular Systems

second step in an MRCI calculation is to choose the references, which are typically determined using the same logic as in MCSCF. Technically, it is not necessary to choose the same reference as in MCSCF, but it is the most common and safest procedure to follow. The third MRCI step is to generate all of the configurations that will be used. These configurations are generated by single and double excitations to the virtual orbitals, i.e., those unoccupied in the MCSCF calculation. One way to generate these configurations is to include all single excitations out of the active space orbitals (First Order CI, FOCI), or to include all single and double excitations out of the active space orbitals (Second Order CI, SOCI). With FOCI or SOCI, all occupied orbitals that are not in the active space are frozen, meaning that there are no excitations out of those occupied orbitals. In a more general procedure, part of the occupied orbitals are frozen and part of them are allowed to participate in excitations. If the molecule is small, there is no need to freeze any orbitals, but as the size of the system under study increases, one is forced to freeze some orbitals to maintain computational tractability. Typically, the core orbitals can be frozen without significant loss of accuracy, but freezing more orbitals introduces errors that can be substantial. For example, it has been seen in organic molecules with p orbitals that, by freezing the s orbitals, one obtains poor results. For such molecules s  p correlation is important.67 The MRCI method is very accurate provided all the important configurations are included in the expansion. This requirement can be satisfied for small systems, but as the size of the system increases, the expansion becomes prohibitively large and truncations are necessary. If one is interested in excitation energies at a single geometry (most often vertical excitations), different expansions for the different states can be used to reduce the size of the calculation. Alternatively, the configurations can be truncated based on some selection criterion. If, however, one is interested in the excited states over a range of geometries, the relative importance of configurations changes and the truncation cannot be used easily. An important development in MRCI involves ‘‘direct methods,’’ in which the Hamiltonian matrix to be diagonalized is not stored on disk. Requiring the Hamiltonian matrix to be stored on disk creates a limitation on the size of the expansions that can be used. Today, the diagonalization is done directly, thus enabling expansions of millions to billions of CSFs.68 Also, analytic gradients have been developed for MRCI wavefunctions,69–72 which is a huge advantage in computing cost when using these wavefunctions for studying conical intersections. The COLUMBUS suite of programs,73 for example, has algorithms for studying conical intersections and derivative couplings,74,75 that rely on analytic gradient techniques.69–71 An efficient, internally contracted MRCI method has been developed by Werner and Knowles76 and is implemented in the MOLPRO suite of programs.77 This program is widely used, but it has the disadvantage of not including analytic gradients.

Electronic Structure Methods for Excited States

101

Complete Active Space Second-Order Perturbation Theory (CASPT2) A very efficient method to calculate excited states based on perturbation theory has been developed by Roos et al.78,79 This method, called CASPT2, has been implemented in the ab initio package MOLCAS.80 Other implementations of perturbation theory for excited states have also been developed81–83 and exist in other computational packages such as in GAMESS.84 The CASPT2 method78,79 perturbatively computes, through second order, the dynamical correlation using a single CASSCF reference state and a non-diagonal zeroth-order Hamiltonian H0 . This method has been used for the study of many systems of various sizes and it reproduces experimental excitation energies with high accuracy.85 It is the method of choice for systems with more than 10 to 15 atoms. The first step in a CASPT2 study is again to obtain the orbitals through an MCSCF method. Then the active space is selected and the second-order perturbation theory calculation is performed using the MCSCF reference wavefunction. The CASPT2 method cannot treat near degeneracies efficiently, because, in these cases, the CASSCF wavefunction is an insufficient reference state for the perturbation calculation. Multistate perturbative methods have been developed81,86 that avoid this problem and have been found to perform well at avoided crossings and when valence-Rydberg mixing occurs.86 Serano et al.87 recently investigated the possibility of using CASPT2 and multistate CASPT2 for locating actual conical intersections. These authors relied on numerical derivatives for locating conical intersections.87 They concluded that these methods can lead to nonphysical results when small active spaces are used. Another major disadvantage in geometry optimizations using CASPT2 is that no analytical derivatives are yet available for this method. Nevertheless, CASPT2 can be used to obtain refined energies at selected points (stationary points or conical intersections) optimized at the CASSCF level. This approach is used extensively at present, and it has been very useful in nonadiabatic problems in systems of moderate size.

Single Reference Methods A different approach for calculating excited states is based on indirect methods that allow one to calculate excitation energies based on a single reference starting point. Starting from a coupled cluster representation for the ground state equation-of-motion coupled cluster (EOM-CCSD)88,89 can be used to provide accurate excitation energies when the reference does not have a multiconfigurational character. Variations of the method can allow for more extended problems such as bond-breaking.90 Alternatively, when the system under consideration cannot be treated at this high level of theory, time-dependent density functional theory (TDDFT)91 provides excitation energies at a cost similar to that of DFT. These methods can predict vertical

102

Conical Intersections in Molecular Systems

excitation energies efficiently, but their extension to the description of excited state properties and PESs is more complicated and is currently under development.

Choosing Electronic Structure Methods for Conical Intersections In summary, the choice of electronic structure method one must make for studying conical intersections should be guided by the following considerations: (1) the intersecting states must be treated equivalently; (2) analytical derivatives should be available because any analysis of conical intersections involves evaluating the gradients of the surfaces; and (3) both dynamical and nondynamical correlation should be included. It is not possible, however, to always satisfy these criteria, especially when studying large systems. Analytic gradient techniques exist for CASSCF and MRCI wavefunctions, and efficient codes have been developed for locating conical intersections for both types of wavefunctions.6,7 By using the MRCI method, one can, in principle, satisfy all of the above criteria, but the scaling of CPU time with the size of the system being studied limits the applicability of this method to small-or medium-size systems. An alternative procedure, which is very common in current publications on excited states optimizations, is to choose a lower-level theory for geometry optimization but then to use a highly correlated method to obtain accurate energetics. A combination that has been used extensively for medium-size systems is to use CASSCF for the optimizations followed by CASPT2 for the energies. Although this procedure can be used to locate conical intersections, one must be careful because the location of degeneracies is more sensitive to the method used than is the location of minima. If the method chosen for optimization does not even give a correct qualitative description of the system, i.e., the correct ordering of states, then this approach will lead to wrong conical intersections.92,93 Another problem occurs when the dynamic electron correlation of the states that are crossing is substantially different. Under these conditions, the point of the conical intersection becomes an avoided crossing at the higher level of theory, with energy differences exceeding 0.5 eV.94 In most cases, the true point of conical intersection is not removed. Instead, it may be relocated at the higher level of theory at a geometry similar to that found by using the lower level of theory94. This is, however, an empirical observation and it is not guaranteed that it will always be the case.

LOCATING CONICAL INTERSECTIONS The development of analytic gradient techniques95 enables the efficient characterization of PESs by finding optimized structures for molecules, locating transition states, and establishing reaction pathways. Applications of analytical

Locating Conical Intersections

103

gradients can be carried out routinely and with great accuracy today for ground state surfaces, thus providing the means for understanding the structure of molecules and their mechanisms of reaction. Optimizing extrema of excited states in contrast is considerably less advanced because of the limited ab initio methodology available for studying excited states, as discussed in the previous section. To locate conical intersections efficiently, the nonadiabatic couplings are needed in addition to the gradients of the surfaces, although algorithms that do not require the coupling can be used. Analytic gradient techniques can be extended for the calculation of the nonadiabatic coupling.72,74,75 Methods for locating conical intersections have been developed based on Lagrange multiplier6,9,96–99 and projected gradient7,100 techniques. It was shown in the section on General Theory that conical intersections exist in hypersurfaces of dimension N int  2, so there is an infinite number of conical intersections. These algorithms seek the minimum energy point on the seam, although the seam can be mapped out along some coordinate by using geometrical constraints. Methods that implement Lagrange multiplier techniques for constrained minimizations use the Lagrange multipliers to incorporate the constraints for conical intersections, or for geometrical constraints.6,9,96–99 In the simplest version of these algorithms, the energy of one of the states is minimized with the constraint that the energy difference between the two states is zero. In more advanced versions, the additional constraint that the coupling HIJ is zero is added. Starting from a point R not at the conical intersection, the requirements for obtaining a conical intersection are EðRÞ þ gðRÞ  dR ¼ 0 hðRÞ  dR ¼ 0

½40 ½41

where E ¼ EeI  EeJ . When both criteria are used, the following Lagrangian is formed and minimized:6,9 LðR; l1 ; l2 Þ ¼ E1 ðRÞ þ l1 E þ l2 HIJ

½42

where l1 and l2 are Lagrange multipliers. Additional geometrical constraints can be imposed by adding them to the Lagrangian. By searching for extrema of the Lagrangian, a Newton–Raphson equation can be set up, 2

Q g 4g 0 h 0

3 2 3 32 rL dR h 0 54 dl1 5 ¼ 4 qL=ql1 5 dl2 qL=ql2 0

½43

which, when solved, provides the solution dR. The matrix elements of H are 2 2 2 L L L given by Qij ¼ qRq i qR , and the relations qRqi ql ¼ gi , qRqi ql ¼ hi have been used. j 1 2

104

Conical Intersections in Molecular Systems

This method has been implemented using analytic gradients from MRCI wavefunctions72,74,75,101 and has recently been added to the COLUMBUS suite of programs.73 Bearpark et al. developed a method that does not use Lagrange multipliers but uses projected gradient techniques instead.7 This approach minimizes the energy difference in the plane spanned by g and h and minimizes E2 in the remaining Nint  2-dimensional space orthogonal to the g  h plane.7 This method has been discussed in a previous chapter of this series.14 It uses MCSCF analytic gradients and has been implemented in the GAUSSIAN computer package.60 The derivative coupling can be calculated for CI or for MCSCF wavefunctions using analytic gradients from the expression72 f IJ ¼

hcI ½rHe cJ i X I J ca hca jrcb icb þ E a;b

½44

where ca are CSFs and cI and cJ are the CI coefficients for the adiabatic states I and J, respectively (see Eq. [37]). The first component in Eq. [44] corresponds to the CI contribution and is caused by the change of the CI coefficients, whereas the second component corresponds to the CSF contribution.

DYNAMICS The electronic structure description of conical intersections provides static information about the PESs and the mechanisms for nonadiabatic processes, comparable with the way that transition states provide mechanisms for ground state problems. In many cases, this static picture is not sufficient, so the kinetic energy of the nuclei must be considered. Once the electronic structure problem has been solved, and the PESs and nonadiabatic couplings are available, the nuclear part of the problem can be solved, thereby providing information about the dynamical evolution of the system. For systems with up to four atoms, a dynamical solution can be obtained by solving the quantum mechanical Schro¨dinger equation.49 Quantum dynamics requires information on the global PES, which is computationally intractable for systems containing many degrees of freedom. A reduced dimensionality model can be invoked for systems where all the degrees of freedom cannot be considered but where some aspects of the dynamics may be lost. Approximate methods, like the multiconfiguration time-dependent Hartree (MCTDH) method, can extend wavepacket propagation to larger systems.102 Systems with conical intersections and up to 24 degrees of freedom have been studied using MCTDH.102 The simplest alternative to quantum mechanics is to use classical mechanics, where many trajectories simulate the wavepacket evolution. Nonadiabatic processes

Applications

105

describe transitions between different electronic states and, because these nonadiabatic processes cannot be treated by purely classical methods, semiclassical methods have to be used. Trajectory-Surface-Hopping (TSH)103–107 and Ehrenfest dynamics108 are very popular methods currently in use for studying dynamics this way and have been reviewed and compared by Hack and Truhlar.107 In the surface-hopping models, classical trajectories are propagated on a single PES. When the transition probability to another surface becomes smaller than some criterion, the trajectory hops to the other surface. There are many variants of the hopping criterion that give rise to many surface-hopping models.103–107 In Ehrenfest dynamics, the force is obtained from an average potential obtained from the electronic structure. Both the PESs and the nonadiabatic couplings determine the trajectories.108 Another approach, Full Multiple Spawning (FMS), has been developed by Martinez and coworkers.109–111 In their method, the total wavefunction is a sum of the products of nuclear and electronic wavefunctions. The PES is recomputed on the fly by ab initio methods that are guided by molecular dynamics. In Martinez’s method, the nuclear wavefunction is expanded in terms of Gaussian basis functions with time-dependent coefficients CðtÞ. The time evolution of those coefficients is determined from the time-dependent Schro¨dinger equation. Each Gaussian basis has a position and a momentum that is determined by Hamilton’s equations of motion. In the nonadiabatic region, new wavefunctions are ‘‘spawned’’ onto the other electronic states. The focus of this chapter is on the electronic structure description of conical intersections, so only a very brief summary of dynamical methods is given. Extensive discussions can be found in other resources.17

APPLICATIONS The list of systems in which conical intersections have been studied is lengthy and one cannot account for all of them in a chapter like this. Here it is pointed out that most areas of chemistry are affected by conical intersections. Examples related to the author’s own research will be described in greater detail later to illustrate how conical intersections can be used to understand mechanisms of photophysical or photochemical processes. The first accidental conical intersections based on ab initio methods were 113 even before the found for triatomic systems O3 ,45 LiNaK,112 and for CHþ 4 availability of automatic search algorithms. Later, the availability of algorithms allowed for the study of many small systems. Systems greatly affected by conical intersections are small radicals important in atmospheric and combustion chemistry, and these systems have been studied extensively.9,16,43,114 Experimental spectroscopic studies of conical intersections are possible for Jahn–Teller systems, and typical radicals like C5 H5 and C6 Hþ 6 have been studied by Miller et al.16,43 A main advantage of small systems is that they are

106

Conical Intersections in Molecular Systems

easy to analyze theoretically and have served as prototype systems to test and extend the theory. In the area of organic photochemistry, extensive work has been done to examine the role of conical intersections in reaction mechanisms, and several reviews have been written highlighting the importance of conical intersections in photochemical reactions.12–14,115,116 A tutorial in this book series14 discusses how to study mechanisms of photochemical reactions using conical intersections, and other books that focus on photochemistry now include this topic in their discussions.117,118 Conical intersections have been found in most photochemical reactions, such as bond-breaking, bond-making, charge transfer, photoisomerization, and intramolecular electron transfer in organic radical cations. Conical intersections usually appear in the Jahn–Teller form in inorganic transition metal complexes because the high symmetry of such complexes allows for this symmetry-required type of conical intersection. For example, studies of complexes of metals with carbonyls revealed that conical intersections facilitate the photodissociation of CO.119 It should be noted, however, that a sufficient amount of work has not been done yet in this area to reveal whether accidental conical intersections exist and what role, if any, they play in photodissociation. As a result of the larger spin-orbit coupling in transition metal systems, there exists a higher probability for spin-forbidden transitions (intersystem crossing) than in nontransition metal systems. Matsunaga and Koseki have recently reviewed spin-forbidden reactions in this book series.20

Conical Intersections in Biologically Relevant Systems One of the emerging areas in which conical intersections are important is in biological systems. Nonadiabatic processes are common in photobiology, affecting essential processes in life like photosynthesis, light harvesting, vision, and charge transfer and in the photochemical damage and repair of DNA. Conical intersections are expected to participate actively in these processes, and current efforts are underway by several groups to study these effects.52,53,121–126 The size of these systems makes accurate quantum mechanical studies prohibitive, so, in many cases, the chromophore responsible for the photochemical behavior (which is usually a smaller molecule) is used as a model. Mixed quantum mechanical/molecular mechanical (QM/MM) methods can be used to incorporate the effect of the biological environment at the classical level. These methods often work well for biological systems where the nonadiabatic process is localized on the chromophore; however, they are of limited use when the effect is delocalized. For example, a problem such as charge transfer through DNA cannot be studied in this way, because the excited states of many chromophores participate. A few examples of these studies follow.

Applications

107

Vision involves cis-trans photoisomerization of a chromophore122 and many studies have been done using different models.52,53,121,123 For example, a CASSCF/AMBER procedure has been used to study the nonadiabatic dynamics of retinal in rhodopsin proteins.53 In another study, a simple model of a photosynthetic center was examined by Worth and Cederbaum.126 They proposed that the presence of conical intersections facilitated the long-range intermolecular photo-initiated electron transfer between the protein’s porphyrin and a nearby quinone. Semiempirical methods and QM/MM methods have been developed by Martinez and coworkers124,127,128 to study the cistrans isomerization dynamics of the Green Fluorescent Protein chromophore in solution, which occurs through conical intersections.124 The chromophore in this protein consists of two rings connected with a double bond and has been studied in vacuo as well.125 DNA/RNA Bases The effect of UV radiation on DNA is of great importance because it can lead to photochemical damage. A detailed understanding of the properties and dynamics of the excited states of the DNA and RNA bases is most relevant because they are the dominant chromophores in nucleic acids. It has been known for years that the excited states of the nucleobases are short-lived and the quantum yields for fluorescence are very low.129–131 Recent advances in experimental techniques have enabled the accurate measurement of their excited state lifetimes132 and found them to be on the order of femtoseconds, which suggests that nonradiative relaxation proceeds to the ground state on an ultra-fast time scale with the extra energy being transformed into heat.132 The photophysical behavior of nucleobases in the gas phase and solution is currently under investigation by many theoretical and experimental groups, with many questions still needing to be addressed. The mechanism for nonradiative decay for cytosine, adenine, and uracil has been investigated with quantum mechanical methods.56,92–94,133–137 Excited states in the nucleobases originate from electron excitations from p or lone pair n orbitals to p orbitals. Detailed calculations have been done for cytosine addressing the involvement of conical intersections in the relaxation mechanism.92,93 The two lowest excited states are pp and nO p , with the nO p being slightly lower in energy than the pp state at the CASSCF level.92 At that level, conical intersections92 were located between the pp  nO p states followed by a nO p S0 conical intersection. These conical intersections can lead the system to the ground state and, as such, provide an explanation for the ultra-short excited state lifetimes. At the conical intersection with the ground state, cytosine is very distorted with pyramidalization of a carbon atom and extreme CO stretching. In another study, which used perturbative methods (CASPT2) to calculate the energies,93 it was found that the pp state is lower in energy than the nO p state and that only one conical intersection, pp S0 , exists in the pathway. Thus, this case is one in

108

Conical Intersections in Molecular Systems

which including dynamic correlation changes the details of the relaxation mechanism. Two different relaxation mechanisms have been proposed for adenine.135–137 Adenine has more excited states close in energy than does cytosine, thus making the theoretical calculations more complicated. One mechanism that has been put forth is similar to the one described above for cytosine and involves ring deformations leading to conical intersections of excited pp states with the ground state.136,137 A very different relaxation mechanism had been proposed earlier involving conical intersections with a Rydberg ps state dissociative along an NH bond.135 It is possible that one or the other or both mechanisms can be effective, depending on experimental conditions. The role of conical intersections on the electronic relaxation mechanism of the excited states of uracil has been studied using MRCI ab initio methods.56 The lowest excited states are S1 ðnO p Þ, S2 ðpp Þ, S3 ðnO p Þ, and S4 ðpp Þ, with S2 having the strongest oscillator strength. Absorption of ultra-violet (UV) radiation populates this state, and an efficient relaxation mechanism involves nonadiabatic transitions to the ground state. The vertical excitation energies of the first two excited states are given in Table 1 (Re ðS0 Þ). The energies in bold in each column correspond to the state that was minimized. MRCI1 is an MRCI expansion involving only single excitations from the reference space. MRCIsp is an MRCI expansion that includes s-p correlation as described in the original publication.56 MRCI1 results are shown and, in parenthesis, the single point energies obtained using MRCIsp are reported. Conical intersections have been located that connect S2 with S1 and S1 with the ground state. The energies at the conical intersections are given in Table 1, where the geometry of the conical intersection between states SI and SJ is designated as Rx ðciIJÞ. The conical intersections between S2 and S1 are easily accessible from the Franck Condon region at energies 0.88 eV below the vertical excitation energy to S2 at the MRCIsp level. The geometry changes involve mainly bonds stretching or contracting. The seam of conical intersections between S2 and S1 contains points with both planar geometry and nonplanar geometry. The geometry of the minimum energy point is given in Figure 5b. In this work, the effect of moving along different directions after emerging through a conical intersection (discussed in the section entitled Table 1 Energies in eV for the Three Lowest States of Uracil at Optimized Geometries R Obtained at the MRCI1 (MRCIsp) Level S0 S1 S2

Re ðS0 Þ

0 5.44 (4.80) 6.24 (5.79)

Re ðS1 Þ

1.18 4.35(4.12) 5.86

Reproduced with permission from Ref. 56

Rx ðci21Þ

2.15 (1.87) 5.37(4.83) 5.37(4.97)

Rx ðci10Þ

4.47(3.96) 4.47(4.29) 7.62

Applications S0 minimum 8 7 Energy (eV)

6 5 4 3

C6

S2 (nOπ∗) S2 (ππ*) S1 (ππ∗) ci10

S1 (nOπ∗)

H1

(a)

S1- S2 CI C5

N1 O7

N3 H3

O8 S0- S1 CI

S1 minimum

S0(min)

0 −64 −56 −48 −40 −32 −24 −16 −8 –C6C5C4H5 (deg)

H5 C4

C2

S0

2 1

H6

109

0

(b)

Figure 5 (a) Pathway from a displacement along the g direction of the conical intersection S1 -S0 . Following the gradient of the S0 surface leads to the S0 minimum. The energies of the S0 , S1 , and S2 states relative to the minimum of S0 are plotted as a function of a dihedral angle. Reproduced with permission from Refs. 56. (b) Geometries of uracil at the minima of S0 , S1 and at the conical intersections S2 -S1 and S1 -S0 .

Characterizing Conical Intersections: Topography) was explored. The vectors defining the branching plane of the S1 -S2 conical intersection are shown in Figure 3b. A gradient minimized pathway starting along one of those vectors leads to the minimum of the S1 surface. Another pathway, however, leads to a conical intersection between S1 and S0 , which is located ca. 4.12 eV above the minimum of the ground state at the MRCIsp level of theory. The geometry is highly distorted with carbon pyramidalization as shown in Figure 5b. Figure 5 shows how the S0 minimum and the S0 -S1 conical intersection can be connected along nonplanar distortions. Moving from a single base to a base pair of adjacent bases in the DNA strand becomes computationally demanding because of the increased size of the super system. Notwithstanding, some groups have started moving their research effort in this direction.138,139 An ab initio study of guanine-cytosine suggests that after photoexcitation, a hydrogen-atom transfer reaction involves amino groups as proton donors and ring nitrogen atoms as proton acceptors. A conical intersection facilitates internal conversion to the ground state. Recently, a combined theoretical/experimental study on a model Watson–Crick base pair was published.139 That model, a cluster of 2-aminopyridine molecules, displayed short decay dynamics only when a near-planar hydrogen-bonded structure is present. The fast relaxation in that system is facilitated first by a conical intersection of a locally excited pp state to a charge-transfer state with a biradical character, and then by a conical intersection of the charge-transfer state with the ground state.139

110

Conical Intersections in Molecular Systems

BEYOND THE DOUBLE CONE Three-State Conical Intersections The discussion so far has focused on two-state conical intersections, which are the most common conical intersections, and which have been studied extensively. Three-state degeneracies imposed by symmetry have been studied in the context of the Jahn–Teller problem for many decades,39–41 but only minor attention had been given to accidental three-state degeneracies in molecules until recently.113 As most molecular systems in nature have low or no symmetry, these accidental intersections may have a great impact on the photophysics and photochemistry of those molecular systems, as has been found in accidental two-state intersections.11,14,17,117 Three-state degeneracies may provide a more efficient relaxation pathway when more than one interstate transition is needed. Moreover, they introduce more complicated geometric phase effects,140–142 and they can affect the system’s dynamics and pathways available for radiationless transitions.143 Extending the noncrossing rule3 to three states being degenerate can best be understood by inspection of a 3 3 electronic Hamiltonian matrix instead of the 2 2 matrix as described earlier 0 1 H11 H12 H13 ½45 He ¼ @ H12 H22 H23 A H13 H23 H33

To obtain degeneracy between all three states, the following 5 requirements must be satisfied: (1) all off-diagonal matrix elements have to be zero, i.e., H12 ¼ H13 ¼ H23 ¼ 0; (2) the diagonal matrix elements have to be equal, i.e., H11 ¼ H22 ¼ H33 . In general, for an N N matrix, N-fold degeneracy is obtained by N  1 diagonal conditions and NðN  1Þ=2 off-diagonal conditions. The total number of conditions to be satisfied is ðN  1Þ þ NðN  1Þ=2 ¼ ðN  1ÞðN þ 2Þ=2. For molecules lacking any spatial symmetry and containing four or more atoms, conical intersections of three states are possible. The branching space28 for these conical intersections, the space in which the double cone topography is evinced, is five dimensional140 and connects each electronic state with two other states. The first study on accidental three-state conical intersections was done 113 In a tetrahedral geometry, for the CHþ 4 cation by Katriel and Davidson. þ the ground state of CH4 is a T2 state. Therefore, it is triply degenerate as required by symmetry. Only one degree of freedom exists that will preserve Td symmetry, and the dimensionality of the seam is one because all the requirements for degeneracy are satisfied by symmetry. The authors found additional three-fold degeneracies in this system even when the tetrahedral symmetry was broken. If no symmetry is present, the cation has 9 degrees of freedom and the dimensionality of the seam becomes 9  5 ¼ 4.

Beyond the Double Cone

111

Efficient algorithms have recently facilitated the location of three-state conical intersections, and have identified the existence of such intersections in many systems.144–146 Three-state conical intersections, like the two-state intersections described above, can affect excited states dynamics and ground state vibrational spectra, if the ground state is involved. Three-state accidental conical intersections were first found between Rydberg excited states in ethyl and allyl radicals.144,145 They were also found in 5-member ring heterocyclic radicals, such as pyrazolyl, where the ground state and two excited states cross at an energy only ca. 3000 cm1 above the ground state minimum.146 In this case, the ground state is one of the degenerate states. As a result, complicated vibronic spectra were expected and observed experimentally.147 More recently, three-state conical intersections have been found in closed-shell systems as well.94,143,148 The involvement of three-state conical intersections in the photophysics and radiationless decay processes of nucleobases has been investigated using MRCI methods.94 Three-state conical intersections have been located for the pyrimidine base, uracil, and for the purine base, adenine. Figure 6 shows the energies of the three-state conical intersections compared with the vertical excitations in these molecules. In uracil, a three-state degeneracy between the S0 , S1 , and S2 states has been located 6.2 eV above the ground state minimum energy. This energy is 0.4 eV higher than a vertical excitation to S2 and

Energy (eV)

(a) Uracil

(b) Adenine

8

8

7

7

6

6

5

5

4

4

3

3

2

2

1

1

0

0

Re(S0 )

R x (ci12)

R x (ci01) Rx2 (ci012)

Re(S0 ) R x2 (ci123 ′) Rx(ci01) R x(ci123)

Figure 6 Energy levels at the two- and three-state conical intersection points using MRCI, (a) the S0 , S1 , S2 states of uracil and (b) the S0 , S1 , S2 , S3 , S4 states of adenine. Rx ðciIJÞ and Rx ðciIJKÞ denote conical intersection between states I; J or I; J; K; respectively. Reproduced with permission from Ref. 94.

112

Conical Intersections in Molecular Systems

at least 1.3 eV higher than the two-state conical intersections found previously. In adenine, two different three-state degeneracies between the S1 , S2 , and S3 states have been located at energies close to the vertical excitation energies. The energetics of these three-state conical intersections suggest that they can play a role in a radiationless decay pathway in adenine. In summary, these results show that three-state conical intersections are common and they can complicate the PESs of molecules. The most relevant question then becomes whether they are accessible during a photoinitiated event. In three-dimensional subspaces of this five-dimensional branching space, the three-state degeneracy can be lifted partially so that two of the three states remain degenerate.140,141,145 These two-state conical intersection seams originating from the three-state conical intersection have been studied in the allyl radical.145 In adenine, different seams of two-state conical intersections originate from each of the three-state conical intersections, leading to a great number of two-state conical intersections at energies lower than the three-state seams. Two-state conical intersections have been well established in many types of molecular systems. In contrast, the study of three-state conical intersections is still in its infancy, and detailed studies are needed to understand the influence of these intersections on the dynamical behavior of molecules.

Spin-Orbit Coupling and Conical Intersections So far, we have only considered the nonrelativistic electronic Hamiltonian when determining electronic PESs and conical intersections. When spin-orbit coupling is included, the total electronic Hamiltonian becomes 0 H e ¼ H e þ H SO , where H e is the nonrelativistic Hamiltonian and H SO is the spin-orbit coupling operator. Depending on the magnitude of the spinorbit coupling, different methods exist for its calculation and incorporation into the electronic structure solution. Several reviews have been published on the methodology for treating the spin-orbit coupling, including one that has appeared in this pedagogically driven review series.149–151 For light elements in which the spin-orbit coupling is small, perturbation theory can be used with H SO treated as a perturbation. For heavier elements, however, the spin-orbit coupling becomes too large to be treated as a perturbation and it has to be included directly into the electronic Hamiltonian before diagonalization. For heavy elements, relativistic scalar effects also become important, and they too must be included in the Hamiltonian. All-electron methods and relativistic effective core potentials have been developed and used for these cases.149,150,152,153 When spin-orbit coupling is included in the Hamiltonian, new, qualitatively different effects appear in the radiationless behavior of the system. Two effects are particularly important. First, the spin-orbit interaction can couple states of different spin multiplicity whose intersection otherwise would not be

Beyond the Double Cone

113

conical. In this case, intersystem crossing and spin-forbidden processes are observed. Spin-forbidden processes are not discussed here but they have been described extensively in a previous chapter in this book series.120 The second effect involves systems with an odd number of electrons, for which inclusion of the spin-orbit coupling changes qualitatively the characteristics of the conical intersections. The implications on the noncrossing rule were discussed by Mead in a seminal work in 1979,154 whereas the effect of the spin-orbit coupling on the geometric phase effect was discussed by Stone.155 The origin of this change comes from the behavior of the wavefunction under time reversal symmetry. The time reversal operator is an antiunitary operator that commutes with the Hamiltonian but inverts the spin. For odd-electron systems, a wavefunction f and its time reversal Tf are orthogonal and degenerate. If f is an eigenfunction of the Hamiltonian, Tf is a degenerate eigenfunction, so all the eigenvalues are (at least) doubly degenerate. This degeneracy, present in odd electron systems, is referred to as Kramers degeneracy.156 Therefore, a two-state conical intersection requires four eigenfunctions of the electronic Hamiltonian to become degenerate. Furthermore, the Hamiltonian matrix elements are complex in general because matrix elements of the spin-orbit coupling operator can be complex. Combining these ideas, the two-state Hamiltonian model used earlier to rationalize the noncrossing rule now becomes the four-by-four Hamiltonian matrix given in Eq. [46] 0

H11  B H12 e0 H ¼B @ 0  H1T 2

H12 H22  H1T 2 0

0 H1T2 H11 H12

1 H1T2 0 C C  A H12 H22

½46

Mathematical properties of the time reversal operator relate the matrix elements appearing in the Hamiltonian. More specifically, only two unique offdiagonal matrix elements and two unique diagonal matrix elements exist.154 The conditions that must be satisfied for degeneracy are as follows: H11 ¼ H22 ReðH12 Þ ¼ ImðH12 Þ ¼ 0

ReðH1T2 Þ ¼ ImðH1T2 Þ ¼ 0

½47 ½48

½49

The off-diagonal matrix elements are complex; ReðÞ and ImðÞ refer to their real and imaginary parts, respectively. If Cs or higher symmetry is present, it can be shown that the number of conditions needed for degeneracy is reduced from five to three.154 Although the theoretical basis needed for studying conical intersections that include spin-orbit coupling was introduced by Mead in 1979,154 algorithms for the computational study of these conical intersections were derived and implemented only much later in time.157–159

114

Conical Intersections in Molecular Systems

Using the conditions in Eqs. [47]–[49] and perturbation theory near a conical intersections, algorithms based on the Lagrange multipliers method6 were developed.157–159 These techniques can locate conical intersections when the spin-orbit coupling is included in the Hamiltonian with perturbative methods.151 A system in which this effect has been studied is the reaction of molecular hydrogen with the electronically excited hydroxy radical, H2 þ OHðA2 þ Þ ! H2 þ OHðX2 Þ 2

H2 þ OHðA2 þ Þ ! H2 O þ Hð SÞ

½50 ½51

In this reaction, either the OH radical quenches back to its ground state or a reaction occurs to form water. The nonadiabatic mechanism for these processes is facilitated by a conical intersection between the  and  states in linear symmetry. When the system has Cs symmetry, the states involved are the two A0 states. The nonrelativistic seam has been studied earlier.160–162 When the system has Cs symmetry, there exist five degrees of freedom. Therefore, a nonrelativistic seam has dimension 5  2 ¼ 3, whereas inclusion of spin-orbit coupling reduces the dimension of the seam to 5  3 ¼ 2. In linear symmetry, the two crossing states are  and . When the spin-orbit coupling is included, the  state splits into two components, a 1=2 level and a 3=2 level, as shown in Figure 7. A conical intersection can occur between either  and 1=2 or between  and 3=2 . Figure 8 shows the energy of the reactants, the products, and the minimum energy point on the seam before and after spin-orbit coupling. Whereas the minimum energy point on the nonrelativistic seam is ca. 20,000 cm1 below the energy of the reactants, the energy of the minimum energy point on the seam after incorporating the spin-orbit coupling is almost the same as that of the reactants, i.e., the seam is ca. 20,000 cm1 higher than when spin-orbit coupling is neglected. Thus, even for a system void of heavy atoms, like H2 þ OH, the qualitative difference is obvious. 2

2 Σ+ 2 Π

–2



system

1/2



(b)

1/2,



1/2

1/2



(a)



3/2



3/2

(c)

Figure 7 Energy level diagram of the intersecting states in H2 þ OH: (a) at the nonrelativistic conical intersection point without spin-orbit coupling; (b) at the nonrelativistic conical intersection point with spin-orbit coupling; (c) at the new relativistic conical intersection.

Conclusions and Future Directions

115

10000

Energy (cm –1)

2.70 2.05

OH(A2Σ+) + H2

2.56

0

Relativistic crossing

1.83

–10000

2.20 1.70 Nonrelativistic minimum energy crossing

–20000

–30000

OH(X2Π) + H2 –40000

H2O + H

Reaction Coordinate

Figure 8 Energy of the minimum energy point on the seam for the    conical intersection of H2 þ OH with, and without, spin-orbit coupling. The numbers in the diagram adjacent to the molecules are computed bond lengths in A˚.

CONCLUSIONS AND FUTURE DIRECTIONS The study of nonadiabatic processes and conical intersections in particular have gained popularity in recent years. Efficient computational strategies needed to locate conical intersections along with modern experimental techniques that probe ultra-fast nonadiabatic processes have contributed to this popularity. Although initial steps in nonadiabatic theory and conical intersections focused on theoretical analyses and involved the study of small prototype systems, significant progress has been made since then, and we are now beginning to address important questions in areas like photobiology and in condensed-phase systems.163 A current focus for some groups is to develop methods that incorporate solvent into the study of conical intersections, which is being done by using continuum model techniques164,165 and with QM/MM models.128 The list of problems that await study is long and they are so important that it is certain many researchers will devote their research endeavors to improve the available methods needed to fully understand the role of conical intersections in chemistry, biology, and in material sciences. Progress in these areas demands ongoing, seminal developments in electronic structure theory so that accurate excited state energies and gradients can be obtained for larger systems, along with efficient methods developed for nuclear dynamics.

116

Conical Intersections in Molecular Systems

ACKNOWLEDGMENTS The author thanks the National Science Foundation under Grant No. CHE-0449853 and Temple University for financial support. David Yarkony is thanked for introducing the author to the field of conical intersections; several results presented here were obtained in collaboration with him.

REFERENCES 1. M. Born and R. Oppenheimer, Ann. Phys., 84, 457 (1927). Zur Quantentheorie der Molekeln. 2. M. Dantus and A. Zewail, Chem. Rev., 104, 1717 (2004). Introduction: Femtochemistry. 3. J. von Neumann and E. P. Wigner, Physik. Z., 30, 467 (1929). On the Behaviour of Eigenvalues in Adiabatic Processes. 4. E. Teller, J. Phys. Chem., 41, 109 (1937). The Crossing of Potential Surfaces. 5. J. Michl, Top. Curr. Chem., 46, 1 (1974). Physical Basis of Qualitative MO Arguments in Organic Photochemistry. 6. M. R. Manaa and D. R. Yarkony, J. Chem. Phys., 99, 5251 (1993). On the Intersection of Two Potential Energy Surfaces of the Same Symmetry. Systematic Characterization Using a Lagrange Multiplier Constrained Procedure. 7. M. J. Bearpark, M. A. Robb, and H. B. Schlegel, Chem. Phys. Lett., 223, 269 (1994). A Direct Method for the Location of the Lowest Energy Point on a Potential Surface Crossing. 8. D. R. Yarkony, Acc. Chem. Res., 31, 511 (1998). Conical Intersections: Diabolical and Often Misunderstood. 9. D. R. Yarkony, Rev. Mod. Phys., 68, 985 (1996). Diabolical Conical Intersections. 10. D. R. Yarkony, J. Phys. Chem., 100, 18612 (1996). Current Issues in Nonadiabatic Chemistry. 11. D. R. Yarkony, J. Phys. Chem. A, 105, 6277 (2001). Conical Intersections: The New Conventional Wisdom. 12. F. Bernardi, M. Olivucci, and M. A. Robb, Acc. Chem. Res., 23, 405 (1990). Predicting Forbidden and Allowed Cycloaddition Reactions: Potential Surface Topology and its Rationalization. 13. F. Bernardi, M. Olivucci, and M. A. Robb, Chem. Soc. Rev., 25, 321 (1996). Potential Energy Surface Crossings in Organic Photochemistry. 14. M. A. Robb, M. Garavelli, M. Olivucci, and F. Bernardi, in Reviews in Computational Chemistry, Vol. 15, K. B. Lipkowitz and D. B. Boyd, Eds., Wiley-VCH, New York, 2000, pp. 87–146. A Computational Strategy for Organic Photochemistry. 15. C. A. Mead and D. G. Truhlar, Phys. Rev. A, 68, 032501 (2003). Relative Likelihood of Encountering Conical Intersections and Avoided Intersections on the Potential Energy Surfaces of Polyatomic Molecules. 16. T. A. Barckholtz and T. A. Miller, Int. Rev. Phys. Chem., 17, 435 (1998). Quantitative Insights about Molecules Exhibiting Jahn-Teller and Related Effects. 17. W. Domcke, D. R. Yarkony, and H. Ko¨ppel, Conical Intersections, World Scientific, Singapore, 2004. 18. A. W. Jasper, C. Zhu, S. Nangia, and D. G. Truhlar, Faraday Discuss., 127, 1 (2004). Introductory Lecture: Nonadiabatic Effects in Chemical Dynamics. 19. J. A. Pople, in Nobel Lectures, Chemistry 1996-2000, I. Grenthe, Ed., World Scientific, Singapore, 2003.

References

117

20. M. Born and K. Huang, Dynamical Theory of Crystal Lattices, Oxford University Press, Oxford, UK, 1954. 21. L. S. Cederbaum, in Conical Intersections, W. Domcke, D. R. Yarkony, and H. Ko¨ppel, Eds., World Scientific, Singapore, 2004, pp. 3–40. Born-Oppenheimer Approximation and Beyond. 22. W. Lichten, Phys. Rev., 131, 229 (1963). Resonant Charge Exchange in Atomic Collisions. 23. T. F. O’Malley, in Advances in Atomic and Molecular Physics, Vol. 7, D. Bates and I. Esterman, Eds., Academic Press, New York, 1971, pp. 223–249. Diabatic States of Molecules - Quasistationary Electronic States. 24. F. T. Smith, Phys. Rev., 179, 111 (1969). Diabatic and Adiabatic Representations for Atomic Collision Problems. 25. T. Pacher, L. S. Cederbaum, and H. Ko¨ppel, Adv. Chem. Phys., 84, 293 (1993). Adiabatic and Quasidiabatic States in a Gauge Theoretical Framework. 26. H. Ko¨ppel, in Conical Intersections, W. Domcke, D. R. Yarkony, and H. Ko¨ppel, Eds., World Scientific, Singapore, 2004, pp. 175–204. Diabatic Representation: Methods for the Construction of Diabatic Electronic States. 27. D. G. Truhlar and C. A. Mead, J. Chem. Phys., 77, 6090 (1982). Conditions for the Definition of a Strictly Diabatic Electronic Basis for Molecular Systems. 28. G. J. Atchity, S. S. Xantheas, and K. Ruedenberg, J. Chem. Phys., 95, 1862 (1991). Potential Energy Surfaces Near Intersections. 29. H. C. Longuet-Higgins, U. Opik, M. H. L. Pryce, and R. A. Sack, Proc. R. Soc. London Ser. A, 244, 1 (1958). Studies of the Jahn-Teller Effect. II. The Dynamical Problem. 30. G. Herzberg and H. C. Longuet-Higgins, Discuss. Faraday Soc., 35, 77 (1963). Intersection of Potential Energy Surfaces in Polyatomic Molecules. 31. C. A. Mead and D. G. Truhlar, J. Chem. Phys., 70, 2284 (1979). On the Determination of Born-Oppenheimer Nuclear Motion Wave Functions Including Complications Due to Conical Intersection and Identical Nuclei. 32. M. V. Berry, Proc. R. Soc. London Ser. A, 392, 45 (1984). Quantal Phase Factors Accompanying Adiabatic Changes. 33. G. J. Atchity and K. Ruedenberg, J. Chem. Phys., 110, 4208 (1999). A Local Understanding of the Quantum Chemical Geometric Phase Theorem in Terms of Diabatic States. 34. C. A. Mead, J. Chem. Phys., 72, 3839 (1980). Superposition of Reactive and Nonreactive Scattering-Amplitudes in the Presence of a Conical Intersection. 35. A. Kuppermann, in Dynamics of Molecules and Chemical Reactions, R. E. Wyatt and J. Z. Zhang, Eds., Marcel Dekker, New York, 1996, pp. 411–472. The Geometric Phase in Reaction Dynamics. 36. B. K. Kendrick, J. Phys. Chem. A, 107, 6739 (2003). Geometric Phase Effects in Chemical Reaction Dynamics and Molecular Spectra. 37. H. A. Jahn and E. Teller, Proc. R. Soc. London Ser. A, 161, 220 (1937). Stability of Polyatomic Molecules in Degenerate Electronic States. I. Orbital Degeneracy. 38. H. A. Jahn, Proc. R. Soc. London Ser. A, 164, 117 (1938). Stability of Polyatomic Molecules in Degenerate Electronic States. II. Spin Degeneracy. 39. I. B. Bersuker, The Jahn–Teller Effect and Vibronic Interactions in Modern Chemistry, Plenum Press, New York, 1984. 40. R. Englman, The Jahn–Teller Effect in Molecules and Crystals, Wiley-Interscience, New York, 1972. 41. I. B. Bersuker and V. Z. Polinger, Vibronic Interactions in Molecules and Crystals, Vol. 49, Springer-Verlag, Berlin, 1989. 42. I. B. Bersuker, Chem. Rev., 101, 1067 (2001). Modern Aspects of the Jahn–Teller Effect. Theory and Applications to Molecular Problems.

118

Conical Intersections in Molecular Systems

43. B. E. Applegate, T. A. Barckholtz, and T. A. Miller, Chem. Soc. Rev., 32, 38 (2003). Exploration of Conical Intersections and Their Ramifications for Chemistry Through the Jahn–Teller Effect. 44. V.-A. Glezakou, M. S. Gordon, and D. R. Yarkony, J. Chem. Phys, 108, 5657 (1998). Systematic Location of Intersecting Seams of Conical Intersection in Triatomic Molecules: The 12a’ - 22a’ Conical Intersections in BH2. 45. S. S. Xantheas, G. J. Atchity, S. T. Elbert, and K. Ruedenberg, J. Chem. Phys., 93, 7519 (1990). Potential Energy Surfaces Near Intersections. 46. D. R. Yarkony, in Conical Intersections, W. Domcke, D. R. Yarkony, and H. Ko¨ppel, Eds., World Scientific, Singapore, 2004, pp. 41–128. Conical Intersections: Their Description and Consequences. 47. C. A. Mead, J. Chem. Phys., 78, 807 (1983). Electronic Hamiltonian, Wavefunctions and Energies and Derivative Coupling between Born-Oppenheimer States in the Vicinity of a Conical Intersection. 48. D. R. Yarkony, J. Phys. Chem. A, 101, 4263 (1997). Energies and Derivative Couplings in the Vicinity of a Conical Intersection Using Degenerate Perturbation Theory and Analytic Gradient Techniques. 49. H. Ko¨ppel, W. Domcke, and L. S. Cederbaum, Adv. Chem. Phys., 57, 59 (1984). Multimode Molecular Dynamics Beyond the Born-Oppenheimer Approximation. 50. W. Domcke and G. Stock, Adv. Chem. Phys., 100, 1–170 (1997). Theory of Ultrafast Nonadiabatic Excited-State Processes and their Spectroscopic Detection in Real Time. 51. D. R. Yarkony, J. Chem. Phys., 114, 2601 (2001). Nuclear Dynamics Near Conical Intersections in the Adiabatic Representation. I. The Effects of Local Topography on Interstate Transition. 52. M. Ben-Nun, F. Molnar, K. Schulten, and T. J. Martinez, Proc. Natl. Acad. Sci. USA, 97, 9379 (2000). The Role of Intersection Topography in Bond Selectivity of Cis-trans Photoisomerization. 53. A. Migani, A. Sinicropi, N. Ferr, A. Cembran, M. Garavelli, and M. Olivucci, Faraday Discuss., 127, 179 (2004). Structure of the Intersection Space Associated with Z/E Photoisomerization of Retinal in Rhodopsin Proteins. 54. A. W. Jasper and D. G. Truhlar, J. Chem. Phys., 122, 044101 (2005). Conical Intersections and Semiclassical Trajectories: Comparison to Accurate Quantum Dynamics and Analyses of the Trajectories. 55. D. R. Yarkony, J. Chem. Phys., 112, 2111 (2000). On the Adiabatic to Diabatic States Transformation Near Intersections of Conical Intersections. 56. S. Matsika, J. Phys. Chem. A, 108, 7584 (2004). Radiationless Decay of Excited States of Uracil Through Conical Intersections. 57. S. Matsika and D. R. Yarkony, J. Chem. Phys., 117, 3733 (2002). Conical Intersections and the Nonadiabatic Reactions H2 O þ Oð3 PÞ $ OHðA2 þ Þ þ OHðX2 Þ. 58. F. Bernardi, M. Olivucci, I. N. Ragazos, and M. A. Robb, J. Am. Chem. Soc., 114, 8211 (1992). A New Mechanistic Scenario for the Photochemical Transformation of Ergosterol: An MCSCF and MM-VB Study. 59. K. Ruedenberg and G. J. Atchity, J. Chem. Phys., 110, 3799 (1993). A Quantum Mechanical Determination of Diabatic States. 60. M. J. Frisch, G. W. Trucks, H. B. Schlegel, G. E. Scuseria, M. A. Robb, J. R. Cheeseman, J. A. Montgomery, Jr., T. Vreven, K. N. Kudin, J. C. Burant, J. M. Millam, S. S. Iyengar, J. Tomasi, V. Barone, B. Mennucci, M. Cossi, G. Scalmani, N. Rega, G. A. Petersson, H. Nakatsuji, M. Hada, M. Ehara, K. Toyota, R. Fukuda, J. Hasegawa, M. Ishida, T. Nakajima, Y. Honda, O. Kitao, H. Nakai, M. Klene, X. Li, J. E. Knox, H. P. Hratchian, J. B. Cross, V. Bakken, C. Adamo, J. Jaramillo, R. Gomperts, R. E. Stratmann, O. Yazyev, A. J. Austin, R. Cammi, C. Pomelli, J. W. Ochterski, P. Y. Ayala, K. Morokuma, G. A. Voth, P. Salvador, J. J. Dannenberg, V. G. Zakrzewski, S. Dapprich, A. D. Daniels, M. C. Strain, O. Farkas,

References

119

D. K. Malick, A. D. Rabuck, K. Raghavachari, J. B. Foresman, J. V. Ortiz, Q. Cui, A. G. Baboul, S. Clifford, J. Cioslowski, B. B. Stefanov, G. Liu, A. Liashenko, P. Piskorz, I. Komaromi, R. L. Martin, D. J. Fox, T. Keith, M. A. Al Laham, C. Y. Peng, A. Nanayakkara, M. Challacombe, P. M. W. Gill, B. G. Johnson, W. Chen, M. W. Wong, C. Gonzalez, and J. A. Pople, Gaussian 03, Revision C.02, 2004. 61. R. J. Bartlett and J. F. Stanton, in Reviews in Computational Chemistry, Vol. 5, K. B. Lipkowitz and D. B. Boyd, Eds., Wiley-VCH, New York, 1994, pp. 65–169. Applications of Post-Hartree-Fock Methods: A Tutorial. 62. I. Shavitt, in Methods of Electronic Structure Theory, H. F. Schaefer III, Ed., Plenum Press, New York, 1977, Vol. 4 of Modern Theoretical Chemistry, pp. 189–275. The Method of Configuration Interaction. 63. T. D. Crawford and H. F. Schaefer III, in Reviews in Computational Chemistry, Vol. 14, K. B. Lipkowitz and D. B. Boyd, Eds., Wiley-VCH, New York, 1999, pp. 33–136. An Introduction to Coupled Cluster Theory for Computational Chemists. 64. F. M. Bickelhaupt and E. J. Baerends, in Reviews in Computational Chemistry, Vol. 15, K. B. Lipkowitz and D. B. Boyd, Eds., Wiley-VCH, New York, 2000, pp. 1–86. Kohn-Sham Density Functional Theory: Predicting and Understanding Chemistry. 65. B. O. Roos and P. R. Taylor, Chem. Phys., 48, 157 (1980). A Complete Active Space SCF Method (CASSCF) Using a Density-Matrix Formulated Super-CI Approach. 66. B. O. Roos, Adv. Chem. Phys., 69, 399 (1987). The Complete Active Space Self Consistent Field Method and its Applications in Electronic Structure Calculations. 67. W. T. Borden and E. R. Davidson, Acc. Chem. Res., 29, 67 (1996). The Importance of Including Dynamic Electron Correlation in Ab Initio Calculations. 68. H. Dachsel, R. J. Harrison, and D. A. Dixon, J. Phys. Chem. A, 103, 152 (1999). Multireference Configuration Interaction Calculations on Cr2 : Passing the One Billion Limit in MRCI/MRACPF Calculations. 69. R. Shepard, Int. J. Quantum Chem., 31, 33 (1987). Geometrical Energy Derivative Evaluation with MRCI Wave Functions. 70. R. Shepard, in Modern Electronic Structure Theory Part I, D. R. Yarkony, Ed., World Scientific, Singapore, 1995, pp. 345–458. The Analytic Gradient Method for Configuration Interaction Wave Functions. 71. H. Lischka, M. Dallos, and R. Shepard, Mol. Phys., 100, 1647 (2002). Analytic MRCI Gradient for Excited States: Formalism and Application to the np Valence- and n  ð3s; 3pÞ Rydberg States of Formaldehyde. 72. B. H. Lengsfield and D. R. Yarkony, in State-Selected and State-to-State Ion-Molecule Reaction Dynamics: Part 2 Theory, M. Baer and C. Y. Ng, Eds., John Wiley and Sons, New York, 1992, Vol. 82 of Advances in Chemical Physics, pp. 1–71. Nonadiabatic Interactions between Potential Energy Surfaces: Theory and Applications. 73. H. Lischka, R. Shepard, R. M. Pitzer, I. Shavitt, M. Dallos, Th. Mu¨ller, P. G. Szalay, M. Seth, G. S. Kedziora, S. Yabushita, and Z. Zhang, Phys. Chem. Chem. Phys., 3, 664 (2001). HighLevel Multireference Methods in the Quantum-Chemistry Program System COLUMBUS: Analytic MR-CISD and MR-AQCC Gradients and MR-AQCC-LRT for Excited States, GUGA Spin-Orbit CI and Parallel CI Density. 74. H. Lischka, M. Dallos, P. G. Szalay, D. R. Yarkony, and R. Shepard, J. Chem. Phys., 120, 7322 (2004). Analytic Evaluation of Nonadiabatic Coupling Terms at the MR-CI Level. I. Formalism. 75. M. Dallos, H. Lischka, R. Shepard, D. R. Yarkony, and P. G. Szalay, J. Chem. Phys., 120, 7330 (2004). Analytic Evaluation of Nonadiabatic Coupling Terms at the MR-CI Level. II. Minima on the Crossing Seam: Formaldehyde and the Photodimerization of Ethylene. 76. H.-J. Werner and P. J. Knowles, J. Chem. Phys., 89, 5803 (1988). An Efficient Internally Contracted Multiconfiguration Reference CI Method. 77. H.-J. Werner, P. J. Knowles, R. Lindh, M. Schu¨tz, P. Celani, T. Korona, F. R. Manby, G. Rauhut, R. D. Amos, A. Bernhardsson, A. Berning, D. L. Cooper, M. J. O. Deegan, A. J.

120

Conical Intersections in Molecular Systems

Dobbyn, F. Eckert, C. Hampel, G. Hetzer, A. W. Lloyd, S. J. McNicholas, W. Meyer, M. E. Mura, A. Nicklass, P. Palmieri, R. Pitzer, U. Schumann, H. Stoll, A. J. Stone, R. Tarroni, and T. Thorsteinsson, Molpro, version 2002.6, A Package of Ab Initio Programs, 2003. ˚ . Malmqvist, B. O. Roos, A. J. Sadlej, and K. Wolinski, J. Phys. Chem., 78. K. Andersson, P.-A 94, 5483 (1990). Second-Order Perturbation-Theory with a CASSCF Reference Function. 79. K. Andersson, P.-A˚. Malmqvist, and B. O. Roos, J. Chem. Phys., 96, 1218 (1992). SecondOrder Perturbation-Theory with a Complete Active Space Self-Consistent Field Reference Function. ˚ . Malmqvist, B. O. Roos, U. Ryde, V. Veryazov, P.-O. Widmark, 80. G. Karlstro¨m, R. Lindh, P.-A M. Cossi, B. Schimmelpfennig, P. Neogrady, and L. Seijo, Computat. Mater. Sci., 28, 222 (2003). Molcas: a Program Package for Computational Chemistry. 81. H. Nakano, J. Chem. Phys., 99, 7983 (1993). Quasidegenerate Perturbation Theory with Multiconfigurational Self-consistent-field Reference Functions. 82. K. Hirao, Chem. Phys. Lett., 190, 374 (1992). Multireference Mo¨ller-Plesset Method. 83. K. R. Glaesemann, M. S. Gordon, and H. Nakano, Phys. Chem. Chem. Phys., 1, 967 (1999). A Study of FeCOþ with Correlated Wavefunctions. 84. M. W. Schmidt, K. K. Baldridge, J. A. Boatz, S. T. Elbert, M. S. Gordon, J. H. Jensen, S. Koseki, N. Matsunaga, K. A. Nguyen, S. Su, T. L. Windus, M. Dupuis, and J. A. Montgomery Jr., J. Comput. Chem., 14, 1347 (1993). Computation of Conical Intersections by Using Perturbation Techniques. ˚ . Malmqvist, L. Serrano-Andre`s, K. Pierloot, 85. B. O. Roos, K. Andersson, M. P. Fulscher, P.-A and M. Mercha´n, in New Methods in Computational Quantum Mechanics, I. Prigogine and S. A. Rice, Eds., Wiley, New York, 1996, Vol. 93 of Advances in Chemical Physics, pp. 219– 331. Multiconfigurational Perturbation Theory: Applications in Electronic Spectroscopy. 86. J. Finley, P.-A˚. Malmqvist, B. O. Roos, and L. Serrano-Andre`s, Chem. Phys. Lett., 288, 299 (1998). The Multi-state CASPT2 Method. 87. L. Serrano-Andre`s, M. Mercha´n, and R. Lindh, J. Chem. Phys., 122, 104107 (2005). Computation of Conical Intersections by Using Perturbation Techniques. 88. H. Koch, H. J. A. Jensen, P. Jorgensen, and T. Helgaker, J. Chem. Phys., 93, 3345 (1990). Excitation-Energies from the Coupled Cluster Singles and Doubles Linear Response Function (CCSDLR) - Applications to Be, CHþ , CO, and H2 O. 89. J. F. Stanton and R. J. Bartlett, J. Chem. Phys., 98, 7029 (1993). The Equation of Motion Coupled-Cluster Method - A Systematic Biorthogonal Approach to Molecular-Excitation Energies, Transition-Probabilities, and Excited-State Properties. 90. A. I. Krylov, Chem. Phys. Lett., 338, 375 (2001). Size-Consistent Wave Functions for BondBreaking: The Equation-of-Motion Spin-Flip Model. 91. E. Runge and E. K. U. Gross, Phys. Rev. Lett., 52, 997 (1984). Density-Functional Theory for Time-Dependent Systems. 92. N. Ismail, L. Blancafort, M. Olivucci, B. Kohler, and M. A. Robb, J. Am. Chem. Soc., 124, 6818 (2002). Ultrafast Decay of Electronically Excited Singlet Cytosine via p; p to n; p State Switch. 93. M. Mercha´n and L. Serrano-Andre´s, J. Am. Chem. Soc., 125, 8108 (2003). Ultrafast Internal Conversion of Excited Cytosine via the Lowest pp Electronic Singlet State. 94. S. Matsika, J. Phys. Chem. A, 109, 7538 (2005). Three-state Conical Intersections in Nucleic Acid Bases. 95. H. B. Schlegel, in Modern Electronic Structure Theory Part I, D. R. Yarkony, Ed., World Scientifc, Singapore, 1995, pp. 459–500. Advance Series in Physical Chemistry, Geometry Optimization on Potential Energy Surfaces. 96. N. Koga and K. Morokuma, Chem. Phys. Lett., 119, 371 (1985). Determination of the Lowest Energy Point on the Crossing Seam between Two Potential Surfaces Using the Energy Gradient.

References

121

97. A. Farazdel and M. Dupuis, J. Comput. Chem., 12, 276 (1991). On the Determination of the Minimum on the Crossing Seam of Two Potential Energy Surfaces. 98. D. R. Yarkony, J. Chem. Phys., 92, 2457 (1990). On the Characterization of Regions of Avoided Surface Crossings Using an Analytic Gradient Based Method. 99. J. M. Anglada and J. M. Bofill, J. Comput. Chem., 18, 992 (1997). A Reduced-RestrictedQuasi-Newton-Raphson Method for Locating and Optimizing Energy Crossing Points between Two Potential Energy Surfaces. 100. I. N. Ragazos, M. A. Robb, F. Bernardi, and M. Olivucci, Chem. Phys. Lett., 119, 217, (1992). Optimization and Characterization of the Lowest Energy Point on a Conical Intersection Using an MC-SCF Lagrangian. 101. D. R. Yarkony, in Conical Intersections, W. Domcke, D. R. Yarkony, and H. Ko¨ppel, Eds., World Scientific, Singapore, 2004, pp. 129–174. Determination of Potential Energy Surface Intersections and Derivative Couplings in the Adiabatic Representation. 102. G. A. Worth, H.-D. Meyer, and L. S. Cederbaum, in Conical Intersections, W. Domcke, D. R. Yarkony, and H. Ko¨ppel, Eds., World Scientific, Singapore, 2004, pp. 583–617. Multidimensional Dynamics Involving a Conical Intersection: Wavepacket Calculations Using the MCTDH Method. 103. J. C. Tully and R. K. Preston, J. Chem. Phys., 55, 562 (1971). Trajectory Surface Hopping Approach to Nonadiabatic Molecular Collisions: The Reaction of Hþ with D2 . 104. J. C. Tully, J. Chem. Phys., 93, 1061 (1990). Molecular Dynamics with Electron Transitions. 105. S. Hammes Schiffer and J. C. Tully, J. Chem. Phys., 101, 4657 (1994). Proton-Transfer in Solution - Molecular-Dynamics with Quantum Transitions. 106. J. C. Tully, in Dynamics of Molecular Collisions, W. H. Miller, Ed., Plenum Press, New York, 1975, pp. 217–267. Nonadiabatic Processes in Molecular Collisions. 107. M. D. Hack and D. G. Truhlar, J. Phys. Chem. A, 104, 7917 (2000). Nonadiabatic Trajectories at an Exhibition. 108. S. Klein, M. J. Bearpark, B. R. Smith, M. A. Robb, M. Olivucci, and F. Bernardi, Chem. Phys. Lett., 293, 259 (1998). Mixed State ‘‘On the Fly’’ Nonadiabatic Dynamics: The Role of the Conical Intersection Topology. 109. T. J. Martinez, M. Ben-Nun, and R. D. Levine, J. Phys. Chem., 100, 7884 (1996). MultiElectronic State Molecular Dynamics - A Wave Function Approach with Applications. 110. M. Ben-Nun and T. J. Martinez, J. Chem. Phys., 108, 7244 (1998). Nonadiabatic Molecular Dynamics: Validation of the Multiple Spawning Method for a Multidimensional Problem. 111. M. Ben-Nun, J. Quenneville, and T. J. Martinez, J. Phys. Chem. A, 104, 5161 (2000). Ab Initio Multiple Spawning: Photochemistry from First Principles Quantum Molecular Dynamics. 112. A. J. C. Varandas, J. Tennyson, and J. N. Murrell, Chem. Phys. Lett., 61, 431 (1979). Chercher le Croisement. 113. J. Katriel and E. R. Davidson, Chem. Phys. Lett., 76, 259 (1980). The Non-crossing Rule: Triply Degenerate Ground-State Geometries of CHþ 4. 114. D. R. Yarkony, in Modern Electronic Structure Theory Part I, D. R. Yarkony, Ed., World Scientifc, Singapore, 1995, pp. 642–721. Advance Series in Physical Chemistry, Electronic Structure Aspects of Nonadiabatic Processes. 115. A. Migani and M. Olivucci, in Conical Intersections, W. Domcke, D. R. Yarkony, and H. Ko¨ppel, Eds., World Scientific, Singapore, 2004, pp. 271–320. Conical Intersections and Organic Reaction Mechanisms. 116. Y. Hass and S. Zilberg, J. Photochem. Photobiol. A: Chem., 144, 221 (2001). Photochemistry by Conical Intersections: A Practical Guide for Experimentalists. 117. M. Klessinger and J. Michl, Excited States and Photochemistry of Organic Molecules, VCH Publishers, Inc., New York, 1995.

122

Conical Intersections in Molecular Systems

118. J. Michl and V. Bonacic-Koutecky´, Electronic Aspects of Organic Photochemistry, Wiley Interscience, New York, 1990. 119. W. Fuss, S. A. Trushin, and W. E. Schmid, Res. Chem. Intermed., 27, 447 (2001). Ultrafast Photochemistry of Metal Carbonyls. 120. N. Matsunaga and S. Koseki, in Reviews in Computational Chemistry, Vol. 20, K. B. Lipkowitz, R. Larter, and T. R. Cundari, Eds., Wiley-VCH, New York, 2004, pp. 101– 152. Modeling of Spin-Forbidden Reactions. 121. M. Garavelli, P. Gelani, F. Bernardi, M. A. Robb, and M. Olivucci, J. Am. Chem. Soc., 119, 6891 (1997). The C5 H6 NHþ 2 Protonated Shiff Base: An Ab Initio Minimal Model for Retinal Photoisomerization. 122. T. Kobayashi, T. Saito, and H. Ohtani, Nature, 414, 531 (2001). Real-Time Spectroscopy of Transition States in Bacteriorhodopsin During Retinal Isomerization. 123. A. Warshel and Z. T. Chu, J. Phys. Chem. B, 105, 9857 (2001). Nature of the Surface Crossing Process in Bacteriorhodopsin: Computer Simulations of the Quantum Dynamics of the Primary Photochemical Event. 124. A. Toniolo, S. Olsen, L. Manohar, and T. J. Martinez, Faraday Discuss., 127, 149 (2004). Conical Intersection Dynamics in Solution: The Chromophore of Green Fluorescent Protein. 125. M. E. Martin, F. Negri, and M. Olivucci, J. Am. Chem. Soc., 126, 5452 (2004). Origin, Nature, and Fate of the Fluorescent State of the Green Fluorescent Protein Chromophore at the CASPT2//CASSCF Resolution. 126. G. A. Worth and L. S. Cederbaum, Chem. Phys. Lett., 338, 219 (2001). Mediation of Ultrafast Transfer in Biological Systems by Conical Intersections. 127. A. Toniolo, M. Ben-Nun, and T. J. Martinez, J. Phys. Chem. A, 106, 4679 (2002). Optimization of Conical Intersections with Floating Occupation Semiempirical Configuration Interaction Wave Functions. 128. A. Toniolo, G. Granucci, and T. J. Martinez, J. Phys. Chem. A, 107, 3822 (2003). Conical Intersections in Solution: A QM/MM Study Using Floating Occupation Semiempirical CI Wave Functions. 129. M. Daniels and W. Hauswirth, Science, 171, 675 (1971). Fluorescence of the Purine and Pyrimidine Bases of the Nucleic Acids in Neutral Aqueous Solution at 300 K. 130. M. Daniels, in Photochemistry and Photobiology of Nucleic Acids, Vol. 1, S. Y. Wang, Ed., Academic Press, New York, 1976, pp. 23–108. Excited States of the Nucleic Acids: Bases, Mononucleosides, and Mononucleotides. 131. P. R. Callis, Ann. Rev. Phys. Chem., 34, 329 (1983). Electronic States and Luminescence of Nucleic Acid Systems. 132. C. E. Crespo-Hernandez, B. Cohen, P. M. Hare, and B. Kohler, Chem. Rev., 104, 1977 (2004). Ultrafast Excited-state Dynamics in Nucleic Acids. 133. B. Mennucci, A. Toniolo, and J. Tomasi, J. Phys. Chem. A, 105, 4749 (2001). Theoretical Study of the Photophysics of Adenine in Solution: Tautomerism, Deactivation Mechanisms, and Comparison with the 2-Aminopurine Fluorescent Isomer. 134. A. Broo, J. Phys. Chem. A, 102, 526 (1998). A Theoretical Investigation of the Physical Reason for the very Diffferent Luminescence Properties of the Two Isomers Adenine and 2-Aminopurine. 135. A. L. Sobolewski and W. Domcke, Eur. Phys. J. D, 20, 369 (2002). On the Mechanism of Nonradiative Decay of DNA Bases: Ab Initio and TDDFT Results for the Excited States of 9H-Adenine. 136. S. Perun, A. L. Sobolewski, and W. Domcke, J. Am. Chem. Soc., 127, 6257 (2005). Ab Initio Studies on the Radiationless Decay Mechanisms of the Lowest Excited Singlet States of 9HAdenine. 137. C. M. Marian, J. Chem. Phys., 122, 104314 (2005). A New Pathway for the Rapid Decay of Electronically Excited Adenine.

References

123

138. A. L. Sobolewski and W. Domcke, Phys. Chem. Chem. Phys., 6, 2763 (2004). Ab Initio Studies on the Photophysics of the Guanine-Cytosine Base Pair. 139. T. Schultz, E. Samoylova, W. Radloff, I. V. Hertel, A. L. Sobolewski, and W. Domcke, Science, 306, 1765 (2004). Efficient Deactivation of a Model Base Pair via Excited-state Hydrogen Transfer. 140. S. P. Keating and C. A. Mead, J. Chem. Phys., 82, 5102 (1985). Conical Intersections in a System of Four Identical Nuclei. 141. S. Han and D. R. Yarkony, J. Chem. Phys., 119, 11562 (2003). Conical Intersections of Three States. Energies, Derivative Couplings, and the Geometric Phase Effect in the Neighborhood of Degeneracy Subspaces. Application to the Allyl Radical. 142. S. Han and D. R. Yarkony, J. Chem. Phys., 119, 5058 (2003). Nonadiabatic Processes Involving Three Electronic States. I. Branch Cuts and Linked Pairs of Conical Intersections. 143. J. D. Coe and T. J. Martinez, J. Am. Chem. Soc., 127, 4560 (2005). Competitive Decay at Two- and Three-State Conical Intersections in Excited-State Intramolecular Proton Transfer. 144. S. Matsika and D. R. Yarkony, J. Chem. Phys., 117, 6907 (2002). Accidental Conical Intersections of Three States of the Same Symmetry. I. Location and Relevance. 145. S. Matsika and D. R. Yarkony, J. Chem. Soc., 125, 10672 (2003). Beyond Two-state Conical Intersections. Three-state Conical Intersections in Low Symmetry Molecules: The Allyl Radical. 146. S. Matsika and D. R. Yarkony, J. Am. Chem. Soc., 125, 12428 (2003). Conical Intersections of Three Electronic States Affect the Ground State of Radical Species with Little Or No Symmetry: Pyrazolyl. 147. S. Kato, R. Hoenigman, A. Gianola, T. Ichino, V. Bierbaum, and W. C. Lineberger, in Molecular Dynamics and Theoretical Chemistry Contractors Review, M. Berman, Ed., AFOSR, San Diego, CA, 2003, p. 49. 148. L. Blancafort and M. A. Robb, J. Phys. Chem. A, 108, 10609 (2004). Key Role of a Threefold State Crossing in the Ultrafast Decay of Electronically Excited Cytosine. 149. B. A. Heb, C. M. Marian, and S. D. Peyerimhoff, in Modern Electronic Structure Theory Part 1, D. R. Yarkony, Ed., World Scientific, Singapore, 1995, pp. 152–278. Advanced Series in Physical Chemistry, Ab Initio Calculation of Spin-orbit Effects in Molecules Including Electron Correlation. 150. W. C. Ermler, R. B. Ross, and P. A. Christiansen, Adv. Quantum Chem., 19, 139–182 (1988). Spin-Orbit Coupling and Other Relativistic Effects in Atoms and Molecules. 151. C. M. Marian, in Reviews in Computational Chemistry, K. B. Lipkowitz, R. Larter, and T. R. Cundari, Eds., Wiley-VCH, New York, 2001, pp. 99–204. Spin-Orbit Coupling in Molecules. 152. K. Balasubramanian, Relativistic Effects in Chemistry, Part A, Theory and Techniques, Wiley, New York, 1997. 153. G. L. Malli, Ed., Relativistic Effects in Atoms, Molecules, and Solids, Vol. 87 of NATO Advanced Science Institutes, Plenum Press, New York, 1983. 154. C. A. Mead, J. Chem. Phys., 70, 2276 (1979). The Noncrossing Rule for Electronic Potential Energy Surfases: The Role of Time-reversal Invariance. 155. A. J. Stone, Proc. R. Soc. London Ser. A, 351, 141 (1976). Spin-Orbit Coupling and the Interaction of Potential Energy Surfaces in Polyatomic Molecules. 156. H. Kramers, Proc. Acad. Sci. Amsterdam, 33, 959 (1930). 157. S. Matsika and D. R. Yarkony, J. Chem. Phys., 115, 2038 (2001). On the Effects of Spin-Orbit Coupling on Conical Intersection Seams in Molecules with an Odd Number of Electrons. I. Locating the Seam. 158. S. Matsika and D. R. Yarkony, J. Chem. Phys., 115, 5066 (2001). On the Effects of Spin-Orbit Coupling on Conical Intersection Seams in Molecules with an Odd Number of Electrons. II. Characterizing the Local Topography of the Seam.

124

Conical Intersections in Molecular Systems

159. S. Matsika and D. R. Yarkony, J. Chem. Phys., 116, 2825 (2002). Spin-Orbit Coupling and Conical Intersections in Molecules with an Odd Number of Electrons. III. A Perturbative Determination of the Electronic Energies, Derivative Couplings and a Rigorous Diabatic Representation Near a Conical Intersection. 160. M. I. Lester, R. A. Loomis, R. L. Schwartz, and S. P. Walch, J. Phys. Chem. A, 101, 9195 (1997). Electronic Quenching of OH A 2 þ ðv0 ¼ 0; 1Þ in Complexes with Hydrogen and Nitrogen. 161. D. R. Yarkony, J. Chem. Phys., 111, 6661 (1999). Substituent Effects and the Noncrossing Rule: The Importance of Reduced Symmetry Subspaces. I. The Quenching of OH(A 2 þ ) by H2 . 162. B. C. Hoffman and D. R. Yarkony, J. Chem. Phys., 113, 10091 (2000). The Role of Conical Intersections in the Nonadiabatic Quenching of OH(A 2 þ ) by Molecular Hydrogen. 163. J. C. Tully, Faraday Discuss., 127, 463 (2004). Concluding Remarks: Non-adiabatic Effects in Chemical Dynamics. 164. D. Laage, I. Burghardt, T. Sommerfeld, and J. T. Hynes, J. Phys. Chem. A, 107, 11271 (2003). On the Dissociation of Aromatic Radical Anions in Solution. 1. Formulation and Application to p-cyanochlorobenzene Radical Anion. 165. I. Burghardt, L. S. Cederbaum, and J. T. Hynes, Faraday Discuss., 127, 395 (2004). Environmental Effects on a Conical Intersection: A Model Study.

CHAPTER 3

Variational Transition State Theory with Multidimensional Tunneling Antonio Fernandez-Ramos,a Benjamin A. Ellingson,b Bruce C. Garrett,c and Donald G. Truhlarb a

Departamento de Quimica Fisica, Universidade de Santiago de Compostela, Facultade de Quimica, Santiago de Compostela, Spain b Department of Chemistry and Supercomputing Institute, University of Minnesota Minneapolis, MN c Chemical and Materials Sciences Division, Pacific Northwest National Laboratory, Richland, WA

INTRODUCTION ‘‘The rate of chemical reactions is a very complicated subject’’ Harold S. Johnston, 1966 ‘‘The overall picture is that the validity of the transition state theory has not yet been really proved and its success seems to be mysterious.’’ Raymond Daudel, Georges Leroy, Daniel Peeters, and Michael Sana, 1983

This review describes the application of variational transition state theory (VTST) to the calculation of chemical reaction rates. In 1985, two of us, together with Alan D. Isaacson, wrote a book chapter on this subject entitled ‘‘Generalized Transition State Theory’’ for the multi-volume series entitled Theory of Chemical Reaction Dynamics.1 Since that time, VTST has undergone Reviews in Computational Chemistry, Volume 23 edited by Kenny B. Lipkowitz and Thomas R. Cundari Copyright ß 2007 Wiley-VCH, John Wiley & Sons, Inc.

125

126

Variational Transition State Theory

important improvements due mainly to the ability of this theory to adapt to more challenging problems. For instance, the 1985 chapter mainly describes the application of VTST to bimolecular reactions involving 3–6 atoms, which were the state-of-the-art at that moment. The study of those reactions by VTST dynamics depended on the construction of an analytical potential energy surface (PES). Nowadays, thanks to the development of more efficient algorithms and more powerful computers, the situation is completely different, and most rate calculations are based on ‘‘on the fly’’ electronic structure calculations, which together with hybrid approaches, like combined quantum mechanical molecular mechanical methods (QM/MM), allow researchers to apply VTST to systems with hundreds or even tens of thousands of atoms. Three other major advances since 1985 are that transition state dividing surfaces can now be defined much more realistically, more accurate methods have been developed to include multidimensional quantum mechanical tunneling into VTST, and the theory has also been extended to reactions in condensed phases. This review progresses from the simplest VTST treatments applicable to simple systems to more advanced ones applicable to complex systems. The next four sections describe the use of VTST for gas-phase unimolecular or bimolecular reactions for which we can afford to build a global analytical PES or to use a high-level electronic structure method to run the dynamics without invoking special methods or algorithms to reduce the computational cost. In the second part (the subsequent three sections on pages 190–212), we deal with VTST in complex systems; this often involves the use of interpolative or dual-level methods, implicit solvation models, or potentials of mean force to obtain the potential energy surface. Two sections also discuss the treatment of condensed-phase reactions by VTST. A fundamental theoretical construct underlying this whole chapter is the Born–Oppenheimer approximation. According to this approximation, which is very accurate for most chemical reactions (the major exceptions being electron transfer and photochemistry), the Born–Oppenheimer energy, which is the energy of the electrons plus nuclear repulsion, provides a potential energy surface V for nuclear motion. At first we assume that this potential energy surface is known and is available as a potential energy function. Later we provide more details on interfacing electronic structure theory with nuclear dynamics to calculate V by electronic structure calculations ‘‘on the fly,’’ which is called direct dynamics. The geometries where rV is zero play a special role; these geometries are called stationary points, and they include the equilibrium geometries of the reactants, products, and saddle points, and geometries of precursor and successor complexes that are local minima (often due to van der Waals forces) between reactants and the saddle point and between products and the saddle point. In general V is required at a wide range of geometries, both stationary and nonstationary. A word on nomenclature is in order here. When we say transition state theory, we refer to the various versions of the theory, with or without including tunneling. When we want to be more specific, we may say conventional

Variational Transition State Theory for Gas-Phase Reactions

127

transition state theory, variational transition state theory, canonical variational transition state theory (also called canonical variational theory or CVT), and so forth. For each of the versions of VTST, we can further differentiate, for example, CVT without tunneling, CVT with one-dimensional tunneling, or CVT with multidimensional tunneling; and we can further specify the specific approximation used for tunneling. Sometimes we use the term generalized transition state theory, which refers to any version of transition state theory in which the transition state is not restricted to the saddle point with the reaction coordinate along the imaginary frequency normal mode. In this chapter we explain the algorithms used to implement VTST, especially CVT, and multidimensional tunneling approximations in the POLYRATE2–6 computer program. We also include some discussion of the fundamental theory underlying VTST and these algorithms. Readers who want a more complete treatment of theoretical aspects are referred to another review. The beginning of the next section includes the basic equations of VTST, paying special attention to canonical variational transition state theory (CVT), although other theories are discussed briefly in the third subsection. The reason for centering attention mainly on CVT is that it is very accurate but requires only a limited knowledge of the PES. The basic algorithms needed to run the dynamics calculations are then discussed in detail, including harmonic and anharmonic calculations of partition functions. Multidimensional tunneling corrections to VTST are discussed in the fourth section. Approaches to build the PES information needed in the VTST calculations are then discussed, including direct-dynamics methods with specific reaction parameters, interpolated VTST, and dual-level dynamics. The sixth section is dedicated to reactions in condensed media, including liquid solutions and solids. Then ensemble-averaged VTST is highlighted. The eighth and ninth sections describe some practical examples that show in some detail how VTST works, including a brief discussion of kinetic isotope effects. The last section provides a summary of the review.

VARIATIONAL TRANSITION STATE THEORY FOR GAS-PHASE REACTIONS Conventional Transition State Theory Transition state theory (TST), also known as conventional TST, goes back to the papers of Eyring8 and Evans and Polanyi9 in 1935. For a general gas-phase reaction of the type A þ B ! Products

½1

where A and B may be either atoms or molecules, the theory assumes that there is an activated complex called the transition state that represents the bottleneck in the reaction process. The fundamental assumption of TST (also

128

Variational Transition State Theory

called the no-recrossing assumption) is only expressible in classical mechanics. It states that (1) this transition state is identified with a dividing hypersurface (or surface, for brevity) that separates the reactant region from the product region in phase space, and (2) all the trajectories that cross this dividing surface in the direction from reactants to products originated as reactants and never return to reactants; that is, they cross the dividing surface only once. For this reason, the TST dividing surface is sometimes called the dynamical bottleneck. Rigorously, we can say that TST makes only four assumptions: (1) that the Born–Oppenheimer approximation is valid, and so the reaction is electronically adiabatic; (2) that the reactants are equilibrated in a fixed-temperature (canonical) ensemble or fixed-total-energy (microcanonical) ensemble; (3) that there is no recrossing; and (4) that quantum effects can be included by quantizing vibrations and by a multiplicative transmission coefficient to account for tunneling (nonclassical transmission) and nonclassical reflection. In a world where nuclear motion is strictly classical, we need not consider (4), and the TST classical rate constant, kzC , for Eq. [1] is given by kzC ¼

z

 1 QC ðTÞ exp bV z R bh C ðTÞ

½2

where b ¼ ðkB TÞ1 (kB is the Boltzmann constant, and T is the temperature), h is the Planck constant, V z is the potential energy difference between reactants and the transition state (the barrier height, also called classical barrier height), QzC is the classical (C) partition function of the transition state, and R C is the classical partition function of reactants per unit volume. (For a unimolecular reaction, we R would replace R C by the unitless classical reactant partition function QC .) Note that the transition state has one less degree of freedom than does the full system; that particular degree of freedom is called the reaction coordinate, and it is missing in QzC . Throughout this chapter, the symbol z is used to denote the conventional transition state, which is a system confined to the vicinity of the saddle point by constraining the coordinate corresponding to the saddle point’s imaginary-frequency normal mode to have zero extension. This coordinate is the reaction coordinate in conventional transition state theory. The zero of energy for the potential is taken as the energy of the minimum energy configuration in the reactant region. The partition functions are proportional to configurational integrals of Boltzmann factors of the potential. For the reactant partition function, the zero of energy is the same as that for the potential, whereas for the partition function of the transition state, the zero

Variational Transition State Theory for Gas-Phase Reactions

129

of energy is taken as the local minimum in the bound vibrational modes at the saddle point, which is V z . We can establish a connection between Eq. [2] and thermodynamics by starting with the relation between the free energy of reaction, G0T at temperature T, and the equilibrium constant K, which is given by  ½3 K ¼ K0 exp G0T =RT

where K0 is the value of the reaction quotient at the standard state. (For a reaction where the number of moles decreases by one, this is the reciprocal of the standard-state concentration.) Then we rewrite Eq. [2] in quasithermodynamic terms8,10,11 as kzC ¼

1 z K ðTÞ bh C

½4

where KzC is the quasiequilibrium constant for forming the transition state. (The transition state is not a true thermodynamic species because it has one degree of freedom missing, and therefore we add the prefix ‘‘quasi’’.) The thermodynamic analog of Eq. [1] is now given by kzC ¼

h i 1 z ;o K exp Gz;o C;T =RT bh

½5

where Gz;o C;T represents the classical free energy of activation for the reaction under consideration. The siren song of TST when it was first proposed was that ‘‘all the quantities may be calculated from the appropriate potential surface,’’8 and in fact from very restricted regions of that surface. Specifically, one ‘‘only’’ needs to obtain the properties (energies, geometries, moments of inertia, vibrational frequencies, etc.) of the reactants and the transition state from the PES and to be sure that the transition state is unequivocally joined to reactants by a reaction path. One approach to ensuring this is to define the reaction path as the minimum energy path, which can be computed by steepest descent algorithms. (These techniques will be discussed in detail in the subsection entitled ‘‘The Reaction Path’’.) The fact that conventional transition state theory needs the potential energy surface only in small regions around the reactant minimum and saddle point is indeed enticing. We will see that when one adds variational effects, one needs a more extensive region of the potential energy surface that is, nonetheless, still localized in the valley connecting reactants to products. Then, when one adds tunneling, a longer section of the valley is needed, and sometimes the potential for geometries outside the valley, in the so-called tunneling swath, is required. Nevertheless, the method often requires only a manageably small portion of the potential energy surface, and the calculations can be quite efficient.

130

Variational Transition State Theory

It is possible to improve the results of Eq. [2] by incorporating a factor gC , called the transmission coefficient, that accounts for some of the above approximations. The ‘‘exact’’ classical thermal rate constant will be given as kC ¼ gC ðTÞkzC ðTÞ

½6

We can factor the transmission coefficient into two approximately independent parts, gC ðTÞ ¼

C ðTÞgðTÞ

½7

that account, respectively, for corrections to the fundamental assumption being made and to approximation (2) described earlier. When conventional TST is compared with classical trajectory calculations, one is testing the norecrossing assumption; i.e., we are assessing how far C is from unity, with TST being an upper bound to the classical rate constant ( C  1). Both classical trajectory simulations (also called molecular dynamics simulations) and TST invoke the local-equilibrium approximation12 where the microstates of reactants are in local equilibrium with each other, but it has been shown that for gas-phase bimolecular reactions, the deviation of g from unity is usually very small.13–18 In the case of gas-phase unimolecular reactions, the reacting molecules need to be activated, and so there is a competition between energy transfer and reaction. At low pressures, the rate constant is pressure dependent (‘‘falloff region’’) and controlled by the activation and deactivation of the activated species. Only when the pressure is sufficiently high is energy redistribution much faster than the product-forming step such that TST can be applied. In this context, we can consider TST as the high-pressure limit rate constant of a unimolecular rate constant. The justification of variational transition state theory is rigorous only in a classical mechanical world because, when the local equilibrium assumption is valid, VTST provides an upper bound on the classical mechanical rate constant. One optimizes the definition of the transition state to minimize recrossing, and the calculated rate constant converges to the exact rate constant from above. The derivation of TST involves calculating the flux, i.e., counting the species that pass through the dividing surface located at the transition state. This only can be stated with certainty in the realm of classical mechanics. In other words, to formulate classical TST requires that, at a given moment, we know exactly the location in coordinate space of our reactive system, which is passing through the dividing surface, and we know the sign of the momentum, which has to be positive, because the molecule is heading toward products. This violates the uncertainty principle. Nevertheless, the classical framework provides a starting point for real systems, which have quantum effects that are incorporated in two ways. First, quantum effects on motion in all degrees of freedom except the reaction coordinate near the dynamical bottleneck are included by replacing classical vibrational partition functions by quantum

Variational Transition State Theory for Gas-Phase Reactions

131

mechanical ones. Second, tunneling and nonclassical reflection are included through another temperature-dependent transmission coefficient, k. In this review we consider reactions for which auxiliary assumption (1), the Born–Oppenheimer approximation, is met or is assumed to be met. Furthermore, we assume that energy transfer processes are occurring fast enough to replenish the populations of depleted reactant states, so g ffi 1 for all gas-phase reactions considered here. Therefore, the true quantum mechanical rate constant is given by k ¼ gðTÞkz ðTÞ ¼ ðTÞkðTÞkz ðTÞ

½8

where k takes into account nonclassical effects on the reaction coordinate, and kz is a quantized version of kzC . Then we have to find a methodology to evaluate (T) and k(T), which are discussed in the following sections. In particular, VTST may be considered a way to calculate by finding a better transition state that has less recrossing, and semiclassical tunneling calculations may be used to estimate k. In practical calculations on real systems, even when we optimize the transition state by VTST, we do not find a transition state that eliminates all recrossing. Thus there is still a non-unit value of (T). As we carry out better optimizations of the transition state, the exact should converge to unity. The essence of transition state theory is that one finally approximates as unity for one’s final choice of transition state.

Canonical Variational Transition State Theory Conventional TST provides only an approximation to the ‘‘true’’ rate constants, in part because we are calculating the one-way flux through the dividing surface that is appropriate only for small, classical vibrations around the saddle point.19 We should be considering the net flux in a way that accounts for global dynamics, quantization of modes transverse to the reaction coordinate, and tunneling. It is important to note that ‘‘transverse’’ modes consist of all modes except the reaction coordinate. The first way in which the calculated rate constants can be improved is to change the location of the dividing surface, which in conventional TST8 is located at the saddle point. More generally we should also consider other dividing surfaces. The conventional transition state dividing surface is a hyperplane perpendicular to the imaginary-frequency normal mode (the reactive normal mode) of the saddle point; it is the hyperplane with displacement along the reaction normal mode set equal to zero (see Figure 1). Any other dividing surface is by definition a ‘‘generalized transition state.’’20 We search for generalized transition state dividing surfaces (even if they are not saddle points) that are located where the forward flux is a minimum.20–27 The practical problem involves locating this particular dividing surface S, which in principle is a function of

132

Variational Transition State Theory

Figure 1 Contour plot of the Ha þ Hb  Hc ! Ha  Hb þ Hc collinear reaction showing the dividing surface at the transition state and minimum energy path (MEP). X1 and X2 indicate the Ha . . . Hb and Hb . . . Hc distances, respectively. The contour labels are in kcal/mol.

all the coordinates q and momenta p of the system; that is, S ¼ Sðp;qÞ. One way of doing this is to consider the surface as being a function of coordinates only and then simplify further this dependency by considering a few-parameter set of dividing surfaces of restricted shape and orientation (together specified by X) at a distance s along a given reaction path (instead of allowing arbitrary definitions) such that S(p,q) is reduced to S(s, ). We can go further and fix the shape of the dividing surface and use the unit vector nˆ perpendicular to the surface, instead of X, to define the dividing surface S(s, n). ˆ These two parameters (one scalar and one vector) are optimized variationally until the forward flux through the dividing surface is minimized. In POLYRATE, the default for the reaction path is the minimum energy path (MEP) in isoinertial coordinates. The minimum-energy path is the union of the paths of steepest descent on the potential energy surface down from the saddle point toward reactants and products. The path of steepest descent depends on the coordinate system, and when we refer to the MEP, we always mean the one computed by steepest descents in isointertial coordinates. Isoinertial coordinates are rectilinear coordinates in which the kinetic energy consists of diagonal square terms (that is, there are no cross terms between different components of momenta), and every coordinate has the same reduced mass. (Rectilinear coordinates are linear functions of Cartesian coordinates.) Some examples of isoinertial coordinates that one encounters are mass-weighted

Variational Transition State Theory for Gas-Phase Reactions

133

Cartesians, mass-weighted Cartesian displacements, mass-scaled Cartesians, and mass-scaled Jacobis. In mass-weighted coordinates,28 mass is unity and unitless, and the ‘‘coordinates’’ have units of length times square root of mass; in mass-scaled coordinates, the reduced mass for all coordinates is a constant m (with units of mass), and the coordinates have units of length. We almost always use mass-scaled coordinates; the main exception is in the subsection on curvilinear internal coordinates, where much of the analysis involving internal coordinates is done in terms of unscaled coordinates. The original choice27 of dividing surface for polyatomic VTST was a hyperplane in rectilinear coordinates orthogonal to the MEP. With this choice of dividing surface, the direction of the gradient along the MEP coincides with the direction along n. ˆ Therefore, in this case, the dividing surface depends only on s, and the minimum rate is obtained by variationally optimizing the location of the surface along the MEP. The coordinate perpendicular to the dividing surface is the reaction coordinate, and the assumption that systems do not recross the dividing surface may be satisfied if this coordinate is separable from the other 3N  1 degrees of freedom, where N is the number of atoms. The set of coordinates {u1(s), . . .u3N  1(s),s} or (u,s) are called natural collision coordinates.29 It can be shown30 that all isoinertial coordinates can be obtained from one another by uniform scaling and an orthogonal transformation. Therefore, the MEP is the same in all such coordinate systems. This MEP is sometimes called the intrinsic reaction coordinate or IRC.31 It is not necessary to use the MEP as the reaction path; one could alternatively use a path generated by an arbitrarily complicated reaction coordinate,32 and for reactions in the condensed phase, some workers have allowed a collective bath coordinate33 to participate in the definition of the reaction path. The transition state dividing surface is defined by the MEP only on the reaction path itself. In the variational reaction path algorithm,34 the dividing surface is not necessarily perpendicular to the gradient along the MEP. Instead, it is the dividing surface that maximizes the free energy of activation,20 and so, in this case, we also optimize nˆ (discussed above and in the subsection entitled ‘‘The Reaction Path’’), which allows us to make a better estimate of the net flux through the dividing surface. It is possible to write an expression for the rate constant similar to Eq. [2] by using generalized transition state dividing surfaces. We start by describing the formulation of VTST for the original choice of dividing surface—a hyperplane in rectilinear coordinates orthogonal to the MEP—and intersecting it at s. In this case, the generalized transition state rate constant is given by

kGT C ¼

1 QGT C ðT; sÞ exp½bVMEP ðsÞ bh R C ðTÞ

½9

134

Variational Transition State Theory

where by convention s ¼ 0 indicates the location of the saddle point and s < 0 and s > 0 indicate the reactant and product side of the reaction path, respectively, VMEP ðsÞ is the potential evaluated on the MEP at s, and QGT C is the classical generalized transition state partition function. The zero of energy for the generalized transition state partition function is taken as the minimum of the local vibrational modes orthogonal to the reaction path at s, which is equal to VMEP ðsÞ. The value of the rate constant in Eq. [9], when minimized with respect to s, corresponds to canonical variational transition state theory, also simply called canonical variational theory (CVT)20,27,30,35,36 i h CVT GT kCVT ¼ min kGT C C ðT; sÞ ¼ kC T; sC; ðTÞ s

½10

where sCVT C; indicates the optimum classical position of the dividing surface. (In general, an asterisk subscript on s denotes the value of s at a variational transition state.) The expression for the classical CVT rate constant is then

kCVT C

GT 1 QC ¼ bh

h

T; sCVT C; ðTÞ

R C ðTÞ

i

n o

exp bVMEP sC; ðTÞ

½11

The CVT rate constant can account for most of the recrossing (depending on the reaction) that takes place at the conventional transition state. It should be noted that to minimize the recrossing does not generally mean to eliminate it, and for a particular reaction, we may find that even the ‘‘best’’ dividing surface obtained by CVT yields a rate constant larger than the exact classical rate constant, although it can be shown that in a classical world, we can always eliminate all recrossing by optimizing the dividing surface in phase space with respect to all coordinates and momenta.37 On the other hand, assuming local equilibrium of reactant states, the CVT rate constant always improves the result obtained by conventional TST, and therefore, the following inequality holds:  kzC ðTÞ kCVT C

½12

Thus, CVT takes into account the effect of the factor zC ðTÞ on the thermal rate constant, where the superscript z means recrossing of the conventional transition state, and the subscript C reminds us that we are still discussing the classical mechanical rate constant. CVT is considered to be an approximation to the exact classical rate constant kC ffi kCVT C ðTÞ ¼

z CVT C ðTÞkC ðTÞ

½13

Variational Transition State Theory for Gas-Phase Reactions

135

where CVT C

¼

kCVT C ðTÞ kzC ðTÞ

½14

Now we consider how to incorporate quantum effects into the thermal rate constant. For the modes perpendicular to the reaction coordinate, this is done in what is often considered to be an ad hoc way by quantizing the partition functions.8 Actually, this is not totally ad hoc; it was derived, at least to order h2 in Planck’s constant, by Wigner38 in 1932. Because the reaction coordinate is missing in the transition state partition functions of Eqs. [2] and [9], the rate constant is still not fully quantized at the transition state. At this point, to denote that we have incorporated quantum effects in all degrees of freedom of reactants and all but one degrees of freedom of the transition state by using quantum mechanical partition functions instead of classical mechanical partition functions, we drop the subscript (C) from all remaining expressions. The CVT rate constant is then given by

kCVT



GT CVT n o

Q ðTÞ T; s  1 CVT ¼ exp bV ðTÞ s MEP  R ðTÞ bh

½15

where R is the quantized reactant partition function per unit volume and QGT ðT; sÞ is the quantized generalized transition state partition function at s. that minimizes the quantized generalized transition Note that the value sCVT  state rate constant at temperature T is not necessarily equal to the value sCVT C; ðTÞ that minimizes the classical expression. Another way to write Eq. [9] is to relate it to the free energy of activation profile GTGT;o by analogy to Eq. [5]: kGT ¼

n h i. o 1 z;o K exp  GGT;o ðT; sÞ  GTR;o RT bh h . i 1 z;o K exp GTGT;o ðT; sÞ RT ¼ bh

½16

where Kz,o is the reciprocal of concentration in the standard state for bimoleis the standard-state cular reactions or unity for unimolecular reactions, GGT;o T free energy of the system at the dividing surface perpendicular to the MEP, and GTR;o is the classical standard-state free energy of reactants at temperature T. The free energy of activation profile is given as G

GT;o

 GT  Q ðT; sÞ ¼ VMEP ðsÞ  RT ln z;o R K  ðTÞ

½17

136

Variational Transition State Theory

Therefore, the CVT rate constant can be rewritten as . o n

1 z;o CVT ðTÞ RT K exp GCVT;0 kCVT ¼ s  T bh

½18

When comparing Eqs. [16] and [18], it can be seen that the minimum value of kGT as a function of s is reached when the free energy of activation is maximum.20,27,39,40 This can be restated in terms of first and second derivatives; that is,   q GT q   ½19a k ðT; sÞ CVT ¼ GTGT;o ðsÞ CVT ¼ 0 s¼s ðTÞ s¼s ðTÞ qs qs with

and

 q2 GT  k ð T; s Þ  CVT > 0 2 s¼s ðTÞ qs  q2  GT G ð s Þ  CVT < 0 T s¼s ðTÞ qs2

½19b

½19c

Initially we have taken the dividing surface to be perpendicular to the MEP. In the reorientation of the dividing surface (RODS) algorithm, the dividing surface is oriented to yield the most physical free energy of activation, ^ÞÞ at a given T which is the dividing surface that maximizes GTGT;o ðSðsi ; n and si . In this case, the dividing surface is defined by the location si where it ^ that is orthogonal to the dividing surintersects the MEP and a unit vector n face at the MEP. The value of the free energy with the optimum orientation at point si is given by ^ÞÞ GTOGT;o ¼ max GTGT;o ðSðsi ; n ^ n

½20

and the CVT free energy is the maximum of the orientation optimized free energies: GTCVT;o ¼ max GTOGT;o ðsÞ s

½21

The algorithm used to evaluate GTOGT;o will be discussed below.

Other Variational Transition State Theories Canonical variational theory finds the best dividing surface for a canonical ensemble, characterized by temperature T, to minimize the calculated canonical rate constant. Alternative variational transition state theories can also be

Variational Transition State Theory for Gas-Phase Reactions

137

defined. This is done for other ensembles by finding the dividing surfaces that minimize the rate constants for those ensembles. For example, a microcanonical ensemble is characterized by a total energy E, and the generalized transition GT ðE; sÞ, which state theory rate constant for this ensemble is proportional to Nvr is the number of vibrational–rotational states with energy smaller than E at a generalized transition state at s. Microcanonical variational transition state1 theGT ; i.e., ory (mVT) is obtained by finding the dividing surface that minimizes Nvr GT ðE; sÞ N mVT ¼ min Nvr s

½22

The location of the dividing surface that minimizes Eq. [22] is defined as smVT ,  which specifies the microcanonical variational transition state; thus,  GT qNvr ðE; sÞ ½23  mVT ¼ 0 qs s¼s ðEÞ Notice that the minimum-number-of-states criterion corresponds correctly to variational transition state theory, whereas an earlier minimum-density-ofstates criterion does not.27 The microcanonical rate constant can be written as Ð 1 mVT Qel ðTÞ 0 Nvr ðEÞ expðbEÞdE mVT ½24 k ¼ hR ðTÞ

Where the electronic partition function is defined below. Evaluating the microcanonical number of states can be very time consuming at high energies for big molecules. To avoid this problem, one can instead optimize the generalized transition states up to the microcanonical variational threshold energy and then use canonical theory for higher energy contributions. This approach is called improved canonical variational theory (ICVT).1,41 ICVT has the same energy threshold as mVT, but its calculation is much less time consuming. A microcanonical criterion is more flexible than is a canonical one, and therefore, kz ðTÞ kCVT ðTÞ kICVT ðTÞ kmVT ðTÞ

½25

As we go to the right in the above sequence, the methods account more accurately for recrossing effects. Sometimes it is found that even the best dividing surface gives too high rate constants because another reaction bottleneck exists. Those cases can be handled, at least approximately, by the unified statistical (US) model.42,43 In this method, the thermal rate constant can be written as

kUS ¼

Qel ðTÞ

R1 0

US ðEÞ expðbEÞdE Nvr

hR ðTÞ

½26

138

Variational Transition State Theory

where US mVT Nvr ¼ Nvr ðEÞ US

The

US

ðEÞ

½27

recrossing factor due to the second bottleneck is defined as US

¼

(

mVT N mVT ðEÞ Nvr ðEÞ  max 1 þ vr min Nvr ðEÞ Nvr ðEÞ

)1

½28

min GT ðEÞ is the second lowest minimum of the accessible number Nvr ðE; sÞ where Nvr GT max of vibrational–rotational states, and Nvr ðEÞ is the maximum of Nvr ðE; sÞ located between the two minima in the number of vibrational–rotational states. This approach is nonvariational but always satisfies the relation

kmVT kUS

½29

In the case that the same physical approximations are applied to fluxes in a canonical ensemble, we call this canonical unified statistical theory (CUS)44 and the recrossing factor CUS is given by 1

½30

qCVT ¼ QGT vr vr ðT; s Þ exp½bVMEP ðsÞ

½31

CUS

¼





qCVT qCVT vr ðTÞ vr ðTÞ  max qvr ðTÞ qmin vr ðTÞ

where CVT

is the partition function evaluated at the maximum of the free energy of actiis evaluated at the second highest maximum, and qmin vation profile, qmax vr vr ðTÞ is evaluated at the lowest minimum between the two maxima. The CUS rate constant is given by kCUS ¼

CUS

ðTÞkCVT ðTÞ

½32

In the limit that there are two equivalent maxima in the free energy of activation profile with a deep minimum between them, the statistical result is obtained; i.e., CUS ¼ 0:5. Note that signs appear different in Eqs. [28] and [30] because in the former, ‘‘max’’ and ‘‘min’’ are associated with local maxima and minima, respectively, of the flux, whereas in the latter, they are associated with maxima and minima, respectively, of the free energy of activation profile—not of the flux.

Quantum Effects on the Reaction Coordinate Up to this point we have incorporated quantum mechanics in the F  1 bound degrees of freedom (where F is the total number of bound and unbound

Practical Methods for Quantized VTST Calculations

139

vibrations and equals 3N  6, where N is the number of atoms except that it is 3N  5 for linear species) through the partition functions, and therefore, both the TST and the CVT rate constants are quantized. The difference between both theories is still given by the factor CVT

ðTÞ ¼ kCVT ðTÞ=kz ðTÞ

½33

which takes into account the recrossing. To quantize all degrees of freedom requires incorporation of quantum effects into the reaction coordinate, through a multiplicative transmission coefficient k(T). For example, for CVT, we write kCVT=Y ðTÞ ¼ kCVT=Y ðTÞkCVT ðTÞ

½34

where Y indicates the method to evaluate the quantum effects. The main quantum effect to account for is tunneling through the reaction barrier. We can classify tunneling calculations into three levels depending on level of approximation:45 (1) one-dimensional approximations, (2) multidimensional zero-curvature approximations, and (3) multidimensional corner-cutting approximations. Early models that were developed correspond to the first level of approximation and are based on the probability of penetration of a mass point through a one-dimensional barrier,46,47 whose shape was usually given by an analytical function, for example, a parabola48–50 or an Eckart barrier,51 that is fitted to the shape of the potential along the reaction path. The method of Wigner38 actually corresponds to the leading term in an expansion in h; as it depends only on the quadratic force constant along the reaction path at the saddle point, it may be considered an approximation to the one-dimensional parabolic result. These one-dimensional models, although historically important, are not very accurate because they do not take into account the full dimensionality of the system under study. Detailed discussion of multidimensional tunneling methods is provided below.

PRACTICAL METHODS FOR QUANTIZED VTST CALCULATIONS In this section, we provide details of methods used in computations of quantities needed in quantized VTST rate constant calculations. We start by discussing methods used to define dividing surfaces. As the reaction path plays an important role in parameterizing dividing surfaces, we first describe methods for its evaluation. We then discuss calculations of partition functions and numbers of states needed in the rate constant calculations.

140

Variational Transition State Theory

The Reaction Path This section describes some algorithms used to calculate the reaction path efficiently. The evaluation of the CVT rate constants requires the knowledge of at least part of a reaction path, which can be calculated by some of the steepest-descent methods briefly described in the first Subsection. The second Subsection explains a reaction-path algorithm that, at a given value of the reaction coordinate, finds the orientation of the hyperplanar dividing surface that maximizes the free energy. Later on, more general shapes for the dividing surface are discussed. The Minimum Energy Path The minimum energy path is the path of steepest descents in isoinertial coordinates from the saddle point into the reactant and product regions. For the general reaction of Eq. [1] in which the reactive system is composed of N atoms ðN ¼ NA þ NB Þ and i ¼ 1; 2; . . .; N labels the atoms, we define the 3N Cartesian coordinates as R. The origin of the coordinate system is arbitrary, although it is often convenient to define it as the center of the mass of the system. The saddle point geometry in Cartesian coordinates, denoted Rz, is a stationary point and first derivatives of the potential energy, V, with respect to the coordinates at Rz, is zero:  qV  rV ¼ ¼0 ½35 qR R¼Rz

It is useful to change from Cartesian coordinates to a mass-scaled coordinate system defined by  1=2 mi Ria ½36 xia ¼ m

where mi is the mass of nucleus i, m is an arbitrary mass, and a denotes the Cartesian component (x, y, or z). For bimolecular reactions like Eq. [1], it is common either to use the reduced mass of reactants mA mB ½37 mrel ¼ mA þ mB or to use a value of 1 amu for m. For these isoinertial coordinates, the kinetic energy of the nuclear motion simplifies from T¼

N X 1X mi R_ 2ia 2 i¼1 a¼x;y;z

½38

to a diagonal form N X 1 X T¼ m x_ 2 2 i¼1 a¼x;y;z ia

½39

Practical Methods for Quantized VTST Calculations

141

where x_ ia represents the derivative of xia with respect to time. With the latter choice, the numerical value of coordinates expressed in A˚ is identical to the numerical value of a mass-weighted28 Cartesian coordinate in amu1/2 A˚. The motion of the polyatomic system is reduced to the motion of a point mass m on a potential surface V with the classical equations of motion given by m

d qV x_ ia ¼  dt qxia

½40

A generalized transition state is a tentative dynamical bottleneck, and a tentative reaction coordinate is a nearly separable coordinate in a direction from reactants to products. Thermal rate constants are dominated by near-threshold events, and near the reaction threshold, a nearly separable coordinate in a direction from reactants to products is given by following the equations of motion but damping out the velocity along the trajectory. With this damping, the equations of motion can be rewritten for an infinitesimal time interval t as mx_ ia ¼ 

qV t qxia

½41

The integration constant is zero because of the assumption of infinitesimal : velocity (x ¼ 0 at t ¼ 0). We can rewrite Eq. [41] in vector form as mdx ¼ rVðxÞdt ¼ GðxÞdt

½42

where dt ¼ t dt. If we define a infinitesimal mass-scaled distance along the path as ds, then

ds ¼

"

N X X

i¼1 a¼x;y;z

dx2ia

#1=2

¼

jGðxÞj dt m

½43

with jGj being the modulus of the gradient. Substituting Eq. [43] in Eq. [42], we obtain dx ^ ðxÞ ¼ vðxÞ ¼ G ds

½44

^ ¼ G=jGj is the normalized gradient, and v is a vector with opposite where G direction to the gradient. The MEP can be followed by solving the above differential equation. The displacement on the MEP is given by the steepest descent direction along v, where s indicates the progression along the path52–55 and x(s) the geometry.

142

Variational Transition State Theory

For a practical evaluation of the MEP, the first stage involves the knowledge of the transition state (or first-order saddle point) geometry. By convention we locate the transition state at s ¼ 0, and we denote its scaled-mass Cartesian-coordinates geometry by xz. Reactants and products sides are given by values of s < 0 and s > 0, respectively. There are very efficient algorithms to evaluate transition state geometries,56–58 which are available in many popular electronic structure packages. We cannot use Eq. [44] to take a step from the saddle point along the reaction path because the gradient is zero. At the saddle point, the direction of the MEP is given by the unbound vibrational mode, which requires evaluation of the normal mode frequencies and eigenvectors at the saddle point. At stationary points, the vibrational frequencies are calculated by diagonalization of the 3N 3N matrix of force constants F, which are the second derivatives of the potential with respect to isoinertial Cartesian coordinates scaled to a mass m. F is also called the Hessian. For instance, for the conventional transition state geometry xz, this matrix can be diagonalized by performing the unitary transformation: Lðxz Þy Fðxz ÞLðxz Þ ¼ ðxz Þ

½45

where { denotes transpose, L is the 3N 3N diagonal matrix with eigenvalues lm on the diagonals (with m ¼ 1; 2; . . . ; 3N) and with eigenvectors arranged as a matrix L whose columns Lm correspond to the 3N normalmode directions. The normal-mode frequencies at the saddle point can be obtained from the eigenvalues by the relation:  1=2 ½46 om ðs ¼ 0Þ ¼ lm ðxz Þ=m The saddle point has 6 zero eigenvalues (5 if it is linear), which correspond to the overall rotation and translation of the molecule. We define F as the number of vibrational modes (F ¼ 3N  6 for a nonlinear molecule or 3N  5 for a linear molecule), where for a saddle point, the first F  1 modes are bound with positive eigenvalues and real frequencies. Mode F is unbound with an imaginary frequency (oz) corresponding to motion parallel to the MEP at the saddle point. The eigenvector associated with this frequency is denoted by LF ðxz Þ. The first geometry along the MEP toward reactants ( sign) and toward products (þ sign) is given by xðs1 ¼ dsÞ ¼ xz dsLF ðxz Þ

½47

^ n1 ¼ xn1 þ dsvn1 xn ¼ xn1  dsG

½48

where ds is the step length. The sign of LF ðxz Þ is chosen so that the vector points from reactants towards products. For the geometry xðsÞ (x hereafter), the gradient is different than zero, and so for the next x2 geometry, or in general for a geometry xn , with n > 1, we can apply Eq. [44] and follow the opposite direction of the normalized gradient:

Practical Methods for Quantized VTST Calculations

143

^n ¼ G ^ ðxn Þ and vn ¼ vðxn Þ. The above where we use the shorthand notation G first-order equation gives the MEP geometries by the so-called Euler steepestdescent (ESD) method.59 For an accurate evaluation of the MEP, the step size has to be small because the error is proportional to ðdsÞ2 . Some other Euler-type methods try to minimize the error, like the predictor-corrector algorithm,60,61 the optimized Euler stabilization method,59 and the backward Euler method.62 Of all of the Euler-based steepest descent methods, the optimized Euler stabilization method, version 1 (ES1*), is the one that produces the best-converged paths.59 The ESD method provides an initial geometry xð0Þ n ¼ xn1 þ dsvn1

½49

Then a corrector step is specified as a point at a minimum of a parabolic fit ð0Þ along a line that goes through xn and parallel to a ‘‘bisector’’ vector dn , which 60 is given by ð0Þ

vðxn1 Þ  vðxn Þ  dn ¼  ð0Þ  vðxn1 Þ  vðxn Þ

½50

The new geometry is given by

xn ¼ xð0Þ n þ dn

½51

where  is a step along dn , with a step size proportional to a user provided parað0Þ meter d2. The correction is not carried out if jvðxn1 Þ  vðxn Þj < o, with o being a small value characteristic of some small angle between gradients. The algorithm is sensitive to the values of d2 and o, and in the ES1* method, it is recommended that both values are set according to recommendations59 that were based on systematic studies of convergence, those values being d2 ¼ ds and o ¼ 0:01. The above methods are based on a local linear approximation to the energy, with quadratic information being used only at the saddle point. Another possibility is to use algorithms, which in general are more accurate, that exploit higher order information about the potential energy. Page and McIver63 have presented a successful method that does this. First, a cubic expansion of the potential energy surface around the saddle point was proposed to take the initial step along the MEP. In this case, the first point along the reaction path is given by 1 xðs1 ¼ dsÞ ¼ xz dsLF ðxz Þ ðdsÞ2 cðxz Þ 2

½52

144

Variational Transition State Theory

where the vector c(xz) is defined by Acðxz Þ ¼ Cðxz ÞLF ðxz Þ  LyF ðxz ÞCðxz ÞLF ðxz ÞLF ðxz Þ

½53a

h i A ¼ 2LyF ðxz ÞFðxz ÞLF ðxz ÞI þ 2LF ðxz ÞLyF ðxz Þ  I Fðxz Þ

½53b

where

with I being the identity matrix, and Cðxz Þ is given by a finite difference expansion of the force constants matrix around the saddle point with a preselected step d3: Cðxz Þ ¼

Fðxz þ d3 LF ðxz ÞÞ  Fðxz  d3 LF ðxz ÞÞ 2d3

½54

Although the algorithm is cubic, it requires calculations of Hessian matrices only near the saddle point. One of the most popular second-order methods for following the steepest descent path is the local quadratic approximation of Page and McIver,63 which we call the Page–McIver (PM) algorithm and we describe next. At a given geometry xn along the path, we evaluate the Hessian matrix Fn and diagonalize it using an ¼ Uyn Fn Un

½55

where Un is an orthogonal matrix of column eigenvectors and an is a diagonal matrix of eigenvalues. The geometry at the next step along the MEP is given by xnþ1 ¼ xn þ Dn ðÞvn

½56

Dn ðÞ ¼ Un Mn ðÞUyn

½57

where

and Mn is a diagonal matrix with diagonal elements given by Mii ðÞ ¼ ½expðan;ii Þ  1Þ=an;ii

½58

The variable  is a progress variable that is zero at xn and is related to the reaction coordinate s by  y 1=2 ds dx dx ¼ d d d

½59

Practical Methods for Quantized VTST Calculations

145

which can be rewritten 3N d X h2i expð2an;ii Þ ¼ ds i¼1

½60

hn ¼ Uyn Gn

½61

where

The next value of the reaction path coordinate snþ1 ¼ sn þ ds is given by choosing the value of  to satisfy the following integral equation: ð

ds ¼ d 0

0

3N X i¼1

h2i

0

expð2an;ii  Þ

!1

½62

which is numerically integrated by the trapezoidal rule. An option is to evaluate a new Hessian after a given number of steps along the reaction path rather than after each step; in which case, we call it the modified Page–McIver algorithm.59 Variational Reaction Path Algorithm The original approach for defining variational dividing surfaces, once the MEP is determined, is to choose them to be hyperplanes in rectilinear coordinates, which are constrained to be orthogonal to the MEP. In this case the dividing surfaces are characterized by a single parameter, the location s along the MEP. In the reorientation of the dividing surface (RODS) method, the dividing surface is not constrained to be orthogonal to the MEP and its orientation is optimized to maximize the free energy for points along the MEP. The previously described algorithms allow calculation of a well-converged MEP by the steepest-descent path from the saddle point to reactants or to products. However, to obtain a well-converged path may be computationally very demanding and so some alternative strategies have been suggested34,64 for defining optimum dividing surfaces even if the MEP is not well converged. One such approach is the variational reaction path algorithm (VRP) that is a combination of the ESD and RODS algorithms. The first geometry along the path can be obtained from Eq. [47] or Eq. [52] as discussed above. The geometries along the path, for instance, a given geometry xn , are obtained by applying first the ESD method to obtain a zero-order approximation to the geometry on the MEP ^ xð0Þ n ¼ xn1  ds Gn1

½63

We define the dividing surface as a hyperplane in rectilinear coordinates, ^ and passes through the geometry which is orthogonal to the unit vector n

146

Variational Transition State Theory

ð0Þ

xn . The potential in the hyperplane is approximated through quadratic terms and is most easily expressed in terms of the generalized normal modes for motion in the (F  1)-dimensional space of the hyperplane (note that conventional normal modes are defined only at stationary points, so this concept must be generalized to use it at geometries where the gradient of the potential does not vanish): VðxÞ ¼ Vðxð0Þ n Þþ

F1 h i X 1 nÞQ2m nÞQm þ lEn;m ð^ GEn;m ð^ 2 m¼1

½64

ð0Þ

where Qm is the displacement from xn in generalized normal mode m and the gradient and force along mode m are defined as follows. The gradient vector and ð0Þ Hessian matrix evaluated at xn are denoted Gnð0Þ and Fð0Þ n , respectively. Motion ^, as well as rotations and translations, are projected out to give along the vector n



^n ^ I  PRT Gð0Þ GnP;ð0Þ ð^ nÞ ¼ I  n ½65 n and a projected Hessian matrix







RT y ^n ^y I  PRT Fð0Þ ^ ^ nÞ ¼ I  n I  P I  n n FnP;ð0Þ ð^ n

½66

h iy nÞ ¼ LPn ð^ nÞ nÞ GnP;ð0Þ ð^ GEn ð^

½67

iy h nÞ KEn ð^ nÞ FnP;ð0Þ ð^ nÞLPn ð^ nÞ ¼ LPn ð^

½68

iy

h nÞ x  xð0Þ Q ¼ LPn ð^ n

½69

where PRT is the matrix that projects onto the translations and rotations.65 The gradient vector and force constant matrix in the eigenvalue representation are then given by

and

where LEn is a diagonal matrix with elements lEn;m along the diagonal and LPn is the matrix of eigenvectors that diagonalizes the projected Hessian matrix. The eigenvalues and eigenvectors are ordered so that the first F  1 correspond to the modes in the hyperplane and modes F, F þ 1, . . ., 3N correspond to the ^ and translations and rotations, which have zero eigenvalues. modes along n The normal mode coordinates are defined by

and the elements F, F þ 1, . . ., 3N will be zero for motion constrained to the hyperplane.

Practical Methods for Quantized VTST Calculations

147

The coordinate along the variational reaction path is then defined as the location of the minimum of the local quadratic potential in the hyperplane as given by Eq. [63], which is given by P nÞQM ð^ nÞ xn ¼ xð0Þ n þ Ln ð^

½70

where the minimum in the normal mode coordinates is given by QM m

_ n ¼

(

. nÞ lEn;m ð^ nÞ; m ¼ 1; . . . ; F  1 GEn;m ð^ 0; m ¼ F; . . . ; 3N

½71

ð0Þ

In the ESD algorithm, for which xn ¼ xn , the value of s along the path is simply given by the arc length between adjacent points on the MEP sn ¼ sn1 ds

½72

where the sign is negative on the reactant side and positive on the product ð0Þ side of the saddle point. Although xn is not necessarily equal to xn for the variational reaction path, it has been found that use of Eq. [72] provides a better estimate of computed rate constants than a method that uses the difference between xn and xn1 in evaluating s. A complete description of the variational reaction path approach still ^. If n ^ is chosen to be along the gradient vector requires definition of the vector n ð0Þ P;ð0Þ E ^ ^n ^ y Þ^ nÞ and Gn ð^ nÞ are zero [i.e., ð1  n n ¼ 0, QM ð^ nÞ ¼ 0, and Gn , then Gn ð^ ð0Þ x ¼ xn ]. In the variational reaction path approach, the RODS algorithm is ^. The free energy of activation of used to determine the direction of n Eq. [17] is generalized to G

GT;o

^Þ ðT; xð0Þ n ;n

¼

VnM ð^ nÞ

"

ð0Þ

^Þ QGT ðT; xn ; n  RT ln C z;o R K C ðTÞ

#

½73

where VnM ð^ nÞ is the minimum value of the local quadratic potential in Eq. [64], which can be expressed F1 h i2 .

X GEn;m ð^ nÞ nÞ nÞ ¼ V xð0Þ 2lEn;m ð^  VnM ð^ n m¼1

½74

Calculation of the partition function needed in the evaluation of the free energy of activation is described in the next section. Once

the partition funcð0Þ ^ with respect to n ^ is tion is evaluated, the optimum value of GGT;o T; xn ; n obtained by applying the conjugate gradient algorithm for which the vector of ð0Þ ^Þ=q^ n is needed. These derivatives are obtained by derivatives qGGT;o ðT; xn ; n finite differences. We denote the optimum value of the unit vector for a point s

148

Variational Transition State Theory

^ðsÞ. This algorithm eliminates some along the variational reaction path as n instabilities of the calculated reaction path and of the generalized normal mode frequencies. At the same time, it allows a larger step size than the normal steepest-descent algorithms.34,64

Evaluation of Partition Functions Calculation of the rate constant involves the ratio of partition functions for the generalized transition state and for reactants. The three degrees of freedom corresponding to translation of the center of mass of the system are the same in the reactants and transition state, and they are therefore removed in both the numerator and the denominator of Eq. [15]. The reactant partition function per unit volume for bimolecular reactions is expressed as the product of partition functions for the two reactant species and their relative translational motion A;B ðTÞQA ðTÞQB ðTÞ R ðTÞ ¼ rel

½75

where A;B rel ðTÞ ¼



2pmrel bh2

3=2

½76

and QA and QB include contributions from internal degrees of freedom (vibrational, rotational, and electronic) for each species. For unimolecular reactions, the reactant partition function involves contributions from just one reactant species. For an atomic reactant, QA(T) and QB(T) have contributions only from the electronic degrees of freedom, whereas for polyatomic species, they are approximated as shown for reactant A: A A Q A ¼ QA el ðTÞQvib ðTÞQrot ðTÞ

½77

In this expression, couplings among the electronic, vibrational, and rotational degrees of freedom are neglected. The calculation of partition functions for bound species is standard in many textbooks and is repeated here for completeness. The electronic partition function is given by h i X A A ¼ d exp bE ðaÞ ½78 QA a el el a¼1

where a is the index over electronic states and daA and EA el ðaÞ are the degeneracy and energy of electronic state a, respectively. Note that the energy of the ground state (i.e., a ¼ 1) is zero. Rotational partition functions approximated

Practical Methods for Quantized VTST Calculations

149

for the rotational motion of a rigid molecule have shown that there is little loss of accuracy (not more than about 1%) if the quantum partition function is replaced by the classical one. For a linear reactant, the classical rigid-rotor partition function is given by QA rot ¼

2IA h2 bsA rot

½79

where IA is the moment of inertia, sA rot is the rotational symmetry number, and h ¼ h=2p. If the reactant is nonlinear, the rotational partition function is  approximated by QA rot

2 31=2 !3 1 4 2 ¼ A pI1A I2A I3A 5 srot h2 b

½80

where I1A , I2A , and I3A are the principal moments of inertia of reactant A. The vibrational partition function is treated quantum mechanically, and as a first approximation, it is evaluated within the harmonic approximation as QA vib ðTÞ ¼

FA X Y

m¼1 nm

h i exp bEA vib;m ðnm Þ

½81

where FA ¼ 3NA  5 (linear) or FA ¼ 3NA  6 (nonlinear), NA is the number of atoms in reactant A, and EA vib;m ðnm Þ is the energy of the harmonic vibrational level n in mode m and is given by EA vib;m ðnm Þ

  1 hoA ¼ nm þ m 2

½82

where oA m is the frequency of normal mode m in reactant A. Anharmonic corrections to the vibrational partition functions are discussed below. Generalized Transition State Partition Functions in Rectilinear Coordinates Evaluation of the generalized transition state partition function QGT involves contributions from the 3N  4 internal degrees of freedom in the dividing surface. The three degrees of freedom for overall center-of-mass translation and motion out of the dividing surface are removed. Calculations of generalized transition state partition functions require definition of the dividing surface, which in the most general case described above is specified by a location x(s) along the reaction coordinate and the orientation of the planar divid^ðsÞ. In this section, we describe ing surface given by the unit normal vector n

150

Variational Transition State Theory

calculations for dividing surfaces that are hyperplanes in rectilinear coordinates. Calculations for curvilinear coordinates are described in the next section. As for reactant partition functions, we assume that the coupling among rotation, vibration, and electronic motion may be neglected, so that the generalized partition function can be written as the product of three partition functions: GT GT QGT ðT; sÞ ¼ QGT rot ðT; sÞQvib ðT; sÞQel ðT; sÞ

½83

The electronic partition function is given by QGT el ¼

X a¼1

h i daGT ðsÞ exp bEGT ða; sÞ el

½84

where a ¼ 1; . . . indicates the electronic state, a ¼ 1 denotes the ground electronic state, and daGT ðsÞ and EGT el ða; sÞ are the degeneracy and energy of electronic state a. The electronic energies are measured relative to the energy at the local minimum in the dividing surface with the ground state energy EGT el ða ¼ 1; sÞ ¼ 0. For many molecules, it is sufficient to consider only the electronic ground state, because it is the only one that contributes significantly to the sum. Furthermore, it is usually a very good approximation to make the electronic partition function independent of s in the interaction region. Rotational partition functions are calculated for rigid rotations of the transition state complex and only require knowledge of the geometry x(s). As noted, classical rotational partition functions accurately approximate the quantum mechanical ones. For a linear transition state complex, the classical rotational partition function is given by QGT rot ¼

2IðsÞ h2 bsrot

½85

where I(s) is the moment of inertia and srot is the rotational symmetry number. The rotational partition function for a nonlinear transition state complex is 2 31=2 !3 1 2 4 pI1 ðsÞI2 ðsÞI3 ðsÞ5 ½86 QGT rot ðT; sÞ ¼ srot h2 b where I1 ðsÞ, I2 ðsÞ, and I3 ðsÞ are the principal moments of inertia. Vibrational partition functions are evaluated within the harmonic approximation QGT vib ¼

F 1 Y

m¼1

QGT vib;m ðT; sÞ

½87

Practical Methods for Quantized VTST Calculations

151

Each of the m vibrational partition functions is given by QGT vib;m ¼

X nm

h i exp bEGT vib;m ðnm ; sÞ

½88

where EGT vib;m ðnm ; sÞ is the energy of the harmonic vibrational level nm in mode m, measured relative to VMEP ðsÞ, and is given, analogous to Eq. [82], by EGT vib;m ðnm ; sÞ ¼

  1 nm þ hom ðsÞ 2

½89

where om ðsÞ is the frequency of normal mode m for the dividing surface ^ðsÞ. The sum in Eq. [88] should terminate when the lowest defined by x(s) and n dissociation energy of the system is reached,30 but because, in general, the contribution from high energy levels is negligible, the sum can include all harmonic levels and so we get an analytical expression of the type: 

 1 exp  bhom ðsÞ 2 ðT; sÞ ¼ QGT vib;m f1  exp½bhom ðsÞg

½90

The harmonic frequencies {x1 ðsÞ; . . . ; xF1 ðsÞ} needed for the vibrational partition functions correspond to those obtained by making a quadratic expansion of the potential in the vicinity of the reaction path for motion constrained to stay on the dividing surface. Calculation of harmonic frequencies for planar dividing surfaces in rectilinear coordinates is straightforward and described here. At stationary points, the vibrational frequencies are calculated by diagonalization of the 3N 3N Hessian matrix, F, which are the second derivatives of the potential with respect to isoinertial Cartesian coordinates scaled to a mass m. For instance, for the transition state geometry, xz, this matrix is diagonalized as in Eq. [45] to yield the eigenvalues lm ðxz Þ. The normal-mode frequencies at the saddle point can be obtained from the eigenvalues using Eq. [46]. For a location s along the reaction path that is off the saddle point, we want the set of vibrational frequencies fx1 ðsÞ; . . . ; xF1 ðsÞg for motions that are orthogonal to the dividing surface at s. Diagonalization of F[x(s)] for locations where the gradient is not zero will yield normal modes that mix motion in the dividing surface with those orthogonal to it. In this case, motion parallel ^ðsÞ and the six degrees of freedom corresponding to translations and rotato n tion of the molecule can be projected out of the Hessian. In the case where the ^ðsÞ is parallel to the gradient vector, the dividing surface is a hyperplane and n expression for the projection matrix, P, can be found in the article of Miller,

152

Variational Transition State Theory

^ðsÞ is not parallel to Handy and Adams.65 The generalization to cases where n the gradient vector is given by an expression similar to Eq. [66] ^ðsÞ^ ^ðsÞ^ nðsÞy ÞðI  PRT ÞF½xðsÞðI  PRT ÞðI  n nðsÞy Þ FP ¼ ðI  n

½91

Now FP can be diagonalized using the relation: 

y LGT ðsÞ FP ðsÞLGT ðsÞ ¼ LðsÞ

½92

om ðsÞ ¼ ½lm ðsÞ=m1=2

½93

The resulting m ¼ 1; . . . ; F  1 eigenvalues are given by

with directions given by the corresponding vectors LGT m ðsÞ, whose phases (‘‘signs’’) are discussed below Eq. [167]. Generalized Partition Functions in Curvilinear Internal Coordinates In the previous subsections, the dividing surfaces were hyperplanes in rectilinear coordinates; they were orthogonal to the reaction path at the point where they intersect it, and they were labeled by the location s at which they intersect the reaction path. In this section, we consider more general dividing surfaces defined in terms of curvilinear coordinates such as stretch, bend, and torsion coordinates (which are called valence coordinates or valence force coordinates and which are curvilinear because they are nonlinear functions of atomic Cartesians). In general, defining the reaction path provides the value of the reaction coordinate only for points on the reaction path. Defining the dividing surface assigns a value to the reaction coordinate even when the geometry is off the reaction path because one defines the generalized transition state dividing surface so that s is constant in the dividing surface; this means that defining the reaction coordinate off the reaction path is equivalent to defining the dividing surface and vice versa. Making the dividing surface curvilinear means that the expression for the flux in phase space through the dividing surface no longer matches the expression for a classical partition function.32 Therefore one should introduce an additional term C, in addition to the free energy of activation, in the exponent of equations like Eq. [5]. However, as we only calculate the generalized transition state partition function approximately, we do not include this term (which is expected to be small for dividing surfaces defined in terms of stretch, bend, and torsion coordinates32). Changing the definition of the dividing surface changes the generalized transition state partition function even if one makes the harmonic approximation for transverse coordinates because generalized normal mode frequencies computed with the constraint that s is constant will also change if the definition of s off the reaction path changes.66–68

Practical Methods for Quantized VTST Calculations

153

An example showing why curvilinear coordinates are more physical than rectilinear coordinates is provided by an atom–diatom reaction (A þ BC ! AB þ C) with a collinear reaction path where it is clearly more physical to define the reaction coordinate in terms of the AB and BC distances and the ABC bond angle than to define it as a function of the Cartesian coordinates. Displacements from the linear geometry for fixed values of s produce different effects on the geometry when the reaction coordinate is defined in curvilinear coordinates, in which the bond distances stay fixed, as shown in part (a) of Scheme 1, than when it is defined in rectilinear coordinates, in which atoms move along straight-line paths in Cartesian coordinates, as shown in part (b) of Scheme 1. This effect is illustrated in Figure 2. The difference is important because the evaluation of the second derivatives of the potential with different frozen variables produces different harmonic frequencies. The above example indicates that the choice between rectilinear and curvilinear coordinates for the harmonic treatment is equivalent to choosing between two different definitions of the reaction coordinate, s and s0 , for points that are off the reaction path. These two reaction coordinates are equal for geometries on the

Figure 2 Contour plot that shows the projection over the reaction coordinate of a geometry close to the MEP when curvilinear ðs0 Þ or rectilinear ðsÞ coordinates are used.

154

Variational Transition State Theory

reaction path but differ for general geometries. The relation between them is given by the expression:67 s0 ¼ s þ

F1 X F1 1X bij qi qj þ Oðq3i Þ 2 i¼1 j¼1

½94

where qi represents a curvilinear coordinate that is zero on the reaction path and measures the distortion away from it; bij involves second-order partial derivatives of s0 with respect to qi with s held fixed. The Hessian elements evaluated with the two definitions are related by66,67 !  !     q2 V q2 V qV   bij ½95 ¼    qqi qqj 0 0 qqi qqj qs q  0 s q ¼ð0;::::;0;s Þ

s

q¼ð0;::::;0;sÞ

where q0 ¼ fq1 ; q2 ; . . . ; qF1 ; s0 g and q ¼ fq1 ; q2 ; . . . ; qF1 ; sg. It is clear from the above relation that the Hessian and (therefore) the harmonic frequencies depend on the definition of the reaction coordinate except at stationary points, where qV=qs ¼ 0. As the calculated vibrational frequencies of the generalized normal modes depend on the coordinate system, it is important to make the most physically appropriate choice. It has been shown that the curvilinear coordinates produce more physical harmonic frequencies than do the rectilinear coordinates.67,68 This results because the atoms move along straight lines in rectilinear generalized normal modes,69 whereas motions along paths dictated by valence coordinates28,68–72 are much less strongly coupled. (Valence coordinates, also called valence force coordinates, are stretches, bends, torsions, and improper torsions.) The frequencies in the more physical curvilinear coordinates can be obtained by following a generalization of the scheme described by Pulay and Fogarasi,71 as described next. For the N-atom system, the energy V at a geometry (denoted by x in Cartesian coordinates and by q in internal coordinates) close to a reference geometry (denoted by x0 in Cartesian coordinates and by q0 in internal coordinates) can be obtained by a second-order Taylor expansion. In unscaled Cartesian and curvilinear coordinates, the expansions are given by V ¼ V0 þ

3N X i¼1

GR ðRi  R0i Þ þ i

3N 1X FR ðRi  R0i ÞðRj  R0j Þ 2 i;j ij

½96

Fcurv 1X fij ðqi  q0i Þðqj  q0j Þ 2 i;j

½97

and V ¼ V0 þ

Fcurv X i¼1

gi ðqi  q0i Þ þ

respectively, where Fcurv is the number of curvilinear coordinates that are to be used, gi is a component of the gradient in internal coordinates, and fij is an

Practical Methods for Quantized VTST Calculations

155

element of the Hessian in curvilinear coordinates. However, three problems are related to the use of curvilinear coordinates: (1) They are not mutually orthogonal; (2) for more than four atoms, there are more than 3N  6 valence coordinates; and (3) the transformation to Cartesian coordinates is nonlinear. Specifically, the curvilinear coordinates can be written as a power series of the displacements in Cartesian coordinates:28 qi ¼

3N 3N X X Bij ðRj  R0j Þ þ 1 Cijk ðRj  R0j ÞðRk  R0k Þ þ . . . 2 j;k j

½98

where a superscript zero indicates a reference geometry (a stationary point or a point on the reaction path), Bij is an element of the Fcurv 3N Wilson B matrix,   qqi  ; i ¼ 1; . . . Fcurv ; j ¼ 1; . . . ; 3N ½99 Bij ¼ qRj fRj g¼fR0 g j

and Cijk

is an element of the 3N 3N tensor Ci that represents the quadratic term ! q2 qi  i ; i ¼ 1; . . . Fcurv ; j; k ¼ 1; . . . ; 3N ½100 Cjk ¼ qRj qRk fRk g¼fR0 g k

For reactions involving more than four atoms, it is often not obvious which set of 3N  6 internal coordinates best describes the whole reaction path, and in those cases, it is very useful to define the reactive system in terms of redundant internal coordinates.72 Using redundant internal coordinates circumvents (1) destroying the symmetry of the system for highly symmetric reaction paths by omitting a subset of symmetry related coordinates and (2) using an incomplete set of 3N  6 internal coordinates that does not fully span the vibrational space. Therefore, the recommendation is that, for more than four atoms, one should always use redundant internal coordinates to evaluate the generalized normal mode frequencies. In practice, the following procedure68,72 is carried out to calculate the frequencies and generalized normal mode eigenvectors in redundant internal coordinates, where nonredundant internal coordinates are simply a special case and may be used in the same manner. First, the Wilson B and C matrices28 must be constructed. When using redundant internal coordinates, the formulas for the Wilson B and C matrices given above are used, except the number of internal coordinates, Fcurv , is not restricted to be 3N  6. The formulas given above for these matrices are deceptively simple, and in practice, this is the most difficult step,68 although once computer code is available (as in POLYRATE), the code is very general, and no new

156

Variational Transition State Theory

issues need to be considered for further applications. Once these matrices have been constructed, the Wilson G matrix, called GW , is constructed as GW ¼ BuB

y

½101

where u is a 3N 3N diagonal matrix with the reciprocals of the atomic masses on the diagonal. Next, the matrix GW is created using   y  1 0 K y W 0 ½102 G ¼ ðKK Þ 0 0 ðK 0 Þ where K is defined to consist of the eigenvectors of GW corresponding to nonzero eigenvalues, K0 is defined to consist of the remaining eigenvectors, and is defined to contain the nonzero eigenvalues. The generalized inverse of the Wilson B matrix is71 A ¼ uBy GW

½103

Now, the construction of the gradient and force constant matrices in internal coordinates is possible: g ¼ Ay GR f ¼ Ay FR A 

Fcurv X

½104

gi Ay Ci A

½105

i

Then, the gradient and force constant matrices needed to project out the reaction coordinate are created: P ¼ GW GW ~f ¼ PfP

½106

~ g ¼ Pg

The projected Hessian f P is given by      f P ¼ 1  pðsÞ BuBy ~f ðsÞ 1  ½BuBy pðsÞ

½107 ½108 ½109

where p, the nonorthogonal coordinate projection operator, is given at s by p¼

~ g~ gy  ~ g gy BuBy ~

½110

Now it is possible to evaluate the vibrational frequencies using the Wilson GF matrix method,28,73–75

W

GW FW LW ¼ LW 

P

½111

W

where G is defined above, the projected Hessian f is used for F , LW is the matrix of generalized normal mode eigenvectors, and K is the diagonal

Practical Methods for Quantized VTST Calculations

157

eigenvalue matrix. Vibrational frequencies are given in terms of the eigenvalues by ½112 om ¼ ðmm Þ1=2 Next, the vibrational eigenvectors must be normalized. The normalized eigenvector matrix is given by ^ W ¼ LW W L

½113

where Wij ¼ and

qffiffiffiffiffiffi Cij dij

½114

iy h C ¼ ðLW Þ1 GW ðLW Þ1

½115

The Cartesian displacement normal-mode eigenvectors are ^ W ¼ ALW W ^ W ¼ AL w ¼ uBy ðGW Þ1 L

GT

½116

Finally, the elements of the rectilinear eigenvector matrix, L , which are needed for multidimensional tunneling calculations (see Eqs. [164], [170], and [171]) are given by LGT ij ¼ "

ðmi =mÞ1=2 P k

ðmk =mÞwkj 2

1=2

#1=2 ¼ "

P k

mi wij #1=2

½117

mk wkj 2

Loose Transition States Although the POLYRATE program is very general, the definitions it uses for the generalized transition state dividing surfaces are most appropriate for reactions with non-negligible barriers and tight transition states. For many association–dissociation reactions, the transition state is located at a position where two fragments have nearly free internal rotation; in such cases, one may wish to use even more general definitions of the dividing surfaces;76,77 these are not covered in the current tutorial. We note though that the methods used above have been used successfully to treat the association of hydrogen atoms with ethylene to form the ethyl radical.78–80 In recent years, there has been tremendous progress in the treatment of barrierless association reactions with strictly loose transition states.76,77,81–89 A strictly loose transition state is defined as one in which the conserved vibrational modes are uncoupled to the transition modes and have the same frequencies in the variational transition state as in the associating reagents.81,82,84 (Conserved vibrational modes are modes that occur in both

158

Variational Transition State Theory

the associating fragments and the association complex, whereas transition modes include overall rotation of the complex and vibrations of the complex that transform into fragment rotations and relative translational upon dissociation of the complex.) Progress has included successively refined treatments of the definition of the dividing surface and of the definition of the reaction coordinate that is missing in the transition state76,77,81–88 and elegant derivations of rate expression for these successive improvements.85–88 The recent variational implementation of the multifaceted–dividing-surface variational-reaction-coordinate version of VTST seems to have brought the theory to a flexible enough state that it is suitable for application to a wide variety of practical applications to complex combustion reactions of polyatomic molecules. Although some refinements (e.g., the flexibility of pivot point placement for cylindrical molecules like O288) would still be useful, the dynamical formalism is now very well developed. However, this formalism is not included in POLYRATE, and so it is not reviewed here.

Harmonic and Anharmonic Vibrational Energy Levels The partition functions thus far have been assumed to be calculated using the harmonic approximation. However, real vibrations contain higher-order force constants and cross terms between the harmonic normal modes, and they are coupled to rotations. If the cross terms and couplings are neglected, each of the vibrational degrees of freedom is bound by an anharmonic potential given by 1 Vm ¼ kmm ðsÞQ2m þ kmmm ðsÞQ3m þ kmmmm ðsÞQ4m þ . . . 2

½118

where kmm , kmmm , and kmmmm are the quadratic, cubic, and quartic normal coordinate force constants and Q is the vector of normal mode coordinates. In rectilinear coordinates, the relationship between normal modes is given by iy h ½119 Q ¼ LGT ðsÞ ½x  xðsÞ where the transformation matrix is defined by the diagonalization in Eq. [92]. In curvilinear coordinates, the normal modes are defined by the Wilson GF matrix method as described above. For the harmonic approximation, the series is truncated after the first term, and the frequency o is given by pffiffiffiffiffiffiffiffiffiffiffiffiffiffi ½120 om ¼ kmm =m

The partition function for the harmonic approximation is bEHO ~ HO 0 Q QHO vib vib ¼ e HO

¼ ebE0

F Y

m¼1

~ HO Q m

½121 ½122

Practical Methods for Quantized VTST Calculations

159

is the harmonic ground-state energy, which is calculated by EHO 0 ¼ EHO 0 where ~ HO ¼ Q m

F h X om 2 m¼1

1 1

ebhom =2

½123

½124

and om is the harmonic vibrational frequency of mode m, given by Eq. [120]. For generalized transition states, F is replaced by F1 in Eqs. [122] and [123] such that the imaginary frequency is not included. Hindered Internal Rotations (Torsions) One type of anharmonic motion is a hindered internal rotation, or torsion, which can differ substantially from a harmonic normal mode motion. Unlike many other anharmonic motions, torsions can be readily accounted for even in large systems. It has been shown9295 that a vibrational partition function that includes a torsion can be written as ~ tor Q ~ sb Qvib ¼ ebE0 Q

½125

~ tor is a torsion partition function and Q ~ sb is the stretch-bend partition where Q function that ignores the torsional twist angle. A simple and effective equa~ tor is tion90 for calculating Q  FR  ~ ¼Q ~ HO tanh Q Q tor m QI

½126

where QFR is the free rotor partition function given by 1

QFR ¼

ð2pIkTÞ2 hs 

½127

where I is the effective moment of inertia and s is the effective symmetry number. QI is called the intermediate partition function, which is the high-temperature limit of the harmonic oscillator partition function given by QI ¼

kT 1 h o 

½128

where o is the normal mode frequency relating to the torsion. The method thus far has been defined for a single well or multiple wells that are symmetrically equivalent. For multiple wells that are not symmetrically equivalent, the extended method has been defined by Chuang and Truhlar.90

160

Variational Transition State Theory

The frequency, the effective moment of inertia, and the barrier height W, are related to one another by the expression90,91 o¼



W 2I

12

M

½129

where M is the number of wells as the torsion rotates 360 degrees. Therefore, under the assumptions that the effective potential for the torsion is a single cosine term and that the moment of inertia is a constant, only two of the three variables need to be specified to calculate the torsion partition function. The frequency can be determined from normal mode analysis, the barrier height can be determined from electronic structure methods, and the effective moment of inertia is described next. There are several schemes for calculating the effective moment of inertia for internal rotation: the curvilinear (C) scheme of Pitzer and Gwinn,92,93 which requires the choice of an axis (typically a bond) about which the tops are rotating; the rectilinear (R) scheme of Truhlar,94 which only requires that one identify the generalized normal mode that corresponds to the torsion and divide the molecule into parts that rotate against each other; the ok scheme of Ellingson et al.,95 which requires that one identify a torsion coordinate as well as the generalized normal mode frequency corresponding to the torsion; and the oW scheme of Chuang and Truhlar.90 When the torsion is mixed with stretching, bending, or other torsional motions in the generalized normal modes, the user must pick the generalized normal mode that is most dominated by the torsion under consideration. It is not always clear which scheme is most correct, in part because real torsions are usually coupled strongly to overall rotation and sometimes to other vibrational modes as well. As the tops become significantly asymmetric, the R scheme begins to fail, and one should use one of the other methods. The method of calculating the moment of inertia in the C scheme is described here. Let M be the mass of the entire molecule and mi be the mass of atom i, and let the principal moment of inertia be defined as Ij , where j ¼ 1; 2, or 3. All atoms in the molecule are divided into two groups rotating with respect to one another; each group is called a top, and the lighter top is taken as the rotating top. Let the coordinate system be defined such that the z axis is the chosen axis of rotation and the x axis is perpendicular to the z axis and passes through the center of mass of the rotating top, and let the y axis be perpendicular to both x and z. At this point, there are three sets of axes: the original Cartesian axes, the principal moment of inertia axes (labeled 1, 2, or 3), and the axes for the rotating top (labeled x, y, and z). It is important that these sets of axis are all either right handed or left handed. The direction of cosines between the axes of the top and the principal moment of inertia axis j are defined as ajx , ajy , and ajz . The vector from the molecule’s center of gravity to the origin of coordinates for the rotating top is given by r, with its components r1 , r2 , and r3 on the principal moment of inertia axes.

Practical Methods for Quantized VTST Calculations

161

The moment of inertia for the rotating top about the z axis is given by X mi ðx2i þ y2i Þ ½130 A¼ i

where the sum is over the atoms in the rotating top and xi , yi , and zi (used below) refer to the location of atom i on the newly created x, y, and z axis, respectively. The xz product of inertia is given by X mi xi zi ½131 B¼ i

The yz produce of inertia is given by X m i yi z i C¼ i

½132

The off-balance factor is given by



X

mi xi

i

The reduced moment of inertia for internal rotation is given by ( ) X ðajy UÞ2 ðbj Þ2 I ¼A þ M Ij j

½133

½134

where

bj ¼ ajz A  ajx B  ajy C þ Uðaj1;y rjþ1  ajþ1 rj1 Þ

½135

and the superscripts refer to cyclic shifts of axes, such that j  1 ¼ 3 if j ¼ 1, and j þ 1 ¼ 1 if j ¼ 3. POLYRATE uses the value of I calculated for the lighter of the two tops as the C scheme moment of inertia. The R scheme does not require that the axis of rotation be chosen a priori, but it relies on the generalized normal mode eigenvector of the mode corresponding to the torsion to determine the axis. The equations for I in this scheme are given elsewhere.90,91 The ok scheme simply takes the moment of inertia as95 I¼

q2 V otorsion 2 qj2 1

½136

where otorsion is the frequency of the normal mode that most corresponds to the torsion, j is the torsion angle, and the partial derivative in Eq. [136] must be supplied by the user. The partial derivative may be evaluated with other internal coordinates fixed or along a torsion path where other degrees of freedom are optimized for each value of j. The oW scheme uses the barrier height rather than the second derivative with respect to the torsion angle.90

162

Variational Transition State Theory

Morse Approximation I and Other Corrections for Principal Anharmonicity Many other anharmonic methods can be applied, especially for smaller systems. One way to approximate the accurate anharmonic potential along a stretching vibrational coordinate is to use a Morse function:96  2 VM;m ¼ De ðsÞ exp½bM;m ðsÞQm ðsÞ  1

½137

where De is the dissociation energy and the range parameter bM;m is chosen such that the force constant is correct at the minimum of the Morse potential: bM;m ðsÞ ¼



kmm ðsÞ 2De ðsÞ

12

½138

The energy levels for the Morse approximation I are given by EGT vib;m

    1 1 ¼ hom ðsÞ n þ 1  xM;m ðsÞ n þ 2 2

½139

where n is the level index, om is the harmonic frequency, and xM;m is the Morse anharmonicity constant: xM;m ¼

 om ðsÞ h 4De ðsÞ

½140

The choice of De as the lowest dissociation energy of the system relative to VMEP ðsÞ is referred to as the Morse approximation I.20,30,97 The Morse approximation is not appropriate for modes that have kmmm ¼ 0. These types of modes include bending modes of linear systems, out-of-planes bends, and certain stretching motions. Often such modes are better treated by a quadratic-quartic model, given by 1 Vm ¼ kmm ðsÞ½Qm ðsÞ2 þ kmmmm ðsÞ½Qm ðsÞ4 2

½141

Accurate approximations for this model can be determined using a perturbation–variation method.98,99 Spectroscopists call the force constants that have all indices the same the principal force constants, while the anharmonicity associated with the principal force constants is called principal anharmonicity. The Morse and quadratic–quartic approximations treat only principal anharmonicity. However, as mentioned in Eq. [95], neglecting the cross terms between modes is a much more serious approximation in rectilinear coordinates.70 Explicitly including cross terms in rectilinear coordinates is expensive and cumbersome

Quantum Effects on Reaction Coordinate Motion

163

because of the large number of quartic cross terms. One practical step that can be taken to minimize the importance of cross terms is to use curvilinear internal coordinates.68,72,100,101 Not only are the harmonic frequencies more physical in curvilinear coordinates, but anharmonicity is much better approximated by retaining only principal terms in the potential and neglecting couplings.

Calculations of Generalized Transition State Number of States The generalized transition state number of states needed for microcanoGT in the nical variational theory calculations counts the number of states Nvr transition state dividing surface at s that are energetically accessible below an energy E. Consistent with approximations used in calculations of the partition functions, we assume that rotations and vibrations are separable to give i i h X h GT GT GT ðsÞ  E ðn; sÞ; s H E  VMEP ðsÞ  EGT ðn; sÞ N ¼ E  V Nvr MEP rot vib vib n

½142

where H(x) is the Heaviside step function ½HðxÞ ¼ 0 for x < 0 and HðxÞ ¼ 1 for x > 0] and the rotational number of states are calculated classically.

QUANTUM EFFECTS ON REACTION COORDINATE MOTION In the previous sections, we quantized the F  1 degrees of freedom in the dividing surface, but we still treated the reaction coordinate classically. As discussed, such quantum effects, which are usually dominated by tunneling but also include nonclassical reflection, are incorporated by a multiplicative transmission coefficient k(T). In this section, we provide details about methods used to incorporate quantum mechanical effects on reaction coordinate motion through this multiplicative factor. In practice, we have developed two very useful approaches to the multidimensional tunneling problem. In both of these methods, we estimate the rate constant semiclassically, in which case it involves averaging the tunneling probabilities calculated for a set of tunneling energies and tunneling paths. In a complete semiclassical theory, one would optimize the tunneling paths;102 the optimum tunneling paths minimize semiclassical imaginary action integrals, which in turn maximizes the tunneling probabilities. We have found103 that sufficiently accurate results can be obtained by a simpler criterion91 in which, for each energy, we choose the maximum tunneling probability from two approximate results, one, called small-curvature tunneling3,104 (SCT), calculated by assuming that the curvature of the reaction path is small, and the other, called

164

Variational Transition State Theory

large-curvature tunneling (LCT),1,3,7,91,105–110 calculated by assuming that it is large. The result is called microcanonically optimized multidimensional tunneling (mOMT) or, for short, optimized multidimensional tunneling (OMT). The resulting VTST/OMT rate constants have been carefully tested against accurate quantum dynamics,103,111,112 and the accuracy has been found to be very good. The SCT, LCT, and OMT tunneling calculations differ from onedimensional models of tunneling in two key respects: (1) These approximations include the quantized energy requirements of all vibrational modes along the tunneling path. As the vibrational frequencies are functions of the reaction coordinate, this changes the shape of the effective potential for tunneling. (2) These approximations include corner-cutting tunneling. Corner cutting means that the tunneling path is shorter than the minimum energy path. The wave function decays most slowly if the system tunnels where the effective barrier is lowest; however, the distance over which the decay is operative depends on the tunneling path. Therefore, the optimum tunneling paths involve a compromise between path length and effective potential along the path. As a consequence, the optimum tunneling paths occur on the concave side of the minimum energy path; i.e., they ‘‘cut the corner.’’7,52,102,107,113– 119 For the purpose of analyzing the results, it is sometimes of interest to also compute an intermediate result, called zero-curvature tunneling (ZCT), that includes effect (1) but not (2). The rest of this section will provide the details of the ZCT, SCT, LCT, and OMT tunneling approximations.

Multidimensional Tunneling Corrections Based on the Adiabatic Approximation The adiabatic separation between the reaction coordinate and all other F  1 vibrational degrees of freedom means that quantum states in those modes are conserved through the reaction path. With this approximation, we can label the levels of the generalized transition states in terms of the ‘‘one-dimensional’’ vibrationally and rotationally adiabatic potentials Va ¼ VMEP ðsÞ þ EGT int ða; sÞ

½143

where a is the collection of vibrational and rotational quantum numbers and EGT int ða; sÞ is the vibrational–rotational energy level for quantum state a and generalized, transition state at s. Making the rigid-rotor–harmonic-oscillator approximation, EGT int ða; sÞ for the ground rotational state reduces to the energy level for vibrational state n ¼ fn1 ; . . . nF1 g and is given by   X 1 EGT h  o ðsÞ n þ ðn; sÞ ¼ ½144 m m vib 2 m

Quantum Effects on Reaction Coordinate Motion

165

The ground-state adiabatic potential is defined with a ¼ 0, and only the vibrations contribute to the internal energy through zero-point energies in each mode to give VaG ¼ VMEP ðsÞ þ

X hom ðsÞ 2

m

½145

The transmission coefficient is written in terms of the classical and quantum A probabilities, PA C and PQ , respectively, for transmission through or above the adiabatic potential: VA ða; sÞ:54 R1 P A PQ ða; EÞ 0 dE expðbEÞ a P ½146 kA ¼ R 1 PA C ða; EÞ 0 dE expðbEÞ a

The probabilities for classical motion along the reaction coordinate within the adiabatic approximation are simply zero when the energy E is below the maximum VaA of the vibrationally adiabatic potential for state a, and one for energies above the barrier; i.e., h i A ½147 PA C ða; EÞ ¼ H E  Va ðaÞ where H is the Heaviside unit-step function defined below Eq. [142]. Evaluation of the quantum probabilities PA Q is more difficult, and two approximations are made to facilitate evaluation of the numerator of the transmission coefficient. The first approximation is that excited-state probabilities are approximated by the probabilities for the ground state PAG Q , but for a shifted energy, AG A AG PA Q ¼ PQ ½E  Va ðaÞ þ Va 

½148

where VaAG is the barrier height of the ground-state vibrationally adiabatic potential, VaAG ¼ VaA ða ¼ 0Þ

½149

This approximation assumes that the vibrationally adiabatic potentials of all excited states have the same shape as the ground-state vibrationally adiabatic potential. Although this approximation is not strictly valid, it is adequate for two reasons. First, when tunneling is important, the temperature is usually low enough that the transmission coefficient is dominated by the ground state or excited states close to the ground state. Second, contributions of tunneling to the rate constant become unimportant (i.e., k ! 1) as T becomes high enough that excited states with significantly different vibrationally adiabatic potential curves contribute more to the rate constant.

166

Variational Transition State Theory

The second approximation consists in the replacement of quantum probabilities PAG Q by semiclassical ones PSAG ¼ f1 þ exp½2yðEÞg1

½150

where y is the imaginary action integral: 1

yðEÞ ¼ h

s>ððEÞ

s< ðEÞ

n h io12 ds 2meff ðsÞ VaG ðsÞ  E

½151

where VaG is the ground-state adiabatic potential defined in Eq. [145], and s< and s> are the classical turning points, i.e., locations where VaG equals E. The effective mass meff(s) for motion along the reaction coordinate is discussed in the next section. After these approximations, the semiclassical adiabatic ground-state transmission coefficient takes the simplified form R1 b 0 dE expðbEÞPSAG ðEÞ SAG ½152 k ¼ expðbVaAG Þ which requires evaluation of semiclassical reaction probabilities for the ground state only. The integrals in Eq. [146] extend to infinity, but Eqs. [150] and [151] are only valid for energies below the top of the barrier (i.e., for E  VaAG ), which is the tunneling region. For energies above VaAG , the quantum effects (nonclassical reflection) are incorporated by assuming that close to the top of the barrier the shape of the potential is parabolic, and in that case,47 PSAG ðVaAG þ EÞ ffi 1  PSAG ðVaAG  EÞ

½153

where E ¼ E  VaAG . This equation provides a natural extension to Eq. [150], and therefore, the semiclassical probability in the whole range of energies is given by

SAG

P

8 0; > > < f1 þ exp½2yðEÞg1 ; ¼ > 1  PSAG 2VaAG  E ; > : 1;

E < E0 E0  E  VaAG VaAG  E  2VaAG  E0 2VaAG  E0 < E

½154

where E0 is the lowest energy at which it is possible to have tunneling (also called the quantum threshold energy). For instance, for a bimolecular reaction AþB!CþD h i ½155 E0 ¼ max VaG ðs ¼ 1Þ; VaG ðs ¼ 1Þ

Quantum Effects on Reaction Coordinate Motion

167

and for a unimolecular reaction A ! B h 1 1 Pi G E0 ¼ max VaG ðs ¼ sR Þ þ hoR ; V ðs ¼ s Þ þ ho P 2 F a 2 F

½156

where sR and sP indicate the value of s at the reactant and the product minima, respectively. Accurate Incorporation of Classical Threshold Energies The transmission coefficient described above is appropriate41 for correcting the adiabatic theory or equivalently20,30 the microcanonical variation theory, which can be written kmVT ¼

1 R h ðTÞ

ð1

¼

kB T hR ðTÞ

X

dE expðbEÞ

0

a

X a

h

PA C ða; EÞ

i exp bVaA ðaÞ

½157

Reaction coordinate motion is treated classically in this expression and the lowest energy for reaction [i.e., at which PA C ðE; aÞ is not zero], or the classical threshold energy, is the barrier maximum for the ground-state adiabatic potential VaAG . CVT has a different classical threshold energy, which can be seen by writing the CVT rate constant as kCVT ¼

n io h X 1 CVT exp bV ðTÞ a; s a  bhR ðTÞ a

½158

where sCVT ðTÞ is the value of s that minimizes the quantized generalized tran sition state rate constant at temperature T as defined after Eq. [15] above. The ðTÞ instead of classical threshold energy inherent in this expression is VaG ½sCVT  VaAG . Using the transmission coefficient kSAG to correct CVT instead of mVT (or the adiabatic theory) requires correction for the different classical threshold. The CVT rate constant including multidimensional tunneling (MT) in the reaction coordinate is given by kCVT=MT ¼ kCVT=MT ðTÞ kCVT ðTÞ

½159

where k

CVT=MT

¼

b

R1

dE expðbEÞPSAG ðEÞ ðTÞg expfbVaG ½sCVT 

0

½160

168

Variational Transition State Theory

Similarly, corrections are needed for other theories that have inherent classical thresholds different from VaAG , such as conventional TST, in which VaAG in Eq. [152] is replaced by VazG ¼ VaG ðs ¼ 0Þ

½161

Some of the variables explained here are shown in Figure 3 for more clarity. POLYRATE actually calculates the transmission coefficent as R1 b 0 dE expðbEÞPSAG ðEÞ MT ½162 k ¼ expðbVaAG Þ where VaAG is VaG ðs ¼ sAG Þ. Then, instead of using Eq. [159], one uses Eq. [162] but first multiplies the CVT rate by nh io ðTÞÞ  VaAG ½163 kCVT=CAG ðTÞ ¼ exp b VaG ðsCVT 

In early papers we were careful to distinguish kCVT=MT from kMT , but in recent papers, we often call both of these quantities kMT and let the reader figure out which one is involved from the context.

Figure 3 Graphic illustration of some important quantities that often appear in variational transition state theory. The transition state is indicated by the z symbol.

Quantum Effects on Reaction Coordinate Motion

169

Zero-Curvature and Small-Curvature Multidimensional Tunneling From the relation between Eq. [150] and Eq. [151], at equal barrier heights, tunneling effects are more important if the particle has a small mass or if the barrier is narrower. This is the reason why tunneling is important when a light particle (for instance, a proton) is being transferred between donor and acceptor. The width at the top of the barrier in VMEP is determined by the magnitude of the imaginary frequency at the transition state, and it is sometimes assumed that a large imaginary frequency indicates a narrower barrier and, as a consequence, more tunneling. However, VMEP ðsÞ is not the effective barrier for tunneling, but as described above, the adiabatic barrier should be used. Complete description of the adiabatic tunneling probabilities requires definition of the effective mass in Eq. [151], which we discuss next. The adiabatic prescription presented above may appear to be a onedimensional approach, because the adiabatic potential is a function of the reaction coordinate s only. However, the reaction path is a curvilinear coordinate and the curvature of the path couples motion along the reaction coordinate to local vibrational modes that are perpendicular to it. The coupling enters into the Hamiltonian for the system through the kinetic energy term and leads to a negative internal centrifugal effect that moves the tunneling path to the concave side of the reaction path. In other words, as also concluded above from a different perspective, the coupling causes the system to ‘‘cut the corner’’ and tunnel through a shorter path than the reaction coordinate.7,52,102,107,113–119 The effect of the coupling is to shorten the tunneling path (relative to the reaction path), decreasing the tunneling integral in Eq. [151] and thereby increasing the tunneling probabilities. Neglecting the coupling in evaluating the tunneling is known as the zero-curvature tunneling (ZCT) approximation. In this case, the tunneling path is the reaction path and the effective mass simplifies to meff ðsÞ ¼ m. The ZCT method has the drawback that tunneling is usually seriously underestimated.54 Marcus and Coltrin115 showed that the effect of the reaction path curvature was to give an optimum tunneling path for the collinear H þ H2 reaction that is the path of concave-side turning points for the stretch vibration orthogonal to the reaction coordinate. If we define dx as the arc length along this new tunneling path, then the effective mass in Eq. [151] is given in terms of the s-dependent Jacobian factor dx/ds by meff ¼ m ðdx=dsÞ2 . The small-curvature tunneling (SCT) method was developed to extend this approach to threedimensional polyatomic reactions and to eliminate problems with the Jacobian becoming unphysical.1,116 In this approach, an approximate expression for dx=ds is written in terms of the curvature components coupling the reaction path to the vibrational modes and the vibrational turning points.3,116 The coupling between the reaction coordinate and a mode m perpendicular to it is given by a curvature component defined by65 BmF ¼ ½signðsÞ

3N X d^ ni ðsÞ i¼1

ds

LGT i;m ðsÞ

½164

170

Variational Transition State Theory

^i is component i of the unit vector perpendicular to the generalized where n transition-state dividing surface at s and LGT i;m is component i of the eigenvector ^ at s. If the reaction path is the MEP, for vibrational mode m perpendicular to n then ^ ¼ vðsÞ n

½165

where v(s) is the unit vector tangent to the MEP at s as defined in Eq. [44]. If ^ðsÞ is defined by the procedure in the subthe reaction path is the VRP, then n section ‘‘Variational Reaction Path Algorithm’’. Note that in either case, the sign of the unit vector is chosen to be opposite or approximately opposite the gradient vector. The modulus of these F  1 couplings corresponds to the curvature along the reaction path:



(

F1 X

m¼1

2

½BmF ðsÞ

)1=2

½166

To evaluate the turning points we make the independent normal mode approximation, where the potential Vm ðs; Qm Þ in mode m at s along the reaction coordinate is given by Eq. [118]. The turning point for vibrational state nm in this mode is obtained by solving the equation: Vm ½s; Qm ¼ tm ðnm ; sÞ ¼ EGT vib;m ðnm ; sÞ

½167

The sign of BmF depend on the phase assigned to the vector LGT m . This is not an issue for harmonic calculations because in such calculations it always enters quadratically. However, for calculations of anharmonic turning points, as in ^ chosen as staEq. [167], we must make the physical choice. With the sign of n ted after eq. [165], we choose the turning point so that BmF Qm< 0, which insures that the turning point is on the concave side. In the harmonic approximation, the vibrational turning point of mode m is given by the expression   ð2nm þ 1Þh 1=2 tm ðnm ; sÞ ¼ mom ðsÞ

½168

The latest version of the SCT method is limited to treatment of tunneling for the ground vibrational state with harmonic treatment of vibrations. In this case, we use the shorthand notation tm ðnm ¼ 0; sÞ ¼ tm ðsÞ for the ground-state turning points. In the original SCT method, we assumed that all modes were extended to their turning points along the tunneling path, and this led to unphysically large

Quantum Effects on Reaction Coordinate Motion

171

tunneling correction factors for reactions with many vibrational modes coupled to the reaction coordinate motion. The final version of SCT, called the centrifugal-dominant small-curvature approximation in the original publication,3 assumes that the corner cutting occurs in the direction along the vector of coupling components BF ðsÞ in the space of the local vibrational coordinates Q. We make a local rotation of the vibrational axes so that BF ðsÞ lies along one of the axes, u1 , and by construction, the curvature coupling in all other vibrational coordinates, ui , i ¼ 2 to F  1, are zero in this coordinate system. The effective harmonic potential for the u1 vibrational mode is written as 1 oðsÞ2 u21 V ¼ VMEP ðsÞ þ m½ 2

½169

where the harmonic frequency for this motion is given by ¼ o

2 !12 F1  X Bm;F ðsÞ om ðsÞ kðsÞ m¼1

½170

The turning point t for zero-point motion in this harmonic potential takes the form 1  2 h t ¼ ¼ m oðsÞ 

!14  F1  X Bm;F ðsÞ 2 4 ½tm ðsÞ kðsÞ m¼1

½171

The Jacobian factor dx/ds for the path defined by these turning points is expressed in terms of the curvature and turning points by

where

o12 n dx=ds ¼ ½1   aðsÞ2 þ ðdt=dsÞ2

½172

 a ¼ jkðsÞ tðsÞj

½173

This expression has a singularity when the turning point is equal to the radius of curvature and is unphysical for values that are larger; i.e., t 1=k. The problem can be solved by using an exponential form, in which case the effective mass for the SCT method is written as   n exp 2 aðsÞ  ½ aðsÞ2 þ ðdt=dsÞ2 =m ¼ min mSC eff 1

½174

From the above expression, it is clear that mSC eff  m, and therefore, the transmission coefficients obtained by the small-curvature approximation are always equal to or larger than the zero-curvature transmission factors. As shown, if

172

Variational Transition State Theory

the curvature along the reaction path is small or intermediate, it is possible to treat tunneling, without explicit evaluation of the tunneling path, by using an effective mass, which is a function of the reaction path curvature.

Large Curvature Transmission Coefficient The SCT method is appropriate for use in reactions with small reaction path curvature. For systems with intermediate to large tunneling, the largecurvature tunneling methods1,3,7,91,105–110,120,121 have been developed that build on the adiabatic approach, but they go beyond it to include important features affecting tunneling in large-curvature systems. The first important feature is that the tunneling paths are straight-line paths that connect the reactant and the product valleys of the reaction. A straight-line path is the shortest possible path between turning points in the reactant and product valleys, but the effective potential along this path is no longer the adiabatic potential and it can have a maximum that is larger than the adiabatic barrier maximum. Shortening the path decreases the tunneling integral, thus increasing the tunneling probability, while increasing the potential does the opposite. The optimal tunneling paths for large-curvature systems are often straight-line paths because the effect of shortening the tunneling path dominates for these systems. The second important feature is nonadiabatic tunneling, which is the possibility of tunneling into excited states for exoergic reactions or the possibility of tunneling from excited states for endoergic reactions. Finally, the straight-line tunneling paths go through regions of the PES, which are far from the MEP. We call this region the reaction swath. In this section, we start by describing the large-curvature tunneling method for systems dominated by tunneling from/to the ground vibrational states of reactants/products. We then describe how vibrationally excited states are included in the calculations and the general procedure to evaluate the LCG4 tunneling probabilities.110 Finally, we describe how to carry out these calculations by sampling the reaction swath efficiently.120,121 At this point, it may be helpful to make some comments about how excited states enter the tunneling calculations. First consider the zero-curvature approximation. Here both the transverse vibrational and the rotational quantum numbers are conserved in the tunneling region, and the process is vibrationally adiabatic.54 Next consider the small-curvature approximation. This is not really adiabatic because the tunneling path is affected by reaction path curvature, which is a manifestation of coupling to transverse modes.116 Nevertheless, when we calculate a ground-state-to-ground-state process by the SCT approximation, we do not actually assume that the reactants and products are in the ground states.122 What we assume is that the system tunnels in the ground level of the quantized transition state.123 Outside the tunneling region, the transverse quantum numbers may be vibrationally adiabatic, and probably they are vibrationally nonadiabatic whenever there are

Quantum Effects on Reaction Coordinate Motion

173

low-frequency modes; in addition, the process is probably usually rotationally nonadiabatic.122,123 But in the dynamical bottleneck region where tunneling occurs, the transverse modes conserve their quantum number, or at least they are assumed to do so. Next consider the large-curvature approximation. Here one cannot even assume that the transverse quantum numbers of high-frequency modes are conserved even during the tunneling process itself.124 One cannot describe the wave function in the strong-interaction region, where tunneling occurs, in terms of asymptotic or adiabatic modes; instead one uses a diabatic representation in which all nonadiabaticity is associated with a single diabatic mode, which correlates more than one asymptotic mode of the product. This yields a recipe for calculating a realistic tunneling probability. To explain the algorithm, we will first consider the case where all quantum numbers are considered, even for this one diabatic mode; this case is treated in the Subsection below. Then we consider the case where tunneling proceeds in part into vibrationally excited levels of the product. Large-Curvature Tunneling Without Vibrational Excitations As stated, the large-curvature tunneling (LCT) methods use the groundstate vibrationally adiabatic potential to define classical reaction-coordinate turning points for a total energy E by inverting the equation VaG ðsi Þ ¼ E; i ¼ 0; 1

½175

to obtain s0 ðEÞ and s1 ðEÞ, which are the turning points in the reactant and product valleys, respectively. One major departure from the adiabatic theory is that tunneling at total energy E is not initiated just from the reactant classical turning points at s0 ðEÞ, but it occurs all along the entrance channel up to the turning point. Another departure is that tunneling occurs along straightline tunneling paths connecting the reactant and product valleys, rather than the curvilinear path defined by the reaction path, vibrational turning points, and curvature couplings. Finally, tunneling is assumed to be initiated by vibrational motions perpendicular to the reaction coordinate rather than motion along the reaction coordinate. The end points of the tunneling paths in the reactant and product valleys are defined as ~s0 and ~s1 , and they obey the resonance condition VaG ð~s0 Þ ¼ VaG ð~s1 Þ

½176

This expression provides a relationship between ~s0 and ~s1 so that either one or the other is an independent variable. Unless stated otherwise, we use ~s0 as the independent variable, and when ~s1 appears, its dependence on ~s0 is implicit. The tunneling path is a straight-line path in mass-scaled Cartesian coordinates defined by ^ð~s0 Þ xðx; ~s0 Þ ¼ xRP ð~s0 Þ þ x g

½177

174

Variational Transition State Theory

where x denotes the progress variable along the linear path. The unit vector along the tunneling path is defined by ^ð~s0 Þ ¼ g

xRP ð~s1 Þ  xRP ð~s0 Þ xP

½178

where xRP ð~s0 Þ and xRP ð~s1 Þ are mass-weighted Cartesian coordinates at the termini of the tunneling path, which lie on the reaction path at ~s0 and ~s1 , respectively, and xP is the length of the path xP ¼ jxRP ð~s1 Þ  xRP ð~s0 Þj

½179

so that x equals the distance from xRP ð~s0 Þ along the path. For simplicity of notation, we do not explicitly show the dependence of xP on ~s0 . To avoid confusion with coordinates along the straight-line tunneling paths, xðx; ~s0 Þ, we use the notation xRP ðsÞ to denote mass-weighted Cartesian coordinates along the reaction path. The reaction path can be either the MEP or the variational reaction path. The total tunneling amplitude along the incoming trajectory at energy E includes contributions from all tunneling paths initiated in the reactant valley T0 ðEÞ ¼

ð s0 ðEÞ 1

    ^ð~s0 Þ d~s0 v1 s0 Þs1 ð~s0 ÞTtun ð~s0 Þsin w½~s0 ; g R ðE; ~

½180

The tunneling amplitude Ttun ð~s0 Þ is weighted by the classical probability density d~s0 =vR ðE; ~s0 Þ, which is proportional to the time spent between ~s0 and ~s0 þ d~s0 , by the number of collisions per unit time with the vibrational turning ^ð~s0 Þ point in the tunneling direction, t1 ð~s0 Þ, and by the sine of the angle w½~s0 ; g ^ð~s0 Þ, which is a meabetween the vector tangent to the reaction path at ~s0 and g sure of how effectively the perpendicular vibrations initiate motion along the tunneling path. Tunneling can occur during the incoming and outgoing trajectory, so the total tunneling amplitude should be 2T0 ðEÞ. However, to enforce microscopic reversibility, the total tunneling amplitude is given by TðEÞ ¼ T0 ðEÞ þ T1 ðEÞ

½181

where T1 ðEÞ is the tunneling amplitude for the outgoing trajectory in the product channel. The expression for T1 ðEÞ is similar to Eq. [180] except that we use ~s1 as the independent variable instead of ~s0 and the quantities ^ð~s1 Þ are evaluated at locations along the reaction vR ðE; ~s1 Þ; s1 ð~s1 Þ, and w½~s1 ; g path in the product channel. The integrals in Eq. [180] and the analogous equation for T1 ðEÞ extend out to s ¼ 1, but quantities along the reaction coordinate needed to evaluate the integrand are available on a grid that extends to finite values of s. Calculations of the tunneling amplitudes need to be converged with respect to the limits of the grid.

Quantum Effects on Reaction Coordinate Motion

175

The local velocity for a point ~si in the reactant channel (i ¼ 0) or product channel (i ¼ 1) is given by vR ðE; ~si Þ ¼

 h i12 2 ; i ¼ 0; 1 E  VaG ð~si Þ m

½182

^ð~si Þ ^ð~si Þ between the unit vector g The general expression for the angle w½s; g and the unit vector tangent to the reaction path at s is ^ð~si Þ  ^ð~si Þ ¼ g cos w½s; g

dxRP =ds ; i ¼ 0; 1 jdxRP =dsj

½183

^ð~si Þ, which is needed in the expressions for T0 ðEÞ and T1 ðEÞ, is where w½~si ; g obtained by evaluating this expression at s ¼ ~si . The vibrational period sð~si Þ is evaluated for the effective vibrational potential along the tunneling path. This effective potential is obtained by projecting the tunneling path onto the (F  1) vibrational modes perpendicular to the reaction path at ~s0 and computing the potential along this projected straight-line path. In the harmonic approximation, the vibrational period reduces to sð~si Þ ¼

2p ; i ¼ 0; 1 o? ð~si Þ

½184

where the harmonic frequency is expressed as

o? ð~si Þ ¼

(

F1 X

m¼1

½om ð~si Þqm ð~si Þ

2

)1=2

; i ¼ 0; 1

½185

and the components of unit vector along the projected path are given by qm ð~si Þ ¼ 

F1 P

m0 ¼1

^ð~si Þ  LGT g si Þ m ð~ 

2 ^ð~si Þ  LGT g si Þ m ð~

12 ; i ¼ 0; 1

½186

where the eigenvectors are defined in Eq. [92]. Again, the sign of qm depends on the ‘‘sign’’ of the vector LGT m , but it is not an issue because we use the harmonic approximation. The tunneling amplitude for each straight-line path is approximated using a primitive semiclassical expression Ttun ð~s0 Þ ¼ Ttun ð~s1 Þ ¼ exp½yð~s0 Þ; i ¼ 0; 1

½187

176

Variational Transition State Theory

where the action integral along the linear path is

yð~s0 Þ ¼

1 ð n o12 xI ð2mÞ2 ^ð~s0 Þ dx VaG ½sI ðx; ~s0 Þ  VaG ð~s0 Þ cos w½sI ðx; ~s0 Þ; g h  0 ð xIII  II 1 dx Veff ðx; ~s0 Þ  VaG ð~s0 Þ 2 þ

xI

þ

ð xP

xIII

n o12 h i G G ^ð~s0 Þ dx Va ½sIII ðx; ~s0 Þ  Va ð~s0 Þ cos w sIII ðx; ~s0 Þ; g

½188

where for simplicity the dependence of the integration limits on ~s0 are not explicitly shown. The intervals [0, xI] and [xIII, xP] along the tunneling path indicate the reactants region (labeled as I) and the products region (labeled as III), respectively. Regions I and III are called adiabatic because contributions to the action integral can be constructed from the information along the reaction path and the adiabatic potential. In these adiabatic regions, the system tunnels through the adiabatic barrier and the tunneling direction is along the reaction coordinate. Therefore, the contribution to the action integral in these regions is weighted by projections of the tunneling path along the reaction path, which are given by the cos w factors. In the nonadiabatic region [xI, xIII], the tunneling is along the straight-line tunneling path and uses an effective potential, which is described below, in calculation of the contribution from this region to the action integral. The vibrational adiabatic potential that enters Eq. [188] requires determination of s for geometry xðx; ~s0 Þ along the tunneling path. The value of s is defined such that the vector between the geometry along the reaction path xRP ðsÞ and the geometry along the linear tunneling path xðx; ~s0 Þ is perpendicular to the gradient at that s value:

½xðx; ~s0 Þ  xRP ðsÞ 

dxRP ¼0 ds

½189

However, this equation may have multiple solutions. We are interested in two sets of solutions that make s a continuous function of x. The first solution sI ðx; ~s0 Þ is obtained by starting in reactants with x ¼ 0, where sI ðx ¼ 0; ~s0 Þ ¼ ~s0 , and then performing a root search for s at x, with sI ðx ¼ 0; ~s0 Þ as the initial guess for the root search. The procedure is iterated for x þ x using sI ðx; ~s0 Þ as the initial guess for the root search to construct a single-valued and continuous function sI ðx; ~s0 Þ. A second solution sIII ðx; ~s0 Þ is found by starting in products with x ¼ xP , where sIII ðx ¼ xP ; ~s0 Þ ¼ ~s1 and iteratively decreasing x to find a solution starting from the product channel. Once the value of s is

Quantum Effects on Reaction Coordinate Motion

177

found, it is possible to define the generalized normal mode coordinates Qm ½si ðx; ~s0 Þ; i ¼ I or III, by the relation s0 Þ; i ¼ I or III Qm ½si ðx; ~s0 Þ ¼ fxðx; ~s0 Þ  xRP ½si ðx; ~s0 Þg  LGT m ½si ðx; ~

½190

and therefore, at every point along the linear path located in regions I or III, it is possible to assign a unique set of local normal modes. Next we discuss how the boundaries between the adiabatic and nonadiabatic regions are determined. We begin by defining a zeroth-order estimate of the boundaries on the reactant side, x0I . A given geometry xðx; ~s0 Þ lies within this boundary (i.e., x < x0I ) if all three of the following conditions are met: (1) The value of sI ðx; ~s0 Þ calculated by Eq. [189] has to be smaller than ~s1 : sI ðx; ~s0 Þ < ~s1 for x < x0I

½191

(2) All generalized normal mode coordinates are within their vibrational turning points         ½192 Qm ½sI ðx; ~s0 Þ  tm ½sI ðx; ~s0 Þ for x < x0I

where the turning points are defined in Eq. [167] but taking nm ¼ 0. (3) The geometry xðx; ~s0 Þ lies within a single-valued region of the curvilinear coordinates; i.e., 

F1 X

m¼1

BmF ½sI ðx; ~s0 ÞQm ½sI ðx; ~s0 Þ < 1 for x < x0I

½193

where the curvature components are defined in Eq. [164]. Note that LGT m occurs in the definition of both BmF and Qm so the sign cancels out and we don’t have to worry about it here. Similarly, we define a zeroth-order estimate of boundaries on the product side, x0III , by the conditions: sIII ðx; ~s0 Þ > ~s0 for x > x0III     Qm ½sIII ðx; ~s0 Þ tm ½sIII ðx; ~s0 Þ for x > x0

III



F1 X

m¼1

BmF ½sIII ðx; ~s0 ÞQm ½sIII ðx; ~s0 Þ < 1 for x < x0III

½194

½195

½196

The values of the zeroth-order boundaries are now used to determine the boundaries, xI and xIII, in Eq. [188]. Two cases can arise, x0I < x0III , in which the effective potential in Eq. [188] needs to be specified for the nonadiabatic region, and x0I x0III , in which the adiabatic regions overlap and the

178

Variational Transition State Theory

nonadiabatic region does not exist. We discuss the latter case first. When the adiabatic regions overlap, the adiabatic potential in the interval ½xIII ; xI  is calculated as o n ½197 min VaG ½sI ðx; ~s0 Þ; VaG ½sIII ðx; ~s0 Þ For the case x0I < x0III , we define a zeroth-order effective potential for region II II;0 I ðx; ~s0 Þ ¼ V½xðx; ~s0 Þ þ Vcorr ðx0I ; ~s0 Þ Veff i x  x0I h III 0 0 I ~ ~ ðx ; s Þ  V ðx ; s Þ þ 0 V 0 0 III I corr corr xIII  x0I

½198

where the first term is the actual potential along the straight-line tunneling path. The other terms correct for zero-point energy in modes that are within their turning points at the boundaries. Within the harmonic approximation, they are given by i ðx0i ; ~s0 Þ ¼ Vcorr

F1 h i 1X ; i ¼ I or III hom ðsÞ  mo2m ðsÞQ2m ðsÞ  s¼si ðx0i ;~s0 Þ 2 m¼1

½199

This zeroth-order effective potential is not guaranteed to match up smoothly with the adiabatic potential at the boundaries. To correct for this deficiency, another requirement is added to the three conditions above, namely, (4) the adiabatic potential should be greater than or equal to the zeroth-order effective potential at the boundary. The boundaries xI and xIII of the nonadiabatic region (labeled as II in Figure 4) are thus defined by  II;0 0 ðxi ; ~s0 Þ; i ¼ I or III xi ¼ x0i if VaG si ðx0i ; ~s0 Þ Veff

½200

otherwise the value of xi is defined implicitly by extending the nonadiabatic region until II;0 ðxi ; ~s0 Þ VaG ½si ðxi ; ~s0 Þ ¼ Veff  II;0 0 0 G for Va si ðxi ; ~s0 Þ < Veff ðxi ; ~s0 Þ; i ¼ I or III

½201

For the case where the adiabatic potential is larger than the effective potential, another correction is made to the effective potential. The difference in energy between the boundaries is due to anharmonicity, and therefore, we introduce nonquadratic corrections of the type II;0 i ð~s0 Þ ¼ VaG ½si ðxi ; ~s0 Þ  Veff ðxi ; ~s0 Þ; i ¼ I or III Vanh

½202

Quantum Effects on Reaction Coordinate Motion

179

Figure 4 Effective potential contour plot of a reaction that illustrates some features of the LCG4 method for the evaluation of a linear path at a given tunneling energy. The linear path has a length xP between the two classical turning points ~s0 and ~s1 , and here we consider np ¼ 0. The adiabatic region in the reactant side is labeled as I, the nonadiabatic LCG3 region is labeled as II, the nonadiabatic region that includes the condition given by Eq. [201] is labeled as II*, and the adiabatic region in the product side is labeled as III. The boundaries of the adiabatic region are indicated by a dotted line. The boundaries between the adiabatic and the nonadiabatic regions for the plotted linear path are zoomed in the squares labeled as (a) and (b). In the reactants side, II;0 we consider the case in which VaG ½sI ðx0I ; ~s0 Þ > Veff ðxI ; ~s0 Þ, and in the products side, the II;0 0 opposite case is considered; i.e., VaG ½sIII ðxIII ; ~s0 Þ > Veff ðxIII ; ~s0 Þ.

for the reactant channel (i ¼ I) and for the product channel (i ¼ III). With this correction, the effective potential is given by II I I Veff ðx; ~s0 Þ ¼ V½xðx; ~s0 Þ þ Vcorr ðxI ; ~s0 Þ þ Vanh ð~s0 Þ i h x  xI III I III I þ ðxIII ; ~s0 Þ  Vcorr ðxI ; ~s0 Þ þ Vanh ð~s0 Þ  Vanh ð~s0 Þ Vcorr xIII  xI ½203

Using the original boundaries x0I and x0III and zeroth-order effective potential II;0 Veff ðx; ~s0 Þ and not imposing the addition condition (4) results in the LCG3 method,3 whereas the use of the improved boundaries xI and xIII and effective II ðx; ~s0 Þ results in the LCG4 method.110 potential Veff

180

Variational Transition State Theory

The tunneling amplitude T(E) accounts for tunneling initiated by vibrational motion perpendicular to the reaction coordinate along the incoming and outgoing trajectories. There is also the probability that motion along the reaction coordinate can initiate tunneling at the classical turning point s0 for the reaction coordinate motion. The amplitude for this tunneling contribution is ^½s0 ðEÞg and for the reverse direction is expfy½s0 ðEÞg cos wfs0 ðEÞ; g ^½s0 ðEÞg. The total probability then becomes expfy½s0 ðEÞg cos wfs1 ðEÞ; g 2 PLCG4 prim ðEÞ ¼ jTðEÞj   ^½s0 ðEÞg þ cos wfs1 ðEÞ; g ^½s0 ðEÞg 2 cos wfs0 ðEÞ; g þ 2

expf2y½s0 ðEÞg

½204

This primitive probability can be greater than one because of the integration of the amplitudes over the incoming and outgoing trajectories. Within the uniform semiclassical approximation, the probability should go to 1=2 at the barrier maximum and we enforce this by the uniform expression in Eq. [205] for E  VaAG .109 9 8 h i1 > > LCG4 AG = < P ðV Þ 1 a 1 prim 1 LCG4 ðEÞ PLCG4 ðEÞ ¼ 1 þ P h i1 prim LCG4 AG Þ > > 2 P ðV ; 1 þ PLCG4 ðEÞ : a prim prim ½205

This expression reduces to the primitive probability PLCG4 prim when it is sufficiently small and goes to 1=2 at the barrier maximum, VaAG . We use an expression analogous to Eq. [153] to extend the uniform probabilities to energies above the barrier. Large-Curvature Tunneling with Vibrational Excitations As we mentioned, exoergic reactions can have tunneling into excited states and endoergic reactions can have tunneling from excited states. To simplify the description of the LCG4 tunneling method, we only consider calculations of the tunneling correction factor for the exoergic direction. However, we construct the tunneling correction factor to obey detailed balance, so the tunneling correction factor for the endoergic reaction is the same. Tunneling is assumed to populate excited states of a single receptor mode p in the product channel. The p mode is a linear combination of the generalized transition-state vibrational modes along the reaction coordinate. We provide a description of how this mode is defined below. The primitive probability is obtained by summing over final states with the vibrational quantum number np of the LCG4 receptor vibrational mode PLCG4 nmax ðEÞ ¼

nmax X

np ¼0

PLCG4 prim ðE; np Þ

½206

Quantum Effects on Reaction Coordinate Motion

181

where nmax is the maximum value of np for which the primitive probabilities are included in the sum. PLCG4 nmax ðEÞ is calculated for values of nmax from 0 to Npmax ðEÞ, which is defined below, and used in an expression similar to Eq. [205] to obtain a uniform expression for each nmax. Although PLCG4 nmax ðEÞ increases monotonically with increasing nmax, the uniform expression may not, and so we choose the value of nmax that gives the maximum value: 9 h  AG i1 > LCG4 = 1 1 Pnmax Va LCG4   ðEÞ P PLCG4 ðEÞ ¼ max 1 þ nmax AG nmax > > 2 PLCG4 ; : nmax Va 8 >
) that correspond to a particular tunneling energy, in both the reactant and the product sides, are called classical turning points of reaction-coordinate motion, and they correspond to the limits of integration of Eq. [151]. The longest, but energetically more favorable path, is the MEP [labeled as (a) in Figure 5a], whereas the shortest path, but with the highest energy, corresponds to the straight-line path [labeled as (d) in Figure 5a]. In between there are an infinite number of paths that connect reactants to products at that particular tunneling energy (among them is the SC path, which is labeled as (b) in Figure 5a). Among all the possible paths, we have to find the one that has the largest tunneling probability, which is equivalent to finding the path that, for the correct boundary conditions, minimizes the action [labeled as (c)], i.e., the so-called least-action path (LAP).102,124,126,127 Tunneling calculations based on the LAP are called leastaction tunneling (LAT). Some approximate methods try to find the LAP without its explicit evaluation,128 because the search for the LAP is often unaffordable or not worth the cost for polyatomic systems. One way to circumvent this problem is to evaluate the probability along the straightline path, which is the kind of path3,108,129–131 that dominates in the largecurvature limit and is usually called the large-curvature path (LCP). We can compute both the SCT and the LCT (the T in the acronym stands for tunneling) probabilities, the first being accurate for small-to-intermediate curvature, whereas the second is accurate for intermediate-to-large curvature (and also often reasonably accurate even for small-curvature). As the objective is to find the tunneling mechanism with the largest tunneling probability, an alternative to searching for the LAP is to choose between the maximum of the SCT and LCT probabilities. This new probability is called the microcanonically optimized multidimensional tunneling probability, PmOMT , and it is given by91 P

mOMT

max ¼ E



PSCT ðEÞ PLCT ðEÞ

½238

It has been shown that the mOMT transmission coefficients are comparable in accuracy with the LAT transmission coefficients for atom–diatom reactions.103 Often we just say OMT without including the microcanonical specification in the algorithm (OMT can also mean canonical OMT in which we first thermally average the SCT and LCT probabilities and then choose the larger transmission coefficient). The resulting VTST/OMT rate constants

190

Variational Transition State Theory

have been tested carefully against accurate quantum dynamics,103,111,112 and the accuracy has been found to be very good. Sometimes we just say VTST/MT. The MT acronym (‘‘multi-dimensional tunneling’’) can denote ZCT, SCT, LCT, or OMT, all of which are multidimensional, but we usually use SCT or OMT when we carry out MT calculations.

BUILDING THE PES FROM ELECTRONIC STRUCTURE CALCULATION For the vast majority of chemically interesting systems, a potential energy surface (PES) is not available. When this is the case, there are two options: Create an analytic potential energy function (PEF), or use direct dynamics.62,118,132 The traditional route of creating an analytic high-level PEF requires considerable data (from electronic structure or experiment) and human development time. A new method called multiconfiguration molecular mechanics (MCMM),133–135 which allows more straightforward creation of a PES from limited data, has recently been developed and is described below. For small to moderately sized systems where electronic structure gradients and Hessians are not overly expensive, direct dynamics is typically the method of choice. Direct dynamics has been defined as ‘‘the calculation of rates or other dynamical observables directly from electronic structure information, without the intermediacy of fitting the electronic energies in the form of a potential energy function.’’132 In this method, information about the PES is calculated by electronic structure methods as it is needed, i.e., ‘‘on the fly.’’ For example, consider the calculation of the MEP using the steepest descent method. A Hessian calculation is done by electronic structure theory at the saddle point, and a step is taken in the direction of the imaginary frequency. At this new geometry, a gradient is requested, which is then calculated using electronic structure theory. That information is passed back to the MEP calculation, a step is taken in the direction of the gradient, and once again a gradient is requested at the new geometry. This iterative process continues until the MEP reaches the desired length, and then it is repeated for the other side of the MEP. In a CVT calculation, Hessians must also be calculated at several points along the path to determine the vibrationally adiabatic ground-state potential energy curve and free energy of activation profile for each value of s. Achieving chemical accuracy by electronic structure calculations is computationally expensive, and the time required calculating a rate constant is governed almost entirely by the time spent calculating the gradients and Hessians. In addition, the accuracy of the rate constant depends on the accuracy of the electronic structure method. Therefore, the user must make judicious decisions about the length of the MEP, how often Hessians are calculated, whether to use options like LCT that require extra information about the PES, and which electronic structure method to use.

Building the PES from Electronic Structure Calculation

191

Direct dynamics calculations can be carried out by interfacing an electronic structure package with POLYRATE, and several such interfaces are available, including MORATE,4,109,136 GAUSSRATE,137 GAMESSPLUSRATE,138 MULTILEVELRATE,139 MC-TINKERATE,140 and CHARMMRATE.141 A key point to be emphasized here is that using so-called ‘‘straight direct dynamics’’ may not be the most efficient approach.142 In straight direct dynamics, whenever the dynamical algorithm requires a potential energy, a gradient, or a Hessian, it is calculated by a full electronic structure calculation. Such algorithmic purity provides one extreme on the spectrum that spans the range from straight direct dynamics to fitting a global potential energy function. However, there are several intermediate possibilities in this spectrum, corresponding to more economical ways of combining electronic structure theory and dynamics. As these algorithmic possibilities are fleshed out, it is not always possible to distinguish whether a calculation should be classified as fitting, as local interpolation (a form of direct dynamics), or as direct.143 In fact, such classification is less important than the ability of the algorithm to reduce the cost for given level of accuracy and size of system, to allow for a given level of accuracy to be applied with affordable cost to larger systems, or to allow more complete dynamical treatments such as large-curvature tunneling, a more expensive treatment of anharmonicity, or a trajectory-based estimate of recrossing. This section will consider interpolation schemes as well as straight direct dynamics.

Direct Dynamics with Specific Reaction Parameters Direct dynamics with specific reaction parameters (SRPs)132 involves the use of an electronic structure method that has been adjusted to reproduce important data for a specific reaction, followed by determining the reaction rate using direct dynamics. The adjusted method is typically parameterized to agree with the correct forward barrier height and possibly also with one experimental or high-level energy of reaction, but it may actually be parameterized for any property that is important for the specific reaction, for example, the potential energy profile along the reaction path.144 When using experimental data, the barrier height is sometimes approximated by the activation energy, although this is not recommended because they may differ by several kcal/mol. High-level frequency calculations may be carried out for reactants and products, yielding an approximation to their zero-point energy and heat capacity, and these data may be used in combination with the experimental enthalpy of reaction to calculate a good approximation to estimate the Born–Oppenheimer energy of reaction. Alternatively, the barrier height and energy of reaction may be calculated from high-level electronic structure methods, such as a correlated wave function theory or density functional theory. Unfortunately such calculations, although often affordable for stationary points, may become prohibitively expensive

192

Variational Transition State Theory

for direct dynamics due to the large number of gradients and Hessians required. In the original application,132 the SRP method was applied to the following reaction: 

Cl ðH2 OÞn þ CH3 Cl0 ! CH3 Cl þ Cl0 ðH2 OÞn ;

n ¼ 0; 1; or 2

½239

In this particular example, a neglect of the diatomic differential overlap (NDDO)145,146 method was created based on semiempirical molecular orbital theory, namely AM1.147,148 The resulting method was referred to as NDDOX , SRP. The adjusted parameters were the one-center, one-electron energies, Umm which were adjusted to achieve the correct electron affinity for Cl and the correct barrier height for the n ¼ 0 reaction. The NDDO-SRP rate constants were compared with those calculated using an accurate PES; the errors for the CVT/ SCT rate constants ranged from 39% at 200 K to 30% at 1000 K for the unsolvated complex. When considering the enormous amount of time required to create an accurate PES compared with the relatively fast SRP direct dynamics calculation, these results are very encouraging. The method also gave good results for the solvated reactions of Eq. [239], where n ¼ 1 and n ¼ 2.

Interpolated VTST MCMM Multiconfigurational molecular mechanics (MCMM)133–135 is an algorithm that approximates a global PES by combining molecular mechanics (MM) with a limited number of energies, gradients, and Hessians based on quantum mechanics. (This is a special case of a dual-level strategy in which one combines a lower and a higher level.) MCMM is an extension of conventional MM (which is only applicable to nonreactive systems) to describe reaction potential energy surfaces. It extends the empirical valence bond method149 so that it becomes a systematically improvable fitting scheme. This is accomplished by combining the rectilinear Taylor series method of Chang, Minichino, and Miller150,151 for estimating V12 in the local region around a given geometry with the use of redundant internal coordinates71,72 for the low-order expansion of the PES and the Shepard interpolation method.152,153 The key to MCMM is the limited number of high-level quantum mechanical data required, because, whether or not one uses interpolation or MCMM, the vast majority of time required to calculate a rate constant is consumed by the electronic structure calculations. It has been shown that potential energy surfaces created using 13 or fewer Hessians can yield accurate rate constants.134 Even greater efficiency can be achieved if one is certain that large-curvature tunneling paths need not be explored and/or if one uses partial high-level Hessians.135

Building the PES from Electronic Structure Calculation

193

In MCMM, the Born–Oppenheimer PES is estimated as being the lowest eigenvalue of the 2 2 potential matrix V:    V11  V V12   ¼0 ½240  V12 V22  V 

where V11 corresponds to the molecular mechanics potential function associated with the well on the reactant side, V22 corresponds to the molecular mechanics potential function associated with the well on the product side, and V12 corresponds to resonance energy function or resonance integral. The lowest eigenvalue VðqÞ of the matrix in Eq. [240] at a given geometry, q, is given by  h i1  1 2 2 2 ðV11 ðqÞ þ V22 ðqÞÞ  ðV11 ðqÞ þ V22 ðqÞÞ þ 4V12 ðqÞ ½241 VðqÞ ¼ 2 where V11 and V22 are calculated by molecular mechanics using the connectivity of reactants and products, respectively, and where q denotes either the R or x coordinate set of Eq. [36] or a set of valence internal coordinates,28,68–72 such as stretch, bend, and torsion coordinates. Therefore V12 ðqÞ2 ¼ ½V11 ðqÞ  VðqÞ½V22 ðqÞ  VðqÞ

½242

Using a suitable quantum mechanical electronic structure method, the energy, gradient, and Hessian can be calculated at an arbitrary geometry, qðkÞ , which is called an interpolation point or a Shepard point. Near qðkÞ , V11 ðqÞ, VðqÞ, and V22 ðqÞ may be expanded as a Taylor series, yielding y y 1 Vðq; kÞ ffi V ðkÞ þ gðkÞ  qðkÞ þ qðkÞ  f ðkÞ  qðkÞ 2

½243

qðkÞ ¼ q  qðkÞ

½244

where

In Eq. [243], V ðkÞ , gðkÞ , and f ðkÞ are the energy, gradient, and Hessian, respectively, at reference point qðkÞ . The diagonal elements of Vnn can be expanded around reference point qðkÞ , yielding

where

y 1 Vnn ðq; kÞ ffi VnðkÞ þ gnðkÞ  qðkÞ þ qyðkÞ  f ðkÞ  qðkÞ 2

VnðkÞ ¼ Vnn ðqðkÞ Þ; gðkÞ n ¼



qVnn qq



q¼qðkÞ

; f ðkÞ n

¼

q2 Vnn qqqq

!

q¼qðkÞ

½245

½246

194

Variational Transition State Theory

Substituting these expressions into Eq. [242] yields an analytic expression for V12 ðqÞ in the vicinity of reference point qðkÞ , given by







y ðkÞ ðkÞ ðkÞ ðkÞ V12 ðq;kÞ2 ffi V1 V ðkÞ V2 V ðkÞ þ V2 V ðkÞ g1 gðkÞ qðkÞ



y

1 ðkÞ ðkÞ ðkÞ ðkÞ þ V1 V ðkÞ g2 gðkÞ qðkÞ þ V2 V ðkÞ qyðkÞ f 1 f ðkÞ 





2 y ðkÞ ðkÞ ðkÞ ðkÞ ðkÞ yðkÞ ðkÞ ðkÞ ðkÞ ðkÞ 1 q þ V1 V f 2 f q q q þ g1 g 2 

 y ðkÞ ½247 g2 gðkÞ qðkÞ Now that expressions for V11 ðqÞ, V12 ðqÞ, and V22 ðqÞ are available in the vicinity of qðkÞ , an expression must be derived for VðqÞ that is globally smooth as q approaches different reference points on the PES. In MCMM, this is done using a Shepard interpolation.152,153 Suppose that a collection of M ‘‘Shepard points’’ is available, for which there are ab initio energies V ðkÞ , gradients gðkÞ , and Hessians f ðkÞ . By using the Shepard interpolation method, the resonance energy function is given by M X S 0 ðqÞ ¼ Wk ðqÞV12 ðq; kÞ ½248 V12 k¼1

where the normalized weight is given by Wk ðqÞ ¼

wk ðqÞ wðqÞ

½249

in terms of unnormalized weights wk (discussed below) and in terms of the normalization constant M þ2 X wðqÞ ¼ wl ðqÞ ½250 l¼1

where the upper limit of the sum is now greater than in Eq. [248] because this sum also includes van der Waals minima (for biomolecular reagents) or chemical minima (for unimolecular reagents) corresponding to the two molecular mechanics structures; the resonance integral is zero by definition at these two structures. 0 is a modified quadratic function given by In Eq. [248], V12 0 ðq; kÞ2 ¼ ½V12 ðq; kÞ2 uðq; kÞ ½V12

where u is a modifier given by ! 8 > d < exp ; ½V12 ðq; kÞ2 > 0 2 uðq; kÞ ¼ ½V ðq; kÞ 12 > : 0; ½V12 ðq; kÞ2  0

½251

½252

Building the PES from Electronic Structure Calculation

195

where d is 108 E2h , and Eh is one hartree. In practice, the expression for V12 ðq; kÞ is given by  y



1

ðkÞy 2 ðkÞ ðkÞ ðkÞ ðkÞ ðkÞ ½253 ½V12 ðq; kÞ ¼ D 1 þ b C qq qq qq þ 2 Choosing constants in Eq. [253] such that Eq. [243] is reproduced when Eq. [251] is substituted into Eq. [241] yields ðkÞ

ðkÞ

DðkÞ ¼ V1 V2 y

bðkÞ ¼

ðkÞ g1

g

ðkÞ

ðkÞ V1

þ

ðkÞ g2

g

½254

ðkÞ

½255

ðkÞ

V2

1  ðkÞ ðkÞ ðkÞ ðg1  gðkÞ Þðg2  gðkÞ Þy þ ðg2  gðkÞ Þ DðkÞ ðkÞ f ðkÞ  f ðkÞ f ðkÞ ðkÞ 2 f þ ðg1  gðkÞ Þy þ 1 ðkÞ ðkÞ V1 V2 VnðkÞ ¼ VnðkÞ  V ðkÞ

CðkÞ ¼

½256 ½257

We can recap this procedure as follows: Electronic structure calculations are used to generate the Taylor series V(q;k) of Eqs. [243] and [244] in the vicinity of point q(k). This V(q) and the Taylor series (Eqs. [245] and [246]) of the reactant and product MM potential energy surfaces are substituted into Eq. [242] to yield a Taylor series of V12 in the vicinity of q(k). Next we will interpolate V12, which is much smoother and easier to interpolate than the original V. As discussed further below, the interpolation of V12 is carried out in valence internal coordinates28,68–72 to avoid the necessity of achieving a consistent molecular orientation, which would be required for interpolation in atomic Cartesians. Finally, we must specify the weighting function wk ðqÞ to be used for interpolation via Eqs. [248] through [252]. Several conditions should be met by the weight wk associated with a particular geometry q(k). These conditions involve the behavior of wk near q(k) and near the other interpolation points 0 qðk Þ with k0 6¼ k. The conditions assure that wk is smooth enough (zero first and second derivatives near all interpolation points) that the left-hand side of Eq. [248] has the same Taylor series, through quadratic terms, at q(k) as 0 ðq; kÞ. The conditions are that of V12 wk ðqðkÞ Þ ¼ 1; 0

all k

wk ðqðk Þ Þ > kD , then k ffi kD and the reaction is controlled by diffusion. A typical value for kD is 4 109 M1s1. Conversely, when kr denotes an ensemble average ði ¼ 1; 2; . . . ; IÞ. The resulting rate constant is kEA-VTST ¼

ð2Þ

ðTÞkð1Þ ðTÞ

½323

This stage-2, step-1 rate expression kEA-VTST is the final quasi-classical rate constant of the two-state process. Equation [323] has sometimes been called the static-secondary-zone rate constant without tunneling, but this term is deceptive because the secondary zone changes from one ensemble member to another and, hence, is not really static. At this point one can include optimized multidimensional tunneling in each ði ¼ 1; 2; . . . ; IÞ of the VTST calculations. The tunneling transmission ð2Þ coefficient of stage 2 for ensemble member i is called ki and is evaluated by treating the primary zone in the ‘‘ground-state’’ approximation (see the section titled ‘‘Quantum Effects on Reaction Coordinate Motion’’) and the secondary zone in the zero-order canonical mean shape approximation explained in the section titled ‘‘Reactions in Liquids’’, to give an improved transmission coefficient that includes tunneling: ð2Þ ð2Þ i i

gð2Þ ¼ hki

½324

with the final stage-2 rate constant being kEA-VTST=OMT ¼ gð2Þ ðTÞkð1Þ ðTÞ

½325

The procedure just discussed for stage 2 includes the thermal energy and entropy of secondary-zone atoms in k(1)(T) and in the determination of each s0,i that is used in stage 2, but the s dependence of these contributions is not included in each MEP. Optionally these effects could be included in a third stage. However, when secondary-zone dynamics are slow on the time scale over which s crosses the barrier189 (or on the time scale of a wave packet traversing the tunneling segment of the reaction path), one is in what Hynes has called the ‘‘nonadiabatic solvation limit.’’190–192 In this limit, the transition state passage occurs with an ensemble average of essentially fixed secondary-zone configurations190–192 because the secondary zone cannot respond to the reaction coordinate motion to provide equilibrium solvation; in such a case, allowing the secondary zone to relax could provide less accurate results

212

Variational Transition State Theory

than stopping after stage 2. Contrarily, if the adjustment of the secondary zone is rapid on the time scale of barrier passage, one can improve the result by adding a third stage,175,178 which we call the equilibrium secondary zone approximation. If invoked, this stage uses free energy perturbation theory all along each MEP to calculate the change in secondary-zone free energy as a function of each si . That change is added to the generalized transition state theory free energy of activation profile for the calculation of both the quasiclassical CVT rate constant and the quantum effects on the reaction coordinate.

GAS-PHASE EXAMPLE: H þ CH4 In this section, CVT/mOMT theory is applied to the H þ CH4 ! H2 þ CH3 reaction by using the Jordan–Gilbert193 (JG) potential energy surface. We select this example because it is one of the few polyatomic systems for which accurate quantum dynamics calculations are available.194–196 (By accurate quantum dynamics, we mean that the nuclear quantum dynamics are converged for a given potential energy surface.) All the VTST calculations have been carried out with POLYRATE–version 9.3.1, and the calculations discussed here reproduce the CVT/mOMT rate constants obtained previously by Pu et al.111,112 First, the reactants, products, and saddle point are optimized. The imaginary frequency at the saddle point of this example has a value of 1093i cm1. The energies calculated at these points yield a classical barrier height of V z ¼ 10:92 kcal=mol and an energy of reaction, E, of 2.77 kcal/mol. From the normal mode analyses performed at the stationary points, the vibrationally adiabatic ground-state barrier at the saddle point is calculated to be VazG ¼ 10:11 kcal=mol, where VazG ¼ VazG  VaG ðs ¼ 1Þ

½326

and the reaction, for the assumed potential energy surface, is slightly exothermic, H0o ¼ 0:01 kcal=mol, where H is the enthalpy. Notice that HTo ¼ GoT at T ¼ 0 K. The MEP was followed over the interval 2:50 ao  s 2:50 ao by using the Page–McIver algorithm with a step size of 0.01 ao, and curvilinear Hessian calculations were performed at every step. The scaling mass that transforms mass-weighted coordinates to mass-scaled coordinates has been set equal to 1 amu. The vibrationally adiabatic ground-state barrier is located at sAG  ¼ 0:182 ao, and the vibrationally adiabatic ground-state barrier height is found to be VaAG ¼ 10:44 kcal/mol. The meaning of  is that this is VaG at its maximum (denoted by A) relative to the value of VaG at reactants, whereas VaAG without  refers to VaG relative to the energy at the classical equilibriun

Gas-Phase Example: H þ CH 4

213

Figure 6 Plot of the MEP (dotted line) and the vibrationally adiabatic ground-state potential curve as calculated in curvilinear (solid line) and rectilinear (dashed line) coordinates for the H þ CH4 reaction.

structure of reactants; this is about 38 kcal/mol as shown in Figure 6. In Figure 6, we plot VMEP and the vibrationally adiabatic potential with vibrations orthogonal to the reaction path treated in both curvilinear and rectilinear (Cartesian) coordinates. It should be noticed that both the MEP and the potential VMEP along the MEP are the same in both systems of coordinates; however, the vibrationally adiabatic potential energy curves are different at nonstationary points because the vibrational frequencies at nonstationary points depend on the coordinate system. The values of the vibrational frequencies along the reaction path are more physical in curvilinear coordinates, as discussed. Once the MEP and the frequencies along it have been calculated, one can calculate the generalized-transition-state-theory free energy profiles, as shown in Figure 7 for T ¼ 200; 300, and 500 K. As indicated in Figure 3, the maximum VaAG of the adiabatic potential need not coincide with the maximum GCVT;o ðTÞ of the free energy of activation profile at a given temperature. The ðTÞ are 0.177, 0.171, and 0.152 ao at T ¼ 200; 300, and 500 K, values of sCVT  respectively as shown in Figure 7. Thus, the CVT rate constant is lower than the conventional TST rate constant because the best dividing surface (the bottleneck) is located at s 6¼ 0. For instance, at T ¼ 300 K, the value of the CVT rate constant is 2.2 1020 cm3 molecule1 s1, whereas the conventional

214

Variational Transition State Theory

Figure 7 Generalized-transition-state free energy of activation along the MEP at three different temperatures for the H þ CH4 reaction.

TST rate constant is 3.6 1018 cm3 molecule1 s1. These rate constants include quantum effects in all the F  1 degrees of freedom perpendicular to the reaction coordinate, but the reaction-coordinate motion is classical; thus, we sometimes call these rate constants hybrid (in older papers) or quasi-classical (in more recent papers). The quantum effects on the reaction coordinate are incorporated by a transmission coefficient as described earlier. Because the maximum of the vibrationally adiabatic potential curve and the maximum of the free energy of activation profile at a given temperature do not coincide, one must employ the classical adiabatic ground-state CAG correction of Eq. [163] in the calculation of the CVT rate constant. Tunneling effects are important at low temperatures for this reaction because a light particle is transferred. The curvature of the reaction path was calculated by Eq. [166], and it is plotted in Figure 8. The small-curvature approximation to the effective mass along the reaction path is calculated by Eq. [174], and its ratio to the scaling mass is also plotted in Figure 8, which shows how the effective mass is reduced along the reaction path. This reduction in the effective mass also reduces the imaginary action integral and therefore increases the tunneling probability. The ZCT transmission coefficients use

Gas-Phase Example: H þ CH 4

215

Figure 8 Plot of meff =m and the reaction path curvature k along the MEP for the H þ CH4 reaction.

an effective mass that is always equal to the scaling mass, because the curvature along the reaction path is neglected in ZCT, and therefore, ZCT transmission coefficients always predict less tunneling than SCT transmission coefficients. The LCT transmission factors are calculated using the procedure described in the section entitled Large Curvature Transmission Coefficient. The larger of the SCT and LCT tunneling probabilities at each tunneling energy is the mOMT transmission probability. Thermally averaging these gives the mOMT transmission coefficient, which is is 18.7 at T ¼ 200 K and 1.57 at T ¼ 500 K. The effect of tunneling on the reaction is further analyzed by finding the energy that contributes most to the ground-state transmission coefficient. Making a change of variable, i.e., letting x ¼ E  VaAG , in Eq. [160] and by using Eqs. [162] and [163], then (ð 0 CVT=mOMT CVT CVT=CAG ¼k ðTÞbk ðTÞ PðxÞ expðbxÞdx k þ

ð1 0

)

PðxÞ expðbxÞdx

E0 VaAG

½327

216

Variational Transition State Theory

Figure 9 Plot of the first integrand of Eq. [327] versus x ¼ E  VaAG at three different energies for the H þ CH4 reaction. The maximum of the curves indicates the representative tunneling energy. The top of the barrier is located at x ¼ 0.

The first integral yields the tunneling contribution to the transmission coefficient, and the integrand is plotted in Figure 9. The curves are the product of the tunneling probability multiplied by the Boltzmann factor. The energy at which this product has a maximum is called119 the representative tunneling energy (RTE). At a given temperature, the RTE indicates the energy at which it is most probable for the particle to tunnel. For instance, at T ¼ 200 K and T ¼ 500 K, the RTE is located 2.02 and 0.31 kcal/mol below the barrier top, respectively. The mOMT transmission factor is larger at lower temperatures because the area under the curve is larger. The CVT/mOMT rate constants are 7:1 1021 and 4:1 1015 cm3 molecule1 s1 at T ¼ 200 and 500 K, respectively, whereas the accurate quantum calculations194–196 are 9.0 1021 and 3.8 1015 cm3 molecule1 s1 at those two temperatures. The average absolute deviation between the CVT/mOMT and the accurate rate constants is only 17% in the range 200–500 K. The performance of CVT/mOMT for this reaction is astonishing, considering that the quantum calculations for this system took several

Liquid-Phase Example: Menshutkin Reaction

217

months, whereas the VTST/mOMT results require only a few seconds of computer time. In particular, the calculations were carried out in less than 30 seconds on an old computer, including full LCT calculations without even using the faster spline algorithm. The calculations are so fast that the slowest part is setting up the input file.

LIQUID-PHASE EXAMPLE: MENSHUTKIN REACTION In this section, VTST is applied to the bimolecular Menshutkin reaction in aqueous solution:159 ClCH3 þ NH3 ! Cl þ H3 CNHþ 3

½328

An important difference of this example from that given earlier is that in this case no analytical potential energy surface was provided to the program. Instead, the electronic structure data needed for the dynamics were calculated ‘‘on the fly’’ by the MN-GSM197 program; that is, direct dynamics was used. The gas-phase electronic structure calculations were carried out with the HF/ 6-31G(d) method, and the MEP was followed by using the Page–McIver algorithm with a step size of 0.01 ao with analytical Hessian calculations every nine steps. Generalized normal modes were calculated using redundant curvilinear coordinates. The calculations in solution were performed with the program MN-GSM–version 5.2, which incorporates the SM5.42, SM5.43, and SM6 solvation models into Gaussian 98.198 The dynamics calculations were carried out with GAUSSRATE–version 9.1, which in this case was modified to serve as an interface between the MN-GSM–v5.2 and POLYRATE–version 9.3.1 programs. The SES calculations were carried out along the gas-phase MEP. In an SES calculation, the solvent is not considered when constructing the MEP, and solvent effects are added separately to create the potential of mean force using Eq. [311]. The solvation free energy was evaluated with the SM5.43 model, and therefore, the SES calculations are denoted as SM5.43/HF/ 6-31þG(d)//HF/6-31G(d) or simply as SM5.43/HF/6-31G(d)//g. The ESP calculations, which include solvent effects when determining geometries of stationary points and points on the reaction path, are denoted as SM5.43/HF/6-31þG(d). The stationary points within the ESP approximation are optimized using the potential of mean force, where this potential has a minimum for reactants and products and a maximum for the transition state in solution. The reaction path was obtained by using the Page–McIver algorithm with a step size of 0.01 ao. We evaluated numerical Hessians, including the effect of solvent, by central differences at every ninth step. Vibrational

218

Variational Transition State Theory

Table 1 Bond Lengths of the Stationary Points in A˚ Gas Phase Reactant van der Waals complex Saddle point Ion pair Products

ESP

RNC

RCCl

RNC

RCCl

1 3.419 1.876 1.548 1.507

1.785 1.793 2.482 2.871 1

1 — 2.263 — 1.476

1.805 — 2.312 — 1

frequencies were calculated in redundant curvilinear coordinates. In the ESP approach, we consider the liquid-phase saddle point on the potential of mean force surface of the solute as the dividing surface for the conventional transition state theory calculations. For Reaction [328], the Cl, C, and N atoms are collinear. The bond lengths between these three atoms in the gas phase and in solution are listed in Table 1, and the energetics of the stationary points are listed in Table 2. For this reaction, solvent effects are very large for products. The aqueous solution stabilizes the charged products, as shown in Table 3. The gas-phase VMEP and the SES canonical mean-shape potential UðsjTÞ are plotted in Figure 10. Note that UðsjTÞ ¼ VMEP ðsÞ þ G0S ðRðsÞ; TÞ

½329

In the gas phase, a transition state exists for reaction only because there is a slightly stable ion-pair structure, which disappears when the geometry is optimized in solution. The maximum of UðsjTÞ in the SES approximation is located at s ¼ 1:60 a0. The maximum of UðsjTÞ along the reaction path at the SM5.43//HF/6-31G(d) level is much closer to reactants than in the gas phase, which was expected, because in solution, products are much more stabilized.

Table 2 Zero-Order Mean Shape Potential of the Stationary Points Relative to Reactants (in kcal/mol) van der Waals complex Saddle point Ion pair Products

Gasa

SESa

ESPa

2.0 36.1 30.6 111.7

0.98 2.61 27.5 38.6

— 13.4 — 35.6

a Reactants absolute energy (in hartrees): 555.277509 (gas); 555.285927 (SES); 555.286366 (ESP).

Liquid-Phase Example: Menshutkin Reaction

219

Table 3 Standard-State Free Energies of Solvation of the Stationary Points in kcal/mol Level NH3 CH3Cl ClCH3. . .NH3 Transition state Cl. . .CH3NHþ 3 Cl CH3NHþ 3

SES

ESP

4.6 0.7 4.2 78.8 78.9 72.0 83.6

5.1 1.4 — 20.1 — 72.0 84.3

The potentials along the reaction paths in the SES and ESP approximations are plotted in Figure 11 using a common reaction coordinate consisting of the difference between the breaking and the forming bonds along the path involving the breaking and forming bond distance in the gas-phase transition state (this reaction coordinate is used only for plotting the two cases on a common scale; the actual reaction coordinates are distance along the gas-phase MEP for the SES cases and along the liquid-phase MEP for ESP). The SES and ESP potentials show similar profiles and therefore similar rate constants at room

Figure 10 Zero-order canonical mean shape potential U for reaction [328] calculated at the HF/6-31 G(d) (gas phase) and SM5.43//HF/6-31G(d) (SES) levels as functions of the reaction coordinate s for the Menshutkin reaction.

220

Variational Transition State Theory

Figure 11 Zero-order canonical mean shape potential U for reaction [328] calculated at the HF/6-31G(d) (gas phase), SM5.43//HF/6-31G(d) (SES), and SM5.43/HF/631G(d) (ESP) levels as functions for the Menshutkin reaction.

temperature (see Table 4). The exception is the conventional TST rate constant in the SES approach, which is about six orders of magnitude higher than the CVT rate constant. This is caused by the very different location of the maximum of the potential in liquid-phase solution as compared with the gas phase. As expected, tunneling is not very important for this reaction, and therefore, the SCT approach for tunneling suffices for this case. Although the above reaction is quite simple, the similarity between the SES and the ESP profiles is stunning if we consider the great difference between the gas-phase and liquid-phase potentials. From this example, we can conclude that, although the ESP allows a more reliable description of the reaction in solution, the SES approach is an inexpensive approach that can sometimes provide a reasonably accurate alternative to the ESP method. Table 4 Rate Constants in cm3 molecule1 s1 k TST CVT CVT/SCT

SES 3.7 1018 2.0 1025 2.9 1025

ESP 3.7 1025 1.9 1025 2.6 1025

Concluding Remarks

221

CONCLUDING REMARKS Transition state theory is based on the assumption of a dynamical bottleneck. The dynamical bottleneck assumption would be perfect, at least in classical mechanics, if the reaction coordinate were separable. Then one could find a dividing surface separating reactants from products that is not recrossed by any trajectories in phase space. Conventional transition state theory assumes that the unbound normal mode of the saddle point provides such a separable reaction coordinate, but dividing surfaces defined with this assumption often have significant recrossing corrections. Variational transition state theory corrects this problem, eliminating most of the recrossing. Variational transition state theory has proved itself to be a flexible and practical tool for finding better transition state dividing surfaces in both simple and complex systems. Such dividing surfaces are called generalized transition states, and the optimum or optimized generalized transition states are called variational transition states. Real chemical reactions involve reactants with quantized vibrations, and this feature must be included in realistic rate constant calculations. Much more accurate rate constants are obtained if vibrations are treated as quantized both in the generalized transition state dividing surface and in the reactants. The reaction-coordinate motion, which is unbound for bimolecular reactions and therefore does not have quantized vibrations, also exhibits quantum effects, especially tunneling and nonclassical reflection. For thermal reactions that involve significant tunneling contributions, it is necessary to treat the overbarrier and tunneling processes in a consistent framework because the fraction of reaction that occurs by a tunneling mechanism tends to decrease gradually as the temperature is increased; this consistency can only be achieved in general if a variational criterion is used to optimize the overbarrier contribution; after such optimization is carried out, the ground-state transmission coefficient approximation and the canonical-mean-shape approximation provide ways of consistently incorporating tunneling effects into variational transition state theory for gas-phase and liquid-phase reactions, respectively. For simple reactions, one needs to consider only a single reaction coordinate, and the isoinertial minimum energy path provides a good choice that is often sufficient. Early work took the transition state dividing surfaces to be hyperplanes perpendicular to the isoinertial minimum energy path and optimized the location of such hyperplanes along this path. The next generation of algorithms either optimized the orientation of hyperplanes or used curvilinear coordinates to define more physical dividing surfaces. The most complete algorithms consider an ensemble of reaction paths. In this way one can account, at least in part, for recrossing the dividing surface defined by a single reaction coordinate. It is not sufficient to merely treat tunneling consistently with overbarrier processes; it must be treated accurately. For overbarrier processes,

222

Variational Transition State Theory

the nonseparability of the reaction coordinate shows up as recrossing, and the nonseparability of the reaction coordinate is even more important for tunneling than for overbarrier processes. Two kinds of nonseparability are recognized. First, the effective barrier along the tunneling coordinate depends on all other degrees of freedom. Second, the tunneling paths themselves tend to be shorter than the minimum energy path, and this path shortening, called corner cutting, depends on the multidimensional shape of the potential energy surface. For small curvature of the minimum energy path in isoinertial coordinates, the effective potential may be calculated vibrationally adiabatically, and tunneling-path shortening may be calculated to a good approximation from the reaction-path curvature. For large curvature of the minimum energy path in isoinertial coordinates, the effective potential is vibrationally nonadiabatic, and one must average over a set of nearly straight tunneling paths that usually cannot be represented in coordinate systems based on the minimumenergy path; special procedures called large-curvature tunneling approximations have been worked out to treat such tunneling consistently with variational transition state theory. This chapter has included a discussion of algorithms for treating all these issues, especially as they are incorporated in the POLYRATE computer program. The POLYRATE program requires information about the potential energy surface, and this can be included in a variety of ways. These include global analytical potential energy surfaces and direct dynamics. In direct dynamics, the energies, gradients, and Hessians required by the algorithms are computed ‘‘on the fly’’ by electronic structure calculations whenever the algorithms call for them. This is called direct dynamics. POLYRATE also includes several interpolation schemes in which the needed energies, gradients, and Hessians are locally interpolated from a small dataset of electronic structure calculations; this is a particularly efficient form of direct dynamics.

ACKNOWLEDGMENTS This work was supported in part by the U.S. Department of Energy (DOE), Office of Basic Energy Sciences (BES), under Grant DE-FG02-86ER13579 and by the Air Force Office of Scientific Research by a Small Business Technology Transfer grant to Scientific Applications and Research Assoc., Inc. A.F.R. thanks the Ministerio de Educacio´n y Ciencia of Spain for a Ramo´n y Cajal research contract and for Project #BQU2003-01639. B.C.G. acknowledges BES support at Pacific Northwest National Laboratory (PNNL). Battelle operates PNNL for DOE.

REFERENCES 1. D. G. Truhlar, A. D. Isaacson, and B. C. Garrett, in Theory of Chemical Reaction Dynamics, Vol. 3, M. Baer, ed., CRC Press, Boca Raton, FL, 1985, pp. 65–137. Generalized Transition State Theory.

References

223

2. A. D. Isaacson, D. G. Truhlar, S. N. Rai, R. Steckler, G. C. Hancock, B. C. Garrett, and M. J. Redmon, Comput. Phys. Commun., 47, 91 (1987). POLYRATE: A General Computer Program for Variational Transition State Theory and Semiclassical Tunneling Calculations of Chemical Reaction Rates. 3. D.-h. Lu, T. N. Truong, V. S. Melissas, G. C. Lynch, Y.-P. Liu, B. C. Garrett, R. Steckler, A. D. Isaacson, S. N. Rai, G. C. Hancock, J. G. Lauderdale, T. Joseph, and D. G. Truhlar, Comput. Phys. Commun., 71, 235 (1992). POLYRATE 4: A New Version of a Computer Program for the Calculation of Chemical Reaction Rates for Polyatomics. 4. W.-P. Hu, R. Steckler, G. C. Lynch, Y.-P. Liu, B. C. Garrett, A. D. Isaacson, D.-h. Lu, V. S. Melissas, I. Rossi, J. J. P. Stewart, and D. G. Truhlar, QCPE Bull., 15,32 (1995). POLYRATE -version 6.5 and MORATE -version 6.5/P6.5-M5.05. Two Computer Programs for the Calculation of Chemical Reaction Rates. 5. R. Steckler, W.-P. Hu, Y.-P. Liu, G. C. Lynch, B. C. Garrett, A. D. Isaacson, V. S. Melissas, D.-h. Lu, T. N. Truong, S. N. Rai, G. C. Hancock, J. G. Lauderdale, T. Joseph, and D. G. Truhlar, Comput. Phys. Commun., 88, 341 (1995). POLYRATE 6.5: A New Version of a Computer Program for the Calculation of Reaction Rates for Polyatomics. 6. J. C. Corchado, Y.-Y. Chuang, P. L. Fast, W.-P. Hu, Y.-P. Liu, G. C. Lynch, K. A. Nguyen, C. F. Jackels, A. Fernandez-Ramos, B. A. Ellingson, B. J. Lynch, V. S. Melissas, J. Villa`, I. Rossi, E. L. Coitin˜o, J. Pu, T. V. Albu, R. Steckler, B. C. Garrett, A. D. Isaacson, and D. G. Truhlar, POLYRATE - version 9.4.3. University of Minnesota, Minneapolis, Minnesota, 2006. Available: http://comp.chem.umn.edu/polyrate. 7. S. C. Tucker and D. G. Truhlar, in New Theoretical Concepts for Understanding Organic Reactions, J. Bertra´n and I. G. Csizmadia, Eds., Kluwer, Dordrecht, The Netherlands, 1989, pp. 291–346. [NATO ASI Ser. C 267, 291–346 (1989)]. Dynamical Formulation of Transition State Theory: Variational Transition States and Semiclassical Tunneling. 8. H. Eyring, J. Chem. Phys., 3, 107 (1935). The Activated Complex in Chemical Reactions. 9. M. G. Evans and M. Polanyi, Trans. Faraday Soc., 31, 875 (1935). Some Applications of the Transition State Method to the Calculation of Reaction Velocities, Especially in Solution. 10. W. F. K. Wynne-Jones and H. Eyring, J. Chem. Phys., 3, 492 (1935). The Absolute Rate of Reactions in Condensed Phases. 11. R. H. Fowler, Trans. Faraday Soc., 34, 124 (1938). General Discussion. 12. R. K. Boyd, Chem. Rev., 77, 93 (1977). Macroscopic and Microscopic Restrictions on Chemical Kinetics. 13. C. Lim and D. G. Truhlar, J. Phys. Chem., 89, 5 (1985). Internal-State Nonequilibrium Effects for a Fast, Second-Order Reaction. 14. C. Lim and D. G. Truhlar, J. Phys. Chem., 90, 2616 (1986). The Effect of VibrationalRotational Disequilibrium on the Rate Constant for an Atom-Transfer Reaction. 15. H. Teitelbaum, J. Phys. Chem., 94, 3328 (1990). Nonequilibrium Kinetics of Bimolecular Exchange Reactions. 3. Application to Some Combustion Reactions. 16. C. Bowes, N. Mina, and H. Teitelbaum, J. Chem. Soc., Faraday Trans., 87, 229 (1991). NonEquilibrium Kinetics of Bimolecular Exchange Reactions. 2. Improved Formalism and Applications to Hydrogen Atom þ Hydrogen Molecule ! Hydrogen Molecule þ Hydrogen drogen Atom and its Isotopic Variants. 17. H. Teitelbaum, Chem. Phys., 173, 91 (1993). Non-Equlibrium Kinetics of Bimolecular Reactions. IV. Experimental Prediction of the Breakdown of the Kinetic Mass-Action Law. 18. H. Teitelbaum, Chem. Phys. Lett., 202, 242 (1993). Non-Equilibrium Kinetics of Bimolecular Reactions. Effect of Anharmonicity on the Rate Law. 19. P. Pechukas, Annu. Rev. Phys. Chem., 32, 159 (1981). Transition State Theory. 20. B. C. Garrett and D. G. Truhlar, J. Phys. Chem., 83, 1052 (1979); Erratum: 87, 4553 (1983). Generalized Transition State Theory. Classical Mechanical Theory and Applications to Collinear Reactions of Hydrogen Molecules.

224

Variational Transition State Theory

21. E. Wigner, J. Chem. Phys., 5, 720 (1937). Calculation of the Rate of Elementary Association Reactions. 22. J. Horiuti, Bull. Chem. Soc. Jpn., 13, 210 (1938). On the Statistical Mechanical Treatment of the Absolute Rates of Chemical Reactions. 23. J. C. Keck, J. Chem. Phys., 32, 1035 (1960). Variational Theory of Chemical Reaction Rates Applied to Three-Body Recombinations. 24. J. C. Keck, Adv. Chem. Phys., 13, 85 (1967). Variational Theory of Reaction Rates. 25. R. L. Jaffe, J. M. Henry, and J. B. Anderson, J. Chem. Phys., 59, 1128 (1973). Variational Theory of Reaction Rates: Application to FþH2 $ HFþH. 26. W. H. Miller, J. Chem. Phys., 61, 1823 (1974). Quantum Mechanical Transition State Theory and a New Semiclassical Model for Reaction Rate Constants. 27. B. C. Garrett and D. G. Truhlar, J. Chem. Phys., 70, 1593 (1979). Criterion of Minimum State Density in the Transition State Theory of Bimolecular Reactions. 28. E. B. Wilson, Jr., J. C. Decius, and P. C. Cross, Molecular Vibrations, Dover Publications, Inc., New York, 1955. 29. R. A. Marcus, Discuss. Faraday Soc., 44, 7 (1967). Analytical Mechanics and Almost Vibrationally Adiabatic Chemical Reactions. 30. B. C. Garrett and D. G. Truhlar, J. Phys. Chem., 83, 1079 (1979); Erratum: 87, 4553 (1983). Generalized Transition State Theory. Quantum Effects for Collinear Reactions of Hydrogen Molecules and Isotopically Substituted Hydrogen Molecules. 31. K. Fukui, in The World of Quantum Chemistry, R. Daudel and B. Pullman, Eds., D. Reidel, Dordrecht, The Netherlands, 1974, pp. 113. The Charge and Spin Transfers in Chemical Reaction Paths. 32. G. K. Schenter, B. C. Garrett, and D. G. Truhlar, J. Chem. Phys., 119, 5828 (2003). Generalized Transition State Theory in Terms of the Potential of Mean Force. 33. G. K. Schenter, B. C. Garrett, and D. G. Truhlar, J. Phys. Chem. B, 105, 9672 (2001). The Role of Collective Solvent Coordinates and Nonequilibrium Solvation in Charge-Transfer Reactions. 34. P. L. Fast and D. G. Truhlar, J. Chem. Phys., 109, 3721 (1998). Variational Reaction Path Algorithm. 35. B. C. Garrett and D. G. Truhlar, J. Am. Chem. 50c, 101, 4534 (1979). Generalized Transition State Theory. Bond Energy–Bond Order Method for Canonical Variational Calculations with Application to Hydrogen Atom Transfer Reactions. 36. A. Tweedale and K. J. Laidler, J. Chem. Phys., 53, 2045 (1970). Vibrationally Adiabatic Model for the Dyamics of H þ H2 Systems. 37. J. C. Keck, Adv. Chem. Phys., 13, 85 (1967). Variational Theory of Reaction Rates. 38. E. Wigner, Z. Physik Chem. B, B19, 203 (1932). On the Penetration of Potential Energy Barriers in Chemical Reactions. 39. M. A. Eliason and J. O. Hirschfelder, J. Chem. Phys., 30, 1426 (1956). General Collision Theory Treatment for the Rate of Bimolecular, Gas Phase Reactions. 40. C. Steel and K. J. Laidler, J. Chem. Phys., 34, 1827 (1961). High Frequency Factors in Unimolecular Reactions. 41. B. C. Garrett, D. G. Truhlar, R. S. Grev, and A. W. Magnuson, J. Phys. Chem., 84, 1730 (1980). Improved Treatment of Threshold Contributions in Variational Transition-State Theory. 42. J. O. Hirschfelder and E. Wigner, J. Chem. Phys., 7, 616 (1939). Some Quantum-Mechanical Considerations in the Theory of Reactions Involving an Activation Energy. 43. W. H. Miller, J. Chem. Phys., 65, 2216 (1976). Unified Statistical Model for ’’Complex’’ and ’’Direct’’ Reaction Mechanisms. 44. B. C. Garrett and D. G. Truhlar, J. Chem. Phys., 76, 1853 (1982). Canonical Unified Statistical Model. Classical Mechanical Theory and Applications to Collinear Reactions.

References

225

45. D. G. Truhlar and B. C. Garrett, J. Phys. Chem. A, 107, 4006 (2003). Reduced Mass in the One-Dimensional Treatment of Tunneling. 46. G. Gamow, Z. Phys., 51, 204 (1928). Quantum Theory of the Atomic Nucleus. 47. E. C. Kemble, The Fundamental Principles of Quantum Mechanics With Elementary Applications, Dover Publications, New York, 1937. 48. R. P. Bell, Proc. Royal Soc. A, 139, 466 (1933). The Application of Quantum Mechanics to Chemical Kinetics. 49. R. P. Bell, Trans. Faraday Soc., 55, 1 (1959). The Tunnel Effect Correction for Parabolic Potential Barriers. 50. R. T. Skodje and D. G. Truhlar, J. Phys. Chem., 85, 624 (1981). Parabolic Tunneling Calculations. 51. C. Eckart, Phys. Rev., 35, 1303 (1930). The Penetration of a Potential Barrier by Electrons. 52. R. A. Marcus, J. Chem. Phys., 49, 2617 (1968). Analytical Mechanics of Chemical Reactions. IV. Classical Mechanics of Reactions in Two Dimensions. 53. I. Shavitt, J. Chem. Phys., 49, 4048 (1968). Correlation of Experimental Rate Constants of the Hydrogen Exchange Reactions with a Theoretical H3 Potential Surface, Using TransitionState Theory. 54. D. G. Truhlar and A. Kuppermann, J. Am. Chem. Soc., 93, 1840 (1971). Exact Tunneling Calculations. 55. K. Fukui, S. Kato, and H. Fujimoto, J. Am. Chem. Soc., 97, 1 (1975). Constituent Analysis of the Potential Gradient Along a Reaction Coordinate. Method and an Application to Methane þ Tritium Reaction. 56. M. C. Flanigan, A. Komornicki, and J. W. McIver, Jr. In Semiempirical Methods of Electronic Structure Calculation, Part B: Applications, G. A. Segal, Ed., Plenum, New York, 1977, pp. 1–47.

57. C. Peng, P. Y. Ayala, H. B. Schlegel, and M. J. Frisch, J. Comput. Chem., 17, 49 (1996). Using Redundant Internal Coordinates to Optimize Equilibrium Geometries and Transition States. 58. P. Y. Ayala and H. B. Schlegel, J. Chem. Phys., 107, 375 (1997). A Combined Method for Determining Reaction Paths, Minima, and Transition State Geometries. 59. V. S. Melissas, D. G. Truhlar, and B. C. Garrett, J. Chem. Phys., 96, 5758 (1992). Optimized Calculations of Reaction Paths and Reaction-Path Functions for Chemical Reactions. 60. M. W. Schmidt, M. S. Gordon, and M. Dupuis, J. Am. Chem. Soc., 107, 2585 (1985). The Intrinsic Reaction Coordinate and the Rotational Barrier in Silaethylene. 61. B. C. Garrett, M. J. Redmon, R. Steckler, D. G. Truhlar, K. K. Baldridge, D. Bartol, M. W. Schmidt, and M. S. Gordon, J. Phys. Chem., 92, 1476 (1988). Algorithms and Accuracy Requirements for Computing Reaction Paths by the Method of Steepest Descent. 62. K. K. Baldridge, M. S. Gordon, R. Steckler, and D. G. Truhlar, J. Phys. Chem., 93, 5107 (1989). Ab Initio Reaction Paths and Direct Dynamics Calculations. 63. M. Page and J. W. McIver, Jr., J. Chem. Phys., 88, 922 (1988). On Evaluating the Reaction Path Hamiltonian. 64. J. Villa` and D. G. Truhlar, Theor. Chem. Acc., 97, 317 (1997). Variational Transition State Theory Without the Minimum-Energy Path. 65. W. H. Miller, N. C. Handy, and J. E. Adams, J. Chem. Phys., 72, 99 (1980). Reaction Path Hamiltonian for Polyatomic Molecules. 66. G. A. Natanson, Mol. Phys., 46, 481 (1982). Internal Motion of a Nonrigid Molecule and its Relation to the Reaction Path. 67. G. A. Natanson, B. C. Garrett, T. N. Truong, T. Joseph, and D. G. Truhlar, J. Chem. Phys., 94, 7875 (1991). The Definition of Reaction Coordinates for Reaction-Path Dynamics. 68. C. F. Jackels, Z. Gu, and D. G. Truhlar, J. Chem. Phys., 102, 3188 (1995). Reaction-Path Potential and Vibrational Frequencies in Terms of Curvilinear Internal Coordinates.

226

Variational Transition State Theory

69. G. Herzberg, Molecular Spectra and Molecular Structure. II. Infrared and Raman Spectra of Polyatomic Molecules, D. Van Nostrand, Princeton, New Jersey, 1945. 70. A. D. Isaacson, D. G. Truhlar, K. Scanlon, and J. Overend, J. Chem. Phys., 75, 3017 (1981). Tests of Approximation Schemes for Vibrational Energy Levels and Partition Functions for Triatomics: H2O and SO2. 71. P. Pulay and G. Fogarasi, J. Chem. Phys., 96, 2856 (1992). Geometry Optimization in Redundant Internal Coordinates. 72. Y.-Y. Chuang and D. G. Truhlar, J. Phys. Chem. A, 102, 242 (1998). Reaction-Path Dynamics in Redundant Internal Coordinates. 73. D. F. McIntosh and K. H. Michelian, Can. J. Spectrosc., 24, 1 (1979). The Wilson GF Matrix Method of Vibrational Analysis. Part I: General Theory. 74. D. F. McIntosh and K. H. Michelian, Can. J. Spectrosc., 24, 35 (1979). The Wilson GF Matrix Method of Vibrational Analysis. Part II. Theory and Worked Examples of the Construction of the B Matrix. 75. D. F. McIntosh and K. H. Michelian, Can. J. Spectrosc., 24, 65 (1979). The Wilson GF Matrix Method of Vibrational Analysis. Part III: Worked Examples of The Vibrational Analysis of Carbon Dioxide and Water. 76. S. J. Klippenstein, J. Chem. Phys., 94, 6469 (1991). A Bond Length Reaction Coordinate for Unimolecular Reactions. II. Microcanonical and Canonical Implementations with Application to the Dissociation of NCNO. 77. S. J. Klippenstein, J. Chem. Phys., 96, 367 (1992); Erratum: 96, 5558 (1992). Variational Optimizations in the Rice–Ramsperger–Kassel–Marcus Theory Calculations For Unimolecular Dissociations With No Reverse Barrier. 78. J. Villa`, A. Gonza´lez-Lafont, J. M. Lluch, and D. G. Truhlar, J. Am. Chem. Soc., 120, 5559 (1998). Entropic Effects on the Dynamical Bottleneck Location and Tunneling Contributions for C2H4 þ H ! C2H5. Variable Scaling of External Correlation Energy for Association Reactions. 79. J. Villa`, J. C. Corchado, A. Gonza´lez-Lafont, J. M. Lluch, and D. G. Truhlar, J. Am. Chem. Soc., 120, 12141 (1998). Explanation of Deuterium and Muonium Kinetic Isotope Effects for Hydrogen Atom Addition to an Olefin. 80. J. Villa`, J. C. Corchado, A. Gonza´lez-Lafont, J. M. Lluch, and D. G. Truhlar, J. Phys. Chem. A, 103, 5061 (1999). Variational Transition State Theory with Optimized Orientation of the Dividing Surface and Semiclassical Tunneling Calculations for Deuterium and Muonium Kinetic Isotope Effects in the Free Radical Association Reaction H þ C2H4 ! C2H5. 81. D. A. Wardlaw and R. A. Marcus, J. Chem. Phys. 83, 3462 (1985). Unimolecular Reaction Rate Theory for Transition States of Partial Looseness. II. Implementation and Analysis with Applications to NO2 and C2H6 Dissociations.

82. D. M. Wardlaw and R. A. Marcus, Adv. Chem. Phys. 107, 9776 (1988). On the Statistical Theory of Unimolecular Processes. 83. S. J. Klippenstein, J. Phys. Chem. 98, 11459 (1994). An Efficient Procedure for Evaluating the Number of Available States within a Variably Defined Reaction Coordinate Framework. 84. M. Pesa, M. J. Pilling, S. H. Robertson, and D. M. Wardlaw, J. Phys. Chem. A, 102, 8526 (1998). Application of the Canonical Flexible Transition State Theory to CH3, CF3, and CCl3 Recombination Reactions. 85. S. C. Smith, J. Chem. Phys., 111, 1830 (1999). Classical Flux Integrals in Transition State Theory: Generalized Reaction Coordinates. 86. S. Robertson, A. F. Wagner, and D. M. Wardlaw, J. Phys. Chem. A, 106, 2598 (2002). Flexible Transition State Theory for a Variable Reaction Coordinate: Analytical Expressions and an Application. 87. Y. Georgievskii and S. J. Klippenstein, J. Chem. Phys., 118, 5442 (2003). Variable Reaction Coordinate Transition State Theory: Analytic Results and Application to the C2H3 þ H ! C2H4 Reaction.

References

227

88. Y. Georgievskii and S. J. Klippenstein, J. Phys. Chem. A, 107, 9776 (2003). Transition State Theory for Multichannel Addition Reactions: Multifaceted Dividing Surfaces. 89. Y. Georgievskii and S. J. Klippenstein, J. Chem. Phys., 122, 194103 (2005). Long-Range Transition State Theory. 90. Y.-Y. Chuang and D. G. Truhlar, J. Chem. Phys., 112, 1221 (2000); Erratum: 124, 179903 (2006). Statistical Thermodynamics of Bond Torsional Modes. 91. Y.-P. Liu, D.-h. Lu, A. Gonza´lez-Lafont, D. G. Truhlar, and B. C. Garrett, J. Am. Chem. Soc., 115, 7806 (1993). Direct Dynamics Calculation of the Kinetic Isotope Effect for an Organic Hydrogen-Transfer Reaction, Including Corner-Cutting Tunneling in 21 Dimensions. 92. K. S. Pitzer and W. D. Gwinn, J. Chem. Phys., 10, 428 (1942). Energy Levels and Thermodynamic Function for Molecules with Internal Rotation. 93. K. S. Pitzer, J. Chem. Phys., 14, 239 (1946). Energy Levels and Thermodynamic Functions for Molecules with Internal Rotation: II. Unsymmetrical Tops Attached to a Rigid Frame. 94. D. G. Truhlar, J. Comput. Chem., 12, 266 (1991). A Simple Approximation for the Vibrational Partition Function of a Hindered Internal Rotation. 95. B. A. Ellingson, V. A. Lynch, S. L. Mielke, and D. G. Truhlar, J. Chem. Phys., 125, 84305 (2006). Statistical Thermodynamics of Bond Torsional Modes. Tests of Separable, AlmostSeparable, and Improved Pitzer–Gwinn Approximations. 96. G. Herzberg, Molecular Spectra and Molecular Structure. I. Spectra of Diatomic Molecules, Van Nostrand Reinhold, Princeton, New Jersey, 1950. 97. A. D. Isaacson and D. G. Truhlar, J. Chem. Phys., 76, 1380 (1982). Polyatomic Canonical Variational Theory for Chemical Reaction Rates. Separable-mode Formalism With Application to Hydroxyl Radical þ Diatomic Hydrogen ! Water þ Atomic Hydrogen.

98. D. G. Truhlar, J. Mol. Spect., 38, 415 (1971). Oscillators with Quartic Anharmonicity: Approximate Energy Levels.

99. B. C. Garrett and D. G. Truhlar, J. Phys. Chem., 83, 1915 (1979). Importance of Quartic Anharmonicity for Bending Partition Functions in Transition-State Theory. 100. K. A. Nguyen, C. F. Jackels, and D. G. Truhlar, J. Chem. Phys., 104, 6491 (1996). ReactionPath Dynamics in Curvilinear Internal Coordinates Including Torsions. 101. Y.-Y. Chuang and D. G. Truhlar, J. Chem. Phys., 107, 83 (1997). Reaction-Path Dynamics with Harmonic Vibration Frequencies in Curvilinear Internal Coordinates: H þ trans-N2H2 ! NH2 þ H2.

102. B. C. Garrett and D. G. Truhlar, J. Chem. Phys., 79, 4931 (1983). A Least-Action Variational Method for Calculating Multidimensional Tunneling Probabilities for Chemical Reactions.

103. T. C. Allison and D. G. Truhlar, in Modern Methods for Multidimensional Dynamics Computations in Chemistry, D. L. Thompson, Ed., World Scientific, Singapore, 1998, pp. 618–712. Testing the Accuracy of Practical Semiclassical Methods: Variational Transition State Theory With Optimized Multidimensional Tunneling. 104. Y.-P. Liu, G. C. Lynch, T. N. Truong, D.-h. Lu, D. G. Truhlar, and B. C. Garrett, J. Am. Chem. Soc., 115, 2408 (1993). Molecular Modeling of the Kinetic Isotope Effect for the [1,5]Sigmatropic Rearrangement of cis-1,3-Pentadiene. 105. D. G. Truhlar and B. C. Garrett, in Annual Review of Physical Chemistry, Vol. 35, B. S. Rabinovitch, J. M. Schurr, and H. L. Strauss, Eds., Annual Reviews, Inc., Palo Alto, California, 1984, pp. 159–189. Variational Transition State Theory. 106. M. M. Kreevoy and D. G. Truhlar, in Investigation of Rates and Mechanisms of Reactions, Fourth edition, Part 1, C. F. Bernasconi, Ed., Wiley, New York, 1986, pp. 13-95. Transition State Theory. 107. D. G. Truhlar and B. C. Garrett, Journal de Chimie Physique, 84, 365 (1987). Dynamical Bottlenecks and Semiclassical Tunneling Paths for Chemical Reactions.

228

Variational Transition State Theory

108. B. C. Garrett, T. Joseph, T. N. Truong, and D. G. Truhlar, Chem. Phys., 136, 271 (1989). Application of the Large-Curvature Tunneling Approximation to Polyatomic Molecules: Abstraction of H or D by Methyl Radical. 109. T. N. Truong, D.-h. Lu, G. C. Lynch, Y.-P. Liu, V. S. Melissas, J. J. P. Stewart, R. Steckler, B. C. Garrett, A. D. Isaacson, A. Gonza´lez-Lafont, S. N. Rai, G. C. Hancock, T. Joseph, and D. G. Truhlar, Comput. Phys. Commun., 75, 143 (1993). MORATE: A Program for Direct Dynamics Calculations of Chemical Reaction Rates by Semiempirical Molecular Orbital Theory. 110. A. Fernandez-Ramos and D. G. Truhlar, J. Chem. Phys., 114, 1491 (2001). Improved Algorithm for Corner-Cutting Tunneling Calculations. 111. J. Pu, J. C. Corchado, and D. G. Truhlar, J. Chem. Phys., 115, 6266 (2001). Test of Variational Transition State Theory With Multidimensional Tunneling Contributions Against an Accurate Full-Dimensional Rate Constant Calculation for a Six-Atom System. 112. J. Pu and D. G.Truhlar, J. Chem. Phys., 117, 1479 (2002). Validation of Variational Transition State Theory with Multidimensional Tunneling Contributions Against Accurate Quantum Mechanical Dynamics for H þ CH4 ! H2 þ CH3 in an Extended Temperature Interval. 113. R. A. Marcus, J. Chem. Phys., 45, 4493 (1966). On the Analytical Mechanics of Chemical Reactions. Quantum Mechanics of Linear Collisions. 114. A. Kuppermann, J. T. Adams, and D. G. Truhlar, in Abstractions of Papers, VIII ICPEAC, Beograd, 1973, B. C. Cubic and M. V. Kurepa, Eds., Institute of Physics, Belgrade, Serbia. 1973 pp. 149–150. 115. R. A. Marcus and M. E. Coltrin, J. Chem. Phys., 67, 2609 (1977). A New Tunneling Path for Reactions Such as HþH2 ! H2þH.

116. R. T. Skodje, D. G. Truhlar, and B. C. Garrett, J. Chem. Phys., 77, 5955 (1982). Vibrationally Adiabatic Models for Reactive Tunneling.

117. M. M. Kreevoy, D. Ostovic, D. G. Truhlar, and B. C. Garrett, J. Phys. Chem., 90, 3766 (1986). Phenomenological Manifestations of Large-Curvature Tunneling in Hydride Transfer Reactions. 118. D. G. Truhlar and M. S. Gordon, Science, 249, 491 (1990). From Force Fields to Dynamics: Classical and Quantal Paths. 119. Y. Kim, D. G. Truhlar, and M. M. Kreevoy, J. Am. Chem. Soc., 113, 7837 (1991). An Experimentally Based Family of Potential Energy Surfaces for Hydride Transfer Between NADþ Analogues. 120. A. Fernandez-Ramos, D. G. Truhlar, J. C. Corchado, and J. Espinosa-Garcia, J. Phys. Chem. A, 106, 4957 (2002). Interpolated Algorithm for Large-Curvature Tunneling Calculations of Transmission Coefficients for Variational Transition State Theory Calculations of Reaction Rates. 121. A. Fernandez-Ramos and D. G. Truhlar, J. Chem. Theory Comput., 1, 1063 (2005). A New Algorithm for Efficient Direct Dynamics Calculations of Large-Curvature Tunneling and its Application to Radical Reactions with 9–15 Atoms. 122. G. C. Lynch, P. Halvick, D. G. Truhlar, B. C. Garrett, D. W. Schwenke, and D. J. Kouri, Z. Naturforsch., 44a, 427 (1989). Semiclassical and Quantum Mechanical Calculations of Isotopic Kinetic Branching Ratios for the Reaction of O(3P) with HD. 123. D. C. Chatfield, R. S. Friedman, D. G. Truhlar, and D. W. Schwenke, Faraday Discuss. Chem. Soc., 91, 289 (1991). Quantum-Dynamical Characterization of Reactive Transition States. 124. B. C. Garrett, N. Abusalbi, D. J. Kouri, and D. G. Truhlar, J. Chem. Phys., 83, 2252 (1985). Test of Variational Transition State Theory and the Least-Action Approximation for Multidimensional Tunneling Probabilities Against Accurate Quantal Rate Constants for a Collinear Reaction Involving Tunneling into an Excited State. 125. D. G. Truhlar, J. Chem. Soc. Faraday Trans., 90, 1740 (1994). General Discussion. 126. B. C. Garrett and D. G. Truhlar, J. Phys. Chem., 89, 2204 (1985). Generalized Transition State Theory and Least-Action Tunneling Calculations for the Reaction Rates of Atomic

References

229

Hydrogen(Deuterium) þ Molecular Hydrogen (n ¼ 1)! Molecular Hydrogen(Hydrogen Deuteride) þ Atomic Hydrogen.

127. S. C. Tucker, D. G. Truhlar, B. C. Garrett, and A. D. Isaacson, J. Chem. Phys., 82, 4102 (1985). Variational Transition State Theory With Least-Action Tunneling Calculations for the Kinetic Isotope Effects in the Atomic Chlorine þ Molecular Hydrogen Reaction: Tests of Extended-LEPS, Information-Theoretic, and Diatomics-in-Molecules Potential Energy Surfaces. 128. A. Fernandez-Ramos, Z. Smedarchina, M. Zgierski, W. Siebrand, and M. A. Rios, J. Am. Chem. Soc., 121, 6280 (1999). Direct-Dynamics Approaches to Proton Tunneling Rate Constants. A Comparative Test for Molecular Inversions and Application to 7-Azaindole.

129. M. Y. Ovchinnikova, Chem. Phys., 36, 85 (1979). The Tunneling Dynamics of the LowTemperature Hydrogen Atom Exchange Reactions. 130. V. K. Babamov and R. A. Marcus, J. Chem. Phys., 74, 1790 (1981). Dynamics of Hydrogen Atom and Proton Transfer Reactions. Symmetric Case. 131. D. K. Bondi, J. N. L. Connor, B. C. Garrett, and D. G. Truhlar, J. Chem. Phys., 78, 5981 (1983). Test of Variational Transition State Theory with a Large-Curvature Tunneling Approximation Against Accurate Quantal Reaction Probabilities and Rate Coefficients for Three Collinear Reactions with Large Reaction-Path Curvature: Atomic Chlorine þ Hydrogen Chloride, Atomic Chlorine þ Deuterium Chloride, and Atomic Chlorine þ MuCl.

132. A. Gonza´lez-Lafont, T. N. Truong, and D. G. Truhlar, J. Phys. Chem., 95, 4618 (1991). Direct Dynamics Calculations with NDDO (Neglect of Diatomic Differential Overlap) Molecular Orbital Theory with Specific Reaction Parameters.

133. Y. Kim, J. C. Corchado, J. Villa`, J. Xing, and D. G. Truhlar, J. Chem. Phys., 112, 2718 (2000). Multiconfiguration Molecular Mechanics Algorithm for Potential Energy Surfaces of Chemical Reactions. 134. T. V. Albu, J. C. Corchado, and D. G. Truhlar, J. Phys. Chem. A, 105, 8465 (2001). Molecular Mechanics for Chemical Reactions: A Standard Strategy for Using Multiconfiguration Molecular Mechanics for Variational Transition State Theory with Optimized Multidimensional Tunneling. 135. H. Lin, J. Pu, T. V. Albu, and D. G. Truhlar, J. Phys. Chem. A, 108, 4112 (2004). Efficient Molecular Mechanics for Chemical Reactions Using Partial Electronic Structure Hessians. 136. Y.-Y. Chuang, P. L. Fast, W.-P. Hu, G. C. Lynch, Y.-P. Liu, and D. G. Truhlar, MORATE— version 8.5. Available: http://comp.chem.umn.edu/morate. 137. J. C. Corchado, Y.-Y. Chuang, E. L. Coitin˜o, and D. G. Truhlar, GAUSSRATE—version 9.4. Available: http://comp.chem.umn.edu/gaussrate. 138. Y.-Y. Chuang, J. C. Corchado, J. Pu, and D. G. Truhlar, GAMESSPLUSRATE—version 9.3. Available: http://comp.chem.umn.edu/gamessplusrate. 139. J. Pu, J. C. Corchado, B. J. Lynch, P. L. Fast and D. G. Truhlar, MULTILEVELRATE— version 9.3. Available: http://comp.chem.umn.edu/multilevelrate. 140. T. V. Albu, J. C. Corchado, Y. Kim, J. Villa`, J. Xing, H. Lin, and D. G. Truhlar, MCTINKERATE—version 9.1. Available: http://comp.chem.umn.edu/mc-tinkerate. 141. M. Garcia-Viloca, C. Alhambra, J. Corchado, M. Luz Sa´nchez, J. Villa`, J. Gao, and D. G. Truhlar, CRATE—version 9.0. Available: http://comp.chem.umn.edu/crate. 142. D. G. Truhlar, in The Reaction Path in Chemistry: Current Approaches and Perspectives, D. Heidrich, Ed., Kluwer, Dordrecht, The Netherlands, 1995, pp. 229-255. Direct Dynamics Method for Calculations of Reaction Rates. 143. Y.-Y. Chuang, and D. G. Truhlar, J. Phys. Chem. A, 101, 3808, 8741(E) (1997). Improved Dual-Level Direct Dynamics Method for Reaction Rate Calculations with Inclusion of Multidimensional Tunneling Effects and Validation for the Reaction of H with transN2H2.

230

Variational Transition State Theory

144. I. Rossi and D. G. Truhlar, Chem. Phys. Lett. 223, 231 (1995). Parameterization of NDDO Wavefunctions using Genetic Algorithms: An Evolutionary Approach to Parameterizing Potential Energy Surfaces and Direct Dynamics Calculations for Organic Reactions. 145. J. A. Pople, D. P. Santry, and G. A. Segal, J. Chem. Phys., 43, S129 (1965). Approximate SelfConsistent Molecular Orbital Theory. I. Invariant Procedures. 146. J. A. Pople and D. J. Beveridge, Approximate Molecular Orbital Theory, McGraw-Hill, New York, 1970. 147. M. J. S. Dewar, E. G. Zoebisch, E. F. Healy, and J. J. P. Stewart, J. Am. Chem. Soc., 107, 3902 (1985). Development and Use of Quantum Mechanical Molecular Models. 76. AM1: A New General Purpose Quantum Mechanical Molecular Model. 148. M. J. S. Dewar and E. G. Zoebisch, J. Mol. Struct. (THEOCHEM), 180, 1 (1988). Extension of AM1 to the Halogens. 149. A. Warshel and R. M. Weiss, J. Am. Chem. Soc., 102, 6218 (1980). An Empirical Valence Bond Approach for Comparing Reactions in Solutions and in Enzymes. 150. Y. T. Chang and W. H. Miller, J. Phys. Chem., 94, 5884 (1990). An Empirical Valence Bond Model for Constructing Global Potential Energy Surfaces for Chemical Reactions of Polyatomic Molecular Systems. 151. Y. T. Chang, C. Minichino, and W. H. Miller, J. Chem. Phys., 96, 4341 (1992). Classical Trajectory Studies of the Molecular Dissociation Dynamics of Formaldehyde: H2CO ! H2 þ CO.

152. J. Ischtwan and M. A. Collins, J. Chem. Phys., 100, 8080 (1994). Molecular Potential Energy Surfaces by Interpolation.

153. K. A. Nguyen, I. Rossi, and D. G. Truhlar, J. Chem. Phys., 103, 5222 (1995). A Dual-Level Shepard Interpolation Method for Generating Potential Energy Surfaces for Dynamics Calculations. 154. J. C. Corchado, E. L. Coitin˜o, Y.-Y. Chuang, P. L. Fast, and D. G. Truhlar, J. Phys. Chem. A, 102, 2424 (1998). Interpolated Variational Transition-State Theory by Mapping. 155. W.-P. Hu, Y.-P. Liu, and D. G. Truhlar, J. Chem. Soc., Faraday Trans., 90, 1715 (1994). Variational Transition-State Theory and Semiclassical Tunneling Calculations With Interpolated Corrections: A New Approach to Interfacing Electronic Structure Theory and Dynamics for Organic Reactions. 156. Y.-Y. Chuang, J. C. Corchado, and D. G. Truhlar, J. Phys. Chem. A, 103, 1140 (1999). Mapped Interpolation Scheme for Single-Point Energy Corrections in Reaction Rate Calculations and a Critical Evaluation of Dual-Level Reaction Path Dynamics Methods. 157. D. G. Truhlar, J. Chem. Educ., 62, 104 (1985). Nearly Encounter-Controlled Reactions: The Equivalence of the Steady-State and Diffusional Viewpoints. 158. K. A. Connors, Chemical Kinetics: The Study of Reaction Rates in Solution; VCH Publishers, New York, 1990, pp. 207–208. 159. Y.-Y. Chuang, C. J. Cramer, and D. G. Truhlar, Int. J. Quantum Chem., 70, 887 (1998). Interface of Electronic Structure and Dynamics for Reactions in Solution. 160. M. J. Pilling and P. W. Seakins, Reaction Kinetics, Oxford University Press, Oxford, United Kingdom, 1995, pp. 155–156. 161. C. J. Cramer and D. G. Truhlar, in Reviews in Computational Chemistry, Vol. 6, K. B. Lipkowitz and D. B. Boyd, Eds., VCH Publishers, New York, 1995, pp. 1–72. Continuum Solvation Models: Classical and Quantum Mechanical Implementations. 162. C. J. Cramer and D. G. Truhlar, in Solvent Effects and Chemical Reactivity, O. Tapia and J. Bertra´n, Eds., Kluwer, Dordrecht, The Netherlands, 1996, pp. 1–80. [Understanding Chem. React. 17, 1–80 (1996).] Continuum Solvation Models. 163. D. J. Giesen, C. C. Chambers, G. D. Hawkins, C. J. Cramer, and D. G. Truhlar, in Computational Thermochemistry, K. Irikura and D. J. Frurip, Eds., American Chemical Society Symposium Series Volume 677, Washington, D.C., 1998, pp. 285–300. Modeling Free Energies of Solvation and Transfer.

References

231

164. G. D. Hawkins, T. Zhu, J. Li, C. C. Chambers, D. J. Giesen, D. A. Liotard, C. J. Cramer, and D. G. Truhlar, in Combined Quantum Mechanical and Molecular Mechanical Methods, J. Gao and M. A. Thompson, Eds., American Chemical Society Symposium Series Volume 712, Washington, D.C., 1998, pp. 201–219. Universal Solvation Models. 165. J. Li, G. D. Hawkins, C. J. Cramer, and D. G. Truhlar, Chem. Phys. Lett., 288, 293 (1998). Universal Reaction Field Model Based on Ab Initio Hartree-Fock Theory. 166. T. Zhu, J. Li, G. D. Hawkins, C. J. Cramer, and D. G. Truhlar, J. Chem. Phys., 109, 9117 (1998). Density Functional Solvation Model Based on CM2 Atomic Charges. 167. C. J. Cramer and D. G. Truhlar, Chem. Rev., 99, 2161 (1999). Implicit Solvation Models: Equilibria, Structure, Spectra, and Dynamics. 168. J. Li, T. Zhu, G. D. Hawkins, P. Winget, D. A. Liotard, C. J. Cramer, and D. G. Truhlar, Theor. Chem. Acc., 103, 9 (1999). Extension of the Platform of Applicability of the SM5.42R Universal Solvation Model. 169. C. J. Cramer and D. G. Truhlar, in Free Energy Calculations in Rational Drug Design, M. R. Reddy and M. D. Erion, Eds., Kluwer Academic/Plenum, New York, 2001, pp. 63–95. Solvation Thermodynamics and the Treatment of Equilibrium and Nonequilibrium Solvation Effects by Models Based on Collective Solvent Coordinates. 170. J. D. Thompson, C. J. Cramer, and D. G. Truhlar, J. Phys. Chem. A, 108, 6532 (2004). New Universal Solvation Model and Comparison of the Accuracy of Three Continuum Solvation Models, SM5.42R, SM5.43R, and C-PCM, in Aqueous Solution and Organic Solvents and for Vapor Pressures. 171. C. P. Kelly, C. J. Cramer, and D. G. Truhlar, J. Chem. Theory Comput., 1, 1133 (2005). SM6: A Density Functional Theory Continuum Solvation Model for Calculating Aqueous Solvation Free Energies of Neutrals, Ions, and Solute-Water Clusters. 172. D. A. McQuarrie, Statistical Mechanics, Harper & Row, New York, 1976, pp. 266. 173. D. G. Truhlar, Y.-P. Liu, G. K. Schenter, and B. C. Garrett, J. Phys. Chem., 98, 8396 (1994). Tunneling in the Presence of a Bath: A Generalized Transition State Theory Approach. 174. Y.-Y. Chuang, and D. G. Truhlar, J. Am. Chem. Soc., 121, 10157 (1999). Nonequilibrium Solvation Effects for a Polyatomic Reaction in Solution. 175. C. Alhambra, J. Corchado, M. L. Sa´nchez, M. Garcia-Viloca, J. Gao, and D. G. Truhlar, J. Phys. Chem. B, 105, 11326 (2001). Canonical Variational Theory for Enzyme Kinetics with the Protein Mean Force and Multidimensional Quantum Mechanical Tunneling Dynamics. Theory and Application to Liver Alcohol Dehydrogenase. 176. D. G. Truhlar, J. Gao, C. Alhambra, M. Garcia-Viloca, J. Corchado, M. L. Sa´nchez, and J. Villa`, Acc. Chem. Res., 35, 341 (2002). The Incorporation of Quantum Effects in Enzyme Kinetics Modeling. 177. M. Garcia-Viloca, C. Alhambra, D. G. Truhlar, and J. Gao, J. Comput. Chem., 24, 177 (2003). Hydride Transfer Catalyzed by Xylose Isomerase: Mechanism and Quantum Effects. 178. T. D. Poulsen, M. Garcia-Viloca, J. Gao, and D. G. Truhlar, J. Phys. Chem. B,107, 9567 (2003). Free Energy Surface, Reaction Paths, and Kinetic Isotope Effect of Short-Chain AcylCoA Dehydrogenase. 179. D. G. Truhlar, J. Gao, M. Garcia-Viloca, C. Alhambra, J. Corchado, M. L. Sa´nchez, and T. D. Poulsen, Int. J. Quantum Chem., 100, 1136 (2004). Ensemble-Averaged Variational Transition State Theory with Optimized Multidimensional Tunneling for Enzyme Kinetics and Other Condensed-Phase Reactions. 180. D. G. Truhlar, in Isotope Effects in Chemistry and Biology, A. Kohen and H.-H. Limbach, Eds., Marcel Dekker, Inc., New York, 2006, pp. 579–620. Variational Transition State Theory and Multidimensional Tunneling for Simple and Complex Reactions in the Gas Phase, Solids, Liquids, and Enzymes. 181. M. J. Rothman, L. L. Lohr, Jr., C. S. Ewig, and J. R. Van Wazer, in Potential Energy Surfaces and Dynamics Calculations, D. G. Truhlar, Ed., Plenum, New York, 1981, pp. 653–660. Application of the Energy Minimization Method to a Search for the Transition State for the H2 þ D2 Exchange Reaction.

232

Variational Transition State Theory

182. R. Steckler and D. G. Truhlar, J. Chem. Phys., 93, 6570 (1990). Reaction-Path Power Series Analysis of NH3 Inversion. 183. D. Heidrich, in The Reaction Path in Chemistry, D. Heidrich, Ed., Kluwer, Dordrecht, The Netherlands, 1995, pp. 1–10. An Introduction to the Nomenclature and Usage of the Reaction Path Concept. 184. B. R. Brooks, R. E. Bruccoleri, B. D. Olafson, D. J. States, S. Swaminathan, and M. Karplus, J. Comput. Chem., 4, 187 (1983). CHARMM: A Program for Macromolecular Energy, Minimisation and Dynamics Calculations. 185. G. N. Patey and J. P. Valleau, Chem. Phys. Lett., 21, 297 (1973). The Free Energy of Spheres with Dipoles: Monte Carlo with Multistage Sampling. 186. G. N. Patey and J. P. Valleau, J. Chem. Phys., 63, 2334 (1975). A Monte Carlo Method for Obtaining the Interionic Potential of Mean Force in Ionic Solution. 187. G. M. Torrie and J. P. Valleau, J. Comput. Phys., 23, 187 (1977). Nonphysical Sampling Distributions in Monte Carlo Free Energy Estimation: Umbrella Sampling. 188. M. Garcia-Viloca, C. Alhambra, D. G. Truhlar, and J. Gao, J. Chem. Phys., 114, 9953 (2001). Inclusion of Quantum Mechanical Vibrational Energy in Reactive Potentials of Mean Force. 189. D.C. Chatfield, R.S. Friedman, D.W. Schwenke, and D.G. Truhlar, J. Phys. Chem, 96, 2414 (1992). Control of Chemical Reactivity by Quantized Transition States. 190. B. J. Gertner, J.P. Bergsma, K. R. Wilson, S. Lee, and J. T. Hynes, J. Chem. Phys., 86, 1377 (1987). Nonadiabatic Solvation Model for SN2 Reactions in Polar Solvents. 191. W. P. Kierstad, K. R. Wilson, and J. T. Hynes, J. Chem. Phys., 95, 5256 (1991). Molecular Dynamics of a Model SN1 Reaction in Water. 192. J. T. Hynes, in Solvent Effects and Chemical Reactivity, O. Tapia and J. Bertra´n, Eds., Kluwer, Dordrecht, The Netherlands, 1996, pp. 231–258. Crossing the Transition State in Solution. 193. M. J. T. Jordan and R. G. Gilbert, J. Chem. Phys., 102, 5669 (1995). Classical Trajectory Studies of the Reaction CH4 þ H ! CH3 þ H2.

194. J. M. Bowman, D. Wang, X. Huang, F. Huarte-Larran˜aga, and U. Manthe, J. Chem. Phys., 114, 9683 (1991). The Importance of an Accurate CH4 Vibrational Partition Function in Full Dimensionality Calculations of the CH4 þ H ! CH3 þ H2 Reaction.

195. F. Huarte-Larran˜aga and U. Manthe, J. Chem. Phys., 113, 5115 (2000). Full Dimensional Quantum Calculations of the CH4 þ H ! CH3 þ H2 Reaction Rate.

196. F. Huarte-Larran˜aga and U. Manthe, J. Phys. Chem. A, 105, 2522 (2001). Quantum Dynamics of the CH4 þ H ! CH3 þ H2 Reaction. Full Dimensional and Reduced Dimensionality Rate Constants Calculations. 197. C. P. Kelly, J. D. Xidos, J. Li, J. D. Thompson, G. D. Hawkins, P. D. Winget, T. Zhu, D. Rinaldi, D. A. Liotard, C. J. Cramer, D. G. Truhlar, and M. J. Frisch, MN-GSM, version 5.2, Univeristy of Minnesota, Minneapolis, Minnesota, 55455-0431, 2005. 198. M. J. Frisch, G. W. Trucks, H. B. Schlegel, G. E. Scuseria, M. A. Robb, J. R. Cheeseman, V. G. Zakrzewski, J. A. Montgomery, R. E. Stratmann, J. C. Burant, S. Dapprich, J. M. Millam, A. D. Daniels, K. N. Kudin, M. C. Strain, O. Farkas, J. Tomasi, V. Barone, M. Cossi, R. Cammi, B. Mennucci, C. Pomelli, C. Adamo, S. Clifford, J. Ochterski, G. A. Petersson, P. Y. Ayala, Q. Cui, K. Morokuma, D. K. Malick, A. D. Rabuck, K. Raghavachari, J. B. Foresman, J. Cioslowski, J. V. Ortiz, B. B. Stefanov, G. Liu, A. Liashenko, P. Piskorz, I. Komaromi, R. Gomperts, R. L. Martin, D. J. Fox, T. Keith, M. A. Al-Laham, C. Y. Peng, A. Nanayakkara, C. Gonzalez, M. Challacombe, P. M. W. Gill, B. G. Johnson, W. Chen, M. W. Wong, J. L. Andres, M. Head-Gordon, E. S. Replogle, and J. A. Pople, Gaussian 98, Revision A.3, Gaussian, Inc., Pittsburgh, Pennsylvania, 1998.

CHAPTER 4

Coarse-Grain Modeling of Polymers Roland Faller Department of Chemical Engineering & Materials Science, University of California—Davis, Davis, California

INTRODUCTION Polymers are omnipresent in modern life, so it comes as no surprise that a large number of computational studies have been devoted to them. Still, even with modern supercomputers, it is elusive that a single molecular modeling study can derive large-scale polymer properties ab initio, in part because of the size scales and time scales involved in the simulation. What is needed are modeling techniques that are adapted to all relevant length scales and, moreover, we have to combine them in a useful way. This review is devoted to polymer coarse-graining where we explain how to do it. Not all methods that have been developed can be presented here, so a ‘‘How-To’’ approach covering a limited number of coarse-graining techniques is given. An extremely wide variety of approaches exists in the extant literature,1–18 and the reader is encouraged to read other recent reviews on polymer coarse-graining covered in Refs.19–22. Many reasons exist for applying coarse-graining schemes to polymer simulations (cf. Figure 1), the most important being that one can carry out simulations on meaningful time and size scales. With coarse-graining, the overall structure of a polymer in melts or in solution can be reproduced faithfully except for the atomistic detail. Coarse-graining improves the speed and memory requirements of simulations and allows for longer simulation times, larger system sizes, and longer chains. Simulations of long chains with coarsegraining are necessary because the experimentally relevant chain lengths Reviews in Computational Chemistry, Volume 23 edited by Kenny B. Lipkowitz and Thomas R. Cundari Copyright ß 2007 Wiley-VCH, John Wiley & Sons, Inc.

233

234

Coarse-Grain Modeling of Polymers

Coarse Graining

1 nm

0.5 m

Figure 1 The challenge of coarse-graining: Atomistic simulations are easily achieved, but many important questions are focused mainly on the macroscopic length scales.

cannot be accommodated by atomistically detailed simulations. The relevant length scales associated with polymer studies span the range of distances beginning from the distance between bonded atoms that are on the order of Angstroms to the contour length of the chain (at a minimum) that is on the order of micrometers. Even if computing speeds increase in the future, as they did over the last decades, we are still decades away from doing atomistic simulations of hundreds of chains each consisting of a few thousand monomers in a melt. Also, the relevant relaxation times increase by N 3:4 for large chains of length N23 meaning that very long simulation times are required for such polymer systems, a feat that remains impossible for atomic-level detail in the near future. Many questions about large scales have already been answered with simple bead-spring models. These models can reproduce scaling behaviors, thereby contributing to our basic understanding of how complex systems behave. However, to obtain numerical results that can be compared directly with experiments, one needs a meso-scale model that does not represent a simple generic polymer but instead represents the identity of the specific polymer being studied. A combination of atomistic and meso-scale models is needed, and the models have to be mapped onto each other as uniquely as possible. It is now accepted that for a simulation to be termed ‘‘multi-scale,’’ a meaningful and well-defined connection between the various length and time scales is necessary. Techniques have been devised in which simulations on more than one scale are combined to achieve a better understanding of the system as a whole.13,19,20 Molecular dynamics as well as Monte Carlo simulations have been applied in polymer coarse-graining including techniques that combine atomistic single-chain Monte Carlo results with results from molecular dynamics simulations on the meso-scale,1 the automatic simplex mapping technique,2,24 the inverted Boltzmann method,3,25 and others.4,13,26 In addition to the techniques, in which a clearly identified mapping between length scales has been established, there exists a wide variety of models that can be chosen for computational efficiency on larger scales, but where

Defining the System

235

the connection to the local atomistic scale is not completely defined. Such models are still valuable because of their ability to reproduce intermediatescale generic features of polymers at moderate simulation cost. The large group of lattice models falls into this category, for example,5,27–30 as do a number of meso-scale models of the bead-spring type.31–33

DEFINING THE SYSTEM In any simulation, as in any experiment, one must first define the system being studied. In the case of coarse-graining or multi-scale modeling, we must consider the different models that can be used, their connections, and their relationship to experiments, among other issues.

Choice of Model If we want to combine simulations on a variety of length scales, logic suggests that the first step is to devise a mapping of the different models being used. Mapping is used here in the mathematical sense that a unique ‘‘identification function’’ is devised. As it turns out, however, this mapping constitutes the third step of the coarse-grain modeling process. The first step for us to take is to choose what kinds of models are to be used, whereas the second step is to define at which anchoring points the mapping between those models should take place. Only when these prerequisites are fulfilled can we begin the mapping. The first issue, the choice of models, is addressed by the nature of the problems at hand. There is no ‘‘one-size-fits-all’’ solution to many of the problems associated with large-scale, coarse-grained modeling, and this fact is one of the major conclusions of this review. To decide which models to use, we need to ask two fundamental questions. First, what properties do we want to calculate or reproduce? The second question is directly connected to the first: What length scales do these properties represent and what effects from other length scales are expected to be relevant? Answers to these questions immediately provide the upper and lower limit for the degree of detail. Sometimes, it is not possible to immediately answer the question of important influences; in that case, we must begin by considering all length scales that are smaller than the largest relevant length scale. It is obvious that computer simulations are not ‘‘reality’’ in the same sense as experiments, but a technically correct simulation will always represent truthfully the model on which it is based. So, we have to devise models that are in agreement with nature. We will focus here on molecular models, i.e., those models that incorporate individual molecules to some degree, the most common of which are the so-called atomistic models. In this kind of modeling, (almost) every atom is represented by a classical interaction site. Classical

236

Coarse-Grain Modeling of Polymers

mechanics is here used to mean particles obeying Newton’s laws. Although nature is based on quantum mechanics, in most problems in soft-condensed matter, including polymers, the direct quantum effects are often negligible and can usually be omitted from a simulation. Only for motions of hydrogen atoms will quantum effects play a key role, but in most ‘‘atomistic’’ models, such hydrogens are not treated explicitly. Instead, those light hydrogen atoms are usually combined with the heavier atom to which they are attached. These united atom (UA) models thus contain only ‘‘heavy’’ atoms in which the hydrogens are subsumed. Creating such fictitious atoms for purposes of modeling is the second step in a hierarchy of coarse-graining, in which the first step was to neglect the influence of quantum mechanics. The prediction of most local properties requires molecular-level modeling because those properties depend on the existence of individual molecules. In contrast, a density-based field model is often sufficient on larger scales. In the case of lattice models, it is obvious that using any length scale smaller than the mesh size is meaningless. In continuous space models, the length scales smaller than the particle size or bond lengths are meaningless as well. A general rule of thumb is that one should not analyze a simulation on a given length scale or a time scale unless it is at least an order of magnitude greater than the smallest intrinsic time or length scale of the system being modeled. A good example is the time step in a molecular dynamics simulation; only if a motion is described by at least 10 time steps can we refer to it as being reasonably well described. It may be necessary to use two or more models to cover the range of relevant interactions depending on the problem at hand. To have a meaningful mapping between scales, there must be a significant overlap between the scales described by the models to be mapped onto each other. Typical models used for simulations can cover three orders of magnitude in time or length. For example, atomistic models can treat the length scales of a few hundred picometers to tens of nanometers and they can cover time periods from picoseconds to tens of nanoseconds, whereas meso-scale models are useful from a few nanometers to a few micrometers in size and from a few hundred picoseconds up to microseconds in time. In this case, the overlap between atomistic and meso-scale models is sufficient. However, if we need to enter the realm of micrometers or even millimeters and beyond in size, a second or third mapping will be necessary. A large part of the computational literature is devoted to the mathematical identification of the mapping that we now use and also the the applicability of effective pair interactions. Technically, coarse-graining is a model reduction. Let fsmall ðfrg; fpgÞ be a function describing an observable in the atomistic scale model. It depends on the position r and momenta p of the particles (in the case of a particle-based model). Similarly, flarge ðfRg; fPgÞ is the same function in the nonatomistic model with positions R and momenta P. It is clear that fsmall ¼ flarge should be valid, but for which ðfRg; fPgÞ? The equality

Defining the System

237

should also hold if one of the models is field-based and the other is particlebased, i.e., fsmall ðfrg; fpgÞ ¼ flarge ðrÞ where r represents the field. The field is often a density field and, without loss of generality, we can use the field description for the larger scale.

Interaction Sites on the Coarse-Grained Scale In a meso-scale model, a group of atoms is often replaced by a single interaction center. This center is usually the size of a monomer in a polymer and it is often called a super-atom. Because the super-atoms are the only interaction centers in a meso-scale simulation, they are required to carry the information of the interactions between the real atoms in their local geometrical arrangements that are imposed by the chemistry of the polymer. The choice of which super-atoms to use is arbitrary in principal, but there exist a number of criteria to consider when making this selection. It is beneficial if the distance between super-atoms along the polymer chain is strictly defined. Figure 2 shows three possibilities for placement of super-atoms along polystyrene and the corresponding distributions of meso-scale bond lengths. This distribution is obtained by performing an atomistic simulation and, afterwards, measuring the distances between chosen super-atom centers. The choice indicated by (a) is clearly advantageous because it represents a single peak in the normalized frequency of distances. With this placement, we see that only a few atomistic torsional degrees of freedom exist between the super-atom

(a) (b) (c)

Superatom Center (a)

0.6 H2 C

0.4

H

(b)

0.2

0 0.15

H2 C C

2

h(r) /r [arb. units]

0. 8

0.2

0.25

0.3

0.35 0.4

C H

(c)

0.45 0.5

0.55

r [nm]

Figure 2 Various anchor sites for the super-atoms of polystyrene. Reprinted from Computers and Chemical Engineering Volume 29, Q. Sun and Roland Faller, Systematic Coarse-Graining of Atomistic Models for Simulation of Polymeric Systems, pp. 2380–2385, Copyright (2005) with permission from Elsevier.

238

Coarse-Grain Modeling of Polymers

centers; thus, there is little torsional freedom to influence the rigid structure. Single peak distributions can be modeled by a single Gaussian curve. This Gaussian is the distribution of a harmonic bond potential in which the height-to-width ratio defines the bond strength. In contrast to this situation, a multiplicity of peaks, as in (b) or (c), would lead to an interdependence of super-bonds and super-angles, making it difficult to isolate useful potential energy models. The nonbonded interaction potential of super-atoms is related to the shape of the group of atoms being represented in one super-atom. A favorable modeling scenario exists when this interaction is spherically symmetric, thus avoiding the use of anisotropic potentials. Coarse-graining approaches using anisotropic potentials have shown that little is gained by the much higher complexity of the simulation in return for the slight gains in accuracy.14,15 If a single spherical potential is not satisfactory to represent a monomer as, for example, in polycarbonate polymers,1 it is more economical in terms of computing speed to use more than one spherical super-atom than it is to use a complex, anisotropic potential energy function. Coarse-graining techniques are based on the idea of effective interaction potentials between super-atoms. The larger-scale model is calibrated against the smaller-scale model, which is considered to be more reliable and is used as a ‘‘gold standard.’’ Thus, any deficiencies of the small-scale model are carried over to the large-scale model. Another issue that needs to be taken into account when considering the selection of super-atoms and their mapping is whether the polymer under study contains a degree of randomness, as in atactic chains. One can select anchor sites that are the same in both tacticities and ignore the local tacticity information, or one can map meso and racemo dyads separately leading to a more complex description that, in essence, is a finer-grained coarse-graining step. For the case of polystyrene, both approaches have been taken and it is not yet clear which is more successful.34,35 In general, the choice of superatoms and anchor sites needed for the mapping for any system must obey the boundary conditions of the problem at hand.

STATIC MAPPING Single-Chain Distribution Potentials One of the earliest approaches of systematic mapping was attempted by Tscho¨p et al.1 Their technique begins with a detailed quantum chemical calculation of a few monomers of the system to obtain energetically favorable local conformations and their relative energies. Those quantum chemically determined distributions are then used to perform single-chain Monte Carlo simulations in a vacuum. The corresponding distributions of super-atom bond

Static Mapping

239

lengths, bond angles, and dihedral angles are recorded. To obtain a correct potential, those distributions have to be weighted by the corresponding Jacobians between local and global coordinate systems, which, for bond lengths, is r2 coming from the transformation from spherical to cartesian coordinates. These distributions are then Boltzmann-inverted to obtain intramolecular potentials, i.e., a potential is derived from the temperature-weighted logarithm of the distribution. In a vacuum, the free energy difference between conformations equals the difference in potential energy, as expressed in Eq. [1]. VðÞ ¼ kB T ln pðÞ

½1

In this equation,  can stand for bond lengths, bond angles, or torsional angles. The distribution of these structural features pðÞ is taken after the Jacobian correction. It is noteworthy that this potential is completely numerical. To calculate its derivative so as to obtain the corresponding forces, smoothing techniques like local splines or running averages are used. This static mapping technique can only be used for single chains in a vacuum; otherwise, the identity between the potential and the free energy cannot be justified.

Simplex A number of direct ways for linking atomistic and meso-scale melt simulations have been proposed more recently. The idea behind these direct methods is to reproduce structure or thermodynamics of the atomistic simulation on the meso-scale self-consistently. As this approach is an optimization problem, mathematical optimization techniques are applicable. One of the most robust (but not very efficient) multidimensional optimizers is the simplex optimizer, which has the advantage of not needing derivatives, which are difficult to obtain in the simulation. The simplex method was first applied to optimizing atomistic simulation models to experimental data.36,37 We can formally write any observable, like, for example, the density r, as a function of the parameters of the simulation model Bi . In Eq. [2], the density is a function of the Lennard–Jones parameters, r ¼ f ðfBi gÞ

½2

This mathematical identification means that we interpret our simulation along with the subsequent analysis of the observable by evaluating a complex function. This function, in multidimensional space, can be optimized as can any mathematical function. For an optimizer to be applicable, one must define a single-valued function with a minimum (or maximum) at the desired target as, for example, the sum of square deviations from target values in Eq. [3]. X 2 ½3 AðfEi g; fsi gÞ  Atarget f ¼ i

240

Coarse-Grain Modeling of Polymers

Every function evaluation requires a complete equilibration sequence (either molecular dynamics (MD) or Monte Carlo (MC)) for the given parameters, followed by a production run and the subsequent analysis. To ensure equilibration, one must be certain that no drift in the observables exists, for which an automatic detection of equilibration was developed.36 It has been shown that derivatives of observables with respect to simulation parameters can be calculated in some cases, paving the way for more efficient optimizers.38 In the context of polymer mapping, we point out that typical target functions are not experimental observables but, instead, are related to the structure of the system that is characterized by structure factors or by radial distribution functions, although optimization against experimental data has been performed as well.16,17 The single-valued function to be minimized is then the integral over the squared difference of, e.g., radial distribution functions.2 If necessary, the radial distribution function can be multiplied by a weighting function wðrÞ as in Eq. [4].2,3,35 Because the local structure is most difficult to reproduce, an exponential decay wðrÞ / expðarÞ with some decay length a1 is often a good choice. ð  2 f ¼ drwðrÞ gðrÞ  gtarget ðrÞ

½4

A drawback of the simplex and other analytical optimizers is the unavailability of numerical potentials. What is needed is a relatively small set of parameters, Bi , defining the entire parameter space. The limit is typically 4–6 independent parameters, because any additional dimension increases the need for computing resources tremendously. A typical choice for such parameters is a Lennard–Jones-like expansion2,24 in Eq. [5] VðRÞ ¼

X Bi i

ri

½5

where i typically spans the even numbers from 6 to 12. The simplex technique has been moderately successful in reproducing monomers of polyisoprene.2

Iterative Structural Coarse-Graining The Iterative Boltzmann Method (IBM) was developed to circumvent the problems encountered with the simplex technique.3,25,35,39,40 It is designed to optimize coarse-graining parameters against the structure of an atomistic simulation, and it lifts the limitation of needing analytical potentials. In the limit of infinite dilution, one could use the potential of mean force (PMF) by Boltzmann inverting the pair distribution function from a simulation or an experiment to get an interaction potential between monomers, which is

Static Mapping

241

essentially the nonbonded generalization of the single-chain approach described earlier. Ideas like these have also been used to calculate the PMF of large particles like colloids embedded in matrices of small particles where the small particles play only the role of a homogeneous background.41,42 However, in concentrated solutions or in melts, the structure is defined by an interplay of the potential and packing effects. Thus, a direct calculation of the PMF is not a suitable way to obtain the potential energy. Nonetheless, free energies can be used iteratively to approach the correct potential. To accomplish this result, a melt or a solution of polymers is simulated in atomistic detail to derive a pair distribution function along with all the internal structural functions described earlier. Note that an experimentally obtained structure factor could be used as the target alternatively. In that case, however, it is not clear how to define the super-atoms except if one uses partially deuterated melts (because the experimental super-atoms are not uniquely defined). Moreover, one would have to transform the structure-factor to a radial distribution function, because the IBM is local in interaction distance and not local in the wave vector, which means that for every iteration, a one-to-one correspondence is assumed between the effects at a distance r0 and the potential Vðr0 Þ (or force dr VðrÞjr¼r0 ) at that distance. Using the radial distribution function, this locality has been proven to be stable for the iterative procedure; however, we cannot expect this result to hold in wave vector space. The resulting potential of the IBM is completely numerical, because the potential energy value at every distance is optimized independently. It is possible (and advantageous) to enforce continuity of the potential by using weighted local averages, which is important if the function against which the potential is to be optimized is relatively noisy. Regrettably, the correct way to remove noise is a longer atomistic simulation, which can be prohibitive in terms of computing times. Widely used atomistic programs like Gromacs43 often use a numerically tabulated potential, however, even if an analytical form exists for reasons of speed. Then, to calculate a derivative to obtain the forces, local splines or similar techniques can be used to smooth the function. Cross dependencies of weakly dependent potentials (e.g., bond and angle) can be neglected for computational reasons, but they are normally eliminated by the proper choice of mapping points anyway. In Figure 3, we show the resulting potential for the bond angle on the meso-scale using polystyrene chains.35,44 It turns out that no angle states under 75 degrees are populated and the corresponding Boltzmann inversion would lead to an infinite potential value (as the logarithm of zero diverges) that, in turn, would lead to numerical problems. A practical working approach to negate this problem is to set the value of the corresponding potential energy to V ¼ 50kB T, thereby preventing such states from being reached. A larger value of V would lead to numerical problems because the potential change would be excessively steep, which would lead to huge forces, thus limiting the size of the time step one could use in the coarse-grain simulation.

242

Coarse-Grain Modeling of Polymers

Angle Potential (kJ/mol)

10

8

6

4

2

80

100

120

140

160

175

Angle (degree)

Figure 3 The bond angle potential on the meso-scale for polystyrene obtained by the Iterative Boltzmann Method.35 The angle is defined by three consecutive super-atoms along the polymer chain.

When working with iterative structural coarse-graining techniques, we limit ourselves to potentials and distribution functions that depend only on a single coordinate like radial distribution functions (RDFs), bond distance, bond angle, or dihedral angle distributions. These distribution functions are convenient for describing the structure of polymers, and they enable the use of the IBM3,35,39 to reproduce the structure. As this procedure is iterative, one has to start with a reasonable initial guess as to what the potential function looks like. We invert RDFs for one-component liquid systems by taking a simple inverse of g0 ðrÞ, the ‘‘target’’ RDF (from the atomistic simulation), resulting in the PMF. FðrÞ is a free energy and not a potential energy. Simulating our system with the initial guess of the potential (the PMF) will yield a new RDF g1 ðrÞ, which differs from the atomistic target result g0 ðrÞ because it combines both packing and potential effects. The potential, therefore, needs to be improved, which can be done by adding a correction term, F ¼ kB T lnðg0 ðrÞ=g1 ðrÞÞ. This procedure is iterated until the desired distributions of the coarse-grained model and the atomistic model coincide within a preset tolerance. The whole procedure is schematically shown in Figure 4. Two points are worth stressing here. First, concerning nearest neighbors, the local packing of interaction centers has the greatest influence on the radial distribution function, so the optimization of the potential should focus initially on local interactions. The optimization process should begin with this shortdistance region and then, only after the image of the meso-scale RDF of this region resembles the atomistic RDF reasonably well, should the tuning process of the other regions begin. It is a good idea to perform a few (typically two to

Static Mapping

243

Initial Potential Target RDF

Simulation and Calculation of Add Δ F to Potential Radial Distribution Function

Difference below Tolerance?

NO

Calculated and Target RDF Free Energy Difference Δ F

YES

Done

Figure 4 Scheme of the iterative procedure used in structural coarse-graining based on the inverse Boltzmann method.

three) independent optimizations, in series, that focus on increasingly larger distances. Second, we need to apply different weighting functions wi for the correction terms during the iteration. The magnitude of the weighting function to be used depends on how far the resulting RDF deviates from the target atomistic RDF; the weighting function is normally set to 1 when the deviation is about 30–40% from the atomistic value. When the deviation is below 30%, a series of parallel runs can be performed with values of wi ¼ 18, 14, and 12 to find an optimum starting point for the next step. Running the optimization process in parallel minimizes the time to the next step but requires more computer power. In a binary polymer melt ðA  BÞ, the interaction can be sorted into selfinteraction ðA  A, B  BÞ and non-self-interactions ðA  BÞ. Because the selfinteraction in a polymer melt may not be the same as in the pure polymer, there are actually three target RDFs to be optimized in addition to any bonded interaction that must be optimized. Although we have correspondingly more target functions, it is a good idea to optimize the pure systems first. Following that step, we start optimizing the mixture where the A  A and B  B interactions are held fixed at their pure polymer value, and only the A  B interaction is tuned. Only when the non-self-interaction has been dealt with can we come back to the optimization of all three target functions at once.

244

Coarse-Grain Modeling of Polymers

It is noteworthy to point out that, in any system, optimization of the bonded and the nonbonded parameters can be performed either together in one combined procedure or they can be done separately because the mutual effects between the two types of interactions are negligible. Because the intrachain optimization can be achieved much more quickly than can the interchain optimization, most modelers choose to optimize the two separately. It has been shown recently, with a comparative study of melts and solutions of polyisoprene,3 that the environment has a strong effect on the coarsegrained model. Because polymers in the melt have a different scaling behavior than in solution,23 we cannot use the same model when we remove the solvent. For polyisoprene, it was possible to calibrate the meso-scale model at chains of length 10 and then to perform simulations for chain lengths up to 120.3 The scaling for the melt and the solution cases were well in agreement with experiments and with theoretical expectations. In Figure 5, we show how one can approach the target RDF by this mapping technique for a polystyrene melt. We clearly see an increase in accuracy over the course of the optimization process. The RDF indicated ‘‘1st iteration’’ used only the PMF. Severe deviations from the target make clear the difference between potential energy and free energy. The entropy component of the free energy corresponds to the multitude of local conformations subsumed into one meso-scale position of a super-atom. The effective size of the monomer, indicated by the point at which the RDF starts to deviate from zero, is largely overestimated. Additionally, the local structure is much too pronounced. Note,

1.5

g(r)

1

0.5

target st

1 iteration nd

2 iteration middle stage close to convergence

0

0

0.5

1

1.5

2

r (nm)

Figure 5 The approach of the RDF by the Iterative Boltzmann Method in the case of atactic polystyrene.35 Running averages were applied to the data for clarity. Not all iterations are shown.

Static Mapping

245

too, that the target atomistic RDF rises continuously from zero to one. The initial slope of that curve corresponds to the ‘‘hardness’’ of the potential. Clearly the first interaction is too hard. After only one iteration, the size of the monomer has been reduced and the potential is much softer. However, the strong overshooting of the first neighbor peak persists. With a few more iterations, we see that the structure of the atomistic system is reasonably well approximated. Because these techniques focus only on the structure of the polymeric system, we are not guaranteed that the thermodynamic state is correctly described, as has been pointed out by Reith et al.25 To avoid such problems, thermodynamic properties should be included in the optimization scheme. To treat pressure, for example, the following is done:25 After optimizing the potential energy against the structure, an additional pressure correction potential Vpc of the form   r ½6 Vpc ðrÞ ¼ Apc 1  rcut is added, in which Apc is negative if the pressure is too high and is positive if it was too low. In Eq. [6], r is the distance between atoms and rcut the cutoff up to which the correction potential is applied. The cutoff is normally chosen to be the same cutoff distance for the nonbonded terms used in the simulation. This additional potential provides another constant force, in addition to the force from the structural potential, leading to a constant shift in pressure. Such a potential has only a weak influence on the RDF that can be eradicated by a short reoptimization. Reith et al. showed that this correction can solve the problem of having an unphysically high pressure.25 Another coarse-graining technique exists where the atomistic and coarsegrained simulations are not separated.6 On the contrary, both fully detailed and mesoscopically modeled particles are allowed to coexist in the very same simulation. The detailed particles carry two potentials because they interact with the nondetailed particles as if they were nondetailed particles. The aim of this method is to provide a homogeneous structure where detailed and nondetailed particles are fully mixed and the local structure is indistinguishable. It represents an alternative way to obtain the target functions but does not require a different optimization technique.

Mapping Onto Simple Models In addition to the self-consistent modeling techniques described above, one can use an ad hoc mapping between two independently developed models. In this case, only a few characteristics of the models can be mapped. A good example is the stiffness of the chain. The chain stiffness can be characterized by the persistence length lp , which is derived from an assumed exponential

246

Coarse-Grain Modeling of Polymers

decay of the directional autocorrelation function along the chain backbone (or the integral over this function) uðr þ sÞ~ uðrÞi es=lp ¼ h~

½7

where r is the curvilinear coordinate along the chain contour, s is the distance along this curvilinear coordinate, and ~ u is the unit vector denoting the local chain direction. We can measure the persistence length in two independently developed models and equate it to obtain the mapping of chain lengths. Of course, any other characteristic length scale such as the gyration radius, the end-to-end distance, the monomer size, and so on can be used as well. All these length scales will generally yield different mappings. If the two models are reasonably similar, the differences will be small and the mapping is meaningful; otherwise, the mapping per se is a bad idea. Accordingly, one should not use a model developed for polydimethylsiloxane, which is probably the most flexible of all polymers, to describe the significantly more rigid actin filaments with persistence lengths that are several orders of magnitude larger. In principle, the mapping onto simple models assigns a computationally cheap interaction potential to a set of super-atoms. With respect to optimization, this mapping is just the initial step before any further refinement is carried out. In this vein, the polymer models can be similar to one another, in which case, we can get a good mapping or we can rely on vastly different polymer models with poor mapping qualities and, as such, have essentially nothing to do with each other.

DYNAMIC MAPPING Molecular dynamics simulations in atomistic detail regularly use a 1-femtosecond time step. This time step is required to be about an order of magnitude smaller than the fastest characteristic time, which, for many molecules of interest, involves bond vibrations. As the bond lengths are customarily fixed, using techniques like Shake,45,46 Rattle,47,48 or Lincs,49 the fastest time scales in atomistic molecular dynamics are bond angle vibrations that are on the order of tens of femtoseconds. With a reasonable use of computer resources, one can reach into the nanosecond time range for a simulation. This time period is sufficient for making comparisons with segmental dynamics in NMR experiments50–52 but not long enough to compare with large time scale experiments. Techniques used to map the statics of polymers described earlier lead inherently to larger time scales because the fastest degrees of freedom are now motions of super-atoms of the size of monomers. If dynamic investigations are desired, one must find a correct mapping of the time scales involved in the different models.

Dynamic Mapping

247

Mapping by Chain Diffusion One method for calibrating the time scale is to use the chain diffusion coefficient. At long enough times, any polymer chain in a melt will end up in diffusive motion as soon as all internal degrees of freedom are relaxed. As described earlier, static mapping can be used to determine the length scale; one typically uses the size of the monomer or the distance between super-atoms along the chain to obtain a suitable length scale for the coarse-grained simulation. If both the atomistic and the coarse-grained simulations can be fully equilibrated in the sense that free diffusion of the whole chain is observed, the two diffusion coefficients can be equated and the time scale is then fixed. Diffusion coefficients in simulations are normally determined by the mean-square displacement through the Einstein relation described in Eq. [8] D¼

1 hðxðt þ tÞ  xðtÞÞ2 i lim 2d t!1 t

½8

where d is the dimensionality of the system and t is the time. In most cases, a complete free diffusion of an atomistic chain in the melt or in the solution cannot be reached in reasonable computer time, which is the case when a coarsegrained simulation should be used as a means to efficiently equilibrate the structure from which atomistic simulations will be started. One example of mapping by chain diffusion involved the case of 10mers of polyisoprene at 413 K. A dynamic mapping between a fully atomistic and a very simple coarse-grained model was demonstrated.7,50 Only chain stiffness was used to perform the mapping in that study. The local chain reorientation in both simulations was the same after the time scales had been determined by the diffusion coefficient. The decay times of the Rouse modes, however, were not equal, indicating that mapping by stiffness alone is too simplistic. This mapping, as any dynamic mapping, can become problematic for mixtures, because the degree of coarse-graining between the different constituents is not necessarily the same, leading to the problem where the ratio of diffusion coefficients in the atomistic and meso-scale simulation can be different. It has been found in simulations of coarse grained phospholipids that one prerequisite for a good dynamic mapping is that the masses of the super-atoms should be similar.53 For example, in lipid simulations, four water molecules were mapped into one super-atom so as to create a mass similar to four CH2 groups that were used as a super-atom in the lipids.

Mapping through Local Correlation Times Instead of relying on chain diffusion from lengthy simulations, it is often more convenient to use shorter local time scales to map between

248

Coarse-Grain Modeling of Polymers

atomistic and coarse-grained length scales, which allows one to carry out a mapping if the atomistic simulation does not reach free diffusion. Even if free diffusion could be reached, the statistical uncertainty of such long time scales is often so great that a shorter time scale is warranted. Candidates for shorter time scales are decay times of higher Rouse modes and, even if the Rouse model is an imperfect description of the system under study, such a mapping is meaningful because it is well-defined. The Rouse modes are the eigenmodes of the Rouse model (see below). The Rouse mode of index p Xp is defined as 1 Xp ¼ N

ðN 0

ds cos

pps Rs N

½9

where N is the degree of polymerization, s is the coordinate along the chain contour, and R is the position. Rouse modes of index p are effectively describing a subchain of length Np, so the first mode describes the chain as a whole, the second mode is the structure and dynamics on the length scale of half a chain, and so on. Every Rouse mode has its distinct time scale tp . These time scales are, in the case of a polymer that follows the Rouse model perfectly, correlated by t1 ¼ p2 tp . In the extreme case where this subchain for high Rouse modes becomes only a single monomer, we end up with the segmental relaxation time, i.e., the reorientation dynamics on the monomer scale. This time scale can almost always be used for mapping, and it can also be used for comparison with and calibration to NMR experiments. If we use the Rouse model for mapping time scales, we should make sure that the Rouse model is a reasonable description for the system under study. The Rouse model as well as the reptation model are successful models for describing the dynamics of polymers. The Rouse model54 treats the polymer as a set of noninteracting beads connected by springs. The dynamics of polymer chains in a melt is governed in this model by a viscous force and the stretching forces along the chain. The reptation model confines the motion of the polymer to take place in a hypothetical tube but can nonetheless describe global dynamic problems. It becomes applicable if a chain becomes longer than a polymer’s specific entanglement length Le . Mean-squared displacements of the monomers g0 1 ðtÞ are important in characterizing polymer dynamics. There exist four processes involving time: (1) For very short times t < te , the polymer segment does not feel the constraints of the tube, so the actual dynamics of the reptation model corresponds to the Rouse model. (2) For te < t  tR , the motion perpendicular to the primitive path, which largely follows the chain, is restricted. However, the motion along the primitive path is free because it is easier for a polymer to displace itself than its neighbors. (3) For te < t  td , the internal degrees of freedom are relaxed,

Dynamic Mapping

249

but the chains still are confined in the tube. (4) For t > td , the dynamics is governed by free diffusion. All these processes can be summarized by Eq. [10]:23,55 8 > Nb2 ðt=tR Þ1=2 > > > < Nb2 ðt=Z2 tR Þ1=4 g0 1 ½t ¼ > Nb2 ðt=t Þ1=2 > d > > : Nb2 ðt=td Þ

t < te t e < t  tR

½10

t R < t  td t > td

where te , tR , and td are entanglement time, Rouse time, and disengagement time, respectively; and Z ¼ LLe , N, and b are the ratio of contour length to the entanglement length, the degree of polymerization, and the size of a monomer, respectively. In the Rouse model, only mean-squared displacement behaviors with exponents of 1=2 and 1 exist because the only subdiffusive process is the motion of a monomer against the center of mass of the chain at short times. Using coarse graining it was possible to simulate the full spectrum of the Rouse and reptation dynamics for atactic polystyrene.44 But Figure 6 illustrates that a global dynamic mapping of trans-1,4-polyisoprene to a simple bead-spring model that includes stiffness cannot map the local dynamics completely.7,50

10

−1

2

g 1,3 /

10

0

10

−2

10

−3

10

−4

1

10

100

1000

10000

Figure 6 Dynamic mapping of polyisoprene at 413 K to a coarse-grained model. Thick lines: atomistic simulations. Thin lines: coarse-grained simulation. The broken lines represent mean-squared displacements of the central monomers, i.e., a local quantity. The solid line shows the mean-squared displacement of the center of mass, i.e., a global property. The mean-squared displacement of the center of mass is used for the mapping. We see that the local quantity does not perfectly follow the mapping.

250

Coarse-Grain Modeling of Polymers

Direct Mapping of the Lennard–Jones Time A different idea that is independent of the atomistic simulation involves mapping of the so-called Lennard–Jones time to real time. If one uses the standard Lennard–Jones units, where we measure lengths in s (the particle diameter), energies in E (the depth of the Lennard–Jones potential), and masses in m (the monomer mass), a natural time scale appears that is conventionally called the Lennard–Jones time,48,56 t¼s

rffiffiffiffiffi m E

½11

This time scale can be used to perform the mapping to the real time scale.3,57 This dynamic mapping runs into a problem when one uses purely numerical potentials because, in that case, s and E are not uniquely defined. One could select a characteristic length or an energy scale, but the definition of this kind of time mapping also becomes ambiguous. Mapping of diffusion or internal time scales are more closely connected to the true system. Nonetheless, this ‘‘Lennard–Jones’’ mapping can be done a priori without performing the simulation and thereby can provide an initial estimate of the time scale. Also, if no atomistic simulation is available, this mapping can provide a guideline for estimating experimental times.

COARSE-GRAINED MONTE CARLO SIMULATIONS The main difference between Monte Carlo (MC) and Molecular Dynamics (MD) simulations is that we do not need to follow the physical trajectory of the system with MC, which, in turn, enables us to use ‘‘unphysical’’ moves to cover the relevant area of phase space more quickly. Such moves include chain breaking and reattachment,58,59 configurational bias,60 and reptation moves.60 Because we do not have to follow a physical trajectory in an MC simulation, we can also use models that are further removed from the true physical or chemical reality. Such models include lattice models (see, e.g., Refs. 28,61,62). With lattice models, the space of our system is (typically) evenly divided into cells, each of which are represented by one lattice site. Lattices can be very simple cubes or they can be specially adapted, highly connected grids.63–65 Here again, we need super-atoms, which, however, can occupy only lattice sites. In most lattice models every site is either singly occupied or empty, meaning that the interaction sites have an impenetrable hard core, which contrasts to Lattice–Boltzmann models66 used in studies of hydrodynamics in which every lattice site is occupied by a density, in which case, one deals with a densitybased field theory. In lattice models, there exist only a fixed number of distances that can be realized. It makes no sense to distinguish between, say, a

Coarse-Grained Monte Carlo Simulations

251

Figure 7 Left: Representation of a polymer on a simple cubic lattice; Right: Interaction potentials on such a lattice.

Lennard–Jones or a finite-size well potential; the two are essentially identical. The limited number of distances makes optimizations relatively straight forward as illustrated in Figure 7. In most lattice models, a super-atom can represent a monomer or a Kuhn segment of the chain.67 In most lattice models, only interactions of very close neighbors (first or second neighbors) are included such that the calculation of the energy, which is the computationally most expensive part of a Monte Carlo calculation, is a sum whose calculation scales linearly with the number of lattice sites. The actual mapping process, if done systematically, is easier than without using a lattice. With a lattice model, we have fewer points in the RDF that need to be reproduced; otherwise, there is no fundamental difference between lattice and off-lattice models. An often-used coarse-grained Monte Carlo model is the bond-fluctuation model.28 In contrast to most other coarse-grained models, it lacks a fixed or quasi-fixed bond length. Instead, connected monomers can occupy all side and corner sites of an fcc lattice if the monomer to which they are connected is in the center of the face-centered cube as shown in Figure 8. In this model, as in others, the solvent is typically ignored such that monomers are either occupying a site or the site is deemed to be empty. Monte Carlo simulations can also be performed with an off-lattice model. In this case, the mapping is the same as described earlier for MD but no dynamic mapping is involved. Kreer et al.68 showed that the number of Monte Carlo moves can be mapped onto a ‘‘pseudo-time.’’ This mapping procedure can be used only if no nonphysical Monte Carlo moves are applied, i.e., only local physical moves are allowed. To accomplish this feat, we need to show that the simulation moves represent the true local dynamics of the model; one includes only the moves that are possible and the relative abundance of

252

Coarse-Grain Modeling of Polymers

Figure 8 The bond fluctuation model. The possible neighbors in two dimensions of the black monomer are the neighboring In three dimensions, this model leads pffiffiffi gray pmonomers. ffiffiffi to relative bond lengths of 1, 2, and 3, where 1 corresponds to the lattice spacing.

different moves represents truthfully the relative probabilities of the possible local dynamical changes. In most cases, we are not interested in a dynamic mapping. In such circumstances, we can use all the advances of modern Monte Carlo technology, i.e., we can apply all conceivable physical and nonphysical moves to derive a correct representation of structure and thermodynamics with less computational effort than would be possible with either MD or MC using only local moves.

REVERSE MAPPING In most applications of coarse graining, one can be satisfied if a relaxed, large-scale description of the system has been derived. However, in some cases, we want to reintroduce atomistic detail in the end, for which we use the anchoring points between different models and reverse the mapping.69 This process is not unique because any constellation of coarse-grained interaction sites represents a variety of constellations of atomistic sites. In reverse mapping, one could use the (precalculated) energetically most favorable states and carry out a short atomistic MD or MC simulation to represent the full system,8 which can technically be done as follows. For short oligomers (two to four monomers) of the respective polymer, a wide variety of local conformations are produced and their respective energies are calculated using the atomistic model. For simplicity, this calculation is done in a vacuum and only the torsional degrees of freedom are used. Additionally, the relative positions of the super-atoms in these fragments are stored. From the meso-scale simulation, we have obtained a

Reverse Mapping

253

melt conformation consisting of super-atoms. We then move along all the chains and, fragment by fragment, select the atomistic configurations with the super-atom constellation fitting the coarse-grained chain. If there is more than one fragment that fits at a particular position along the chain, we take the one with the lowest energy. Rather than using only the most favorable and most populated states, we can also use higher-energy states according to their Boltzmann weighting. In reverse mapping, we need to consider only the torsional degrees of freedom primarily because the atomistic bonds and angles are more rigid and less deformable than are torsions. Moreover, bond and angle distributions equilibrate very quickly such that any subsequent short MD simulation will be able to provide a realistic distribution. After the reintroduction of atomistic detail, the melt configuration will have some overlap in the atomic positions leading to a high energy. Therefore, an energy minimization should be performed before any MD run. Also, one may need to start the simulation with a small time step and increase it after a few steps as the system equilibrates. Another way to do reverse mapping is to rigidly fix the super-atom centers in space and perform a local MC simulation of only the small-scale interactions.9 Here, one selects a distribution of coarse-grained structures and does the local calculation on all structures in that distribution. Figure 9 shows the idea for more than one coarse graining step. The interaction sites marked in black are the highest degree of coarse-graining, the ones in gray are intermediate, and the white interaction sites are the smallest (atomistic). After performing a simulation of only the black super-atoms, with a coarse-grained potential obtained in any of the ways described earlier, we have an ensemble of system

Figure 9 The method of Brandt9 where each length scale is treated independently and each interaction site can only move if the respective length scale is treated. Black, gray, and white circles correspond to coarse-, medium-, and fine-grained monomers, respectively. See text for details.

254

Coarse-Grain Modeling of Polymers

configurations of these super-atoms. A Monte Carlo simulation with finer resolution is then performed for each member of the ensemble without disturbing the position of the super-atoms, i.e., in Figure 9 we would move only the gray centers while constraining the black centers to obtain an ensemble of configurations containing the black and the gray centers. If more than one level of coarse graining exists, an even finer-grained simulation follows by fixing the black and the gray centers and moving only the now added white centers. The difference in degree of detail that one uses here to account for different grain models in this technique typically involves every third interaction site. Thus, we have about three times more gray centers than black centers and three times more white centers than gray centers.9 If more than one level of coarse graining exists, a structural optimization is used for each level to obtain a potential. In some instances, the rigid constraint can also be relaxed.

A LOOK BEYOND POLYMERS Polymers have traditionally been the focus of multi-scale modeling. Other areas of soft-condensed matter, notably biological membranes, have become extremely important more recently and myriad coarse-graining techniques have been applied to them.53,70–82 As the techniques used in membrane simulations are similar to those used for polymers, we point out here only a few of the main differences. Phospholipids, the main ingredient of biological membranes, can be viewed as essentially consisting of two hydrophobic oligomers connected by a hydrophilic head group. The mapping of an atomistic representation of a lipid bilayer to a coarse-grained representation is illustrated in Figure 10. These systems self-assemble into bilayers where the hydrophobic core is shielded from the surrounding water. Compared with polymers, we now must deal with three main new effects. First, the systems are inherently heterogeneous because the biomembrane is in water. Second, lipids are essentially very short heteropolymers because they contain hydrophilic and hydrophobic parts. Third, the electrostatic interactions of lipid molecules are much more important than those in typical polymer systems. A

Figure 10 The mapping of an atomistic representation of a lipid bilayer to a coarsegrained model.

A Look Beyond Polymers

255

successful meso-scale simulation model for lipid bilayers was proposed by Marrink et al.53 The model was originally parameterized to reproduce the structural, dynamic, and elastic properties of lamellar and nonlamellar states of various phospholipids. In that study, groups of 4–6 heavy atoms (carbons, nitrogens, phophorus, and oxygen) were united to form super-atoms. The lipid headgroup consisted of four sites. Two hydrophilic sites (one representing the choline and one representing the phosphate group) and two intermediately hydrophilic sites (representing the glycerol moiety) were involved. Each of the two tails of dipalmitoyl-phosphatidylcholine (DPPC), an abundant phospholipid, was modeled by four sites. Water was modeled by individual hydrophilic sites, each representing four real water molecules in order to achieve a similar mass as the lipid super-atoms so as to make the dynamic mapping easier as described earlier. The sites were constrained to interact in a pair-wise manner via Lennard–Jones (LJ) potentials. Five different LJ potentials were used, ranging from weak, mimicking hydrophobic interactions, to strong, for hydrophilic interactions (with three levels in between for other types of interactions). In addition to the LJ interactions, a screened Coulomb interaction was used to model the electrostatic interaction between the zwitterionic head groups. The choline group bears a charge of þ1, and the phosphate group bears a charge of 1. Soft springs between bonded pairs held the coarse-grained molecules together and angle potentials provided the appropriate stiffness. For efficiency reasons, all super-atoms were assigned the exact same mass of 72 atomic units. The interaction of lipids with small molecules was treated in a similar manner. Alcohols (butanols) were modeled simply as a dimer of a polar and an apolar site;83 the polar site has the same interaction potential as does water, whereas the apolar site is the same as the alkanes in the lipids. This model makes the alcohol a symmetric amphiphile (which is not fully realistic). The alcohol concentrations had to be renormalized by Dickey et al. because one coarse-grained water represents four actual water molecules, whereas one coarse-grained butanol represents one real butanol. Accordingly, a concentration of 1:100 (butanol:water) in the coarse-grained model actually is 1:400 in the real system. Rougher coarse graining has also been used in lipid bilayer modeling. Only generic effects of the chemistry are taken into account for such rough models including hydrophilic–hydrophobic interactions and the anisotropy of the overall molecule. Notwithstanding, important generic properties of membranes have been elucidated. An example is the general pathway of lipid bilayer self-assembly, which is not specific to the individual lipid molecules.70,71,84 Also, large-scale properties like the bending modulus and the influence of concentrations on the bending modulus have been elucidated.72 In that study, it was noted that the layer thickness is the most crucial factor needed for the prediction of the bending modulus. The phase behavior of lipids has also been studied using dissipative particle dynamics.73,74 A number

256

Coarse-Grain Modeling of Polymers 65 Å GM1

DPPE 45 Å

40 Å

Figure 11 Illustration of the nonadditive hard–disk model for lipid mixtures.

of solvent-free models have been proposed that are able to reproduce the liquid phase behavior and domain formation85–88 as well as the general elastic behavior of the membrane.88,89 An even more drastic approach to coarse graining is to model the lipid completely in two dimensions and to use only one interaction site per lipid. A simple example of such a model is a nonadditive hard-disk model90 depicted in Figure 11 and described later. For simulating lipid bilayers on very large scales, Monte Carlo techniques are the method of choice. To model the interactions of mixed phospholipids, a simplified model was developed by Faller et al.90–92 That model contains the essential interactions between ganglioside lipids having large head groups, other smaller lipids in the membrane, and attacking pathogens. Ganglioside lipids are unusual—they pack well into lipid membranes, but they have a large oligosaccharide head group that extends away from the membrane surface.91,93 Thus, they are dispersed in a layer of other lipids. Such mixtures cannot be modeled readily in a traditional two-dimensional way. To derive a model for the lipid interactions between dipalmitoyl phosphatidylethanolamine (DPPE) and the ganglioside lipid GM1 , two coupled layers of hard disks were used (see Figure 11).90 It is well known that hard-sphere fluids have a single-phase transition when going from a gas phase to a crystalline phase.94,95 Without attractive interactions, no liquid phase can emerge, so one can expect that such a generalized hard-disk fluid will also have two phases. At very high pressures, phase separation may also occur. The minimum packing area for lipids with two linear hydrocarbon chains, like DPPE and GM1 , is 38 to 40A˚2 per molecule. Head group size, hydration, steric, and entropic interactions may increase this area substantially. In the work of Ref. 90, 45A˚2 was used initially for DPPE and 65A˚2 for GM1 . These values are based on experimental pressure–area isotherms for each lipid.93 However, GM1 molecules at low to intermediate densities when mixed with DPPE do not change the overall area per molecule very much. Hence, a minimum packing area of 40A˚2 per molecule was used for GM1 in the hydrocarbon plane. The DPPE molecules are therefore modeled as simple disks, whereas the model for the GM1 molecules consists of two concentric disks that act in two layers, which technically leads to the peculiar situation where we have a binary hard-disk fluid with a cross interaction radius that is not the average of the self-interaction radii.

Conclusions

257

It is known96,97 that when cholera toxin attacks a membrane it binds to five pentagonally arranged GM1 molecules. Therefore, for some of the simulations, a number of GM1 particles were fixed, which was done in two different ways.90 The first and easiest way is to fix a number of particles randomly in space. The second way was to fix them in a group of pentagonal shapes so as to model simplistically the binding of cholera toxin to a mixed DPPE/GM1 bilayer. With either arrangement, the increase in area per head group at the binding of cholera toxin could be reproduced semi-quantitatively, thereby explaining that the increase in area per head group comes from the disruption of local packing by the fixation of molecules.

CONCLUSIONS This chapter focused on describing structural properties of polymers and related soft-matter systems using coarse-grained models. We need to point out two major caveats that are important and should be considered especially by novice modelers. First, every mapping carried out between two systems is done at a specific state point. Caution is advised if we want to transfer a coarse-grained model between different state points. Changing just the temperature from one state point to another can lead to a severe change in the meso-scale model. It has been shown recently that a coarse-grained model for atactic polystyrene optimized in the meet crystallizes under cooling instead of forming a glass.98 An extreme example of this case involves crossing through the l-temperature in polymer solutions where the system undergoes a significant structural transition between a globular nonsolvated polymer conformation and a well-solvated stretched conformation. Note, however, that for polymers like polyisoprene and polystyrene, there exists a stability of the modeling results with chain length; it was not necessary to reoptimize the meso-scale model with increasing chain length, which is one of the major strengths of the selfconsistent optimization technique described earlier. Second, changing concentrations in polymer mixtures requires reevaluating the mapping. One must optimize all interaction potentials together at least in the final steps. Using coarse-grained modeling techniques today is inevitable because large-scale atomistic, especially quantum chemical, calculations are impractical and may not be helpful for answering questions involving large size scales or long time scales. Many techniques exists but, regrettably, there is no single answer to the ‘‘How To?’’ question. Coarse graining is still far from being a technique that can be used in a broad sense as can atomistic simulations because one must always think about the underlying scientific problem. It may never become as easy to use as atomistic MD or MC methods where a

258

Coarse-Grain Modeling of Polymers

manifold of well-evolved and relatively easy-to-use software packages exist. Coarse graining, however, offers much in the way of addressing scientific problems that are intractable at the atomistic level and, from that perspective, should be considered as a valuable method for molecular simulations.

ACKNOWLEDGMENTS The author thanks Alison Dickey and Qi Sun for assistance with the figures. Some of the work described here was financially supported by the U.S. Department of Energy, Office of Advanced Scientific Computing through an Early Career Grant (DE-FG02-03ER25568).

REFERENCES 1. W. Tscho¨p, K. Kremer, J. Batoulis, T. Bu¨rger, and O. Hahn, Acta Polymerica, 49, 61 (1998). Simulation of Polymer Melts. I. Coarse-Graining Procedure for Polycarbonates. 2. H. Meyer, O. Biermann, R. Faller, D. Reith, and F. Mu¨ller-Plathe, J. Chem. Phys., 113, 6264 (2000). Coarse Graining of Nonbonded Interparticle Potentials Using Automatic Simplex Optimization to Fit Structural Properties. 3. R. Faller and D. Reith, Macromolecules, 36, 5406 (2003). Properties of Polyisoprene – Model Building in the Melt and in Solution. 4. R. L. C. Akkermans and W. J. Briels, J. Chem. Phys., 114, 1020 (2001). A Structure-Based Coarse-Grained Model for Polymer Melts. 5. K. R. Haire, T. J. Carver, and A. H. Windle, Comput. Theor. Polym. Sci., 11, 17 (2001). A Monte Carlo Lattice Model for Chain Diffusion in Dense Polymer Systems and its Interlocking with Molecular Dynamics Simulations. 6. J. D. McCoy and J. G. Curro, Macromolecules, 31, 9362 (1998). Mapping of Explicit Atom onto United Atom Potentials. 7. R. Faller and F. Mu¨ller-Plathe, Polymer, 43, 621 (2002). Multi-Scale Modelling of Poly (isoprene) Melts. 8. J. Eilhard, A. Zirkel, W. Tscho¨p, O. Hahn, K. Kremer, O. Scha¨rpf, D. Richter, and U. Buchenau, J. Chem. Phys., 110, 1819 (1999). Spatial Correlations in Polycarbonates: Neutron Scattering and Simulation. 9. D. Bai and A. Brandt, in Multiscale Computational Methods in Chemistry and Physics, Vol. 177 of NATO Science Series: Computer and System Sciences, A. Brandt, J. Bernholc, and K. Binder, Eds., IOS Press, Amsterdam, 2001, pp. 250–266. Multiscale Computation of Polymer Models. 10. M. Murat and K. Kremer, J. Chem. Phys., 108, 4340 (1998). From Many Monomers to Many Polymers: Soft Ellipsoid Model for Polymer Melts and Mixtures. 11. C. F. Abrams and K. Kremer, J. Chem. Phys., 116, 3162 (2002). Effects of Excluded Volume and Bond Length on the Dynamics of Dense Bead-Spring Polymer Melts. 12. M. Tsige, J. G. Curro, G. S. Grest, and J. D. McCoy, Macromolecules, 36, 2158 (2003). Molecular Dynamics Simulations and Integral Equation Theory of Alkane Chains: Comparison of Explicit and United Atom Models. 13. H. Fukunaga, J. Takimoto, and M. Doi, J. Chem. Phys., 116, 8183 (2002). A Coarse-Graining Procedure for Flexible Polymer Chains with Bonded and Nonbonded Interactions. 14. O. Hahn, L. Delle Site, and K. Kremer, Macromolec. Theory Simul., 10, 288 (2001). Simulation of Polymer Melts: From Spherical to Ellipsoidal Beads.

References

259

15. C. F. Abrams and K. Kremer, Macromolecules, 36, 260 (2003). Combined Coarse-Grained and Atomistic Simulation of Liquid Bisphenol A-Polycabonate: Liquid and Intramolecular Structure. 16. G. C. Rutledge, Phys. Rev. E, 63, 021111 (2001). Modeling Experimental Data in a Monte Carlo Simulation. 17. F. L. Colhoun, R. C. Armstrong, and G. C. Rutledge, Macromolecules, 35, 6032 (2002). Analysis of Experimental Data for Polystyrene Orientation during Stress Relaxation Using Semigrand Canonical Monte Carlo Simulation. 18. P. Doruker and W. L. Mattice, Macromolec. Theory Simul., 8, 463 (1999). A Second Generation of Mapping/Reverse Mapping of Coarse-Grained and Fully Atomistic Models of Polymer Melts. 19. J. Baschnagel, K. Binder, P. Doruker, A. A. Gusev, O. Hahn, K. Kremer, W. L. Mattice, F. Mu¨ller-Plathe, M. Murat, W. Paul, S. Santos, U. W. Suter, and V. Tries, in Advances in Polymer Science, Vol. 152, Springer-Verlag, New York, 2000, pp. 41–156. Bridging the Gap Between Atomistic and Coarse-Grained Models of Polymers: Status and Perspectives. 20. F. Mu¨ller-Plathe, ChemPhysChem, 3, 754 (2002). Coarse-Graining in Polymer Simulation: From the Atomistic to the Mesoscopic Scale and Back. 21. F. Mu¨ller-Plathe, Soft Mater., 1, 1 (2003). Scale-Hopping in Computer Simulations of Polymers. 22. R. Faller, Polymer, 45, 3869 (2004). Automatic Coarse Graining of Polymers. 23. M. Doi and S. F. Edwards, The Theory of Polymer Dynamics, Vol. 73 of International Series of Monographs on Physics, Clarendon Press, Oxford, 1986. 24. D. Reith, H. Meyer, and F. Mu¨ller-Plathe, Macromolecules, 34, 2335 (2001). Mapping Atomistic to Coarse-Grained Polymer Models using Automatic Simplex Optimization to Fit Structural Properties. 25. D. Reith, M. Pu¨tz, and F. Mu¨ller-Plathe, J. Comput. Chem., 24, 1624 (2003). Deriving Effective Meso-Scale Coarse Graining Potentials from Atomistic Simulations. 26. R. L. C. Akkermans, A Structure-based Coarse-grained Model for Polymer Melts, Ph.D. thesis, University of Twente, 2000. 27. A. Kolinski, J. Skolnick, and R. Yaris, Macromolecules, 19, 2550 (1986). Monte Carlo Study of Local Orientational Order in a Semiflexible Polymer Melt Model. 28. I. Carmesin and K. Kremer, Macromolecules, 21, 2819 (1988). The Bond Fluctuation Method - A New Effective Algorithm for the Dynamics of Polymers in All Spatial Dimensions. 29 K. Binder, Ed., Monte Carlo and Molecular Dynamics Simulation in Polymer Science, Vol. 49, Oxford University Press, Oxford, 1995. 30. K. Binder and G. Ciccotti, Eds., Monte Carlo and Molecular Dynamics of Condensed Matter Systems, Como Conference Proceedings, Societa` Italiana di Fisica, Bologna, 1996. 31. G. S. Grest and K. Kremer, Phys. Rev. A, 33, R3628 (1986). Molecular Dynamics Simulation for Polymers in the Presence of a Heat Bath. 32. K. Kremer and G. S. Grest, J. Chem. Phys., 92, 5057 (1990). Dynamics of Entangled Linear Polymer Melts: A Molecular-Dynamics Simulation. 33. R. Faller, F. Mu¨ller-Plathe, and A. Heuer, Macromolecules, 33, 6602 (2000). Local Reorientation Dynamics of Semiflexible Polymers in the Melt. 34. G. Milano and F. Mu¨ller-Plathe, J. Polym. Sci. B, 43, 871 (2005). Gaussian Multicentred Potentials for Coarse-Grained Polymer Simulations: Linking Atomistic and Mesoscopic Scales. 35. Q. Sun and R. Faller, Comp. Chem. Eng., 29, 2380 (2005). Systematic Coarse-Graining of Atomistic Models for Simulation of Polymeric Systems. 36. R. Faller, H. Schmitz, O. Biermann, and F. Mu¨ller-Plathe, J. Comput. Chem., 20, 1009 (1999). Automatic Parameterization of Forcefields for Liquids by Simplex Optimization.

260

Coarse-Grain Modeling of Polymers

37. R. G. Della Valle and D. Gazzillo, Phys. Rev. B, 59, 13699 (1999). Towards an Effective Potential for the Monomer, Dimer, Hexamer, Solid and Liquid Forms of Hydrogen Fluoride. 38. E. Bourasseau, M. Haboudou, A. Boutin, A. H. Fuchs, and P. Ungerer, J. Chem. Phys., 118, 3020 (2003). New Optimization Method for Intermolecular Potentials: Optimization of a New Anisotropic United Atom Potential for Olefins: Prediction of Equilibrium Properties. 39. D. Reith, H. Meyer, and F. Mu¨ller-Plathe, Comput. Phys. Commun., 148, 299 (2002). CG– OPT: A Software Package for Automatic Force Field Design. 40. D. Reith, B. Mu¨ller, F. Mu¨ller-Plathe, and S. Wiegand, J. Chem. Phys., 116, 9100 (2002). How does the Chain Extension of Poly(acrylic acid) Scale in Aqueous Solution? A Combined Study with Light Scattering and Computer Simulation. 41. O. Engkvist and G. Karlstro¨m, Chem. Phys., 213, 63 (1996). A Method to Calculate the Probability Distribution for Systems with Large Energy Barriers. 42. E. B. Kim, R. Faller, Q. Yan, N. L. Abbott, and J. J. de Pablo, J. Chem. Phys., 117, 7781 (2002). Potential of Mean Force between a Spherical Particle Suspended in a Nematic Liquid Crystal and a Substrate. 43. E. Lindahl, B. Hess, and D. van der Spoel, J. Mol. Model., 7, 306 (2001). GROMACS 3.0: A Package for Molecular Simulation and Trajectory Analysis. 44. Q. Sun and R. Faller, Macromolecules, 39, 812 (2006). Crossover from Unentangled to Entangled Dynamics in a Systematically Coarse-Grained Polystyrene Melt. 45. J.-P. Ryckaert, G. Cicotti, and H. J. C. Berendsen, J. Comput. Phys., 23, 327 (1977). Numerical Integration of the Cartesian Equations of Motion of a System with Constraints: Molecular Dynamics of n-Alkanes. 46. F. Mu¨ller-Plathe and D. Brown, Comput. Phys. Commun., 64, 7 (1991). Multicolour Algorithms in Molecular Simulation: Vectorisation and Parallelisation of Internal Forces and Constraints. 47. H. C. Andersen, J. Comput. Phys., 72, 2384 (1983). Rattle: A ‘Velocity’ Version of the Shake Algorithm for Molecular Dynamics Simulations. 48. M. P. Allen and D. J. Tildesley, Computer Simulation of Liquids, Clarendon Press, Oxford, 1987. 49. B. Hess, H. Bekker, H. J. C. Berendsen, and J. G. E. M. Fraaije, J. Comput. Chem., 18, 1463 (1997). LINCS: A Linear Constraint Solver for Molecular Simulations. 50. R. Faller, F. Mu¨ller-Plathe, M. Doxastakis, and D. Theodorou, Macromolecules, 34, 1436 (2001). Local Structure and Dynamics in trans Polyisoprene. 51. J. Budzien, C. Raphael, M. D. Ediger, and J. J. de Pablo, J. Chem. Phys., 116, 8209 (2002). Segmental Dynamics in a Blend of Alkanes: Nuclear Magnetic Resonance Experiments and Molecular Dynamics Simulation. 52. M. Doxastakis, D. N. Theodorou, G. Fytas, F. Kremer, R. Faller, F. Mu¨ller-Plathe, and N. Hadjichristidis, J. Chem. Phys., 119, 6883 (2003). Chain and Local Dynamics of Polyisoprene as Probed by Experiments and Computer Simulations. 53. S. J. Marrink, A. H. de Vries, and A. Mark, J. Phys. Chem. B, 108, 750 (2004). Coarse Grained Model for Semi-Quantitative Lipid Simulation. 54. P. E. Rouse, J. Chem. Phys., 21, 1272 (1953). A Theory of Linear Viscoelastic Properties of Dilute Solutions of Coiling Polymers. 55. G. Strobl, The Physics of Polymers, second ed., Springer Verlag, Berlin, 1997. 56. D. Frenkel and B. Smit, Understanding Molecular Simulation: From Basic Algorithms to Applications, Academic Press, San Diego, CA, 1996. 57. D. Reith, Neue Methoden zur Computersimulation von Polymersystemen auf verschiedenen La¨ngenskalen und ihre Anwendung, PhD thesis, MPI fu¨r Polymerforschung and Universita¨t Mainz, 2001. Available: http://archimed.uni-mainz.de/pub/2001/0074.

References

261

58. P. V. K. Pant and D. N. Theodorou, Macromolecules, 28, 7224 (1995). Variable Connectivity Method for the Atomistic Monte Carlo Simulation of Polydisperse Polymer Melts. 59. Z. Chen and F. A. Escobedo, J. Chem. Phys., 113, 11382 (2000). A Configurational-Bias Approach for the Simulation of Inner Sections of Linear and Cyclic Molecules. 60. J. J. de Pablo, M. Laso, and U. W. Suter, J. Chem. Phys., 96, 2395 (1992). Simulation of Polyethylene above and below the Melting-point. 61. J. Wittmer, W. Paul, and K. Binder, Macromolecules, 25, 7211 (1992). Rouse and Reptation Dynamics at Finite Temperatures: A Monte Carlo Simulation. 62. M. Mu¨ller, Macromolec. Theory Simul., 8, 343 (1999). Miscibility Behavior and Single Chain Properties in Polymer Blends: A Bond Fluctuation Model Study. 63. T. Haliloglu and W. L. Mattice, Rev. Chem. Eng., 15, 293 (1999). Simulation of Rotational Isomeric State Models for Polypropylene Melts on a High Coordination Lattice. 64. T. C. Clancy and W. L. Mattice, J. Chem. Phys., 112, 10049 (2000). Rotational Isomeric State Chains on a High Coordination Lattice: Dynamic Monte Carlo Algorithm Details. 65. R. Ozisik, E. D. von Meerwall, and W. L. Mattice, Polymer, 43, 629 (2001). Comparison of the Diffusion Coefficients of Linear and Cyclic Alkanes. 66. P. Ahlrichs and B. Du¨nweg, J. Chem. Phys., 111, 8225 (1999). Simulation of a Single Polymer Chain in Solution by Combining Lattice Boltzmann and Molecular Dynamics. ¨ ber die Gestalt Fadenfo¨rmiger Moleku¨le in Lo¨sungen. 67. W. Kuhn, Kolloid Z., 68, 2 (1934). U 68. T. Kreer, J. Baschnagel, M. Mu¨ller, and K. Binder, Macromolecules, 34, 1105 (2001). Monte Carlo Simulation of Long Chain Polymer Melts: Crossover from Rouse to Reptation Dynamics. 69. W. Tscho¨p, K. Kremer, O. Hahn, J. Batoulis, and T. Bu¨rger, Acta Polymerica, 49, 75 (1998). Simulation of Polymer Melts. II. From Coarse-Grained Models back to Atomistic Description. 70. T. Soddemann, B. Du¨nweg, and K. Kremer, Eur. Phys. J. E, 6, 409 (2001). A Generic Computer Model for Amphiphilic Systems. 71. J. C. Shelley, M. Y. Shelley, R. C. Reeder, S. Bandyopadhyay, and M. L. Klein, J. Phys. Chem. B, 105, 4464 (2001). A Coarse Grain Model for Phospholipid Simulations. 72. L. Rekvig, B. Hafskjold, and B. Smit, J. Chem. Phys., 120, 4897 (2004). Simulating the Effect of Surfactant Structure on Bending Moduli of Monolayers. 73. M. Kranenburg, M. Venturoli, and B. Smit, J. Phys. Chem. B, 107, 11491 (2003). Phase Behavior and Induced Interdigitation in Bilayers Studied with Dissipative Particle Dynamics. 74. M. Kranenburg and B. Smit, J. Phys. Chem. B, 109, 6553 (2005). Phase Behavior of Model Lipid Bilayers. 75. B. Smit, P. A. J. Hilbers, K. Esselink, L. A. M. Rupert, N. M. van Os, and A. G. Schlijper, Nature, 348, 624 (1990). Computer Simulations of a Water/Oil Interface in the Presence of Micelles. 76. R. Goetz, G. Gompper, and R. Lipowsky, Phys. Rev. Lett., 81, 221 (1999). Mobility and Elasticity of Self-Assembled Membranes. 77. G. Ayton and G. A. Voth, Biophys. J., 83, 3357 (2002). Bridging Microscopic and Mesoscopic Simulations of Lipid Bilayers. 78. H. Guo and K. Kremer, J. Chem. Phys., 118, 7714 (2003). Amphiphilic Lamellar Model Systems under Dilation and Compression: Molecular Dynamics Study. 79. M. Mu¨ller, K. Katsov, and M. Schick, J. Polym. Sci. B, 41, 1441 (2003). Coarse Grained Models and Collective Phenomea in Membranes: Computer Simulation of Membrane Fusion. 80. T. Murtola, E. Falck, M. Patra, M. Karttunen, and I. Vattulainen, J. Chem. Phys., 121, 9156 (2004). Coarse-Grained Model for Phospholipid/Cholesterol Bilayer.

262

Coarse-Grain Modeling of Polymers

81. S. O. Nielsen, C. F. Lopes, I. Ivanov, P. B. Moore, J. C. Shelley, and M. L. Klein, Biophys. J., 87, 2107 (2004). Transmembrane Peptide-Induced Lipid Sorting and Mechanism of La -toInverted Phase Transition Using Coarse-Grain Molecular Dynamics. 82. O. Lenz and F. Schmid, J. Mol. Liq., 117, 147 (2005). A Simple Computer Model for Liquid Lipid Bilayers. 83. A. N. Dickey and R. Faller, J. Polym. Sci. B, 43, 1025 (2005). Investigating Interactions of Biomembranes and Alcohols: A Multiscale Approach. 84. O. G. Mouritsen, in Advances in the Computer Simulation of Liquid Crystals, P. Pasini and C. Zannoni, Eds., Vol. C 545 of NATO ASI, NATO, Kluwer, Dordrecht, the Netherlands, 2000, pp. 139–188. Computer Simulation of Lyotropic Liquid Crystals as Models of Biological Membranes. 85. O. Farago, J. Chem. Phys., 119, 596 (2003). ‘‘Water-Free’’ Computer Model for Fluid Bilayer Membranes. 86. G. Brannigan and F. L. H. Brown, J. Chem. Phys., 120, 1059 (2004). Solvent-Free Simulations of Fluid Membrane Bilayers. 87. G. Brannigan and F. L. H. Brown, J. Chem. Phys., 122, 074905 (2005). Composition Dependence of Bilayer Elasticity. 88. I. R. Cooke, K. Kremer, and M. Deserno, Phys. Rev. E, 72, 011506 (2005). Tunable Generic Model for Fluid Bilayer Membranes. 89. G. Brannigan, A. C. Tamboli, and F. L. H. Brown, J. Chem. Phys., 121, 3259 (2004). The Role of Molecular Shape in Bilayer Elasticity and Phase Behavior. 90. R. Faller and T. L. Kuhl, Soft Mater., 1, 343 (2003). Modeling the Binding of Cholera-Toxin to a Lipid Membrane by a Non-Additive Two-Dimensional Hard Disk Model. 91. C. E. Miller, J. Majewski, R. Faller, S. Satija, and T. L. Kuhl, Biophys. J., 86, 3700 (2004). Cholera Toxin Assault on Lipid Monolayers Containing Ganglioside GM1 . 92. C. E. Miller, J. Majewski, K. Kjaer, M. Weygand, R. Faller, S. Satija, and T. L. Kuhl, Coll. Surf. B: Biointerfaces, 40, 159 (2005). Neutron and X-Ray Scattering Studies of Cholera Toxin Interactions with Lipid Monolayers at the Air–Liquid Interface. 93. J. Majewski, T. L. Kuhl, K. Kjaer, and G. S. Smith, Biophys. J., 81, 2707 (2001). Packing of Ganglioside–Phospholipid Monolayers: An X-Ray Diffraction and Reflectivity Study. 94. B. J. Alder and T. E. Wainwright, Phys. Rev., 127, 359 (1962). Phase Transition in Elastic Disks. 95. W. W. Wood, J. Chem. Phys., 52, 729 (1970). NpT-Ensemble Monte Carlo Calculations for the Hard-Disk Fluid. 96. R. A. Reed, J. Mattai, and G. G. Shipley, Biochemistry, 26, 824 (1987). Interaction of Cholera Toxin with Ganglioside GM1 Receptors in Supported Lipid Monolayers. 97. H. O. Ribi, D. S. Ludwig, K. L. Mercer, G. K. Schoolnik, and R. D. Kornberg, Science, 239, 1272 (1988). Three-Dimensional Structure of Cholera Toxin Penetrating a Lipid Membrane. 98. J. Ghosh, B. Y. Wong, Q. Sun, F. R. Pon, and R. Faller, Molecular Simulation, 32, 175 (2006) Simulation of glasses: Multiscale Modeling and Density of States Monte Carlo Simulations.

CHAPTER 5

Analysis of Chemical Information Content Using Shannon Entropy Jeffrey W. Godden and Ju¨rgen Bajorath* Department of Life Science Informatics, B-IT, Rheinische Friedrich-Wilhelms-Universita¨t, Bonn, Germany

INTRODUCTION The goals of this tutorial are to introduce to the novice molecular modeler the application of information content analysis in chemistry, to present an information theoretic examination of chemical descriptors and provide insights into their relative significance, and to show that an entropy-based information metric provides an undistorted assessment of the diversity of a chemical database. Along the way the Shannon entropy (SE) concept, a formalism originally developed for the telecommunications industry,1,2 will be introduced and applied. A differential form of the SE metric will be used to compare chemical libraries and to suggest which descriptors are most responsive to chemical characteristics of different compound collections. Although this chapter focuses on the analysis and comparison of the information content of molecular descriptors in large databases, we need to point out that other applications of information theory in chemistry exist. Entropy is well known as a quantitative measure of the disorder of a closed system in thermodynamics and statistical mechanics. The equilibrium of a thermodynamic system is associated with the distribution of objects or molecules having the greatest probability of occurring, and this most probable state is the one with the greatest degree of disorder. In statistical mechanics, the increase in entropy to its maximum at equilibrium is rationalized as the Reviews in Computational Chemistry, Volume 23 edited by Kenny B. Lipkowitz and Thomas R. Cundari Copyright ß 2007 Wiley-VCH, John Wiley & Sons, Inc.

263

264

Analysis of Chemical Information

intrinsic tendency of any system to proceed to increasingly probable states. In this context, entropy is interpreted as a function of the number of possible microscopic states that a system can occupy, as determined by external factors such as temperature or pressure. In a similar manner, entropy is also used in information theory as a measure of information contained in a dataset or transmitted message. Claude E. Shannon is generally recognized as the founding father of information theory as we understand it today: a mathematical theory or framework to quantitatively describe the communication of data. Irrespective of their nature or type, data need to be transmitted over ‘‘channels,’’ and a focal point of Shannon’s pioneering work has been that channels available for communicating data are generally noisy. Shannon demonstrated that data can be communicated over noisy channels with a small probability of error if it is possible to encode (and subsequently) decode the data in a way that communicates data at a rate below but close to channel capacity. The most basic means of conceptualizing entropy in the context of information theory is to associate the information content of a signal with a probability distribution. The amount of apparent randomness, or distribution spread, is then treated as an entropy metric and thereby associated with the information content of the system or message giving rise to the distribution. A fundamentally important interpretation of Shannon’s formalism for the study of molecules was that any structural representation could be understood as a communication carrying a specific amount of information. Consequently, in 1953, the concept of molecular information content was introduced.3 In 1977, graph theory was combined with information theoretic analysis in the design and study of topological indices (graph-based descriptors of molecular topology).4 A year later, the principle was formulated that entropy is transformed into molecular information by formation of structures from elements (through bonds).5 In 1981, the combination of graph and information theory led to the first quantitative description of molecular complexity.6 Shannon entropy analysis was also applied in quantum mechanics. In 1985, entropy calculations were reported to analyze quantum mechanical basis sets.7 In 1998, the concept of local Shannon entropy was introduced based on partitioning of charge densities over atoms or groups.8 More recently, several investigators have focused on adapting the Shannon entropy concept for various uses in theoretical organic chemistry and chemoinformatics. For example, almost simultaneously with our initial studies on Shannon entropy-based descriptor and database profiling,9 the adaptation of this concept for the design of diverse chemical libraries was reported.10 Building directly on our work to adapt9 and to extend11,12 the Shannon entropy formalism, Zell et al. further extended the approach for feature and descriptor selection by introducing a Shannon entropy clique algorithm,13,14 and Graham has studied the molecular information content of organic compounds using Shannon entropy calculations.15–17

Shannon Entropy Concept

265

These publications illustrate very well that the Shannon entropy concept has established itself in computational chemistry and chemoinformatics, regardless of whether applied in the context of molecular graph theory, diversity analysis, descriptor selection, or large-scale database profiling. As we will see, the Shannon entropy formalism is not difficult to grasp even though the underlying concept is much more complex to comprehend than it appears at first glance. For example, although Shannon entropy was related to molecular information content as early as 1953, it took 15 more years until it was rigorously shown that the Shannon formalism is truly a measure of information content when applied to molecular structure via graph representations.18 We will describe below the SE formalism in detail and explain how it can be used to estimate chemical information content based on histogram representations of feature value distributions. Examples from our work and studies by others will be used to illustrate key aspects of chemical information content analysis. Although we focus on the Shannon entropy concept, other measures of information content will also be discussed, albeit briefly. We will also explain why it has been useful to extend the Shannon entropy concept by introducing differential Shannon entropy (DSE)11 to facilitate large-scale analysis and comparison of chemical features. The DSE formalism has ultimately led to the introduction of the SE–DSE metric.12

SHANNON ENTROPY CONCEPT Claude E. Shannon, in his seminal 1948 paper,1 considered the frequency of symbols sent along transmission channels and formulated a metric of the expectation of aggregations of symbols, which he connected to formulations for entropy found previously in statistical mechanics. Shannon was concerned with the channel capacity needed to transmit a specific amount of information. For Shannon, a channel was a real or theoretical conduit of a signal. For our purposes here, the analog of a channel is a single bin in a histogram, and instead of calculating channel capacity, we will hold our ‘‘channels’’ constant and monitor the degree to which their capacity is filled. The Shannon entropy (or SE value)1,2 is defined as X pi log2 pi ½1 SE ¼  i

Here p is the estimated probability, or frequency, of the occurrence of a specific channel of data. The pi corresponds to a particular histogram bin count normalized by the sum of all bin counts, ci being the bin count for a particular bin (Eq. [2]): X ci ½2 pi ¼ total count i

266

Analysis of Chemical Information

Note that the logarithm in Eq. [1] is taken to base 2. Although this amounts to a simple scaling factor, it is a convention adopted in information theory so that entropy can be considered equivalent to the number of bifurcating (binary) choices made in the distribution of the data. In other words, using base 2 allows us to address this question: How many yes/no decisions do we need to make for data counts to fall into specific channels or bins in order to reproduce the observed data distribution? The higher the information content, the more numerous are the decisions required to place each data point. In Figure 1, the number of ‘‘decisions’’ necessary to place the 100 compounds falls between the numerical value that would have resulted if the data distribution had produced four equally populated bins (log2 ¼ 2:0) and that of eight equally populated bins (log2 8 ¼ 3:0). This intermediate value is because the example probabilities are not evenly distributed over the eight bins in our histogram, and therefore, our ability to guess which bin a future compound will fall into is better than if they were equally distributed. Another way to look at this is that the information content of the distribution, which is the opposite of our predictive ability (there is no information 45 40 35

Start with a histogram of the molecular weights of 100 compounds...

30 25 20 15 10 5 0

200

300

43 +

23 +

400

500

600

700

800

5

+ 2 +

11 + 8 + 7 +

900 1 = 100 total bin counts

divide each bin count by the total bin count to get the sample probabilities 0.43

0.23

0.11

0.08

0.07

0.05

0.02

0.01

apply Shannon’s equation to these probabilities 0.43 log 20.43 + 0.23 log 20.23 + 0.11 log 20.11 + 0.08 log 20.08 + 0.07 log 20.07 + 0.05 log 20.05 + 0.02 log 20.02 + 0.01 log 20.01 = –2.316 invert the sign, and the SE is:

2.316

Figure 1 Example of Shannon entropy calculation for a hypothetical distribution of molecular weights.

Shannon Entropy Concept

267

for us if we already know the outcome with certainty), is less than if every bin was equally probable. The connection this example has to Shannon’s original work with the transmission of information is to view the molecular weight frequencies in Figure 1 as the frequencies of the unit of information to be transmitted, e.g., letters in an alphabet. Given this view, from Figure 1, we would conclude that we would need on average 2.316 binary bits to encode the molecular weight bin ‘‘message’’ for this hypothetical distribution. The extremes of data distributions are depicted in Figure 2 along with an arbitrary midpoint in a calculation of descriptor entropy. When the data are maximally distributed over all of the histogram bins, the SE value is equal to the logarithm of the number of histogram bins. Therefore, the SE value is dependent on the number of histogram bins used for a particular study. This dependence can be, for the most part, removed by dividing the SE value by the logarithm to the base 2 of the number of histogram bins chosen (‘‘N’’ in Eq. [3]), which gives rise to a scaled SE or SSE value: SSE ¼

SE log2 N

½3

The SSE has an absolute minimum of 0, associated with a data distribution where all the values fall into a single bin and a maximum of 1.0, where each bin is occupied by an equal number of data counts. As we shall see, SSE is not independent of boundary effects (described later) underlying the data and there is an asymptotic relationship associated with the number of bins, which can be ignored for most practical comparisons. In addition to this asymptotic relationship associated with the number of bins used in a given study, the treatment of data outliers affects SSE calculations, just as it would influence the analysis of any histogram. A large body of literature associated with both of these topics exists (see Refs. 19–21). Surprisingly, there is no known optimum value for the number of bins chosen for a histogram,22 but commonly accepted rules exist. For example, one postulate is that the bin width should be proportional to both the standard deviation of the

Figure 2 Data distribution extremes and corresponding SE values. Depicted are three hypothetical data distributions that correspond to no information content, intermediate information content, and maximal information content (from the left to the right).

268

Analysis of Chemical Information

1000 0

400

150 300 0

bin counts

data and the cube root of the number of available data points.23 An important point when calculating SSE is that the number of histogram bins, however chosen, should remain constant throughout any comparison made, even though the SSE values are normalized with respect to bin numbers. Outliers are a significant problem for the distribution of chemical descriptor values. Many descriptors were designed with a relatively narrow range of chemical compounds in mind and using them indiscriminately on a large diverse chemical database will produce descriptor value outliers (and occasionally even undefined numbers such as infinity). An outlier can distort a standard histogram by forcing other values to be concentrated into fewer bins as is shown in Figure 3. The fact that one of the common uses of histograms is for the discovery of outliers should not obviate entering into a kind of circular reasoning by removing outliers until a histogram ‘‘looks good.’’ A more unbiased approach to removing outliers is to ask the question: How many vacant internal bins does a histogram have? If this number exceeds some preestablished threshold (e.g., greater than 10% of the total number of bins), a percent trimming of the extreme values should be employed to tag values as outliers that would then omit them from subsequent SE or SSE calculations. Although there are more formalized tests for outliers,24,25 many of them depend on the presence of (approximately) normal data distributions,26 and therefore must be discounted for reasons already mentioned. Once a descriptor value of a compound is declared to be an outlier, all other values associated with that compound must also be removed for any consistent comparison. It is biased statistically to remove only those descriptor values that are outliers and carry out SE-based comparisons between descriptors; outlier removal must be made consistent at the data level of the compound set. Simply put, an outlier must be excluded from any of the subsequent calculations. Although the Shannon entropy formalism appears to be ‘‘easy’’ and is straightforward to implement, entropy calculations are intimately connected with and critically influenced by the data representation over ‘‘channels’’ or

400

450

500

500

1000

1500

2000

Figure 3 Effect of an outlier on histograms with constant binning schemes. On the left is a histogram of 1000 normally distributed data points, and on the right is a histogram of the same data with a single outlier of value 2000 added. Because the binning scheme maintains a particular number of equal width bins, this one outlier forces all previous data counts into the single lowest valued bin. The SE for the left histogram is 2.301 (SSE: 0.726), and the right histogram SE is 0.011 (SSE: 0.004).

Descriptor Comparison

269

histogram bins. Those channels can be severely affected by statistical problems associated with outliers in datasets. For all practical purposes, consistency of data representation and rigorous outlier treatment are key considerations when evaluating Shannon entropy. There are other metrics of information content, and several of them are based on the Shannon entropy.27 About 10 years after introduction of the Shannon entropy concept, Jaynes formulated the ‘‘maximum entropy’’ approach,28 which is often referred to as Jaynes entropy and is closely related to Shannon’s work. Jaynes’ introduction of the notion of maximum entropy has become an important approach to any study of statistical inference where all or part of a model system’s probability distribution remains unknown. Jaynes entropy, or ‘‘relations,’’ which guide the parameterization to achieve a model of minimum bias, are built on the Kullback–Leibler (KL) function,29 sometimes referred to as the cross-entropy or ‘‘relative entropy’’ function, which is often used and shown (in which p and q represent two probability distributions indexed by k), as KL ¼

X k

  p pk log2 k qk

½4

The Kullback–Leibler formulation evaluates the relative entropy between two data distributions. However, it is not symmetrical with respect to the two distributions under comparison; that is, one must declare one distribution as the base set or reference from which the other is assumed to depart. Concerning the connection between Jaynes entropy and the Kullback–Leibler function, maximum entropy is achieved when qk is replaced with a distribution about which there is ‘‘prior knowledge’’ and pk is adjusted so as to maximize KL. Prior knowledge could, for example, be the mean or expectation value of a data distribution. Importantly, because of the quotient involved (Eq. [4]), the Kullback–Leibler function becomes undefined if any bin is unpopulated. This renders this function inappropriate for the purposes of estimating information content in chemical descriptor sets, which is discussed below.

DESCRIPTOR COMPARISON Chemical descriptors are used widely in chemoinformatics research to map the chemical features of compounds into the domain of numerical and statistical analysis.30 Once molecular features are expressed numerically, or as enumerated factor sets (e.g., structural keys), the tools for numerical and statistical analysis can then be applied to analyze and compare molecular similarity or diversity of compound collections.

270

Analysis of Chemical Information

Many chemical descriptors exist and are readily available in the literature or they can be easily calculated,31 but discerning which ones are most useful to a particular study can be a daunting task.30 When selecting chemical descriptors, the researcher should consider which set best encodes the features that are important to the study in question. Even without focusing on a particular problem, however, one can estimate the information a descriptor may contain by considering the details of its numerical construction. For example, a descriptor can produce an integer value for a molecule, such as an enumeration of a chemical feature (e.g., the number of triple bonds), or it may possess a span of real values (e.g., logP(o/w); the logarithm of the octanol/water partition coefficient), or it may fall somewhere between the two, like the descriptor ‘‘molecular weight,’’ which is quantized in that not all real values are attainable. One can easily understand the fundamentally quantized nature of the molecular weight descriptor by considering that a hydrogen atom, whose atomic mass is approximately 1.00794, is the smallest possible unit one can add to a molecule. How quantized a descriptor is (its ‘‘granularity’’) can be estimated quickly by the number of unique values it attains in a large database. Another question to be considered is as follows: What is the theoretical range of the descriptor value? When one uses more than one descriptor, as is typically the case, the relative ranges of each individual descriptor must be considered. Once the general numeric behavior of a descriptor has been considered, then some understanding of the descriptor’s statistical distribution in a dataset must be obtained. Specifically, the modeler must ask: Can we sensibly apply analysis tools that depend on approximately normal (or ‘‘Gaussian’’) distributions? Even the most common estimator of the central tendency, the average (or mean) value, is more sensitive to departures from a normal distribution than is often realized. A proper estimator of central tendency (e.g., mean, median, or mode) is a single value chosen to be an accurate representative of the behavior of the whole population. Values of descriptors often display long tails in their distributions, and chemical libraries frequently contain compounds that are likely outliers, as discussed earlier. In such cases, the mean value may not be the best representation of the central tendency. When possible, the descriptor’s distribution should be viewed as a histogram, and a quick glance at that histogram, of even a relatively small random sample of the data population, can readily suggest the proper treatment of the data or explain why some analysis has generated unexpected results. Figure 4 shows a few descriptor histograms of a chemical library of nowadays typical size, i.e., containing more than a million compounds.32 It is immediately apparent that any statistical technique that depends on a normal distribution cannot be generally applied; it would be inappropriate, for example, to use the average value of the number of aromatic bonds to characterize a representative for a set of compounds. Any metric of descriptor variability based on a normal distribution such as a standard deviation is also

271

0

10

20

30

40

50

600000 0

0

150000

0e+00 2e+05 4e+05

350000

Descriptor Comparison

0

10000

20000

30000

0.0

0.2

0.4

0.6

0.8

1.0

Figure 4 Examples of histograms of molecular descriptors. Shown are database distributions for three descriptors: ‘‘b_ar’’ stands for the ‘‘number of aromatic bonds,’’ and ‘‘weinerPath’’and ‘‘petitjean’’are both molecular distance matrix descriptors. These distributions are representative of those seen for many different types of molecular descriptors. It should be noted that in the right graph the bins on either side of the peak are not empty but graphically insignificant relative to the central bin.

generally unreliable. Clearly, a nonparametric estimator of descriptor information content is needed and the histograms suggest a method. The intimate connection of histogram analysis and information content estimation based on the Shannon entropy makes this type of analysis very attractive for the systematic study and comparison of descriptor value distributions. To provide some specific examples, let us consider the calculation of SSE for four molecular descriptors in two well-known databases: the Available Chemical Directory (ACD)33 and the Molecular Drug Data Report (MDDR).34 These two databases contain different types of molecules. The ACD contains many organic compounds and reagents, whereas MDDR consists exclusively of biologically active molecules, many of which have originated from drug discovery programs. Thus, MDDR compounds are much more ‘‘lead-like’’ or ‘‘drug-like’’ than are the synthetic organic ACD molecules. The descriptors displayed in Figure 5 are ‘‘molecular weight,’’ the ‘‘number of rotatable bonds’’ in a molecule (a measure of molecular flexibility), ‘‘logP(o/w),’’ the logarithm of the octanol/water partition coefficient (a measure of hydrophobic character), and the ‘‘number of hydrogen bond donors’’ in a molecule. It should be noted that these descriptors almost constitute the ‘‘rule-of-five’’ set of descriptors used to estimate the oral availability of pharmaceutically relevant molecules.35 The database values of these four descriptors were calculated using the software platform Molecular Operating Environment (MOE).36 Histograms for 231,187 ACD compounds and 155,814 MDDR compounds were constructed by keeping the number of histogram bins constant, removing any compounds judged to be outliers for any of the descriptors under study, and establishing overall minimum and maximum descriptor values. The number of bins was fixed at 19 according to the Sturges rule, which sets the number of bins to the base 2 logarithm of the number of data points plus one.37 In terms of outliers, a single ACD compound had a sufficiently high molecular weight that it left three interior

272

Analysis of Chemical Information

Figure 5 Descriptor histograms and SE and SSE values from two different compound databases. Compared are distributions of descriptor values in a chemical (ACD) and pharmaceutical (MDDR) database. The top number in the upper part of each chart reports the SE value, and the number beneath is the SSE value. Descriptor abbreviations: ‘‘MW,’’ molecular weight; ‘‘a_don,’’ number of hydrogen bond donor atoms; ‘‘b_rotN,’’ number of rotatable bonds in a molecule; ‘‘logP(o/w),’’ logarithm of the octanol/water partition coefficient.

histogram bins empty. Consequently, this compound was removed from further consideration for the remainder of the study. None of the MDDR compounds was considered an outlier. Global minimum and maximum descriptor values were identified, and the histogram bin counts were accumulated. These counts were then converted to frequencies according to Eq. [2], and entropy values were calculated via Eqs. [1] and [3]. Figure 5 shows the resulting histograms and reports the associated descriptor entropy values. The entropy values reflect the data distributions captured in the histograms. For every descriptor, the ACD compound database produces histograms with more populated bins appearing in the limits of the chart, which is reflected by the calculated entropy values. These findings are consistent with the fact that the ACD database contains a variety of synthetic compounds whose descriptor and property values are not restricted to those typically seen in more drug-like compounds. Comparing the entropy values of the same descriptor between two compound sets can therefore provide insights into the relative information content of the databases, at least in light of the monitored feature(s). This use of entropy analysis will be covered in detail below when the DSE formalism is introduced. Because SE is a nonparametric distribution metric, one of the essential features of an entropic approach to descriptor information content analysis is that descriptors with different units, numerical ranges, and variability can be compared directly, a task that would otherwise not be possible. This allows us to ask questions such as follows: Which descriptors carry high levels of information for a specific compound set and which carry very little? To answer this question, we have systematically studied ‘‘1-D descriptors’’ and ‘‘2-D

Influence of Boundary Effects

273

descriptors’’ contained in various databases.9,12 These descriptor designations mean that their values are calculated from molecular composition (1-D) formulas and two-dimensional (graph) representations of molecular structure (2-D), respectively. Among others, they include categories such as bulk property descriptors, physicochemical parameters, atom and bond counts, and topological or shape indices. What we have generally found is that descriptors belonging to all of these categories can carry significant information, even those consisting of atom and bond counts. Thus, there is no strict correlation between the complexity of a descriptor and its information content. The information content much depends on the compound database under investigation. However, as one would expect, there is a tendency for complex descriptors, whose definition can be understood to consist of several simple descriptors, to carry more information. An example of this complexity are those descriptors that depend on divided atomic surface area distributions of other property descriptors and, accordingly, represent higher order statistical combinations of other descriptors.38 On the other hand, simple counts of atomic or bond properties typically have discrete values and thus often occur at the lower end of an information content spectrum. Perhaps unexpectedly, we have made similar observations for Kier and Hall connectivity and shape indices.39 These indices are calculated in a hierarchical order where the higher orders consider an increasing span of neighbor atoms. Consequently, even though those connectivity indices are found at the lower end of the descriptor information content spectrum, it is consistently the higher order connectivity descriptors whose values tend to go toward zero. Thus, upon closer inspection, our unexpected findings can be rationalized. The utility of comparing entropic information content between descriptors is particularly evident when one is attempting to construct an efficient ‘‘aggregate of chemical information,’’ an example of which is a ‘‘fingerprint,’’ which refers to a bit string representation of molecular structure and properties composed of various descriptor contributions. For such an endeavor, one would like to have a set of information-rich descriptors, which are, however, not overdetermined with regard to a specific compound feature (otherwise, this exercise often becomes a deceptive form of a single property or substructure search). Another situation in which one would be interested in descriptors that are particularly information-rich is the study of compound class sensitivity where one would like to know which descriptors are most sensitive to chemical features encoded in a class of compounds that display specific activity against a target protein as an example. An entropy-based metric designed to answer questions related to such issues will be discussed below.

INFLUENCE OF BOUNDARY EFFECTS As stated, the reason for formulating the SSE is to remove the dependence of the entropy metric on the number of histogram bins. However,

274

Analysis of Chemical Information

common to all histogram-based analyses are boundary effects, especially when the number of bins is small. For example, if a descriptor has an intrinsic ‘‘preference’’ for certain numerical values (factors of six for the number of aromatic bonds, for example), and if adding just one more bin leads to dividing the data close to the center of this preferred value, a change will then appear in the resulting SSE value that will not relate directly to the descriptor’s intrinsic information content. As mentioned, this phenomenon is generally not an issue if the number of bins is always held constant over the entire analysis, and when it is, it remains a relatively small effect given that the number of bins is initially chosen to be sufficiently large for the number of data points involved. Nevertheless, altering the number of bins can affect the assessment of the underlying information content. In Figure 6, we illustrate how the value of SSE changes with the number of bins selected for analysis. As Figure 6 demonstrates, boundary oscillations can also be seen, particularly with discrete valued descriptors, which are followed by a slow fall-off reflecting a finite amount of data. A peak (occurring near 225 bins for ‘‘b_rotN’’ in Figure 6, for example) occurs at the point where the data cannot be sampled on a finer grid without exposing the underlying granularity of the data. That is, the data have become spread out as much as possible. It would be tempting to attempt to define another (and even more bin number independent) metric to be the peak SSE value. However, that peak occurs at different values depending on both the dataset and the design of the chemical descriptor. For example, an information-rich but sufficiently narrow valued descriptor might require the number of bins to be on the order of half of the number of data points before that peak is reached. Therefore, the 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

0

50

100

150

200

250

300

350

400

Figure 6 Changes in values of SSE for histograms with increasing numbers of bins. Descriptors are abbreviated as in Figures 4 and 5.

Extension of SE Analysis for Profiling

275

factors of dependence would actually increase. Although SSE is not truly a bin number-independent metric, its values can always be compared for a constant number of bins and its value can be approximated for studies using different numbers of bins.

EXTENSION OF SE ANALYSIS FOR PROFILING OF CHEMICAL LIBRARIES One concern of those informaticians who assemble compound libraries is if one database represents a more diverse set than another. For example, one might ask questions such as follows: How much additional chemical diversity could be expected if the size of the current database was doubled by adding compounds from other sources? The SE metric, as a nonparametric indicator of variability or value spread for a particular compound set, is designed to be an independent metric and not suitable to address such questions. So, although it is reasonable to note that one compound set has a higher SE or SSE for a specific descriptor than for another, no statement can be made about the overlap between the two sets. Indeed, the two compound sets considered together might not produce a greater spread of values, and perhaps surprisingly, it is even possible to lower the aggregate SE, or global information content, of a chemical library by adding additional compounds. Probability histograms providing the basis for SE calculations are implicitly renormalized as more compounds are included into the set. It therefore becomes necessary to introduce a new metric for making new assemblages of compounds or for comparing two preexisting collections. This new metric is referred to as the ‘‘differential Shannon entropy’’ (DSE) defined in Eq. [5]. The DSE metric has a form that often occurs in statistics, which asks the question: Is the aggregate more than the combination of its parts?   SEA þ SEB ½5 DSE ¼ SEAB  2

In Eq. [5], SEAB is the Shannon entropy calculated from the aggregate set of compound sets A and B, whereas SEA and SEB are the SE values for each of the two databases considered individually (of course, SSE values are typically used instead of SE). Therefore, DSE can be viewed as the increase or decrease in the overall descriptor variability due to complementary or synergistic information content of the individual databases involved. Figure 7 depicts hypothetical distributions to underscore the situations where DSE values become significant. The case of negative DSE will occur whenever the spread of one distribution is enveloped by the other. In general, DSE reflects the growth of the resulting renormalized distribution envelope of descriptor values. Importantly, DSE analysis permits the identification of

276

Analysis of Chemical Information

count frequency

0.1 0.08

0.08

+

0.06 0.04 0.02

count frequency

SSE = 0.66

SSE = 0.81

0.04

0.04

0.02

0.02

0.08

+

0.08

0.08 0.06

0.04

0.04

0.02

0.02

0.02

0

0

0

0.04

SDSE = 0.15

SSE = 0.72

SDSE = –0.02

0 0.1

SSE = 0.66

0.06

0.06

SSE = 0.88

0.06

0.06

0 0.1

0 0.1 0.08

0.1

0.1

SSE = 0.81

Figure 7 Model DSE calculations. SSE and scaled DSE (SDSE) values are reported. SDSE values are analogous to the relation between SE and SSE, and they are produced by dividing the DSE by the number of bins taken to logarithm of base 2.

descriptors that are sensitive to systematic differences in the properties of various compound databases or classes. Consider a sample calculation. An SE value for the calculated logP(o/w) descriptor from the ACD collection is 1.329 (SSE: 0.308), that from the MDDR is 1.345 (SSE: 0.311), and when the two compound sets are combined into one histogram, the value becomes 1.343 (SSE: 0.311). Therefore the resulting DSE value between the ACD and MDDR databases is 0.006 [scaled DSE (SDSE): 0.002]. From this we can conclude that there is a small information gain with respect to logP(o/w) when combining the two compound collections. Thus, there is a detectable difference between the two compound sets in the overall logP distribution at the given bin resolution. The same calculation using the ZINC compound database (containing over 2 million lead- and drug-like compounds at this writing)40 gives an SE of 1.180 (SSE: 0.273), whereas ZINC and MDDR together gives 1.243 (SSE: 0.288). This provides a ZINC and MDDR DSE of 0.238 (SDSE: 0.055). These results are again not unexpected. Because the ZINC database is assembled from the catalogs of pharmaceutical compound vendors, we would expect to find in it the majority of compounds from the smaller MDDR dataset. Consequently, the MDDR logP(o/w) distribution is duplicated by the set of ZINC compounds, leading to an overall reduction in the per compound information content. What sort of DSE values would be associated with a significant difference between two compound sets? This question was answered by systematically comparing 143 descriptors among four databases representing pharmaceutical, general synthetic organic, and natural product compounds. It was concluded that SDSE values in excess of 0.026 represent a large difference between the distributions i.e., they are ‘‘high-DSE.’’12 A combination of SE and DSE analysis can be used to separate descriptors having little information content in one or both databases from those that are variable but have different value ranges in the compared databases. Thus, DSE calculations extend SE analysis by accounting for the range-dependence of

Extension of SE Analysis for Profiling

277

descriptor values. Combining SE and DSE calculations has led to the SE–DSE metric12 that can classify descriptors based on database comparisons into four SE–DSE categories: ‘‘high–high,’’ ‘‘high–low,’’ ‘‘low–high,’’ and ‘‘low–low,’’ where, e.g., ‘‘high–high’’ means high SE and high DSE. Systematic analysis has revealed that descriptors belonging to the high–high SE–DSE category are relatively rare. Mainly complex descriptors, such as the previously mentioned descriptors developed by Labute,38 were found to belong to this category. The relative scarcity of the high–high category is intuitive, because it requires information-rich descriptors (which by definition occupy a broad distribution in the histogram) to also have their values be more or less ‘‘self-avoiding’’ between the two databases being compared. Nonetheless, such descriptors can be found when profiling and comparing different databases, and these are sought after by scientists for many applications, because they have consistently high information content and because they respond to systematic property differences between databases. Descriptors belonging to the low–high category are more intuitive to comprehend, as they are characterized by narrow ranges of descriptor values in each database combined with a significant difference in the ranges they adopt between the databases. The low–high situation could also indicate general differences between compound collections, although these differences are in principle more difficult to relate to statistical significance using descriptors having low information content. For example, when comparing differences between the ACD and the Chapman and Hall (CH)41 natural products database, several simple atomic count descriptors (most notably for halogens and sulphur) were found to have generally low information content, but high SDSE values 0:03.12 This observation can be rationalized because elements like halogens and sulphur rarely occur in natural molecules. Furthermore, although descriptors of the low–low category are clearly not useful for comparing databases, descriptors belonging to the high–low category could be of value because they have high information content but do not measurably respond to compound class-specific features. High–low descriptors are also found frequently when comparing various databases.12 Descriptors in this category include different levels of complexity, ranging from very simple constructions such as the ‘‘number of hydrogen atoms’’ or ‘‘number of rotatable bonds’’ to complex formulations, as mentioned above. The high–low category of SE–DSE descriptors are preferred for applications like similarity searching across different databases. In summary, the classification of property descriptors based on information content taking into account value range differences helps greatly to prioritize descriptors for specific applications. For example, such descriptor selection themes have proved to be useful when systematically comparing compounds from synthetic versus natural sources and to model physical properties as described below. Establishing a metric that provides an intuitive measure of the graphical separation between value distributions of two databases being compared is also useful. For this purpose, the ‘‘Entropic Separation’’ (ES)11 was defined

278

Analysis of Chemical Information 0.06

0.02

0

ES = 13.13 DSE = 2.73

counts frequency

0.04

| MA _ MB|

5

10

15

20

25

Figure 8 Model ES calculation and comparison with DSE. Two hypothetical data distributions are used to calculate ES and DSE values. The ES value can be thought of as a peak separation given in SE units.

(Eq. [6]). The ES is the bin distance between the most populated bins or statistical modes (‘‘M’’ in Eq. [6]) of the comparison histogram divided by half of the average of the two distributions of individual SE values. For example, if for one database the molecular weight histogram had its most populated bin at bin number 13, and the database to which the first was being compared had its most populated bin number at bin 27 (with all histogram parameters held constant), the intermode bin distance, or jMA  MB j, would be 14. ES ¼

jM  MB j  A  1 SEA þ SEB 2 2

½6

As the ES is scaled to the information content of the descriptor in the compared databases, a descriptor with a broader distribution (higher average SE) must have a greater peak separation in order to achieve the same level of ES as another descriptor. The ES is therefore an entropic (and nonparametric) analogy of the classical statistical phrase: ‘‘to be separated by so many sigma.’’ This measure is related to, yet distinct from, DSE. Figure 8 illustrates the application of the ES metric on a pair of hypothetical data distributions.

INFORMATION CONTENT OF ORGANIC MOLECULES Among the Shannon entropy-related investigations referred to in the introductory sections, studies by Graham et al. on the information content of organic molecules are interesting to consider relative to our own works. This is because although we have focused on the analysis of descriptors to

Shannon Entropy in Quantum Mechanics, Molecular Dynamics

279

estimate chemical information content, Graham et al. have chosen to study information content associated with organic molecules more or less ‘‘directly.’’ In qualitative terms, alkanes and aromatic molecules, which consist of only carbon and hydrogen atoms, have less information content than do, for example, halogen substituted forms of these molecules.15 A key feature of the approach of Graham et al. is that conventional molecular graphs are directly examined for information content, not through descriptors as in our more chemoinformatics-oriented approach. Possible applications of Graham’s approach include, for example, the study of interactions between organic compounds and solvent molecules, comparison of different tautomeric or ionized forms of organic molecules, or the correlation between information content within a compound series and relative potencies. Graph-centric Shannon entropy-based information content analysis has been elegantly facilitated by Graham et al. through implementation of a Brownian processing model that corresponds to a random (yet systematic) walk through a molecular graph representation.17 This Brownian processing approach has recently been further extended to incorporate molecule aggregation and solvation effects,42 thereby linking molecular information to communication between molecules. Brownian processing, as applied to information content analysis, is based on extracting three component atom-bond-atom units from conventional molecular graphs. For organic molecules, typical O), etc. In serial Brownian proexamples would be (C–C), (C–H), (C–O), (C cessing, a molecular graph is accessed, a ‘‘code unit’’ selected, one of its nearest neighbors is randomly chosen, followed by a neighbor of the latter unit, then a neighbor of the neighbor is selected, and so forth. In parallel processing, several nearest neighbors are selected randomly for each unit. Selected units are then used to generate strings, an example of which for benzene is (C–C) (C–H) C) (C–H) . . .. These strings contain multiple C) (C–H) (C–C) (C–H) (C (C copies of code units and create a serial message (or ‘‘tape recording’’) for a given molecule. To process molecular aggregates, such tape recordings can be combined for interacting groups of molecules, either sequentially or in parallel.42 Recording the code units and their relative frequency of occurrence (in a serial message) in a histogram or table format permits application of the Shannon entropy formula where relative frequencies of code units become their probabilities. Calculated entropy is then equivalent to the number of bits required to encode the observed distributions of units; the larger the number of bits, the higher the information content. For tape recordings of similar size, the information content of molecules or aggregate states can then be compared.

SHANNON ENTROPY IN QUANTUM MECHANICS, MOLECULAR DYNAMICS, AND MODELING Calculated electron density distributions can be conveniently studied using SE analysis, which has led to applications in quantum mechanics.7,8

280

Analysis of Chemical Information

For example, when electron densities are recorded along a reaction coordinate, regions where densities peak correspond to low entropy areas, whereas intermediate regions are characterized by high entropy. For the study of charge densities, electron distributions around functional groups, or for the interpretation of ab initio wave functions, SE analysis and the concept of ‘‘local’’ Shannon entropy are relevant.8 Local SE values, based on the partitioning of charge densities of functional groups, have been used as a measure of group similarity,8 and SE values calculated for various groups from Hartree–Fock wave functions have been correlated with changes in molecular geometry.43 Moreover, orbital models representing probabilities of electron distributions over restricted spaces are well suited for SE analysis. The formulation of orbital Shannon entropy has been achieved where electron density is normalized with respect to orbital occupation numbers.44 In this context, the Jaynes and Shannon entropy formalisms were compared, and the Jaynes entropy was rationalized as representing the difference between the mean orbital SE per electron and the mean orbital SE of a particular electron.44 Just like calculated electron densities, entropy calculations have also been applied to experimental densities in order to aid in the refinement of crystallographic phases.45 Using the maximum entropy concept, the entropy of the electron density in a binned unit cell was calculated relative to the average electron density.45 In addition to its use in quantum mechanics for the past 20 years, SE analysis has more recently been applied to molecular dynamics simulations and conformational analysis. An algorithm has been developed to calculate SE values from dynamics trajectories, and it was shown that entropies of conformational energies of test molecules correlated linearly with their experimental thermodynamic entropies.46 Using 2-D-lattices and simplified (two-state; i.e., hydrophilic–hydrophobic) protein chain representations, SE values for energy distributions produced by different pair-wise interactions were calculated. Potentials leading to their discrimination on the basis of differences in information content were developed.47

EXAMPLES OF SE AND DSE ANALYSIS A key question is as follows: Can SE and DSE, as an information theoretic approach to descriptor comparison and selection, be applied to accurately classify compounds or to model physiochemical properties? To answer this question, two conceptually different applications of SE and DSE analysis will be discussed here and related to other studies. The first application explores systematic differences between compound sets from synthetic and natural sources.48 The second addresses the problem of rational descriptor selection to predict the aqueous solubility of synthetic compounds.49 For these purposes, SE or DSE analysis were carried out, and in both cases, selected descriptors were used to build binary QSAR-like classification models.50

Examples of SE and DSE Analysis

281

A common assertion among medicinal chemists and library designers is that compounds from natural products are difficult to work with. When asking the question what exactly complicates working with natural products, a typical response involves the complexity of the molecules or features that make synthesis difficult. Studies systematically comparing natural and synthetic compounds are rare. Only fairly recently has a direct statistical analysis been carried out on structural and property differences between natural and synthetic molecules.51 One example is distribution differences in nitrogenor oxygen-containing groups as well as differences in distributions of halogen atoms. Halogen atoms and amide groups occur more frequently in synthetic molecules, whereas natural compounds typically have higher oxygen abundance (e.g., in ester or alcohol groups). SE was used to compare two sizable collections of compounds, one of synthetic origin and the other of natural origin. The following question was asked: Which chemical descriptors are most likely to contain the information needed to systematically distinguish between natural and synthetic molecules? It should be emphasized that making use of a variety of chemical descriptors for such an investigation allows any level of abstraction of chemical information to enter the analysis and is therefore more general than a statistical fragment or substructure comparison. To answer this question, the ACD33 database for synthetic compounds and the CH41 database for natural products were chosen. The MOE software platform36 was used to calculate 98 chemical 2-D descriptors for 199,240 ACD compounds and for 116,364 CH compounds. Also included were several implicit 3-D descriptors that map properties on molecular surface areas approximated from 2-D representations of structures.38 SE values were calculated for all 98 descriptors and both databases. The ACD SE for each descriptor was plotted along one axis and the CH SE for each descriptor along the other as shown in Figure 9. SE points were seen to fall into three broad classes: (1) those with low SE values in both databases, (2) those with high SE in both, and (3) ‘‘off diagonal’’ points with intermediate SE values in either or both databases. The entropic separations (ES) of all 98 descriptors were calculated, and it was found that ten descriptors (Table 1) in the off-diagonal and high SE regions produced the highest ES values.48 The highest ES descriptors reflect some known differences between synthetic and natural molecules, including, for example, the degree of saturation or aromatic character. It is also interesting to note that the descriptor with the highest ES value, ‘‘a_ICM,’’ is itself calculated using entropic principles. It accounts for the entropy of the distribution of the elemental composition of the compound. Based on this SE and ES analysis, four sets of descriptors were tested in binary QSAR models. The four sets of descriptors consisted of (1) 7 descriptors with intermediate SE values in both databases, (2) 11 descriptors with low SE values in both databases, (3) 8 descriptors with high SE values in both databases, and (4) 8 descriptors with the highest ES values in Table 1.

6 a_ICM

A

5

SE (ACD)

4

3 chi1v PEOE_VSA+1

PEOE_VSA+4

2 KierA3

B

vsa_pol

1

0

0

1

2

3 SE (CH)

4

6

5

Figure 9 Shannon entropy comparison. SE values of descriptors calculated for ACD compounds are plotted against corresponding values for the CH database. Region (A) includes descriptors with the highest SE, and region (B) those with the lowest SE. ‘‘Offdiagonal’’descriptors have the greatest difference in variability between the two databases.

Table 1 Entropic Separation of Descriptors in Two Databases Descriptor (a)

a_ICM bpol(b) chi0v_C(c) b_double(d) chi1v(e) a_nH(f) b_single(g) b_ar(h) vsa_hyd(i) apol(j) (a)

Entropic Separation (ES) 8.14 5.08 4.68 4.52 4.42 4.08 3.93 3.86 3.84 3.83

SE (CH/ACD) 5.4/5.3 4.8/3.1 4.9/3.6 2.9/2.5 4.8/2.4 4.7/3.2 4.9/3.2 2.2/3.0 4.8/3.5 4.8/3.6

‘‘a_ICM,’’ compositional entropy descriptor; ‘‘bpol,’’ normalized atomic polarizability; (c) ‘‘chi0v_C,’’ carbon valence connectivity index (order 0); (d) ‘‘b_double,’’ number of double bonds in a molecule; (e) ‘‘chi1v,’’ atomic valence connectivity index (order 1); (f) ‘‘a_nH,’’ number of hydrogen atoms; (g) ‘‘b_single,’’ number of single bonds; (h) ‘‘b_ar,’’ number of aromatic bonds; (i) ‘‘vsa_hyd,’’ approximate hydrophobic van der Waals surface area; (j) ‘‘apol,’’ atomic polarizability. (b)

282

Examples of SE and DSE Analysis

283

Binary QSAR employs Bayesian statistics to correlate a selected set of properties with a probability for each molecule to belong to one of two states. As it was originally conceived,50 these states are assigned as ‘‘active’’ ¼ 1 and ‘‘inactive’’ ¼ 0. For our purposes, they instead acquired the meaning of ‘‘natural product’’ and ‘‘synthetic compound.’’ The binary QSAR method is designed to make use of a particular set of descriptors as input. Their calculated values are subjected to principal component analysis and processed to produce a probability density function to which a cut-off value is assigned in order to place each result into one of the two result states. A random set of 500 compounds (composed of equal numbers from ACD and CH) was used as a training set, and the results were tested against a same-sized test set. The prediction accuracy was assessed with a simple formula consisting of the number of correctly identified natural products plus the correctly identified synthetic compounds divided by the total number of compounds, expressed as a percentage. Applying the above protocol to six different random training and test sets, the results are unequivocal; tests done with the low SE descriptor set (group 2) performed the worst returning nearly random results in the range of 53% (random performance being at 50%). Tests using the highest ES (group 4) performed the best at 91% prediction accuracy. Group 3, consisting of the high SE descriptors (without considering ES), returned a favorable, but not the best, prediction accuracy of 85%. The descriptor set composed of intermediate valued SE descriptors (group 1) had an intermediate prediction accuracy of 68%. Two conclusions can be derived from these results. First, it is feasible to use entropy-based information theory to select fewer than 10 chemical descriptors that can systematically distinguish between compounds from different sources. Second, when selecting descriptors to distinguish between compounds, it is important that these descriptors have high information content that can support separability or differentiate compounds between the datasets. The power of the entropic separation revealed in this analysis gave rise to the development of the DSE and, ultimately, the SE–DSE metric, as described earlier. Another example that focuses on the use of DSE analysis is to model chemical properties such as predicting the aqueous solubility of compounds.49 Aqueous solubility provides an example of a physicochemical property that can be addressed at the level of structurally derived chemical descriptors. Because the aqueous solubility of many compounds is known, an accurate and sufficiently large dataset can be accumulated for constructing and evaluating predictive models. In addition, problems surrounding solubility remain a significant issue for lead identification and optimization in pharmaceutical research.52,53 An important goal of studying chemical descriptors for their ability to predict aqueous solubility was to provide a rational alternative to the intuitive bias that tended to dominate the descriptor selection in this area of research.53–55 Many scientists had included in their studies descriptors that

284

Analysis of Chemical Information

are based on chemical intuition, such as logP(o/w) and related descriptors that address, e.g., hydrogen bonding, and hydrophobic or solvent-accessible surface areas. However, further studies have shown that the addition of descriptor-based topological and electronic molecular information is as important as these intuitive sets.56,57 We now ask whether an entropy-based approach can be used to identify descriptors that accurately predict aqueous solubility (as an example of a relevant physicochemical property). To address different solubility threshold values from an experimental dataset, compounds were divided into ‘‘soluble’’ and ‘‘insoluble’’ subsets. The descriptors chosen as the information source input for a binary QSAR model were selected exclusively by DSE analysis that was performed with the number of histogram bins consistently held at 25. An experimental database of 650 molecules with known solubility (expressed as logS values, where S is the aqueous solubility in mol/L) was gleaned from the literature54–57,59 and confirmed in the PHYSPROP database;60 all values selected were for a constant temperature (25 1 C). These 650 compounds were divided into a training set of 550 molecules and a test set of the remaining 100 molecules to cover equivalent solubility ranges. Five solubility threshold levels were established: 1 mM, 5 mM, 10 mM, 50 mM, and 100 mM. These levels were based on the ranges seen for many drugs60 and because the middle threshold (10 mM) was a minimal solubility acceptable in most screening assays.52 DSE values were calculated independently for each of five paired datasets corresponding to the five threshold values for a total of 148 2-D descriptors.36 Six binary QSAR models using the DSE sorted (highest to lowest, Table 2) top 5, 10, 15, 20, 25, and 30 descriptors were generated for each of the five threshold dataset pairs. Prediction accuracy was monitored with the number of correctly identified soluble molecules plus the number of correctly identified insoluble molecules divided by the total number of molecules, expressed as a percentage. With the exception of the 100-mM threshold set, the best prediction accuracy was achieved (at an average 88%), when using only the five highest valued DSE descriptors. The 100-mM set did better (92%) with 20 of the highest valued DSE descriptors. The descriptors producing the highest accuracy are logP, hydrophobic van der Waals surfaces, hydrophobic atom counts, and three complex descriptors approximating polar, charged, and hydrophobic surface areas. Note that descriptors providing information about hydrogen bonding or partial charges were not needed to produce the best results. One of the most significant findings of this study is that only very few descriptors are required to predict aqueous solubility with high accuracy. This is consistent with the findings of Jorgensen and Duffy61 whose Monte Carlo simulations identified 11 descriptors and, with only five terms in a subsequent QSPR, achieved high prediction accuracy. This DSE analysis of aqueous solubility confirms that information theoretic analysis can be used to successfully select features for modeling of physicochemical properties. A genetic algorithm implementation of the

Examples of SE and DSE Analysis

285

Table 2 Molecular Descriptors with Highest DSE Values in Solubility Predictions Av DSE

Descriptor

0.558 0.554 0.542 0.542 0.526 0.494 0.492 0.492 0.482 0.472

SlogP(a) a_hyd(b) logP(o/w)(c) PEOE_VSA_NEG(d) PEOE_VSA-1(e) SMR(f) chi1v(g) vsa_hyd(h) mr(i) chi0v(j)

NOTE: Reported are average DSE values (‘‘Av DSE’’) for the top 10 descriptors that were found to be most responsive to differences between ‘‘soluble’’and ‘‘insoluble’’compounds. DSE values were averaged over all five solubility threshold ranges. (a) ‘‘SlogP,’’ atomic contribution model of the logarithm of octanol/water partition coefficient; (b) ‘‘a_hyd,’’ number of hydrophobic atoms based on pharmacophore atom typing; (c) ‘‘logP(o/w),’’ logarithm of octanol/water partition coefficient based on a linear atom model; (d) ‘‘PEOE_VSA_NEG,’’ approximate electronegative van der Waals surface area; (e) ‘‘PEOE_VSA-1,’’ sum of van der Waals surface area for a partial charge range; (f) ‘‘SMR,’’ molecular refractivity parameterized model; (g) ‘‘chi1v,’’ atomic valence connectivity index (order 1); (h) ‘‘vsa_hyd,’’ approximate hydrophobic van der Waals surface area; (i) ‘‘mr,’’ molecular refractivity linear model; (j) ‘‘chi0v,’’ atomic valence connectivity index (order 0). Data were taken from Ref. 49.

SE and DSE formalisms by Wegner and Zell has also been applied to select descriptors for neural network prediction of aqueous solubility and logP(o/w) values.13 Significant correlation coefficients of 0.9 were obtained. In these neural network studies, only a small number of information-rich descriptors were also necessary for successful modeling. Shannon entropy-based analysis of Brownian processing of molecular graphs, as discussed above, has also been applied successfully to relate information content parameters of nicotinic receptor antagonist and beta-lactamase inhibitors with their potencies.42 Taken together, all these studies have confirmed that the Shannon entropy approach derived from digital communication theory can be adapted and extended for solving problems that have traditionally been treated using QSAR-type or machine learning methods. When applied to descriptor selection, information content analysis is complementary to both QSAR modeling and molecular similarity analysis. Finally, in addition to descriptor selection, the Shannon entropy concept has also been employed by Clark in descriptor

286

Analysis of Chemical Information

design.62,63 In these studies, complex molecular shape descriptors have been generated that capture four local properties: electrostatic potential, ionization energy, electron affinity, and polarizability.62 Clark calculated local SE to quantify the distributions of these properties in different regions of the molecular surface, leading to the conclusion that low SE regions are preferred for mediating specific interactions.63

CONCLUSIONS The Shannon entropy concept has been adapted and extended for different types of applications in chemoinformatics and computational chemistry. This information-theoretic concept evaluates the information content of data distributions and thereby, within the chemoinformatics framework, provides a basis for estimating the information carrying capacity of chemical descriptors or the relative diversity of a compound library. It can also be applied to extract information from molecular graph representations. Extending the SE formalism to follow changes in distributions of values and introducing a value-range dependence gave rise to a differential form, called DSE, which identifies those chemical descriptors whose shifts most distinguish one compound set from another. The ensuing SE–DSE metric makes it possible to identify descriptors having consistently high information content in databases and that are responsive to database- or compound class-specific features. Importantly, the applications of such metrics permit large-scale property profiling of compound databases. Using these techniques, it is often possible to discern differences in property changes that are too subtle and buried in the morass of data associated with large compound sets to be detected by other means. For descriptor selection, SE–DSE analysis provides a rational alternative to more intuitive selection schemes that have long dominated many applications in the QSAR arena. Results available thus far indicate that if descriptor selection can be rationalized, relatively few descriptors having high information content (SE) and suitable sensitivity (DSE) are usually sufficient for developing a successful application, for example, as a parameter set for QSAR. As a future perspective, SE–DSE analysis can also be expected to aid in the discovery and generation of new chemical descriptors by identifying efficacious combinations of commonly used descriptors or by elucidating gaps where new types of chemical descriptors need to be advanced. A general feature of information content analysis, as described herein, is that it has low computational complexity and memory requirements. Thus, the approach can easily handle very large databases that nowadays often contain millions of compounds and are expected to grow further. For applications in chemistry, information content analysis should have significant scientific growth potential in a variety of areas, including theoretical organic and medicinal chemistry, chemoinformatics, quantum mechanics, and molecular dynamics simulations.

References

287

REFERENCES 1. C. E. Shannon, Bell Syst. Tech. J., 27, 379 (1948). A Mathematical Theory of Communication. 2. C. E. Shannon and W. Weaver, The Mathematical Theory of Communication, University of Illinois Press, Urbana, Illinois, 1963. 3. S. M. Dancoff and H. Quastler, in Essays on the Use of Information Theory in Biology, H. Quastler, Ed., University of Illinois Press, Urbana, Illinois, 1953, pp 263–273, The Information Content and Error Rate of Living Things. 4. D. Bonchev and N. Trinajstic, J. Chem. Phys., 67, 4517 (1977). Information Theory, Distance Matrix, and Molecular Branching. 5. D. Bonchev, Commun. Math. Chem., 7, 65 (1979). Information Indices for Atoms and Molecules. 6. S. H. Bertz, J. Am. Chem Soc., 103, 3599 (1981). First General Index of Molecular Complexity. 7. S. R. Gadre, S. B. Sears, S. J. Chakravorty, and R. D. Bendale, Phys. Rev., A32, 2602 (1985). Some Novel Characteristics of Atomic Information Entropies. 8. M. Ho, V. H. Smith, Jr., D. F. Weaver, C. Gatti, R. P. Sagar, and R. O. Esquivel, J. Chem. Phys., 108, 5469 (1998). Molecular Similarity Based on Information Entropies and Distances. 9. J. W. Godden, F. L. Stahura, and J. Bajorath, J. Chem. Inf. Comput. Sci., 40, 796 (2000). Variability of Molecular Descriptors in Compound Databases Revealed by Shannon Entropy Calculations. 10. G. M. Maggiora and V. Shanmugasundaram, 219th American Chemical Society National Meeting. Division of Computers in Chemistry. Abstract No. 119 (2000). Similarity-based Shannon-like Diversity Measure. 11. J. W. Godden, and J. Bajorath, J. Chem. Inf. Comput. Sci., 41, 1060 (2001). Differential Shannon Entropy as a Sensitive Measure of Differences in Database Variability of Molecular Descriptors. 12. J. W. Godden and J. Bajorath, J. Chem. Inf. Comput. Sci., 42, 87 (2002). Chemical Descriptors with Distinct Levels of Information Content and Varying Sensitivity to Differences Between Selected Compound Databases Identified by SE-DSE Analysis. 13. J. K. Wegner and A. Zell, J. Chem. Inf. Comput. Sci., 43, 1077 (2003). Prediction of Aqueous Solubility and Partition Coefficient Optimized by a Genetic Algorithm Based Descriptor Selection Method. 14. J. K. Wegner, H. Fro¨hlich, and A. Zell, J. Chem. Inf. Comput. Sci., 44, 921 (2004). Feature Selection for Descriptor Based Classification Models. 15. D. J. Graham and D. Schacht, J. Chem. Inf. Comput. Sci., 40, 942 (2000). Base Information Content in Organic Molecular Formulae. 16. D. J. Graham, J. Chem. Inf. Comput. Sci., 42, 215 (2002). Information Content in Organic Molecules: Structure Considerations Based on Integer Statistics. 17. D. J. Graham, C. Malarkey, and M. V. Schulmerich, J. Chem. Inf. Comput. Sci., 44, 1601 (2004). Information Content in Organic Molecules: Quantification and Statistical Structure via Brownian Processing. 18. A. Mowshowitz, Bull. Math. Biophys., 30, 175 (1968). Entropy and the Complexity of Graphs: I. An Index of the Relative Complexity of a Graph. 19. J. Daly, Commun. Stat., Theory Methods, 17, 2921 (1988). The Construction of Optimal Histograms. 20. K. He and G. Meeden, J. Stat. Planning and Inference, 61, 49 (1997). Selecting the Number of Bins in a Histogram: A Decision Theoretic Approach. 21. M. P. Wand, J. Am. Stat. Assoc., 85, 59 (1997). Data-based Choice of Histogram Bin Width.

288

Analysis of Chemical Information

22. L. Birge and Y. Rozenholc, (2002). How Many Bins Should Be Put in a Regular Histogram? Available: http://www.proba.jussieu.fr/mathdoc/textes/PMA-721.pdf. 23. D. W. Scott, Biometrika, 66, 605 (1979). On Optimal and Data-based Histograms. 24. Analytical Methods Committee, Basic Concepts, Analyst, 114, (Part 1), 1693–1697, (1989). Robust Statistics - How not to Reject Outliers. 25. D. B. Rorabacher, Anal. Chem., 83, 139 (1991). Statistical Treatment for Rejection of Deviant Values: Critical Values of Dixon Q Parameter and Related Subrange Ratios at the 95 Percent Confidence Level. 26. F. Grubbs, in Technometrics, U.S. Army Aberdeen Research and Development Center, Aberdeen Proving Ground, Maryland, 1969, 11, 1, pp. 1–21, Procedures for Detecting Outlying Observations in Samples. 27. T. M. Cover and A. T. Joy, Elements of Information Theory, Wiley, New York, 1991. 28. E. T. Jaynes, Phys. Rev., 106, 620 (1957). Information Theory and Statistical Mechanics. 29. S. Kullback, Information Theory and Statistics, Dover Publications, Mineola, New York, 1997. 30. L. Xue and J. Bajorath, Combin. Chem. High Throughput Screen., 3, 363 (2000). Molecular Descriptors in Chemoinformatics, Computational Combinatorial Chemistry, and Virtual Screening. 31. R. Todeschini and V. Consonni, in Methods and Principles in Medicinal Chemistry - Volume 11 - Handbook of Molecular Descriptors, R. Mannhold, H. Kubinyi, and H. Timmerman, Eds., Wiley, New York, 2000. 32. J. W. Godden, L. Xue, D. B. Kitchen, F. L. Stahura, E. J. Schermerhorn, and J. Bajorath, J. Chem. Inf. Comput. Sci., 42, 885 (2002). Median Partitioning: A Novel Method for the Selection of Representative Subsets from Large Compound Pools. 33. Available Chemicals Directory (ACD), 2005, MDL Information Systems Inc., San Leandro, California. Available: www.mdl.com. 34. Molecular Drug Data Report (MDDR), 2005, MDL Information Systems Inc., San Leandro, California. Available: www.mdl.com. 35. C. A. Lipinski, F. Lombardo, B. W. Dominy, and P. J. Feeney, Adv. Drug Deliver. Rev., 23, 3 (1997). Experimental and Computational Approaches to Estimate Solubility and Permeability in Drug Discovery and Development Settings. 36. Molecular Operating Environment (MOE), 2005, Chemical Computing Group Inc., Montreal, Quebec, Canada, Available: www.chemcomp.com. 37. H. A. Sturges, J. Am. Stat. Assoc., 21, 65 (1926). The Choice of a Class Interval. 38. P. A. Labute, J. Mol. Graph. Model., 18, 464 (2000). A Widely Applicable Set of Descriptors. 39. L. H. Hall and L. B. Kier, in Reviews of Computational Chemistry Vol. 2, K. B. Lipkowitz, and D. B. Boyd, Eds., VCH Publishers, New York, 1991, pp. 367–422. The Molecular Connectivity Chi Indices and Kappa Shape Indices in Structure-Property Modeling. 40. J. J. Irwin and B. J. Shoichet, Chem. Inf. Model., 45, 177 (2005). Zinc – A Free Database of Commercially Available Compounds for Virtual Screening. 41. Chapman & Hall Dictionary of Natural Products (2005), CRC Press LLC, Boca Raton, Florida. Available: www.crcpress.com. 42. D. J. Graham, J. Chem. Inf. Model., 45, 1223 (2005). Information Content in Organic Molecules: Aggregation States and Solvent Effects. 43. M. Ho, R. P. Sagar, D. F. Weaver, and V. H. Smith, Jr., Int. J. Quantum Chem., 56, 109 (1995). An Investigation of the Dependence of Shannon Information Entropies and Distance Measures on Molecular Geometry. 44. R. P. Sagar, J. C. Ramirez, R. O. Esquivel, M. Ho, and V. H. Smith, Jr., J. Chem. Phys., 116, 9213 (2002). Relationships Between Jaynes Entropy of the One-Particle Density Matrix and Shannon Entropy of the Electron Densities. 45. T. Sato, Acta Cryst. A, 48, 842 (1992). Maximum Entropy Method: Phase Refinement. 46. L. Lorenzo and R. A. Mosquera, J. Comput. Chem., 24, 707 (2003). A Box-Counting-Based Algorithm for Computing Shannon Entropy in Molecular Dynamics Simulations.

References

289

47. T. Aynechi and I. D. Kuntz, Biophys. J., 89, 3008 (2005). An Information Theoretic Approach to Macromolecular Modeling II. Force Fields. 48. F. L. Stahura, J. W. Godden, and J. Bajorath, J. Chem. Inf. Comput. Sci., 40, 1245 (2000). Distinguishing Between Natural Products and Synthetic Molecules by Descriptor Shannon Entropy Analysis and Binary QSAR Calculations. 49. F. L. Stahura, J. W. Godden, and J. Bajorath, J. Chem. Inf. Comput. Sci., 42, 550 (2002). Differential Shannon Entropy Analysis Identifies Molecular Property Descriptors that Predict Aqueous Solubility of Synthetic Compounds with High Accuracy in Binary QSAR Calculations. 50. P. Labute, Pac. Symp. Biocomput., 7, 444 (1999). Binary QSAR: A New Method for the Determination of Quantitative Structure Activity Relationships. 51. T. Henkel, R. M. Brunne, H. Mu¨ller, and R. Reichel, Angew. Chemie. Int. Ed., 38, 643 (1999). Statistical Investigation into the Structural Complementarily of Natural Products and Synthetic Compounds. 52. C. A. Lipinski, Current Drug Discovery, 1, 17 (2001). Avoiding Investments in Doomed Drugs. 53. J. Taskinen, Curr. Opin. Drug Discov. Dev., 3, 102 (2000). Prediction of Aqueous Solubility in Drug Design. 54. J. M. Sutter and P. C. Jurs, J. Chem. Inf. Comput. Sci., 36, 100 (1996). Prediction of Aqueous Solubility for a Diverse Set of Heteroatom-containing Organic Compounds Using a Quantitative Structure Property Relationship. 55. B. E. Mitchell and P. C. Jurs, J. Chem. Inf. Comput. Sci., 38, 489 (1998). Prediction of Aqueous Solubility of Organic Compounds from Molecular Structure. 56. N. R. McElroy and P. C. Jurs, J. Chem. Inf. Comput. Sci., 41, 1237 (2001). Prediction of Aqueous Solubility of Heteratom-containing Organic Compounds from Molecular Structure. 57. I. V. Tetko, V. Y. Tanchuk, T. N. Kasheva, and A. E. P. Villa, J. Chem. Inf. Comput. Sci., 41, 1488 (2001). Estimation of Aqueous Solubility of Chemical Compounds Using E-state Indices. 58. J. Huuskonen, J. Chem. Inf. Comput. Sci., 40, 773 (2000). Estimation of Aqueous Solubility for a Diverse Set of Organic Compounds Based on Molecular Topology. 59. J. Huuskonen, M. Salo, and J. Taskinen, J. Chem. Inf. Comput. Sci., 38, 450 (1998). Aqueous Solubility Prediction of Drugs Based on Molecular Topology and Neural Network Modelling. 60. Physical/Chemical Property database (PHYSPROP), 1994, Syracuse Research Corporation, SRC Environmental Science Center, Syracuse, New York. Available: www.syrres.com. 61. W. L. Jorgensen and E. M. Duffy, Bioorg. Med. Chem. Lett., 10, 1155 (2000). Prediction of Drug Solubility from Monte Carlo Simulations. 62. J.-H. Lin and T. Clark, J. Chem. Inf. Model., 45, 1010 (2005). An Analytical, Variable Resolution, Complete Description of Static Molecules and Their Intermolecular Binding Properties. 63. T. Clark, 229th American Chemical Society National Meeting. Division of Computers in Chemistry, Abstract No. 267 (2005). Shannon Entropy as a Local Surface Property.

CHAPTER 6

Applications of Support Vector Machines in Chemistry Ovidiu Ivanciuc Sealy Center for Structural Biology, Department of Biochemistry and Molecular Biology, University of Texas Medical Branch, Galveston, Texas

INTRODUCTION Kernel-based techniques (such as support vector machines, Bayes point machines, kernel principal component analysis, and Gaussian processes) represent a major development in machine learning algorithms. Support vector machines (SVM) are a group of supervised learning methods that can be applied to classification or regression. In a short period of time, SVM found numerous applications in chemistry, such as in drug design (discriminating between ligands and nonligands, inhibitors and noninhibitors, etc.), quantitative structure-activity relationships (QSAR, where SVM regression is used to predict various physical, chemical, or biological properties), chemometrics (optimization of chromatographic separation or compound concentration prediction from spectral data as examples), sensors (for qualitative and quantitative prediction from sensor data), chemical engineering (fault detection and modeling of industrial processes), and text mining (automatic recognition of scientific information). Support vector machines represent an extension to nonlinear models of the generalized portrait algorithm developed by Vapnik and Lerner.1 The SVM algorithm is based on the statistical learning theory and the Vapnik–Chervonenkis

Reviews in Computational Chemistry, Volume 23 edited by Kenny B. Lipkowitz and Thomas R. Cundari Copyright ß 2007 Wiley-VCH, John Wiley & Sons, Inc.

291

292

Applications of Support Vector Machines in Chemistry

(VC) dimension.2 The statistical learning theory, which describes the properties of learning machines that allow them to give reliable predictions, was reviewed by Vapnik in three books: Estimation of Dependencies Based on Empirical Data,3 The Nature of Statistical Learning Theory,4 and Statistical Learning Theory.5 In the current formulation, the SVM algorithm was developed at AT&T Bell Laboratories by Vapnik et al.6–12 SVM developed into a very active research area, and numerous books are available for an in-depth overview of the theoretical basis of these algorithms, including Advances in Kernel Methods: Support Vector Learning by Scho¨lkopf et al.,13 An Introduction to Support Vector Machines by Cristianini and Shawe–Taylor,14 Advances in Large Margin Classifiers by Smola et al.,15 Learning and Soft Computing by Kecman,16 Learning with Kernels by Scho¨lkopf and Smola,17 Learning to Classify Text Using Support Vector Machines: Methods, Theory, and Algorithms by Joachims,18 Learning Kernel Classifiers by Herbrich,19 Least Squares Support Vector Machines by Suykens et al.,20 and Kernel Methods for Pattern Analysis by Shawe-Taylor and Cristianini.21 Several authoritative reviews and tutorials are highly recommended, namely those authored by Scho¨lkopf et al.,7 Smola and Scho¨lkopf,22 Burges,23 Scho¨lkopf et al.,24 Suykens,25 Scho¨lkopf et al.,26 Campbell,27 Scho¨lkopf and Smola,28 and Sanchez.29 In this chapter, we present an overview of SVM applications in chemistry. We start with a nonmathematical introduction to SVM, which will give a flavor of the basic principles of the method and its possible applications in chemistry. Next we introduce the field of pattern recognition, followed by a brief overview of the statistical learning theory and of the Vapnik–Chervonenkis dimension. A presentation of linear SVM followed by its extension to nonlinear SVM and SVM regression is then provided to give the basic mathematical details of the theory, accompanied by numerous examples. Several detailed examples of SVM classification (SVMC) and SVM regression (SVMR) are then presented, for various structure-activity relationships (SAR) and quantitative structure-activity relationships (QSAR) problems. Chemical applications of SVM are reviewed, with examples from drug design, QSAR, chemometrics, chemical engineering, and automatic recognition of scientific information in text. Finally, SVM resources on the Web and free SVM software are reviewed.

A NONMATHEMATICAL INTRODUCTION TO SVM The principal characteristics of the SVM models are presented here in a nonmathematical way and examples of SVM applications to classification and regression problems are given in this section. The mathematical basis of SVM will be presented in subsequent sections of this tutorial/review chapter. SVM models were originally defined for the classification of linearly separable classes of objects. Such an example is presented in Figure 1. For

A Nonmathematical Introduction to SVM H1

+1 +1 +1

+1 +1

H2 −1

δ

+1 +1

+1 −1 −1

−1

293

−1

−1

−1

−1 −1

Figure 1 Maximum separation hyperplane.

these two-dimensional objects that belong to two classes (class þ1 and class 1), it is easy to find a line that separates them perfectly. For any particular set of two-class objects, an SVM finds the unique hyperplane having the maximum margin (denoted with d in Figure 1). The hyperplane H1 defines the border with class þ1 objects, whereas the hyperplane H2 defines the border with class 1 objects. Two objects from class þ1 define the hyperplane H1, and three objects from class 1 define the hyperplane H2. These objects, represented inside circles in Figure 1, are called support vectors. A special characteristic of SVM is that the solution to a classification problem is represented by the support vectors that determine the maximum margin hyperplane. SVM can also be used to separate classes that cannot be separated with a linear classifier (Figure 2, left). In such cases, the coordinates of the objects are mapped into a feature space using nonlinear functions called feature functions f. The feature space is a high-dimensional space in which the two classes can be separated with a linear classifier (Figure 2, right). As presented in Figures 2 and 3, the nonlinear feature function f combines the input space (the original coordinates of the objects) into the feature space, which can even have an infinite dimension. Because the feature space is high dimensional, it is not practical to use directly feature functions f in

Input space

Feature space

−1 −1 −1

−1 +1 +1 +1 −1

+1

−1

−1 −1

+1

+1

φ

+1 +1 −1

H +1

−1

+1

+1 +1

−1

−1 −1 −1 −1

−1 −1 −1

−1

Figure 2 Linear separation in feature space.

−1

294

Applications of Support Vector Machines in Chemistry

Input space

Output space

Feature space

Figure 3 Support vector machines map the input space into a high-dimensional feature space.

computing the classification hyperplane. Instead, the nonlinear mapping induced by the feature functions is computed with special nonlinear functions called kernels. Kernels have the advantage of operating in the input space, where the solution of the classification problem is a weighted sum of kernel functions evaluated at the support vectors. To illustrate the SVM capability of training nonlinear classifiers, consider the patterns from Table 1. This is a synthetic dataset of two-dimensional patterns, designed to investigate the properties of the SVM classification algorithm. All figures from this chapter presenting SVM models for various datasets were prepared with a slightly modified version of Gunn’s MATLAB toolbox, http://www.isis.ecs.soton.ac.uk/resources/svminfo/. In all figures, class þ1 patterns are represented by þ , whereas class 1 patterns are represented by black dots. The SVM hyperplane is drawn with a continuous line, whereas the margins of the SVM hyperplane are represented by dotted lines. Support vectors from the class þ1 are represented as þ inside a circle, whereas support vectors from the class 1 are represented as a black dot inside a circle. Table 1 Linearly Nonseparable Patterns Used for the SVM Classification Models in Figures 4–6 Pattern

x1

x2

Class

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

2 2.5 3 3.6 4.2 3.9 5 0.6 1 1.5 1.75 3 4.5 5 5.5

4.5 2.9 1.5 0.5 2 4 1 1 4.2 2.5 0.6 5.6 5 4 2

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

A Nonmathematical Introduction to SVM

295

Figure 4 SVM classification models for the dataset from Table 1: (a) dot kernel (linear), Eq. [64]; (b) polynomial kernel, degree 2, Eq. [65].

Partitioning of the dataset from Table 1 with a linear kernel is shown in Figure 4a. It is obvious that a linear function is not adequate for this dataset, because the classifier is not able to discriminate the two types of patterns; all patterns are support vectors. A perfect separation of the two classes can be achieved with a degree 2 polynomial kernel (Figure 4b). This SVM model has six support vectors, namely three from class þ1 and three from class 1. These six patterns define the SVM model and can be used to predict the class membership for new patterns. The four patterns from class þ1 situated in the space region bordered by the þ1 margin and the five patterns from class 1 situated in the space region delimited by the 1 margin are not important in defining the SVM model, and they can be eliminated from the training set without changing the SVM solution. The use of nonlinear kernels provides the SVM with the ability to model complicated separation hyperplanes in this example. However, because there is no theoretical tool to predict which kernel will give the best results for a given dataset, experimenting with different kernels is the only way to identify the best function. An alternative solution to discriminate the patterns from Table 1 is offered by a degree 3 polynomial kernel (Figure 5a) that has seven support vectors, namely three from class þ1 and four from class 1. The separation hyperplane becomes even more convoluted when a degree 10 polynomial kernel is used (Figure 5b). It is clear that this SVM model, with 10 support vectors (4 from class þ1 and 6 from class 1), is not an optimal model for the dataset from Table 1. The next two experiments were performed with the B spline kernel (Figure 6a) and the exponential radial basis function (RBF) kernel (Figure 6b). Both SVM models define elaborate hyperplanes, with a large number of support vectors (11 for spline, 14 for RBF). The SVM models obtained with the exponential RBF kernel acts almost like a look-up table, with all but one

296

Applications of Support Vector Machines in Chemistry

Figure 5 SVM classification models obtained with the polynomial kernel (Eq. [65]) for the dataset from Table 1: (a) polynomial of degree 3; (b) polynomial of degree 10.

pattern used as support vectors. By comparing the SVM models from Figures 4–6, it is clear that the best one is obtained with the degree 2 polynomial kernel, the simplest function that separates the two classes with the lowest number of support vectors. This principle of minimum complexity of the kernel function should serve as a guide for the comparative evaluation and selection of the best kernel. Like all other multivariate algorithms, SVM can overfit the data used in training, a problem that is more likely to happen when complex kernels are used to generate the SVM model. Support vector machines were extended by Vapnik for regression4 by using an e-insensitive loss function (Figure 7). The learning set of patterns is used to obtain a regression model that can be represented as a tube with radius e fitted to the data. In the ideal case, SVM regression finds a function that maps

Figure 6 SVM classification models for the dataset from Table 1: (a) B spline kernel, degree 1, Eq. [72]; (b) exponential radial basis function kernel, s ¼ 1, Eq. [67].

A Nonmathematical Introduction to SVM

297

+ε 0 −ε

Figure 7 Support vector machines regression determines a tube with radius e fitted to the data.

all input data with a maximum deviation e from the target (experimental) values. In this case, all training points are located inside the regression tube. However, for datasets affected by errors, it is not possible to fit all the patterns inside the tube and still have a meaningful model. For the general case, SVM regression considers that the error for patterns inside the tube is zero, whereas patterns situated outside the regression tube have an error that increases when the distance to the tube margin increases (Figure 7).30 The SVM regression approach is illustrated with a QSAR for angiotensin II antagonists (Table 2) from a review by Hansch et al.31 This QSAR, modeling the IC50 for angiotensin II determined in rabbit aorta rings, is a nonlinear equation based on the hydrophobicity parameter ClogP: log1=IC50 ¼ 5:27ð 1:0Þ þ 0:50ð 0:19ÞClogP  3:0ð 0:83Þlogðb 10ClogP þ 1Þ n ¼ 16

r2cal ¼ 0:849

scal ¼ 0:178

q2LOO ¼ 0:793

opt:ClogP ¼ 6:42

We will use this dataset later to demonstrate the kernel influence on the SVM regression, as well as the effect of modifying the tube radius e. However, we will not present QSAR statistics for the SVM model. Comparative QSAR models are shown in the section on SVM applications in chemistry. A linear function is clearly inadequate for the dataset from Table 2, so we will not present the SVMR model for the linear kernel. All SVM regression figures were prepared with the Gunn’s MATLAB toolbox. Patterns are represented by þ, and support vectors are represented as þ inside a circle. The SVM hyperplane is drawn with a continuous line, whereas the margins of the SVM regression tube are represented by dotted lines. Several experiments with different kernels showed that the degree 2 polynomial kernel offers a good model for this dataset, and we decided to demonstrate the influence of the tube radius e for this kernel (Figures 8 and 9). When the e parameter is too small, the diameter of the tube is also small forcing all patterns to be situated outside the SVMR tube. In this case, all patterns are penalized with a value that increases when the distance from the tube’s margin increases. This situation is demonstrated in Figure 8a generated with e ¼ 0:05, when all patterns are support

298

Applications of Support Vector Machines in Chemistry Table 2 Data for the Angiotensin II Antagonists QSAR31 and for the SVM Regression Models from Figures 8–11 X N C4H9

N N

N

NH

N

N

O

No

Substituent X

ClogP

log 1/IC50

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

H C2H5 (CH2)2CH3 (CH2)3CH3 (CH2)4CH3 (CH2)5CH3 (CH2)7CH3 CHMe2 CHMeCH2CH3 CH2CHMeCH2CMe3 CH2-cy-C3H5 CH2CH2-cy-C6H11 CH2COOCH2CH3 CH2CO2CMe3 (CH2)5COOCH2CH3 CH2CH2C6H5

4.50 4.69 5.22 5.74 6.27 6.80 7.86 5.00 5.52 7.47 5.13 7.34 4.90 5.83 5.76 6.25

7.38 7.66 7.82 8.29 8.25 8.06 6.77 7.70 8.00 7.46 7.82 7.75 8.05 7.80 8.01 8.51

vectors. As e increases to 0.1, the diameter of the tube increases and the number of support vector decreases to 12 (Figure 8b), whereas the remaining patterns are situated inside the tube and have zero error. A further increase of e to 0.3 results in a dramatic change in the number of support vectors, which decreases to 4 (Figure 9a), whereas an e of 0.5, with two support vectors, gives an SVMR model with a decreased curvature

Figure 8 SVM regression models with a degree 2 polynomial kernel (Eq. [65]) for the dataset from Table 2: (a) e ¼ 0:05; (b) e ¼ 0:1.

A Nonmathematical Introduction to SVM

299

Figure 9 SVM regression models with a degree 2 polynomial kernel (Eq. [65]) for the dataset from Table 2: (a) e ¼ 0:3; (b) e ¼ 0:5.

(Figure 9b). These experiments illustrate the importance of the e parameter on the SVMR model. Selection of the optimum value for e should be determined by comparing the prediction statistics in cross-validation. The optimum value of e depends on the experimental errors of the modeled property. A low e should be used for low levels of noise, whereas higher values for e are appropriate for large experimental errors. Note that a low e results in SVMR models with a large number of support vectors, whereas sparse models are obtained with higher values for e. We will explore the possibility of overfitting in SVM regression when complex kernels are used to model the data, but first we must consider the limitations of the dataset in Table 2. This is important because those data might prevent us from obtaining a high-quality QSAR. First, the biological data are affected by experimental errors and we want to avoid modeling those errors (overfitting the model). Second, the influence of the substituent X is characterized with only its hydrophobicity parameter ClogP. Although hydrophobicity is important, as demonstrated in the QSAR model, it might be that other structural descriptors (electronic or steric) actually control the biological activity of this series of compounds. However, the small number of compounds and the limited diversity of the substituents in this dataset might not reveal the importance of those structural descriptors. Nonetheless, it follows that a predictive model should capture the nonlinear dependence between ClogP and log 1/IC50, and it should have a low degree of complexity to avoid modeling of the errors. The next two experiments were performed with the degree 10 polynomial kernel (Figure 10a; 12 support vectors) and the exponential RBF kernel with s ¼ 1 (Figure 10b; 11 support vectors). Both SVMR models, obtained with e ¼ 0:1, follow the data too closely and fail to recognize the general relationship between ClogP and log 1/IC50. The overfitting is more pronounced for the exponential RBF kernel, which therefore is not a good choice for this QSAR dataset. Interesting results are also obtained with the spline kernel (Figure 11a) and the degree 1 B spline kernel (Figure 11b). The spline kernel offers an

300

Applications of Support Vector Machines in Chemistry

Figure 10 SVM regression models with e ¼ 0:1 for the dataset of Table 2: (a) polynomial kernel, degree 10, Eq. [65]; (b) exponential radial basis function kernel, s ¼ 1, Eq. [67].

interesting alternative to the SVMR model obtained with the degree 2 polynomial kernel. The tube is smooth, with a noticeable asymmetry, which might be supported by the experimental data, as one can deduce after a visual inspection. Together with the degree 2 polynomial kernel model, this spline kernel represents a viable QSAR model for this dataset. Of course, only detailed cross-validation and parameter tuning can decide which kernel is best. In contrast with the spline kernel, the degree 1 B spline kernel displays clear signs of overfitting, indicated by the complex regression tube. The hyperplane closely follows every pattern and is not able to extract a broad and simple relationship between ClogP and log 1/IC50. The SVMR experiments that we have just carried out using the QSAR dataset from Table 2 offer convincing proof for the SVM ability to model nonlinear relationships but also their overfitting capabilities. This dataset was presented only for demonstrative purposes, and we do not recommend the use of SVM for QSAR models with such a low number of compounds and descriptors.

Figure 11 SVM regression models with e ¼ 0:1 for the dataset of Table 2: (a) spline kernel, Eq. [71]; (b) B spline kernel, degree 1, Eq. [72].

Pattern Classification

301

PATTERN CLASSIFICATION Research in pattern recognition involves development and application of algorithms that can recognize patterns in data.32 These techniques have important applications in character recognition, speech analysis, image analysis, clinical diagnostics, person identification, machine diagnostics, and industrial process supervision as examples. Many chemistry problems can also be solved with pattern recognition techniques, such as recognizing the provenance of agricultural products (olive oil, wine, potatoes, honey, etc.) based on composition or spectra, structural elucidation from spectra, identifying mutagens or carcinogens from molecular structure, classification of aqueous pollutants based on their mechanism of action, discriminating chemical compounds based on their odor, and classification of chemicals in inhibitors and noninhibitors for a certain drug target. We now introduce some basic notions of pattern recognition. A pattern (object) is any item (chemical compound, material, spectrum, physical object, chemical reaction, industrial process) whose important characteristics form a set of descriptors. A descriptor is a variable (usually numerical) that characterizes an object. Note that in pattern recognition, descriptors are usually called ‘‘features’’, but in SVM, ‘‘features’’ have another meaning, so we must make a clear distinction here between ‘‘descriptors’’ and ‘‘features’’. A descriptor can be any experimentally measured or theoretically computed quantity that describes the structure of a pattern, including, for example, spectra and composition for chemicals, agricultural products, materials, biological samples; graph descriptors33 and topological indices;34 indices derived from the molecular geometry and quantum calculations;35,36 industrial process parameters; chemical reaction variables; microarray gene expression data; and mass spectrometry data for proteomics. Each pattern (object) has associated with it a property value. A property is an attribute of a pattern that is difficult, expensive, or time-consuming to measure, or not even directly measurable. Examples of such properties include concentration of a compound in a biological sample, material, or agricultural product; various physical, chemical, or biological properties of chemical compounds; biological toxicity, mutagenicity, or carcinogenicity; ligand/nonligand for different biological receptors; and fault identification in industrial processes. The major hypothesis used in pattern recognition is that the descriptors capture some important characteristics of the pattern, and then a mathematical function (e.g., machine learning algorithm) can generate a mapping (relationship) between the descriptor space and the property. Another hypothesis is that similar objects (objects that are close in the descriptor space) have similar properties. A wide range of pattern recognition algorithms are currently being used to solve chemical problems. These methods include linear discriminant analysis, principal component analysis, partial least squares (PLS),37 artificial

302

Applications of Support Vector Machines in Chemistry +1

+1 −1 −1

−1

+1 ?

−1

+1

+1

+1

+1 +1

+1 +1

−1 −1 −1

−1

−1 −1

−1 −1

−1 −1

Figure 12 Example of a classification problem.

neural networks,38 multiple linear regression (MLR), principal component regression, k-nearest neighbors (k-NN), evolutionary algorithms embedded into machine learning procedures,39 and large margin classifiers including, of course, support vector machines. A simple example of a classification problem is presented in Figure 12. The learning set consists of 24 patterns, 10 in class þ1 and 14 in class 1. In the learning (training) phase, the algorithm extracts classification rules using the information available in the learning set. In the prediction phase, the classification rules are applied to new patterns, with unknown class membership, and each new pattern is assigned to a class, either þ1 or 1. In Figure 12, the prediction pattern is indicated with ‘‘?’’. We consider first a k-NN classifier, with k ¼ 1. This algorithm computes the distance between the new pattern and all patterns in the training set, and then it identifies the k patterns closest to the new pattern. The new pattern is assigned to the majority class of the k nearest neighbors. Obviously, k should be odd to avoid undecided situations. The k-NN classifier assigns the new pattern to class þ1 (Figure 13) because its closest pattern belongs to this class. The predicted class of a new pattern can change by changing the parameter k. The optimal value for k is usually determined by cross-validation. The second classifier considered here is a hyperplane H that defines two regions, one for patterns þ1 and the other for patterns 1. New patterns are assigned to class þ1 if they are situated in the space region corresponding to the class þ1, but to class 1 if they are situated in the region corresponding to class 1. For example, the hyperplane H in Figure 14 assigns the new pattern to class 1. The approach of these two algorithms is very different: although the k-NN classifier memorizes all patterns, the hyperplane classifier is defined by the equation of a plane in the pattern space. The hyperplane can be used only for linearly separable classes, whereas k-NN is a nonlinear classifier and can be used for classes that cannot be separated with a linear hypersurface.

Pattern Classification +1

+1

+1

+1 −1 −1

−1

+1

+1

+1

−1

+1

+1

+1

303

+1

−1 −1

−1

−1

−1 −1

−1

−1

−1 −1

Figure 13 Using the k-NN classifier (k ¼ 1), the pattern . is predicted to belong to the class þ1.

An n-dimensional pattern (object) x has n coordinates, x ¼ ðx1 ; x2 ; . . . ; xn Þ, where each xi is a real number, xi 2 R for i ¼ 1, 2, . . ., n. Each pattern xj belongs to a class yj 2 f1; þ1g. Consider a training set T of m patterns together with their classes, T ¼ fðx1 ; y1 Þ; ðx2 ; y2 Þ; . . . ; ðxm ; ym Þg. Consider a dot product space S, in which the patterns x are embedded, x1 , x2 , . . ., xm 2 S. Any hyperplane in the space S can be written as fx 2 Sjw  x þ b ¼ 0g; w 2 S; b 2 R

½1

The dot product w  x is defined by wx¼

n X

w i xi

½2

i¼1

H +1

+1 −1 −1

−1

+1

−1

−1

+1

+1

+1

+1 +1

+1 +1

−1 −1 −1

−1

−1 −1

−1 −1

−1 −1

Figure 14 Using the linear classifier defined by the hyperplane H, the pattern . is predicted to belong to the class 1.

304

Applications of Support Vector Machines in Chemistry H

+1

+1 +1

−1

−1

−1 −1

+1

+1

Class −1 w ·x i +b< 0

−1

−1

+1

Class +1 w ·x i +b>0

−1

−1

−1

w ·x i +b=0

Figure 15 The classification hyperplane defines a region for class þ1 and another region for class 1.

A hyperplane w  x þ b ¼ 0 can be denoted as a pair (w, b). A training set of patterns is linearly separable if at least one linear classifier exists defined by the pair (w, b), which correctly classifies all training patterns (see Figure 15). All patterns from class þ1 are located in the space region defined by w  x þ b > 0, and all patterns from class 1 are located in the space region defined by w  x þ b < 0. Using the linear classifier defined by the pair (w, b), the class of a pattern xk is determined with  þ1 if w  xk þ b > 0 ½3 classðxk Þ ¼ 1 if w  xk þ b < 0 The distance from a point x to the hyperplane defined by (w, b) is dðx; w; bÞ ¼

jw  x þ bj jjwjj

½4

where jjwjj is the norm of the vector w. Of all the points on the hyperplane, one has the minimum distance dmin to the origin (Figure 16): dmin ¼

jbj jjwjj

½5

In Figure 16, we show a linear classifier (hyperplane H defined by w  x þ b ¼ 0), the space region for class þ1 patterns (defined by w  x þ b > 0), the space region for class 1 patterns (defined by w  x þ b < 0), and the distance between origin and the hyperplane H (jbj=jjwjj). Consider a group of linear classifiers (hyperplanes) defined by a set of pairs (w, b) that satisfy the following inequalities for any pattern xi in the training set:  w  xi þ b > 0 if yi ¼ þ1 ½6 w  xi þ b < 0 if yi ¼ 1

Pattern Classification

305

H w

Class +1 w ·x i +b>0

Class −1 w ·x i +b 0). It is clear that such classifiers have little prediction success, which led to the idea

+1

+1 −1 −1

−1

+1

−1

+1

+1

+1

+1 +1

+1 +1

−1 −1 −1

−1

−1 −1

−1 −1

−1 −1

Figure 17 Several hyperplanes that correctly classify the two classes of patterns.

306

Applications of Support Vector Machines in Chemistry +1

+1

+1 +1

+1

−1

+1 +1

−1

+1

−1 −1

−1 −1 −1

−1 −1

Figure 18 Examples of margin hyperplane classifiers.

of wide margin classifiers, i.e., a hyperplane with a buffer toward the þ1 and 1 space regions (Figure 18). For some linearly separable classification problems having a finite number of patterns, it is generally possible to define a large number of wide margin classifiers (Figure 18). Chemometrics and pattern recognition applications suggest that an optimum prediction could be obtained with a linear classifier that has a maximum margin (separation between the two classes), and with the separation hyperplane being equidistant from the two classes. In the next section, we introduce elements of statistical learning theory that form the basis of support vector machines, followed by a section on linear support vector machines in which the mathematical basis for computing a maximum margin classifier with SVM is presented.

THE VAPNIK–CHERVONENKIS DIMENSION Support vector machines are based on the structural risk minimization (SRM), derived from statistical learning theory. 4,5,10 This theory is the basis for finding bounds for the classification performance of machine learning algorithms. Another important result from statistical learning theory is the performance estimation of finite set classifiers and the convergence of their classification performance toward that of a classifier with an infinite number of learning samples. Consider a learning set of m patterns. Each pattern consists of a vector of characteristics xi 2 Rn and an associated class membership yi . The task of the machine learning algorithm is to find the rules of the mapping xi ! yi . The machine model is a possible mapping xi ! f ðxi ; p), where each model is defined by a set of parameters p. Training a machine learning algorithm results in finding an optimum set of parameters p. The machine algorithm is considered to be deterministic; i.e., for a given input vector xi and a set of parameters p, the output will be always f ðxi ; p). The expectation for the test error of a machine trained

The Vapnik–Chervonenkis Dimension

307

with an infinite number of samples is denoted by e(p) (called expected risk or expected error). The empirical risk eemp (p) is the measured error for a finite number of patterns in the training set: eemp ðpÞ ¼

m 1 X jyi  f ðxi ; pÞj 2m i¼1

½7

The quantity ½jyi  f ðxi ; pÞj is called the loss, and for a two-class classification, it can take only the values 0 and 1. Choose a value Z such that 0  Z  1. For losses taking these values, with probability 1  Z, the following bound exists for the expected risk: rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dVC ðlogð2m=dVC Þ þ 1Þ  logðZ=4Þ eðpÞ  eemp ðpÞ þ m

½8

where dVC is a non-negative integer, called the Vapnik–Chervonenkis (VC) dimension of a classifier, that measures the capacity of a classifier. The right-hand side of this equation defines the risk bound. The second term in the right-hand side of the equation is called VC confidence. We consider the case of two-class pattern recognition, when the function f ðxi ; p) can take only two values, e.g., þ1 and 1. Consider a set of m points and all their two-class labelings. If for each of the 2m labelings one can find a classifier f(p) that correctly separates class þ1 points from class 1 points, then that set of points is separated by that set of functions. The VC dimension for a set of functions ff ðpÞg is defined as the maximum number of points that can be separated by ff ðpÞg. In two dimensions, three samples can be separated with a line for each of the six possible combinations (Figure 19, top panels). In the case of four training points in a plane, there are two cases that cannot be separated with a line (Figure 19, bottom panels). These two cases require a classifier of higher complexity, with a higher VC dimension. The example

· °° ° ° · ° ·° · °· ·° °° °° ·° ·· °° ·° ·°

·· ° ·° ° ·

· · ° · ·° °

Figure 19 In a plane, all combinations of three points from two classes can be separated with a line. Four points cannot be separated with a linear classifier.

308

Applications of Support Vector Machines in Chemistry

dVC,1

dVC,2

dVC,3

Figure 20 Nested subsets of function, ordered by VC dimension.

from Figure 19 shows that the VC dimension of a set of lines in R2 is three. A family of classifiers has an infinite VC dimension if it can separate m points, with m being arbitrarily large. The VC confidence term in Eq. [8] depends on the chosen class of functions, whereas the empirical risk and the actual risk depend on the particular function obtained from the training algorithm.23 It is important to find a subset of the selected set of functions such that the risk bound for that subset is minimized. A structure is introduced by classifying the whole class of functions into nested subsets (Figure 20), with the property dVC;1 < dVC;2 < dVC;3 . For each subset of functions, it is either possible to compute dVC or to get a bound on the VC dimension. Structural risk minimization consists of finding the subset of functions that minimizes the bound on the actual risk. This is done by training for each subset a machine model. For each model the goal is to minimize the empirical risk. Subsequently, one selects the machine model whose sum of empirical risk and VC confidence is minimal.

PATTERN CLASSIFICATION WITH LINEAR SUPPORT VECTOR MACHINES To apply the results from the statistical learning theory to pattern classification one has to (1) choose a classifier with the smallest empirical risk and (2) choose a classifier from a family that has the smallest VC dimension. For a linearly separable case condition, (1) is satisfied by selecting any classifier that completely separates both classes (for example, any classifier from Figure 17), whereas condition (2) is satisfied for the classifier with the largest margin.

SVM Classification for Linearly Separable Data The optimum separation hyperplane (OSH) is the hyperplane with the maximum margin for a given finite set of learning patterns. The OSH computation with a linear support vector machine is presented in this section. The Optimization Problem Based on the notations from Figure 21, we will now establish the conditions necessary to determine the maximum separation hyperplane. Consider a

Pattern Classification with Linear Support Vector Machines H2

H H1

+1

w

+1

w ·x i +b>+1

+1 −1 −1

−1

+1

−1

−1

309

+1

+1

−1

+1

+1

w ·x i +b< −1 −1

−1

+1 w ·x i +b= +1 w ·x i +b=0

−1

2/|| w ||

w ·x i +b = −1

Figure 21 The separating hyperplane.

linear classifier characterized by the set of pairs (w, b) that satisfy the following inequalities for any pattern xi in the training set: 

w  xi þ b > þ1 w  xi þ b < 1

if if

yi ¼ þ1 yi ¼ 1

½9

These equations can be expressed in compact form as yi ðw  xi þ bÞ þ1

½10

yi ðw  xi þ bÞ  1 0

½11

or

Because we have considered the case of linearly separable classes, each such hyperplane (w, b) is a classifier that correctly separates all patterns from the training set: classðxi Þ ¼



þ1 1

if if

w  xi þ b > 0 w  xi þ b < 0

½12

For the hyperplane H that defines the linear classifier (i.e., where w  x þ b ¼ 0), the distance between the origin and the hyperplane H is jbj=jjwjj. We consider the patterns from the class 1 that satisfy the equality w  x þ b ¼ 1 and that determine the hyperplane H1; the distance between the origin and the hyperplane H1 is equal to j  1  bj=jjwjj. Similarly, the patterns from the class þ1 satisfy the equality w  x þ b ¼ þ1 and that determine

310

Applications of Support Vector Machines in Chemistry

the hyperplane H2; the distance between the origin and the hyperplane H2 is equal to j þ 1  bj=jjwjj. Of course, hyperplanes H, H1, and H2 are parallel and no training patterns are located between hyperplanes H1 and H2. Based on the above considerations, the margin of the linear classifier H (the distance between hyperplanes H1 and H2) is 2=jjwjj. We now present an alternative method to determine the distance between hyperplanes H1 and H2. Consider a point x0 located on the hyperplane H and a point x1 located on the hyperplane H1, selected in such a way that (x0  x1 ) is orthogonal to the two hyperplanes. These points satisfy the following two equalities: 

w  x0 þ b ¼ 0

w  x1 þ b ¼ 1

½13

By subtracting the second equality from the first equality, we obtain w  ðx0  x1 Þ ¼ 1

½14

Because (x0  x1 ) is orthogonal to the hyperplane H, and w is also orthogonal to H, then (x0  x1 ) and w are parallel, and the dot product satisfies jw  ðx0  x1 Þj ¼ jjwjj jjx0  x1 jj

½15

From Eqs. [14] and [15], we obtain the distance between hyperplanes H and H 1: jjx0  x1 jj ¼

1 jjwjj

½16

Similarly, a point x0 located on the hyperplane H and a point x2 located on the hyperplane H2, selected in such a way that (x0  x2 ) is orthogonal to the two hyperplanes, will satisfy the equalities: 

w  x0 þ b ¼ 0 w  x2 þ b ¼ þ1

½17

Consequently, the distance between hyperplanes H and H2 is jjx0  x2 jj ¼

1 jjwjj

½18

Therefore, the margin of the linear classifier defined by (w, b) is 2=jjwjj. The wider the margin, the smaller is dVC , the VC dimension of the classifier. From

Pattern Classification with Linear Support Vector Machines

311

these considerations, it follows that the optimum separation hyperplane is obtained by maximizing 2=jjwjj, which is equivalent to minimizing jjwjj2 =2. The problem of finding the optimum separation hyperplane is represented by the identification of the linear classifier (w, b), which satisfies  w  xi þ b þ1 if yi ¼ þ1 ½19 w  xi þ b  1 if yi ¼ 1 for which ||w|| has the minimum value. Computing the Optimum Separation Hyperplane Based on the considerations presented above, the OSH conditions from Eq. [19] can be formulated into the following expression that represents a linear SVM: jjwjj2 2 with the constraints gi ðxÞ ¼ yi ðw  xi þ bÞ  1 0; i ¼ 1; . . . ; m minimize f ðxÞ ¼

½20

The optimization problem from Eq. [20] represents the minimization of a quadratic function under linear constraints (quadratic programming), a problem studied extensively in optimization theory. Details on quadratic programming can be found in almost any textbook on numerical optimization, and efficient implementations exist in many software libraries. However, Eq. [20] does not represent the actual optimization problem that is solved to determine the OSH. Based on the use of a Lagrange function, Eq. [20] is transformed into its dual formulation. All SVM models (linear and nonlinear, classification and regression) are solved for the dual formulation, which has important advantages over the primal formulation (Eq. [20]). The dual problem can be easily generalized to linearly nonseparable learning data and to nonlinear support vector machines. A convenient way to solve constrained minimization problems is by using a Lagrangian function of the problem defined in Eq. [20]: LP ðw; b; LÞ ¼ f ðxÞ þ

m X i¼0

m X 1 li gi ðxÞ ¼ kwk2  li ðyi ðw  xi þ bÞ  1Þ 2 i¼1

m m X X 1 1 ¼ kwk2  li ¼ kwk2 li yi ðw  xi þ bÞþ 2 2 i¼1 i¼1



m X i¼1

li y i w  x i 

m X i¼1

li y i b þ

m X

½21

li

i¼1

Here L ¼ ðl1 ; l2 ; . . . ; lm ) is the set of Lagrange multipliers of the training (calibration) patterns with li 0, and P in LP indicates the primal

312

Applications of Support Vector Machines in Chemistry

formulation of the problem. The Lagrangian function LP must be minimized with respect to w and b, and maximized with respect to li , subject to the constraints li 0. This is equivalent to solving the Wolfe dual problem,40 namely to maximize LP subject to the constraints that the gradient of LP with respect to w and b is zero, and subject to the constraints li 0. The Karuch–Kuhn–Tucker (KKT)40 conditions for the primal problem are as follows: Gradient Conditions   m X qLP ðw; b; LÞ qLP ðw; b; LÞ qL qL qL ; ;; li yi xi ¼ 0; where ¼w ¼ qw qw qw1 qw2 qwn i¼1 qLP ðw; b; LÞ ¼ qb

m X i¼1

½22

li y i ¼ 0

qLP ðw; b; LÞ ¼ gi ðxÞ ¼ 0 qli

½23 ½24

Orthogonality Condition li gi ðxÞ ¼ li ½yi ðw  xi þ bÞ  1 ¼ 0; i ¼ 1; . . . ; m

½25

Feasibility Condition yi ðw  xi þ bÞ  1 0; i ¼ 1; . . . ; m

½26

Non-negativity Condition li 0; i ¼ 1; . . . ; m

½27

Solving the SVM problem is equivalent to finding a solution to the KKT conditions. We are now ready to formulate the dual problem LD : maximize LD ðw; b; LÞ ¼

m X i¼1

li 

subject to li 0; i ¼ 1; . . . ; m m X and li yi ¼ 0

m m X 1X li lj yi yj xi  xj 2 i¼1 j¼1

½28

i¼1

Both the primal LP and the dual LD Lagrangian functions are derived from the same objective functions but with different constraints, and the solution is

Pattern Classification with Linear Support Vector Machines

313

found by minimizing LP or by maximizing LD . The most popular algorithm for solving the optimization problem is the sequential minimal optimization (SMO) proposed by Platt.41 When we introduced the Lagrange function we assigned a Lagrange multiplier li to each training pattern via the constraints gi (x) (see Eq. [20]). The training patterns from the SVM solution that have li > 0 represent the support vectors. The training patterns that have li ¼ 0 are not important in obtaining the SVM model, and they can be removed from training without any effect on the SVM solution. As we will see below, any SVM model is completely defined by the set of support vectors and the corresponding Lagrange multipliers. The vector w that defines the OSH (Eq. [29]) is obtained by using Eq. [22]: w¼

m X

li yi xi

i¼1

½29

To compute the threshold b of the OSH, we consider the KKT condition of Eq. [25] coupled with the expression for w from Eq. [29] and the condition lj > 0, which leads to m X i¼1

li yi xi  xj þ b ¼ yj

½30

Therefore, the threshold b can be obtained by averaging the b values obtained for all support vector patterns, i.e., the patterns with lj > 0: b ¼ yj 

m X i¼1

l i y i x i  xj

½31

Prediction for New Patterns In the previous section, we presented the SVM algorithm for training a linear classifier. The result of this training is an optimum separation hyperplane defined by (w, b) (Eqs. [29] and [31]). After training, the classifier is ready to predict the class membership for new patterns, different from those used in training. The class of a pattern xk is determined with classðxk Þ ¼



þ1 1

if if

w  xk þ b > 0 w  xk þ b < 0

½32

Therefore, the classification of new patterns depends only on the sign of the expression w  x þ b. However, Eq. [29] offers the possibility to predict new

314

Applications of Support Vector Machines in Chemistry

patterns without computing the vector w explicitly. In this case, we will use for classification the support vectors from the training set and the corresponding values of the Lagrange multipliers li : classðxk Þ ¼ sign

m X i¼1

!

li yi xi  xk þ b

½33

Patterns that are not support vectors (li ¼ 0) do not influence the classification of new patterns. The use of Eq. [33] has an important advantage over using Eq. [32]: to classify a new pattern xk , it is only necessary to compute the dot product between xk and every support vector. This results in a significant saving of computational time whenever the number of support vectors is small compared with the total number of patterns in the training set. Also, Eq. [33] can be easily adapted for nonlinear classifiers that use kernels, as we will show later. For a particular SVM problem (training set, kernel, kernel parameters), the optimum separation hyperplane is determined only by the support vectors (Figure 22a). By eliminating from training those patterns that are not support vectors (li ¼ 0), the SVM solution does not change (Figure 22b). This property suggests a possible approach for accelerating the SVM learning phase, in which patterns that cannot be support vectors are eliminated from learning. Example of SVM Classification for Linearly Separable Data We now present several SVM classification experiments for a dataset that is linearly separable (Table 3). This exercise is meant to compare the linear kernel with nonlinear kernels and to compare different topologies for the separating hyperplanes. All models used an infinite value for the capacity parameter C (no tolerance for misclassified patterns; see Eq. [39]).

H1

H

H2 +1 +1

+1 +1

−1 −1 −1

+1

−1

+1

+1

+1 +1

+1

H1

H

H2

−1

+1 −1

+1

−1 −1 −1

−1

−1 −1

−1

−1 −1

(a)

−1 −1

(b)

Figure 22 The optimal hyperplane classifier obtained with all training patterns (a) is identical with the one computed with only the support vector patterns (b).

Pattern Classification with Linear Support Vector Machines

315

Table 3 Linearly Separable Patterns Used for the SVM Classification Models in Figures 23–25 Pattern

x1

x2

Class

1 2 3 4 5 6 7 8 9 10 11 12 13

1 2.25 3.25 4 5.25 5.5 0.5 1 1.5 2.25 3 3.75 5

5.5 5 4.25 5.2 2.25 4 3.5 2 1 2.7 0.8 1.25 0.6

1 1 1 1 1 1 1 1 1 1 1 1 1

As expected, a linear kernel offers a complete separation of the two classes (Figure 23a), with only three support vectors, namely one from class þ1 and two from class 1. The hyperplane has the maximum width and provides both a sparse solution and a good prediction model for new patterns. Note that, according to the constraints imposed in generating this SVMC model, no patterns are allowed inside the margins of the classifier (margins defined by the two bordering hyperplanes represented with dotted lines). To predict the class attribution for new patterns, one uses Eq. [33] applied to the three support vectors. The next experiment uses a degree 2 polynomial kernel (Figure 23b), which gives a solution with five support vectors, namely two from class þ1 and three from class 1. The model is not optimal for this dataset, but it still provides an acceptable hyperplane

Figure 23 SVM classification models for the dataset from Table 3: (a) dot kernel (linear), Eq. [64]; (b) polynomial kernel, degree 2, Eq. [65].

316

Applications of Support Vector Machines in Chemistry

Figure 24 SVM classification models for the dataset from Table 3: (a) polynomial kernel, degree 10, Eq. [65]; (b) exponential radial basis function kernel, s ¼ 1, Eq. [67].

topology. We have to notice that the margin width varies, decreasing from left to right. By increasing the polynomial degree to 10, we obtain an SVM model that has a wide margin in the center of the separating hyperplane and a very small margin toward the two ends (Figure 24a). Four patterns are selected as support vectors, two from each class. This is not a suitable classifier for the dataset from Table 3, mainly because the topology of the separating hypersurface is too complicated. An even more complex discriminating hyperplane is produced by the exponential RBF kernel (Figure 24b). The last two experiments for the linearly separable dataset are performed with the Gaussian RBF kernel (s ¼ 1; Figure 25a) and the B spline kernel (degree 1; Figure 25b). Although not optimal, the classification hyperplane for the Gaussian RBF kernel is much better than those obtained with the exponential RBF kernel and degree 10 polynomial kernel. On the other hand, SVM

Figure 25 SVM classification models for the dataset from Table 3: (a) Gaussian radial basis function kernel, s ¼ 1, Eq. [66]; (b) B spline kernel, degree 1, Eq. [72].

Pattern Classification with Linear Support Vector Machines

317

with the B spline kernel is clearly overfitted, with a total of nine support vectors (four from class þ1 and five from class 1). The margins of the SVM classifier define two ‘‘islands’’ that surround each cluster of patterns. Noticeable are the support vectors situated far away from the central hyperplane. The SVM classification models depicted in Figures 23–25 convey an important message for scientists who want to use SVM applications in cheminformatics: SVM models obtained with complex, nonlinear kernels must always be compared with those obtained with a linear kernel. Chances are that the separation hypersurface is almost linear, thus avoiding overfitting the data.

Linear SVM for the Classification of Linearly Non-Separable Data In the previous section, we presented the SVMC model for the case when the training set is linearly separable, and an optimum separation hyperplane correctly classifies all patterns from that training set. The linear separability of two classes of patterns might not be a valid assumption for real-life applications, however, and in these cases, the algorithm presented earlier will not find a solution. There are many reasons why a training set is linearly nonseparable. The identification of input variables (x1 , x2 , . . . , xn ) that can separate the two classes linearly is not a trivial task. When descriptors are used for SAR models, the selection of those descriptors can be made from thousands of descriptors from the extant literature or they can be computed with available software. Although several procedures have been developed to select the optimum set of structural descriptors, these methods are often time-consuming and may require special algorithms that are not implemented in, e.g., currently available SVM packages. In chemometrics applications, when measured quantities (e.g., spectra, physico-chemical properties, chemical reaction variables, or industrial process variables) are used to separate two classes of patterns, difficulties exist not only for identifying the relevant properties, but also for cost and instrument availability, which may limit the number of possible measurements. Also, all experimental input data are affected by measurement errors and noise, which can make the patterns linearly nonseparable. Finally, the classes might not be separable with a linear classifier, due to the nonlinear mapping between the input space and the two classes. In Figure 26, we present a classification problem that, for the majority of patterns, can be solved with a linear classifier. However, the region corresponding to the þ1 patterns contains two 1 patterns (shown in square boxes), whereas the two þ1 patterns are embedded in the region corresponding to the 1 patterns. Of course, no linear classifier can be computed for this learning set, but several hyperplanes can be calculated in such a way as to minimize the number of classification errors, e.g., hyperplane H in Figure 26. In this section, we consider a training set T of m patterns together with their classes, T ¼ fðx1 ; y1 Þ; (x2 ; y2 Þ; . . . ; ðxm ; ym Þg that can be separated

318

Applications of Support Vector Machines in Chemistry H

+1 +1

+1

+1 +1

−1 −1 −1

+1

−1

−1

−1

+1

+1

−1

+1 −1

−1

−1 −1

−1 +1

−1

−1 −1

Figure 26 Linearly nonseparable data. The patterns that cannot be linearly separated with a hyperplane are represented inside a square.

linearly, except for a small number of objects. Obviously, computing the optimum separation hyperplane according to Eqs. [21] and [28] will fail to produce any viable solution. We will show below how the SVMC for linearly separable patterns can be adapted to accommodate classification errors in the training set. The resulting SVMC will still be linear, but it will compute an optimum separation hyperplane even for cases like that in Figure 26, which cannot be completely separated with a linear classifier. In the previous section, we found that the OSH defined by a pair (w, b) is a buffer between class þ1 and class 1 of patterns, with the property that it has the largest margin. The border toward the class þ1 is defined by the hyperplane w  x þ b ¼ 1, whereas the border toward the class 1 is defined by the hyperplane w  x þ b ¼ 1. For the OSH, all class þ1 patterns satisfy w x þ b þ1, whereas all class 1 patterns satisfy w x þ b  1, and the learning set is classified without errors. To obtain an optimum linear classifier for nonseparable data (Figure 27), a penalty is introduced for misclassified data, denoted with x and called a slack H2

+1

H

+1

+1

+1

H1

+1

−1

−1 −1 −1

+1 −1

ξ

−1 −1

−1

−1 ξ

+1 w·xi+b=+1 w·xi+b=0

+1 w·xi+b=−1

Figure 27 Linear separable hyperplanes for nonseparable data. The patterns that cannot be linearly separated with a hyperplane are represented inside a square.

Pattern Classification with Linear Support Vector Machines

319

variable. This penalty associated with any pattern in the training is zero for patterns classified correctly, and has a positive value that increases with the distance from the corresponding hyperplane for patterns that are not situated on the correct side of the classifier. For a pattern (xi , yi ) from the class þ1, the slack variable is defined as xi ðw; bÞ ¼



0 1  ðw  xi þ bÞ

if if

w  xi þ b þ1 w  xi þ b  þ1

½34

Similarly, for a pattern (xi , yi ) from the class 1, the slack variable is defined as xi ðw; bÞ ¼



0 1 þ ðw  xi þ bÞ

if if

w  xi þ b  1 w  xi þ b 1

½35

From Eqs. [34] and [35] and Figure 27, one can see that the slack variable xi (w, b) is zero for þ1 patterns that are classified correctly by hyperplane H2 (w  x þ b þ1) and for 1 patterns that are classified correctly by hyperplane H1 (w  x þ b  1). Otherwise, the slack variable has a positive value that measures the distance between a pattern xi and the corresponding hyperplane w  x þ b ¼ yi . For þ1 patterns situated in the buffer zone between H and H2, and for 1 patterns situated in the buffer zone between H and H1, the slack variable takes values between 0 and 1. Such patterns are not considered to be misclassified, but they have a penalty added to the objective function. If a pattern xi is located in the ‘‘forbidden’’ region of the classifier, then xi (w, b) > 1 (see the patterns in square boxes from Figure 27) and the pattern is considered to be misclassified. We can combine Eqs. [34] and [35] for slack variables of þ1 and 1 patterns into Eq. [36]: xi ðw; bÞ ¼



0 1  yi ðw  xi þ bÞ

if if

yi ðw  xi þ bÞ þ1 yi ðw  xi þ bÞ  þ1

½36

When slack variables are introduced to penalize misclassified patterns or patterns situated in the buffer region between H and the corresponding border hyperplanes (H1 or H2), the constraints imposed to the objective function are as follows: 8 < w  xi þ b þ1  xi w  xi þ b  1 þ xi : xi > 0; 8i

if yi ¼ þ1 if yi ¼ 1

½37

The identification of an OSH is much more difficult when slack variables are used, because the optimum classifier is a compromise between two opposing conditions. On the one hand, a good SVMC corresponds to a hyperplane

320

Applications of Support Vector Machines in Chemistry

(w, b) with a margin as large as possible in order to guarantee good prediction performances, which translates into minimizing jjwjj2 =2. On the other hand, the optimum hyperplane should minimize the number of classification errors and it should also minimize the error of misclassified patterns, which translates in minimizing the number of positive slack variables and simultaneously minimizing the value of each slack variable. The latter condition has the tendency of decreasing the width of the SVMC hyperplane, which is in contradiction with the former condition. A simple way to combine these two conditions and to assign a penalty for classification errors is to change the objective function to be minimized from jjwjj2 =2 to m X jjwjj2 þC xi 2 i¼1

!k

½38

where C is a parameter that can be adjusted by the user, and can either increase or decrease the penalty for classification errors. A large C assigns a higher penalty to classification errors, thus minimizing the number of misclassified patterns. A small C maximizes the margin so that the OSH is less sensitive to the errors from the learning set. Equation [38] is a convex programming problem for any positive integer k, which for k ¼ 1 and k ¼ 2 is also a quadratic programming problem. The formula with k ¼ 1 has the advantage that neither xi nor their Lagrange multipliers appear in the Wolfe dual problem.40 Based on the above considerations, we are now ready to state the form of the optimization problem for SVMC with a linear classifier and classification errors: minimize

m X jjwjj2 þC xi 2 i¼1

with the constraints

yi ðw  xi þ bÞ þ1  xi ; i ¼ 1; . . . ; m xi 0;

½39

i ¼ 1; . . . ; m

To solve the above constrained quadratic optimization problem, we follow the approach based on Lagrange multipliers (Eq. [21]). We define the Lagrange multipliers L ¼ ðl1 ; l2 ; . . . ; lm ) for each constraint yi ðw  xi þ bÞ þ1  xi and the Lagrange multipliers M ¼ ðm1 ; m2 ; . . . ; mm ) for each constraint xi 0; 8 i ¼ 1; . . . ; m. With these notations, the primal Lagrangian function of this problem is LP ðw; b; L; MÞ ¼

m m m X X X 1 mi xi li ½yi ðw  xi þ bÞ  1 þ xi  xi  jjwjj2 þ C 2 i¼1 i¼1 i¼1

½40

Pattern Classification with Linear Support Vector Machines

321

where L ¼ ðl1 ; l2 ; . . . ; lm Þ is the set of Lagrange multipliers of the training (calibration) patterns. The Karuch–Kuhn–Tucker conditions40 for the primal problem are as follows: Gradient Conditions m X qLP ðw; b; L; MÞ ¼w li yi xi ¼ 0 qw i¼1   qLP ðw; b; L; MÞ qL qL qL ; ;; ¼ where qw qw1 qw2 qwn m X qLP ðw; b; L; MÞ ¼ l i yi ¼ 0 qb i¼1

qLP ðw; b; L; MÞ ¼ C  li  mi ¼ 0 qxi

½41 ½42 ½43

Orthogonality Condition li ½yi ðw  xi þ bÞ  1 þ xi  ¼ 0;

i ¼ 1; . . . ; m

½44

Feasibility Condition yi ðw  xi þ bÞ  1 þ xi 0;

i ¼ 1; . . . ; m

½45

Non-negativity Condition xi 0; li 0;

i ¼ 1; . . . ; m i ¼ 1; . . . ; m

mi xi ¼ 0;

i ¼ 1; . . . ; m

mi 0;

i ¼ 1; . . . ; m

½46

We now substitute Eqs. [41] and [42] into the right side of the Lagrangian function, obtaining the dual problem maximize LD ðw; b; L; MÞ ¼

m X i¼1

li 

m X m 1X li lj yi yj xi  xj 2 i¼1 j¼1

subject to 0  li  C; i ¼ 1; . . . ; m m X li yi ¼ 0 and i¼1

½47

322

Applications of Support Vector Machines in Chemistry

The solution for the vector w is obtained from Eq. [41], which represents one of the KKT conditions: w¼

m X

li yi xi

i¼1

½48

The value of b can be computed as an average for the b values obtained from all training patterns with the following KKT conditions: li ½yi ðw  xi þ bÞ  1 þ xi  ¼ 0

ðC  li Þxi ¼ 0

½49

From the above equations, we have also that xi ¼ 0 if li < C. Therefore, b can be averaged only for those patterns that have 0  li < C. We will now examine the relationships between the position of a pattern xi and the corresponding values for li ; xi , and C. The following situations can be distinguished: 1. (li ¼ 0; xi ¼ 0): The pattern is inside the þ1 region (w  xi þ b > þ1) if yi ¼ þ1 or inside the 1 region (w xi þ b < 1) if yi ¼ 1, i.e., it is correctly classified, and its distance from the separating hyperplane is larger than 1=jjwjj. Such patterns are not important in defining the SVMC model, and they do not influence the solution. Hence, they can be deleted from the learning set without affecting the model. 2. (0 < li < C; xi ¼ 0): This situation corresponds to correctly classified patterns situated on the hyperplanes that border the SVMC OSH, i.e., patterns þ1 are situated on the hyperplane (w xi þ b ¼ þ1), whereas patterns 1 are situated on the hyperplane (w xi þ b ¼ 1). The distance between these patterns and the separating hyperplane is 1=jjwjj. Such a pattern is called a margin support vector. 3. (li ¼ C; 0 < xi  1): These patterns, correctly classified, are called bound support vectors, and their distance to the separating hyperplane is smaller than 1=jjwjj. Patterns from the class þ1 are situated in the buffer zone between the separating hyperplane (w  xi þ b ¼ 0) and the border hyperplane toward the þ1 region (w  xi þ b ¼ þ1). Patterns from the class 1 are situated in the buffer zone between the separating hyperplane (w  xi þ b ¼ 0) and the border hyperplane toward the 1 region (w  xi þ b ¼ 1). 4. (li ¼ C; xi > 1): These patterns are incorrectly classified. Patterns from the class þ1 are situated in the 1 region defined by the separating hyperplane (w  xi þ b < 0), whereas patterns from the class 1 are situated in the þ1 region of the separating hyperplane (w  xi þ b > 0).

Nonlinear Support Vector Machines

323

The classification of new patterns uses the optimum values for w (Eq. [48]) and b (Eq. [49]): ! m X classðxk Þ ¼ sign li yi xi  xk þ b ½50 i¼1

Equation [50] depends only on the support vectors and their Lagrange multipliers, and the optimum value for b, showing that one does not need to compute w explicitly in order to predict the classification of new patterns.

NONLINEAR SUPPORT VECTOR MACHINES In previous sections, we introduced the linear SVM classification algorithm, which uses the training patterns to generate an optimum separation hyperplane. Such classifiers are not adequate for cases when complex relationships exist between input parameters and the class of a pattern. To discriminate linearly nonseparable classes of patterns, the SVM model can be fitted with nonlinear functions to provide efficient classifiers for hard-to-separate classes of patterns.

Mapping Patterns to a Feature Space The separation surface may be nonlinear in many classification problems, but support vector machines can be extended to handle nonlinear separation surfaces by using feature functions f(x). The SVM extension to nonlinear datasets is based on mapping the input variables into a feature space of a higher dimension (a Hilbert space of finite or infinite dimension) and then performing a linear classification in that higher dimensional space. For example, consider the set of nonlinearly separable patterns in Figure 28, left. It is

Input space +1

−1 −1

+1

−1 −1 −1

−1 −1

+1 +1

−1 +1 +1

−1 −1

+1

−1

−1

Feature space −1 +1

+1

+1

φ

−1

+1

+1 +1

−1

+1

−1 −1 +1 −1

−1

+1 +1

−1

+1

−1

−1

Figure 28 Linear separation of patterns in feature space.

324

Applications of Support Vector Machines in Chemistry

clear that a linear classifier, even with slack variables, is not appropriate for this type of separation surface, which is obviously nonlinear. The nonlinear feature functions f transform and combine the original coordinates of the patterns and perform their mapping into a high-dimensional space (Figure 28, right) where the two classes can be separated with a linear classifier. This property is of value because linear classifiers are easy to compute, and we can use the results obtained for linear SVM classification from the previous sections. The only difficulty is to identify, for a particular dataset, the correct set of nonlinear functions that can perform such mapping. Consider a training set T of m patterns together with their classes, T ¼ fðx1 ; y1 Þ; ðx2 ; y2 Þ; . . . ; ðxm ; ym Þg, where x is an n-dimensional pattern, x ¼ ðx1 ; x2 ; . . . ; xn ). Define the set of feature functions as f1 ; f2 ; . . . ; fh . Any pattern x is mapped to a real vector f(x): x ¼ ðx1 ; x2 ; . . . ; xn Þ ! fðxÞ ¼ ðf1 ðxÞ; f2 ðxÞ; . . . ; fh ðxÞÞ

½51

After mapping all patterns from the learning set into the feature set, we obtain a set of points in the feature space Rh : fðTÞ ¼ fðfðx1 Þ; y1 Þ; ðfðx2 Þ; y2 Þ; . . . ; ðfðxm Þ; ym Þg

½52

The important property of the feature space is that the learning set f(T) might be linearly separable in the feature space if the appropriate feature functions are used, even when the learning set is not linearly separable in the original space. We consider a soft margin SVM in which the variables x are substituted with the feature vector f(x), which represents an optimization problem similar with that from Eq. [39]. Using this nonlinear SVM, the class of a pattern xk is determined with Eq. [53]. classðxk Þ ¼ sign½w  fðxk Þ þ b ¼ sign

m X i¼1

!

li yi fðxi Þ  fðxk Þ þ b

½53

The nonlinear classifier defined by Eq. [53] shows that to predict a pattern xk , it is necessary to compute the dot product f(xi Þ  fðxk ) for all support vectors xi . This property of the nonlinear classifier is very important, because it shows that we do not need to know the actual expression of the feature function f. Moreover, a special class of functions, called kernels, allows the computation of the dot product fðxi Þ  fðxk ) in the original space defined by the training patterns. We present now a simple example of linearly nonseparable classes that can become linearly separable in feature space. Consider the dataset from Table 4 and Figure 29. This two-dimensional dataset, with dimensions

Nonlinear Support Vector Machines

325

Table 4 Linearly Nonseparable Patterns that Can be Separated in a Feature Space Pattern 1 2 3 4 5 6 7 8 9

x1

x2

x21

Class

1 1 1 0 0 0 þ1 þ1 þ1

1 0 þ1 1 0 þ1 1 0 þ1

þ1 þ1 þ1 0 0 0 þ1 þ1 þ1

1 1 1 þ1 þ1 þ1 1 1 1

x1 and x2 , consists of three patterns in class þ1 and six patterns in class 1. From Figure 29, it is easy to deduce that there is no straight line that can separate these two classes. On the other hand, one can imagine a higher dimensional feature space in which these classes become linearly separable. The features are combinations of the input data, and for this example, we add x21 as a new dimension (Table 4, column 4). After this transformation, the dataset is represented in a three-dimensional feature space. The surface f(x1 ; x2 ) ¼ x21 is represented in Figure 30. By adding this simple feature, we have mapped the patterns onto a nonlinear surface. This is easily seen when we plot (Figure 31) the feature space points (x1 , x2 , x21 ) that are located on the surface from Figure 30. The feature x21 has an interesting property, as one can see by inspecting Table 4: all patterns from class þ1 have x21 ¼ 0, whereas all patterns from class 1 have x21 ¼ þ1. By mapping the patterns in the feature space, we are now able to separate the two classes with a linear classifier, i.e., a plane (Figure 32). Of course, this plane is not unique, and in fact, there is an infinite number of planes that can now discriminate the two classes.

−1

+1

−1

−1

+1

−1

−1

+1

−1

Figure 29 Linearly nonseparable two-dimensional patterns.

326

Applications of Support Vector Machines in Chemistry

Figure 30 Surface f ðx; yÞ ¼ x2 .

The intersection between the feature space and the classifier defines the decision boundaries, which, when projected back onto the original space, look like Figure 33. Thus, transforming the input data into a nonlinear feature space makes the patterns linearly separable. Unfortunately, for a given dataset, one cannot predict which feature functions will make the patterns linearly separable; finding good feature functions is thus a trial-and-error process.

Feature Functions and Kernels The idea of transforming the input space into a feature space of a higher dimension by using feature functions f(x) and then performing a linear

−1

−1 +1

−1

−1 +1

−1

−1 +1

−1 −1 −1

−1

+1 +1 +1

−1 −1

Figure 31 Feature space points (x, y, x2 ).

Nonlinear Support Vector Machines

327

decision boundary

−1

−1 +1

−1

−1 +1

−1

−1 +1

−1

−1

+1

−1

−1

+1

−1

−1

+1

Figure 32 A separation plane for þ1 patterns (below the plane) and 1 patterns (above the plane).

classification in that higher dimensional space is central to support vector machines. However, the feature space may have a very high dimensionality, even infinite. An obvious consequence is that we want to avoid the inner product of feature functions f(x) that appears in Eq. [53]. Fortunately, a method was developed to generate a mapping into a high-dimensional feature space with kernels. The rationale that prompted the use of kernel functions is to

decision boundary

−1

+1

−1

−1

+1

−1

−1

+1

−1

Figure 33 Projection of the separation plane.

328

Applications of Support Vector Machines in Chemistry

enable computations to be performed in the original input space rather the high-dimensional (even infinite) feature space. Using this approach, the SVM algorithm avoids the evaluation of the inner product of the feature functions. Under certain conditions, an inner product in feature space has an equivalent kernel in input space: Kðxi ; xj Þ ¼ fðxi Þ  fðxj Þ

½54

If the kernel K is a symmetric positive definite function, which satisfies the Mercer’s conditions:4,42 Kðxi ; xj Þ ¼

1 X k

ak fk ðxi Þfk ðxj Þ; ak 0

½55

and ðð

Kðxi ; xj Þgðxi Þgðxj Þdxi dxj > 0

½56

then the kernel represents an inner product in feature space. Consider the two-dimensional pattern x ¼ ðx1 ; x2 ) and the feature function defined for a two-dimensional pattern x:

pffiffiffi pffiffiffi pffiffiffi ½57 fðxÞ ¼ 1; 2x1 ; 2x2 ; x21 ; x22 ; 2x1 x2

From the expression of this feature function, it is easy to obtain the corresponding kernel function Kðxi ; xj Þ ¼ fðxi Þ  fðxj Þ ¼ ð1 þ xi  xj Þ2

½58

This example can be easily extended to a three-dimensional pattern x ¼ ðx1 ; x2 ; x3 ), when the feature function has the expression

pffiffiffi pffiffiffi pffiffiffi pffiffiffi pffiffiffi pffiffiffi fðxÞ ¼ 1; 2x1 ; 2x2 ; 2x3 ; x21 ; x22 ; x23 ; 2x1 x2 ; 2x1 x3 ; 2x2 x3

½59

which corresponds to the polynomial of degree two kernel from Eq. [58]. In a similar way, a two-dimensional pattern x ¼ ðx1 ; x2 ) and a feature function

pffiffiffi pffiffiffi pffiffiffi pffiffiffi pffiffiffi pffiffiffi pffiffiffi fðxÞ ¼ 1; 3x1 ; 3x2 ; 3x21 ; 3x22 ; 6x1 x2 ; 3x21 x2 ; 3x1 x22 ; x31 ; x32 ½60

is equivalent with a polynomial of degree three kernel

Kðxi ; xj Þ ¼ fðxi Þ  fðxj Þ ¼ ð1 þ xi  xj Þ3

½61

Nonlinear Support Vector Machines

329

We will now present an example of infinite dimension feature function with the expression   1 1 1 1 ½62 /ðxÞ ¼ sinðxÞ; pffiffiffi sinð2xÞ; pffiffiffi sinð3xÞ; pffiffiffi sinð4xÞ;    ; pffiffiffi sinðnxÞ;    n 3 2 4 where x 2 ½1; p. The kernel corresponding to this infinite series has a very simple expression, which can be easily calculated as follows: Kðxi ; xj Þ ¼ fðxi Þ  fðxj Þ    

1 X  xi þ xj xi  xj  1 1 sin sinðnxi Þ sinðnxj Þ ¼ logsin ¼  2 2 n 2 n¼1

½63

Kernel Functions for SVM In this section, we present the most used SVM kernels. As these functions are usually computed in a high-dimensional space and have a nonlinear character, it is not easy to derive an impression on the shape of the classification hyperplane generated by these kernels. Therefore, we will present several plots for SVM models obtained for the dataset shown in Table 5. This dataset is not separable with a linear classifier, but the two clusters can be clearly distinguished. Linear (Dot) Kernel The inner product of x and y defines the linear (dot) kernel: Kðxi ; xj Þ ¼ xi  xj

½64

This is a linear classifier, and it should be used as a test of the nonlinearity in the training set, as well as a reference for the eventual classification improvement obtained with nonlinear kernels. Table 5 Linearly Nonseparable Patterns Used for the SVM Classification Models in Figures 34–38 Pattern 1 2 3 4 5 6 7 8

x1

x2

Class

2 2.5 3 3.5 4.5 5 3.25 4

4 2.75 5 2 4.75 3.75 4 3.25

1 1 1 1 1 1 1 1

Pattern

x1

x2

Class

9 10 11 12 13 14 15 16 17

0.6 1 1.5 2 3.5 4 5 5.3 5.75

4.5 3 1 5.7 5.5 0.6 1.5 5.4 3

1 1 1 1 1 1 1 1 1

330

Applications of Support Vector Machines in Chemistry

Figure 34 SVM classification models obtained with the polynomial kernel (Eq. [65]) for the dataset from Table 5: (a) polynomial of degree 2; (b) polynomial of degree 3.

Polynomial Kernel The polynomial kernel is a simple and efficient method for modeling nonlinear relationships: Kðxi ; xj Þ ¼ ð1 þ xi  xj Þd

½65

The dataset from Table 5 can be separated easily with a polynomial kernel (Figure 34a, polynomial of degree 2). The downside of using polynomial kernels is the overfitting that might appear when the degree increases (Figure 34b, degree 3; Figure 35a, degree 5; Figure 35b, degree 10). As the degree of the polynomial increases, the classification surface becomes more complex. For the degree 10 polynomial, one can see that the border hypersurface defines two regions for the cluster of þ1 patterns.

Figure 35 SVM classification models obtained with the polynomial kernel (Eq. [65]) for the dataset from Table 5: (a) polynomial of degree 5; (b) polynomial of degree 10.

Nonlinear Support Vector Machines

331

Figure 36 SVM classification models obtained with the Gaussian radial basis function kernel (Eq. [66]) for the dataset from Table 5: (a) s ¼ 1; (b) s ¼ 10.

Gaussian Radial Basis Function Kernel Radial basis functions (RBF) are widely used kernels, usually in the Gaussian form: jjx  yjj2 Kðxi ; xj Þ ¼ exp  2s2

!

½66

The parameter s controls the shape of the separating hyperplane, as one can see from the two SVM models in Figure 36, both obtained with a Gaussian RBF kernel (a, s ¼ 1; b, s ¼ 10). The number of support vectors increases from 6 to 17, showing that the second setting does not generalize well. In practical applications, the parameter s should be optimized with a suitable crossvalidation procedure. Exponential Radial Basis Function Kernel If discontinuities in the hyperplane are acceptable, an exponential RBF kernel is worth trying:   jjx  yjj Kðxi ; xj Þ ¼ exp  2s2

½67

The form of the OSH obtained for this kernel is apparent in Figure 37, where two values for the parameter s are exemplified (a, s ¼ 0:5; b, s ¼ 2). For the particular dataset used here, this kernel is not a good choice, because it requires too many support vectors.

332

Applications of Support Vector Machines in Chemistry

Figure 37 SVM classification models obtained with the exponential radial basis function kernel (Eq. [67]) for the dataset from Table 5: (a) s ¼ 0:5; (b) s ¼ 2.

Neural (Sigmoid, Tanh) Kernel The hyperbolic tangent (tanh) function, with a sigmoid shape, is the most used transfer function for artificial neural networks. The corresponding kernel has the formula: Kðxi ; xj Þ ¼ tanhðaxi  xj þ bÞ

½68

Anova Kernel A useful function is the anova kernel, whose shape is controlled by the parameters g and d:

Kðxi ; xj Þ ¼

X i

!d

expðgðxi  xj ÞÞ

½69

Fourier Series Kernel A Fourier series kernel, on the interval ½p=2; þp=2, is defined by Kðxi ; xj Þ ¼

sinðN þ ½Þðxi  xj Þ sinð½ðxi  xj ÞÞ

½70

Spline Kernel The spline kernel of order k having N knots located at ts is defined by Kðxi ; xj Þ ¼

k X r¼0

xri xrj þ

N X s¼1

ðxi  ts Þkþ ðxj  ts Þkþ

½71

Nonlinear Support Vector Machines

333

Figure 38 SVM classification models for the dataset from Table 5: (a) spline kernel, Eq. [71]; (b) B spline kernel, degree 1, Eq. [72].

B Spline Kernel The B spline kernel is defined on the interval [1, 1] by the formula: Kðxi ; xj Þ ¼ B2Nþ1 ðxi  xj Þ

½72

Both spline kernels have a remarkable flexibility in modeling difficult data. This characteristic is not always useful, especially when the classes can be separated with simple nonlinear functions. The SVM models from Figure 38 (a, spline; b, B spline, degree 1) show that the B spline kernel overfits the data and generates a border hyperplane that has three disjoint regions. Additive Kernel An interesting property of kernels is that one can combine several kernels by summing them. The result of this summation is a valid kernel function: Kðxi ; xj Þ ¼

X i

Ki ðxi ; xj Þ

½73

Tensor Product Kernel The tensor product of two or more kernels is also a kernel function: Kðxi ; xj Þ ¼

Y i

Ki ðxi ; xj Þ

½74

In many SVM packages, these properties presented in Eqs. [73] and [74] allow the user to combine different kernels in order to generate custom kernels more suitable for particular applications.

334

Applications of Support Vector Machines in Chemistry

Hard Margin Nonlinear SVM Classification In Figure 39, we present the network structure of a support vector machine classifier. The input layer is represented by the support vectors x1 , . . ., xn and the test (prediction) pattern xt , which are transformed by the feature function f and mapped into the feature space. The next layer performs the dot product between the test pattern f(xt ) and each support vector f(xi ). The dot product of feature functions is then multiplied with the Lagrangian multipliers, and the output is the nonlinear classifier from Eq. [53] in which the dot product of feature functions was substituted with a kernel function. The mathematical formulation of the hard margin nonlinear SVM classification is similar to that presented for the SVM classification for linearly separable datasets, only now input patterns x are replaced with feature functions, x ! fðxÞ, and the dot product for two feature functions fðxi Þ  fðxj Þ is replaced with a kernel function Kðxi ; xj Þ, Eq. [64]. Analogously with Eq. [28], the dual problem is maximize LD ðw; b; LÞ ¼

m X i¼1

m X m 1X li lj yi yj fðxi Þ  fðxj Þ 2 i¼1 j¼1

li 

subject to li 0; i ¼ 1; . . . ; m m X and li yi ¼ 0

½75

i¼1

The vector w that determines the optimum separation hyperplane is w¼

Mapped vectors Support vectors x1 φ(x1) x2 Test vector xt

φ(x2)

m X i¼1

li yi fðxi Þ

½76

Dot product (

)

(

)

λ1 λ2

φ(xt)

Output sign [ΣλiyiK(xi, xt)+b]

λn xn

φ(xn)

(

)

Figure 39 Structure of support vector machines. The test pattern xt and the support vectors x1 , . . . , xn are mapped into a feature space with the nonlinear function f, and the dot products are computed.

Nonlinear Support Vector Machines

335

As with the derivation of b in Eq. [30], we have m X i¼1

li yi Kðxi ; xj Þ þ b ¼ yj

½77

Therefore, the threshold b can be obtained by averaging the b values obtained for all support vector patterns, i.e., the patterns with lj > 0: b ¼ yj 

m X i¼1

li yi Kðxi ; xj Þ

½78

The SVM classifier obtained with a kernel K is defined by the support vectors from the training set (li > 0) and the corresponding values of the Lagrange multipliers li : classðxk Þ ¼ sign

m X i¼1

li yi Kðxi ; xk Þ þ b

!

½79

Soft Margin Nonlinear SVM Classification A soft margin nonlinear SVM classifier is obtained by introducing slack variables x and the capacity C. As with Eq. [47], the dual problem is maximize LD ðw; b; LÞ ¼

m X i¼1

li 

m m X 1X li lj yi yj fðxi Þ  fðxj Þ 2 i¼1 j¼1

subject to 0  li  C; i ¼ 1; . . . ; m m X and li yi ¼ 0

½80

i¼1

which defines a classifier identical with the one from Eq. [79]. The capacity parameter C is very important in balancing the penalty for classification errors. It is usually adjusted by the user, or it can be optimized automatically by some SVM packages. The penalty for classification errors increases when the capacity C increases, with the consequence that the number of erroneously classified patterns decreases when C increases. On the other hand, the margin decreases when C increases, making the classifier more sensitive to noise or errors in the training set. Between these divergent requirements (small C for a large margin classifier; large C for a small number of classification errors), an optimum value should be determined, usually by trying to maximize the cross-validation prediction.

336

Applications of Support Vector Machines in Chemistry

Figure 40 Influence of the C parameter on the class separation. SVM classification models obtained with the polynomial kernel of degree 2 for the dataset from Table 5: (a) C ¼ 100; (b) C ¼ 10.

To illustrate the influence of the capacity parameter C on the separation hyperplane with the dataset from Table 5 and a polynomial kernel of degree 2, consider Figures 40 (a, C ¼ 100; b, C ¼ 10) and 41 (a, C ¼ 1; b, C ¼ 0:1). This example shows that a bad choice for the capacity C can ruin the performance of an otherwise very good classifier. Empirical observations suggest that C ¼ 100 is a good value for a wide range of SVM classification problems, but the optimum value should be determined for each particular case. A similar trend is presented for the SVM models obtained with the spline kernel, presented in Figure 38a (C infinite) and Figure 42 (a, C ¼ 100; b, C ¼ 10). The classifier from Figure 38a does not allow classification errors, whereas by decreasing the capacity C to 100 (Figure 42a), one 1 pattern is misclassified (indicated with an arrow). A further decrease

Figure 41 Influence of the C parameter on the class separation. SVM classification models obtained with the polynomial kernel of degree 2 for the dataset from Table 5: (a) C ¼ 1; (b) C ¼ 0:1.

Nonlinear Support Vector Machines

337

Figure 42 Influence of the C parameter on the class separation. SVM classification models obtained with the spline kernel for the dataset from Table 5: (a) C ¼ 100; (b) C ¼ 10.

of C to 10 increases the number of classification errors: one for class þ1 and three for class 1.

n-SVM Classification Another formulation of support vector machines is the n-SVM in which the parameter C is replaced by a parameter n 2 [0, 1] that is the lower and upper bound on the number of training patterns that are support vectors and are situated on the wrong side of the hyperplane. n-SVM can be used for both classification and regression, as presented in detail in several reviews, by Scho¨lkopf et al.,43 Chang and Lin,44,45 Steinwart,46 and Chen, Lin, and Scho¨lkopf.47 The optimization problem for the n-SVM classification is minimize

m jjwjj2 1X  nr þ x 2 2 i¼1 i

yi ðw  xi þ bÞ r  xi ; i ¼ 1; . . . ; m with the constraints i ¼ 1; . . . ; m xi 0;

½81

With these notations, the primal Lagrangian function of this problem is LP ðw; b; L; x; b; r; dÞ ¼

m 1 1X jjwjj2  nr þ x 2 m i¼1 i m n o X  li ½yi ðw  xi þ bÞr þ xi  þ bi xi  dr i¼1

½82

with the Lagrange multipliers li , bi , d 0. This function must be minimized with respect to w, b, x, r, and maximized with respect to L, b, d. Following

338

Applications of Support Vector Machines in Chemistry

the same derivation as in the case of C-SVM, we compute the corresponding partial derivatives and set them equal to 0, which leads to the following conditions: m X

li yi xi

½83

li þ bi ¼ 1=m m X li y i ¼ 0

½84



i¼1 m X i¼1

i¼1

li  d ¼ n

½85 ½86

We substitute Eqs. [83] and [84] into Eq. [82], using li , bi , d 0, and then we substitute the dot products with kernels, to obtain the following quadratic optimization problem: maximize LD ðLÞ ¼ 

m m X 1X li lj yi yj Kðxi ; xj Þ 2 i¼1 j¼1

subject to : 0  li  1=m i ¼ 1; . . . ; m m X li y i ¼ 0

½87

i¼1

m X i¼1

li n

From these equations, it follows that the n-SVM classifier is ! m X classðxk Þ ¼ sign li yi Kðxi ; xk Þ þ b i¼1

½88

Scho¨lkopf et al. showed that if a n-SVM classifier leads to r > 0, then the C-SVM classifier with C ¼ 1=mr has the same decision function.43

Weighted SVM for Imbalanced Classification In many practical applications, the ratio between the number of þ1 and 1 patterns is very different from 1; i.e., one class is in excess and can dominate the SVM classifier. In other cases the classification error for one class may be more unfavorable or expensive than an error for the other class

Nonlinear Support Vector Machines

339

(e.g., a clinical diagnostic error). In both classes, it is advantageous to use a variant of the SVM classifier, the weighted SVM, that uses different penalties (Cþ and C ) for the two classes. The most unfavorable type of error has a higher penalty, which translates into an SVM classifier that minimizes that type of error. By analogy with Eq. [39], the primal problem is the Lagrangian function: minimize

m m X X jjwjj2 þ Cþ xi þ C  xi 2 i¼1 i¼1 yi ¼þ1

yi ¼1

yi ðw  xi þ bÞ þ1  xi ; i ¼ 1; . . . ; m with the constraints i ¼ 1; . . . ; m xi 0;

½89

which is equivalent with the dual problem maximize LD ðw; b; LÞ ¼ subject to :

m X i¼1

0  li  Cþ 0  li  C m X li yi ¼ 0

li 

m m X 1X li lj yi yj xi  xj 2 i¼1 j¼1

i ¼ 1; . . . ; m i ¼ 1; . . . ; m

i¼1

for yi ¼ þ1 for yi ¼ 1 ½90

The final solution is obtained by introducing the feature functions, x ! fðxÞ, and substituting the dot product fðxi Þ  fðxj Þ with a kernel function Kðxi ; xj Þ:

Multi-class SVM Classification Support vector machine classification was originally defined for twoclass problems. This is a limitation in some cases when three or more classes of patterns are present in the training set as, for example, classifying chemical compounds as inhibitors for several targets. Many multiclass SVM classification approaches decompose the training set into several two-class problems. The one-versus-one approach trains a twoclass SVM model for any two classes from the training set, which for a k-class problem results in kðk  1Þ=2 SVM models. In the prediction phase, a voting procedure assigns the class of the prediction pattern to be the class with the maximum number of votes. A variant of the one-versus-one approach is DAGSVM (directed acyclic graph SVM), which has an identical training procedure, but uses for prediction a rooted binary directed acyclic graph in which

340

Applications of Support Vector Machines in Chemistry

each vertex is a two-class SVM model. Debnath, Takahide and Takahashi proposed an optimized one-versus-one multiclass SVM in which only a minimum number of SVM classifiers are trained for each class.48 The one-versus-all procedure requires a much smaller number of models, namely for a k-class problem, only k SVM classifiers are needed. The ith SVM classifier is trained with all patterns from the ith class labeled þ1, and all other patterns labeled 1. Although it is easier to implement than the one-versusone approach, the training sets may be imbalanced due to the large number of 1 patterns. In a comparative evaluation of one-versus-one, one-versus-all, and DAGSVM methods for 10 classification problems, Hsu and Lin found that one-versus-all is less suitable than the other methods.49 However, not all literature reports agree with this finding. Based on a critical review of the existing literature on multiclass SVM and experiments with many datasets, Rifkin and Klautau concluded that the one-versus-all SVM classification is as accurate as any other multiclass approach.50 Angulo, Parra and Catala` proposed the K-SVCR (K-class support vector classification-regression) for k-class classification.51 This algorithm has ternary outputs, f1; 0; þ1g, and in the learning phase evaluates all patterns in a oneversus-one-versus-rest procedure by using a mixed classification and regression SVM. The prediction phase implements a voting scheme that makes the algorithm fault-tolerant. Guermeur applied a new multiclass SVM, called M-SVM, to the prediction of protein secondary structure.52,53 Multiclass SVM classification is particularly relevant for the classification of microarray gene expression data, with particular importance for disease recognition and classification.54–58

SVM REGRESSION Initially developed for pattern classification, the SVM algorithm was extended by Vapnik4 for regression by using an e-insensitive loss function (Figure 7). The goal of SVM regression (SVMR) is to identify a function f(x) that for all training patterns x has a maximum deviation e from the target (experimental) values y and has a maximum margin. Using the training patterns, SVMR generates a model representing a tube with radius e fitted to the data. For the hard margin SVMR, the error for patterns inside the tube is zero, whereas no patterns are allowed outside the tube. For real-case datasets, this condition cannot account for outliers, an incomplete set of input variables x or experimental errors in measuring y. Analogously with SVM classification, a soft margin SVMR was introduced by using slack variables. Several reviews on SVM regression should be consulted for more mathematical details, especially those by Mangasarian and Musicant,59,60 Gao, Gunn, and Harris,61,62 and Smola and Scho¨lkopf.30

SVM Regression

341

Consider a training set T of m patterns together with their target (experimental) values, T ¼ fðx1 ; y1 Þ; ðx2 ; y2 Þ; . . . ; ðxm ; ym Þg, with x 2 Rn and y 2 R. The linear regression case with a hard margin is represented by the function f ðxÞ ¼ w  x þ b, with w 2 Rn and b 2 R. For this simple case, the SVMR is represented by minimize

jjwjj2 2

with the constraints

w  xi þ b  yi  e;

yi  w  xi  b  e;

i ¼ 1; . . . ; m

½91

i ¼ 1; . . . ; m

The above conditions can be easily extended for the soft margin SVM regression: minimize

m  X  jjwjj2  þC xþ i þ xi 2 i¼1

with the constraints

w  xi þ b  yi  e þ xþ i ; i ¼ 1; . . . ; m  yi  w  xi  b  e þ xi ; i ¼ 1; . . . ; m xþ i 0; x i 0;

½92

i ¼ 1; . . . ; m i ¼ 1; . . . ; m

where xþ i is the slack variable associated with an overestimate of the calculated response for the input vector xi , x i is the slack variable associated with an underestimate of the calculated response for the input vector xi , e determines the limits of the approximation tube, and C > 0 controls the penalty associated with deviations larger than e. In the case of the e-insensitive loss function, the deviations are penalized with a linear function: jxje ¼



0 jxj  e

ifjxj  e otherwise

½93

The SVM regression is depicted in Figure 43. The regression tube is bordered by the hyperplanes y ¼ w x þ b þ e and y ¼ w x þ b  e. Patterns situated between these hyperplanes have the residual (absolute value for the difference between calculated and experimental y) less than e, and in SVM regression, the error of these patterns is considered zero; thus, they do not contribute to the penalty. Only patterns situated outside the regression tube have a residual larger than e and thus a nonzero penalty that, for the e-insensitive loss function, is proportional to their distance from the SVM regression border (Figure 43, right).

342

Applications of Support Vector Machines in Chemistry

loss ξ

+ε 0 −ε −ε



y − f(x) ξ+ε

Figure 43 Linear SVM regression case with soft margin and 2-insensitive loss function.

The primal objective function is represented by the Lagrange function LP ðw; b; L; MÞ ¼ 

m X i¼1

lþ i





xþ i

m  m  X   X jjwjj2 þ    þC mþ xþ þ x  i i i xi þ m i xi 2 i¼1 i¼1



þ yi  w  xi  b 

m X i¼1

l i





x i

 y i þ w  xi þ b



½94

 þ  where lþ i ; li ; mi , and mi are the Lagrange multipliers. The KKT conditions for the primal problem are as follows:

Gradient Conditions m   qLP ðw; b; L; MÞ X  ¼ ¼0 lþ i  li qb i¼1

m  X  qLP ðw; b; L; MÞ þ l ¼w i  li xi ¼ 0 qw i¼1

qLP ðw; b; L; MÞ þ ¼ C  mþ i  li ¼ 0 qxþ i qLP ðw; b; L; MÞ  ¼ C  m i  li ¼ 0 qx i

½95 ½96 ½97 ½98

Non-negativity Conditions  xþ i ; xi 0; i ¼ 1; . . . ; m

 lþ i ; li 0; i ¼ 1; . . . ; m þ  mi ; mi 0; i ¼ 1; . . . ; m

½99

SVM Regression

343

The dual optimization problem is obtained by substituting Eqs. [95]–[98] into Eq. [94]: m X m 1X  þ ðl  lþ LD ðw; b; L; MÞ ¼  i Þðlj  lj Þxi  xj 2 i¼1 j¼1 i maximize m m X X þ þ e yi ðl ðl þ l Þ þ i  li Þ i i i¼1

i¼1

m X

subject to

i¼1

½100

þ ðl i  li Þ ¼ 0

þ and l i ; li 2 ½0; C

The vector w is obtained from Eq. [96]: w¼

m X i¼1

þ ðl i  li Þxi

½101

which leads to the final expression for f ðxk ), the computed value for a pattern xk : f ðxk Þ ¼

m X i¼1

þ ðl i  li Þxi  xk þ b

½102

Nonlinear SVM regression is obtained by introducing feature functions f that map the input patterns into a higher dimensional space, x ! fðxÞ. By replacing the dot product fðxi Þ  fðxj ) with a kernel function Kðxi ; xj ), we obtain from Eq. [100] the following optimization problem: m X m 1X  þ ðl  lþ LD ðw; b; L; MÞ ¼  i Þðlj  lj ÞKðxi ; xj Þ 2 i¼1 j¼1 i maximize  e subject to and

i¼1

m X i¼1

þ l i ; li

m X

þ ðl i þ li Þ þ

m X i¼1

þ yi ðl i  li Þ

½103

þ ðl i  li Þ ¼ 0

2 ½0; C

Similarly with Eq. [101], the kernel SVM regression model has w given by m X þ ðl ½104 w¼ i  li Þfðxi Þ i¼1

The modeled property for a pattern xk is obtained with the formula: f ðxk Þ ¼

m X i¼1

þ ðl i  li ÞKðxi ;xk Þ þ b

½105

344

Applications of Support Vector Machines in Chemistry

(a)

(b)

(c)

(d)

Figure 44 Loss functions for support vector machines regression: (a) quadratic; (b) Laplace; (c) Huber; (d) e-insensitive.

The e-insensitive loss function used in the SVM regression adds a new parameter e that significantly influences the model and its prediction capacity. Besides the e-insensitive, other loss functions can be used with SVM regression, such as quadratic, Laplace, or Huber loss functions (Figure 44). We now present an illustrative example of a one-dimensional nonlinear SVM regression using the dataset in Table 6. This dataset has two spikes, which makes it difficult to model with the common kernels.

Table 6 Patterns Used for the SVM Regression Models in Figures 45–48 Pattern 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

x

y

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.85 0.9 0.95 1.0 1.05 1.1 1.15 1.2 1.3 1.4

1.2 1.22 1.23 1.24 1.25 1.28 1.38 1.6 1.92 2.1 2.3 2.2 1.85 1.6 1.4 1.19 1.18

Pattern 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33

x

y

1.5 1.6 1.7 1.8 1.85 1.9 1.95 2.0 2.05 2.1 2.15 2.2 2.3 2.4 2.5 2.6

1.18 1.17 1.16 1.12 0.85 0.65 0.32 0.4 0.5 0.6 0.8 0.95 1.18 1.2 1.21 1.22

SVM Regression

345

Figure 45 SVM regression models for the dataset from Table 6, with e ¼ 0:1: (a) degree 10 polynomial kernel; (b) spline kernel.

In Figure 45, we present two SVM regression models, the first one obtained with a degree 10 polynomial kernel and the second one computed with a spline kernel. The polynomial kernel has some oscillations on both ends of the curve, whereas the spline kernel is observed to be inadequate for modeling the two spikes. The RBF kernel was also unable to offer an acceptable solution for this regression dataset (data not shown). The degree 1 B spline kernel (Figure 46a) with the parameters C ¼ 100 and e ¼ 0:1 gives a surprisingly good SVM regression model, with a regression tube that closely follows the details of the input data. We will use this kernel to explore the influence of the e-insensitivity and capacity C on the regression tube. By maintaining C to 100 and increasing e to 0.3, we obtain a less sensitive solution (Figure 46b), that does not model well the three regions having almost constant y values. This is because the diameter of the tube is significantly larger and the patterns inside the tube do not influence the SVMR model (they have zero error). By further increasing e to 0.5 (Figure 47a), the shape of the SVM regression model becomes even less similar to the dataset. The regression tube is now

Figure 46 SVM regression models with a B spline kernel, degree 1, for the dataset from Table 6, with C ¼ 100: (a) e ¼ 0:1; (b) e ¼ 0:3.

346

Applications of Support Vector Machines in Chemistry

Figure 47 SVM regression models with a B spline kernel, degree 1, for the dataset from Table 6: (a) e ¼ 0:5, C ¼ 100; (b) e ¼ 0:1, C ¼ 10.

defined by a small number of support vectors, but they are not representative of the overall shape of the curve. It is now apparent that the e-insensitivity parameter should be tailored for each specific problem because small variations in that parameter have significant effects on the regression model. We now consider the influence of the capacity C when e is held constant to 0.1. The reference here is the SVMR model from Figure 46a obtained for C ¼ 100. When C decreases to 10 (Figure 47b), the penalty for errors decreases and the solution is incapable of modeling the points with extreme y values in the two spikes accurately. By further decreasing the capacity parameter C to 1 (Figure 48a) and then to 0.1 (Figure 48b), the SVMR model further loses the capacity to model the two spikes. The examples shown here for C are not representative for normal experimental values, and they are presented only to illustrate their influence on the shape of the regression hyperplane.

Figure 48 SVM regression models with a B spline kernel, degree 1, for the dataset from Table 6: (a) e ¼ 0:1, C ¼ 1; (b) e ¼ 0:1, C ¼ 0:1.

Optimizing the SVM Model

347

OPTIMIZING THE SVM MODEL Finding an SVM model with good prediction statistics is a trial-and-error task. The objective is to maximize the predictions statistics while keeping the model simple in terms of number of input descriptors, number of support vectors, patterns used for training, and kernel complexity. In this section, we present an overview of the techniques used in SVM model optimization.

Descriptor Selection Selecting relevant input parameters is both important and difficult for any machine learning method. For example, in QSAR, one can compute thousands of structural descriptors with software like CODESSA or Dragon, or with various molecular field methods. Many procedures have been developed in QSAR to identify a set of structural descriptors that retain the important characteristics of the chemical compounds.63,64 These methods can be extended to SVM models. Another source of inspiration is represented by the algorithms proposed in the machine learning literature, which can be readily applied to cheminformatics problems. We present here several literature pointers for algorithms on descriptor selection. A variable selection method via sparse SVM was proposed by Bi et al.65 In a first step, this method uses a linear SVM for descriptor selection, followed by a second step when nonlinear kernels are introduced. The recursive saliency analysis for descriptor selection was investigated by Cao et al.66 Fung and Mangasarian proposed a feature selection Newton method for SVM.67 Kumar et al. introduced a new method for descriptor selection, the locally linear embedding, which can be used for reducing the nonlinear dimensions in QSPR and QSAR.68 Xue et al. investigated the application of recursive feature elimination for three classification tests, namely P-glycoprotein substrates, human intestinal absorption, and compounds that cause torsade de pointes.69 Fro¨hlich, Wegner, and Zell introduced the incremental regularized risk minimization procedure from SVM classification and regression, and they compared it with recursive feature elimination and with the mutual information procedure.70 Five methods of feature selection (information gain, mutual information, w2-test, odds ratio, and GSS coefficient) were compared by Liu for their ability to discriminate between thrombin inhibitors and noninhibitors.71 Byvatov and Schneider compared the SVM-based and the Kolmogorov–Smirnov feature selection methods to characterize ligandreceptor interactions in focused compound libraries.72 A genetic algorithm for descriptor selection was combined with SVM regression by Nandi et al. to model and optimize the benzene isopropylation on Hbeta catalyst.73 Finally, gene selection from microarray data is a necessary step for disease classification55,74–81 with support vector machines.

348

Applications of Support Vector Machines in Chemistry

Support Vectors Selection The time needed to predict a pattern with an SVM model is proportional to the number of support vectors. This makes prediction slow when the SVM has a large number of support vectors. Downs, Gates, and Masters showed that the SMO algorithm,41 usually used for SVM training, can produce solutions with more support vectors than are needed for an optimum model.82 They found that some support vectors are linearly dependent on other support vectors, and that these linearly dependent support vectors can be identified and then removed from the SVM model with an efficient algorithm. Besides reducing of the number of support vectors, the new solution gives identical predictions with the full SVM model. Their model reduction algorithm was tested for several classification and regression problems, and in most cases lead to a reduction in the number of support vectors, which was as high as 90% in one example. In some cases, the SVM solution did not contain any linearly dependent support vectors so it was not possible to simplify the model. Zhan and Shen proposed a four-step algorithm to simplify the SVM solution by removing unnecessary support vectors.83 In the first step, the learning set is used to train the SVM and identify the support vectors. In the second step, the support vectors that make the surface convoluted (i.e., their projection of the hypersurface having the largest curvatures) are excluded from the learning set. In the third step, the SVM is retrained with the reduced learning set. In the fourth step, the complexity of the SVM model is further reduced by approximating the separation hypersurface with a subset of the support vectors. The algorithm was tested for tissue classification for 3-D prostate ultrasound images, demonstrating that the number of support vectors can be reduced without degrading the prediction of the SVM model.

Jury SVM Starting from current machine learning algorithms (e.g., PCA, PLS, ANN, SVM, and k-NN), one can derive new classification or regression systems by combining the predictions of two or more models. Such a prediction meta-algorithm (called jury, committee, or ensemble) can use a wide variety of mathematical procedures to combine the individual predictions into a final prediction. Empirical studies showed that jury methods can increase the prediction performances of the individual models that are aggregated in the ensemble. Their disadvantages include the increased complexity of the model and longer computing time. In practical applications, the use of jury methods is justified if a statistically significant increase in prediction power is obtained. Several examples of using jury SVM follow. Drug-like compound identification with a jury of k-NN, SVM, and ridge regression was investigated by Merkwirth et al.84 Jury predictions with several machine learning methods were compared by Briem and Gu¨nther for the

Optimizing the SVM Model

349

discrimination of kinase inhibitors from noninhibitors.85 Yap and Chen compared two jury SVM procedures for classifying inhibitors and substrates of cytochromes P450 3A4, 2D6 and 2C9.86 Jerebko et al. used jury SVM classifiers based on bagging (bootstrap aggregation) for polyp detection in CT colonography.78 Valentini, Muselli, and Ruffino used bagged jury SVM on DNA microarray gene expression data to classify normal and malignant tissues.87 As a final example, we point out the work of Guermeur et al. who used a multiclass SVM to aggregate the best protein secondary structure prediction methods, thus improving their performances.52,53

Kernels for Biosequences The kernel function measures the similarity between pairs of patterns, typically as a dot product between numerical vectors. The usual numerical encoding for protein sequences is based on a 20-digit vector that encodes (binary) the presence/absence of a certain amino acid in a position. To explore new ways of encoding the structural information from biosequences, various kernels have been proposed for the prediction of biochemical properties directly from a given sequence. Saigo et al. defined alignment kernels that compute the similarity between two sequences by summing up scores obtained from local alignments with gaps.88 The new kernels could recognize SCOP superfamilies and outperform standard methods for remote homology detection. The mismatch kernel introduced by Leslie et al. measures sequence similarity based on shared occurrences of fixed-length patterns in the data, thus allowing for mutations between patterns.89 This type of partial string matching kernel predicts successfully protein classification in families and superfamilies. Vert used a tree kernel to measure the similarity between phylogenetic profiles so as to predict the functional class of a gene from its phylogenetic profile.90 The tree kernel can predict functional characteristics from evolutionary information. Yang and Chou defined a class of kernels that compute protein sequence similarity based on amino acid similarity matrices such as the Dayhoff matrix.91 String kernels computed from subsite coupling models of protein sequences were used by Wang, Yang, and Chou to predict the signal peptide cleavage site.92 Teramoto et al. showed that the design of small interfering RNA (siRNA) is greatly improved by using string kernels.93 The siRNA sequence was decomposed into 1-, 2-, and 3-mer subsequences that were fed into the string kernel to compute the similarity between two sequences. Leslie and Kuang defined three new classes of k-mer string kernels, namely restricted gappy kernels, substitution kernels, and wildcard kernels, based on feature spaces defined by k-length subsequences from the protein sequence.94 The new kernels were used for homology detection and protein classification. Tsuda and Noble used the diffusion kernel to predict protein functional classification from metabolic and protein–protein interaction networks.95 The diffusion kernel is a

350

Applications of Support Vector Machines in Chemistry

method of computing pair-wise distances between all nodes in a graph based on the sum of weighted paths between each pair of vertices.

Kernels for Molecular Structures The common approach in SVM applications for property prediction based on molecular structure involves the computation of various classes of structural descriptors. These descriptors are used with various kernels to compute the structural similarity between two chemical compounds. Obviously, this approach reflects chemical bonding only in an indirect way, through descriptors. The molecular structure can be used directly in computing the pair-wise similarity of chemicals with tree and graph kernels as reviewed below. Micheli, Portera, and Sperduti used acyclic molecular subgraphs and tree kernels to predict the ligand affinity for the benzodiazepine receptor.96 Mahe´ et al. defined a series of graph kernels that can predict various properties from only the molecular graph and various atomic descriptors.97 Jain et al. defined a new graph kernel based on the Schur–Hadamard inner product for a pair of molecular graphs, and they tested it by predicting the mutagenicity of aromatic and hetero-aromatic nitro compounds.98 Finally, Lind and Maltseva used molecular fingerprints to compute the Tanimoto similarity kernel, which was incorporated into an SVM regression to predict the aqueous solubility of organic compounds.99

PRACTICAL ASPECTS OF SVM CLASSIFICATION Up to this point we have given mostly a theoretical presentation of SVM classification and regression; it is now appropriate to show some practical applications of support vector machines, together with practical guidelines for their application in cheminformatics and QSAR. In this section, we will present several case studies in SVM classification; the next section is dedicated to applications of SVM regression. Studies investigating the universal approximation capabilities of support vector machines have demonstrated that SVM with usual kernels (such as polynomial, Gaussian RBF, or dot product kernels) can approximate any measurable or continuous function up to any desired accuracy.100,101 Any set of patterns can therefore be modeled perfectly if the appropriate kernel and parameters are used. The ability to approximate any measurable function is indeed required for a good nonlinear multivariate pattern recognition algorithm (artificial neural networks are also universal approximators), but from a practical point of view, more is required. Indeed, good QSAR or cheminformatics models must have an optimum predictivity (limited by the number of data, data distribution, noise, errors, selection of structural descriptors, etc.), not only a good mapping capability. For SVM classification problems, highly nonlinear

Practical Aspects of SVM Classification

351

kernels can eventually separate perfectly the classes of patterns with intricate hyperplanes. This is what the universal approximation capabilities of SVM guarantees. However, these capabilities cannot promise that the resulting SVM will be optimally predictive. In fact, only empirical comparison with other classification algorithms (kNN, linear discriminant analysis, PLS, artificial neural networks, etc.) can demonstrate, for a particular problem, that SVM is better or worse than other classification methods. Indeed, the literature is replete with comparative studies showing that SVM can often, but not always, predict better than other methods. In many cases, the statistical difference between methods is not significant, and due to the limited number of samples used in those studies, one cannot prefer a method against others. An instructive example to consider is the HIV-1 protease cleavage site prediction. This problem was investigated with neural networks,102 self-organizing maps,103 and support vector machines.91,104 After an in-depth examination of this problem, Ro¨gnvaldsson and You concluded that linear classifiers are at least as good predictors as are nonlinear algorithms.105 The poor choice of complex, nonlinear classifiers could not deliver any new insight for the HIV-1 protease cleavage site prediction. The message of this story is simple and valuable: always compare nonlinear SVM models with linear models and, if possible, with other pattern recognition algorithms. A common belief is that because SVM is based on structural risk minimization, its predictions are better than those of other algorithms that are based on empirical risk minimization. Many published examples show, however, that for real applications, such beliefs do not carry much weight and that sometimes other multivariate algorithms can deliver better predictions. An important question to ask is as follows: Do SVMs overfit? Some reports claim that, due to their derivation from structural risk minimization, SVMs do not overfit. However, in this chapter, we have already presented numerous examples where the SVM solution is overfitted for simple datasets. More examples will follow. In real applications, one must carefully select the nonlinear kernel function needed to generate a classification hyperplane that is topologically appropriate and has optimum predictive power. It is sometimes claimed that SVMs are better than artificial neural networks. This assertion is because SVMs have a unique solution, whereas artificial neural networks can become stuck in local minima and because the optimum number of hidden neurons of ANN requires time-consuming calculations. Indeed, it is true that multilayer feed-forward neural networks can offer models that represent local minima, but they also give constantly good solutions (although suboptimal), which is not the case with SVM (see examples in this section). Undeniably, for a given kernel and set of parameters, the SVM solution is unique. But, an infinite combination of kernels and SVM parameters exist, resulting in an infinite set of unique SVM models. The unique SVM solution therefore brings little comfort to the researcher because the theory cannot foresee which kernel and set of parameters are optimal for a

352

Applications of Support Vector Machines in Chemistry

particular problem. And yes, artificial neural networks easily overfit the training data, but so do support vector machines. Frequently the exclusive use of the RBF kernel is rationalized by mentioning that it is the best possible kernel for SVM models. The simple tests presented in this chapter (datasets from Tables 1–6) suggest that other kernels might be more useful for particular problems. For a comparative evaluation, we review below several SVM classification models obtained with five important kernels (linear, polynomial, Gaussian radial basis function, neural, and anova) and show that the SVM prediction capability varies significantly with the kernel type and parameters values used and that, in many cases, a simple linear model is more predictive than nonlinear kernels. For all SVM classification models described later in this chapter, we have used the following kernels: dot (linear); polynomial (degree d ¼ 2, 3, 4, 5); radial basis function, Kðxi ; xj Þ ¼ expðgjjxi  xj jj2 Þ; ðg ¼ 0:5; 1:0; 2:0Þ; neural (tanh), Eq. [68], (a ¼ 0:5, 1.0, 2.0 and b ¼ 0, 1, 2); anova, Eq. [69], (g ¼ 0:5, 1.0, 2.0 and d ¼ 1, 2, 3). All SVM models were computed with mySVM, by Ru¨ping, (http://www–ai.cs.uni–dortmund.de/SOFTWARE/MYSVM/).

Predicting the Mechanism of Action for Polar and Nonpolar Narcotic Compounds Because numerous organic chemicals can be environmental pollutants, considerable efforts were directed toward the study of the relationships between the structure of a chemical compound and its toxicity. Significant progress has been made in classifying chemical compounds according to their mechanism of toxicity and to screen them for their environmental risk. Predicting the mechanism of action (MOA) using structural descriptors has major applications in the selection of an appropriate quantitative structure–activity relationships (QSAR) model, to identify chemicals with similar toxicity mechanism, and in extrapolating toxic effects between different species and exposure regimes.106–109 Organic compounds that act as narcotic pollutants are considered to disrupt the functioning of cell membranes. Narcotic pollutants are represented by two classes of compounds, namely nonpolar (MOA 1) and polar (MOA 2) compounds. The toxicity of both polar and nonpolar narcotic pollutants depends on the octanol–water partition coefficient, but the toxicity of polar compounds depends also on the propensity of forming hydrogen bonds. Ren used five structural descriptors to discriminate between 76 polar and 114 nonpolar pollutants.107 These were the octanol–water partition coefficient log Kow, the energy of the highest occupied molecular orbital EHOMO, the energy of the lowest unoccupied molecular orbital ELUMO, the most negative partial charge on any non-hydrogen atom in the molecule Q, and the most positive partial charge on a hydrogen atom Qþ. All quantum descriptors were computed with the AM1 method.

Practical Aspects of SVM Classification

353

Table 7 Chemical Compounds, Theoretical Descriptors (EHOMO, ELUMO and Q), and Mechanism of Toxic Action (nonpolar, class þ1; polar, class 1)

No

Compound

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

tetrachloroethene 1,2-dichloroethane 1,3-dichloropropane dichloromethane 1,2,4-trimethylbenzene 1,1,2,2-tetrachloroethane 2,4-dichloroacetophenone 4-methyl-2-pentanone ethyl acetate cyclohexanone 2,4,6-trimethylphenol 3-chloronitrobenzene 4-ethylphenol 2,4-dimethylphenol 4-nitrotoluene 2-chloro-4-nitroaniline 2-chloroaniline pentafluoroaniline 4-methylaniline 4-ethylaniline

EHOMO

ELUMO

9.902 11.417 11.372 11.390 8.972 11.655 9.890 10.493 11.006 10.616 8.691 10.367 8.912 8.784 10.305 9.256 8.376 9.272 8.356 8.379

0.4367 0.6838 1.0193 0.5946 0.5030 0.0738 0.5146 0.8962 1.1370 3.3960 0.4322 1.2855 0.4334 0.3979 1.0449 0.9066 0.3928 1.0127 0.6156 0.6219

Q

0.0372 0.1151 0.1625 0.1854 0.2105 0.2785 0.4423 0.4713 0.5045 0.5584 0.4750 0.4842 0.4931 0.4980 0.5017 0.6434 0.6743 0.8360 0.9429 0.9589

MOA Class 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2

þ1 þ1 þ1 þ1 þ1 þ1 þ1 þ1 þ1 þ1 1 1 1 1 1 1 1 1 1 1

Using a descriptor selection procedure, we found that only three descriptors (EHOMO, ELUMO, and Q) are essential for the SVM model. To exemplify the shape of the classification hyperplane for polar and nonpolar narcotic pollutants, we selected 20 compounds (Table 7) as a test set (nonpolar compounds, class þ1; polar compounds, class 1). The first two experiments were performed with a linear kernel for C ¼ 100 (Figure 49a) and C ¼ 1 (Figure 49b). The first plot shows that this

Figure 49 SVM classification models with a dot (linear) kernel for the dataset from Table 7: (a) C ¼ 100; (b) C ¼ 1.

354

Applications of Support Vector Machines in Chemistry

Figure 50 SVM classification models with a degree 2 polynomial kernel for the dataset from Table 7: (a) C ¼ 100; (b) C ¼ 1.

dataset can be separated with a linear classifier if some errors are accepted. Note that several þ1 compounds cannot be classified correctly. A decrease of the capacity C shows a larger margin, with a border close to the bulk of class þ1 compounds. A similar analysis was performed for the degree 2 polynomial kernel with C ¼ 100 (Figure 50a) and C ¼ 1 (Figure 50b). The classification hyperplane is significantly different from that of the linear classifier, but with little success because three þ1 compounds cannot be classified correctly. By decreasing the penalty for classification errors (Figure 50b), the margin increases and major changes appear in the shape of the classification hyperplane. We will now show two SVMC models that are clearly overfitted. The first one is obtained with a degree 10 polynomial kernel (Figure 51a), whereas for the second, we used a B spline kernel (Figure 51b). The two classification

Figure 51 SVM classification models for the dataset from Table 7, with C ¼ 100: (a) polynomial kernel, degree 10; (b) B spline kernel, degree 1.

Practical Aspects of SVM Classification

355

hyperplanes are very complex, with a topology that clearly does not resemble that of the real data. The statistics for all SVM models that were considered for this example are presented in Table 8. The calibration of the SVM models was performed with the whole set of 190 compounds, whereas the prediction was tested with a leave–20%–out cross-validation method. All notations are explained in the footnote of Table 8. Table 8 shows that the SVMs with a linear kernel give very good results. The prediction accuracy from experiment 3 (ACp ¼ 0:97) is used to compare the performances of other kernels. The polynomial kernel (experiments 4–15) has ACp between 0.93 and 0.96, which are results that do not equal those of the linear kernel. The overfitting of SVM models is clearly detected in several cases. For example, as the degree of the polynomial kernel increases from 2 to 5, ACc increases from 0.97 to 1, whereas ACp decreases from 0.96 to 0.93. The SVM models with perfect classification in training have the lowest prediction statistics. The RBF kernel (experiments 16–24), with ACp between 0.96 and 0.97, has better calibration statistics than the linear kernel, but its performance in prediction only equals that of the linear SVM. Although many tests were performed for the neural kernel (experiments 25–51), the prediction statistics are low, with ACp between 0.64 and 0.88. This result is surprising, because the tanh function gives very good results in neural networks. Even the training statistics are low for the neural kernel, with ACc between 0.68 and 0.89 The last set of SVM models were obtained with the anova kernel (experiments 52–78), with ACp between 0.94 and 0.98. In fact, only experiment 58 has a better prediction accuracy (ACp ¼ 0.98) than the liner SVM model from experiment 3. The linear SVM has six errors in prediction (all nonpolar compounds predicted to be polar), whereas the anova SVM has four prediction errors, also for nonpolar compounds. Our experiments with various kernels show that the performance of the SVM classifier is strongly dependent on the kernel shape. Considering the results of the linear SVM as a reference, many nonlinear SVM models have lower prediction statistics. It is also true that the linear classifier does a good job and there is not much room for improvement. Out of the 75 nonlinear SVM models, only one, with the anova kernel, has slightly higher prediction statistics than the linear SVM.

Predicting the Mechanism of Action for Narcotic and Reactive Compounds The second experiment we present in this tutorial for classifying compounds according to their mechanism of action involves the classifications of 88 chemicals. The chemicals are either narcotics (nonpolar and polar narcotics) and reactive compounds (respiratory uncouplers, soft electrophiles, and proelectrophiles).110 The dataset, consisting of 48 narcotic compounds

356

K

L

P

R

C

10 100 1000

10 100 1000 10 100 1000 10 100 1000 10 100 1000

10 100 1000 10 100 1000 10 100 1000

1 2 3

4 5 6 7 8 9 10 11 12 13 14 15

16 17 18 19 20 21 22 23 24

Exp

0.5 0.5 0.5 1.0 1.0 1.0 2.0 2.0 2.0

g

2 2 2 3 3 3 4 4 4 5 5 5

d

109 112 113 112 113 114 113 114 114

109 109 109 112 113 114 112 114 114 114 114 114

105 106 106

TPc

5 2 1 2 1 0 1 0 0

5 5 5 2 1 0 2 0 0 0 0 0

9 8 8

FNc

76 76 76 76 76 76 76 76 76

75 76 76 76 76 76 76 76 76 76 76 76

76 76 76

TNc

0 0 0 0 0 0 0 0 0

1 0 0 0 0 0 0 0 0 0 0 0

0 0 0

FPc

26 20 19 35 28 21 45 43 43

21 20 19 21 19 18 22 20 20 19 20 20

27 25 25

SVc

0.97 0.99 0.99 0.99 0.99 1.00 0.99 1.00 1.00

0.97 0.97 0.97 0.99 0.99 1.00 0.99 1.00 1.00 1.00 1.00 1.00

0.95 0.96 0.96

ACc

107 108 108 109 109 109 109 109 109

108 108 108 108 107 106 106 106 106 107 107 107

104 104 108

TPp

7 6 6 5 5 5 5 5 5

6 6 6 6 7 8 8 8 8 7 7 7

10 10 6

FNp

75 74 74 75 75 75 74 75 75

75 74 72 73 73 73 73 72 72 70 70 70

76 76 76

TNp

Table 8 Results for SVM Classification of Polar and Nonpolar Pollutants Using EHOMO , ELUMO , and Q a

1 2 2 1 1 1 2 1 1

1 2 4 3 3 3 3 4 4 6 6 6

0 0 0

FPp

23.6 17.0 15.8 34.0 26.4 21.8 44.8 40.8 40.8

18.0 15.2 14.8 15.2 15.2 14.4 17.0 15.8 15.8 15.0 15.0 15.0

22.2 20.2 19.6

SVp

0.96 0.96 0.96 0.97 0.97 0.97 0.96 0.97 0.97

0.96 0.96 0.95 0.95 0.95 0.94 0.94 0.94 0.94 0.93 0.93 0.93

0.95 0.95 0.97

ACp

357

N

A

10 100 1000 10 100 1000 10 100 1000 10 100 1000 10 100 1000 10 100 1000 10 100 1000 10 100 1000 10 100 1000

10 100

25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51

52 53

1 1

d

g

0.5 0.5

0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0

b

0.5 0.5 0.5 1.0 1.0 1.0 2.0 2.0 2.0 0.5 0.5 0.5 1.0 1.0 1.0 2.0 2.0 2.0 0.5 0.5 0.5 1.0 1.0 1.0 2.0 2.0 2.0

a

110 111

102 102 102 98 98 98 85 87 85 95 92 92 85 98 98 86 86 86 87 84 84 83 83 83 85 97 97

4 3

12 12 12 16 16 16 29 27 29 19 22 22 29 16 16 28 28 28 27 30 30 31 31 31 29 17 17

76 76

68 64 64 60 60 60 48 48 47 53 53 53 47 59 59 43 43 43 46 46 46 45 45 45 46 58 58

0 0

8 12 12 16 16 16 28 28 29 23 23 23 29 17 17 33 33 33 30 30 30 31 31 31 30 18 18

26 17

26 28 28 34 34 34 60 58 60 53 49 49 61 35 35 64 64 64 67 63 62 64 64 64 63 37 37

0.98 0.98

0.89 0.87 0.87 0.83 0.83 0.83 0.70 0.71 0.69 0.78 0.76 0.76 0.69 0.83 0.83 0.68 0.68 0.68 0.70 0.68 0.68 0.67 0.67 0.67 0.69 0.82 0.82

106 108

102 104 103 95 100 95 80 80 86 92 89 89 87 83 84 86 94 97 90 85 84 71 82 82 75 79 82

8 6

12 10 11 19 14 19 34 34 28 22 25 25 27 31 30 28 20 17 24 29 30 43 32 32 39 35 32

75 74

66 63 62 61 56 60 55 55 48 52 51 50 50 52 46 50 55 46 44 44 44 50 45 45 65 68 65

1 2

10 13 14 15 20 16 21 21 28 24 25 26 26 24 30 26 21 30 32 32 32 26 31 31 11 8 11

0.95 0.96

0.88 0.88 0.87 0.82 0.82 0.82 0.71 0.71 0.71 0.76 0.74 0.73 0.72 0.71 0.68 0.72 0.78 0.75 0.71 0.68 0.67 0.64 0.67 0.67 0.74 0.77 0.77

(continued)

22.0 15.4

24.2 23.4 22.0 30.6 31.4 29.6 45.2 45.2 47.6 41.4 39.4 39.2 44.6 43.8 48.0 35.6 26.6 34.0 54.2 51.0 50.2 52.0 51.6 51.6 46.0 42.0 38.2

358

1000 10 100 1000 10 100 1000 10 100 1000 10 100 1000 10 100 1000 10 100 1000 10 100 1000 10 100 1000

54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78

K

0.5 1.0 1.0 1.0 2.0 2.0 2.0 0.5 0.5 0.5 1.0 1.0 1.0 2.0 2.0 2.0 0.5 0.5 0.5 1.0 1.0 1.0 2.0 2.0 2.0

1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3

112 111 111 113 111 113 114 112 112 114 112 114 114 114 114 114 112 114 114 114 114 114 114 114 114

TPc 2 3 3 1 3 1 0 2 2 0 2 0 0 0 0 0 2 0 0 0 0 0 0 0 0

FNc 76 76 76 76 76 76 76 76 76 76 76 76 76 76 76 76 76 76 76 76 76 76 76 76 76

TNc 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

FPc 14 26 18 17 24 18 14 24 20 15 21 20 20 24 22 22 21 17 17 20 20 20 38 38 38

SVc 0.99 0.98 0.98 0.99 0.98 0.99 1.00 0.99 0.99 1.00 0.99 1.00 1.00 1.00 1.00 1.00 0.99 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

ACc 109 109 110 110 110 109 109 107 108 107 108 107 107 108 108 108 108 107 107 107 107 107 108 108 108

TPp 5 5 4 4 4 5 5 7 6 7 6 7 7 6 6 6 6 7 7 7 7 7 6 6 6

FNp 73 75 74 72 76 73 70 75 74 74 75 73 73 73 73 73 74 73 73 74 74 74 74 74 74

TNp 3 1 2 4 0 3 6 1 2 2 1 3 3 3 3 3 2 3 3 2 2 2 2 2 2

FPp 13.2 20.4 16.0 14.6 20.6 17.8 15.2 18.4 16.8 14.2 18.8 16.6 16.6 24.6 23.0 23.0 17.0 15.4 15.4 20.4 20.4 20.4 37.2 37.2 37.2

SVp 0.96 0.97 0.97 0.96 0.98 0.96 0.94 0.96 0.96 0.95 0.96 0.95 0.95 0.95 0.95 0.95 0.96 0.95 0.95 0.95 0.95 0.95 0.96 0.96 0.96

ACp

a The table reports the experiment number Exp, capacity parameter C, kernel type K (linear L; polynomial P; radial basis function R; neural N; anova A), and corresponding parameters, calibration results (TPc, true positive in calibration; FNc, false negative in calibration; TNc, true negative in calibration; FPc, false positive in calibration; SVc, number of support vectors in calibration; ACc, calibration accuracy), and L20%O prediction results (TPp, true positive in prediction; FNp, false negative in prediction; TNp, true negative in prediction; FPp, false positive in prediction; SVp, average number of support vectors in prediction; ACp, prediction accuracy).

C

Exp

Table 8 (Continued)

Practical Aspects of SVM Classification

359

(class þ1) and 40 reactive compounds (class 1), was taken from two recent studies.108,111 Four theoretical descriptors are used to discriminate between their mechanism of action, namely the octanol–water partition coefficient log Kow , the energy of the highest occupied molecular orbital EHOMO , the energy of the lowest unoccupied molecular orbital ELUMO , and the average acceptor superdelocalizability SN av . The prediction power of each SVM model was evaluated with a leave–10%–out cross-validation procedure. The best prediction statistics for each kernel type are presented here: linear, C ¼ 1000, ACp ¼ 0.86; polynomial, degree 2, C ¼ 10, ACp ¼ 0.92; RBF, C ¼ 100, g ¼ 0.5, ACp ¼ 0.83; neural, C ¼ 10, a ¼ 0:5, b ¼ 0, ACp ¼ 0.78; and anova, C ¼ 10, g ¼ 0.5, d ¼ 1, ACp ¼ 0.87. These results indicate that a degree 2 polynomial is a good separation hyperplane between narcotic and reactive compounds. The neural and RBF kernels have worse predictions than does the linear SVM model, whereas the anova kernel has similar ACp with the linear model.

Predicting the Mechanism of Action from Hydrophobicity and Experimental Toxicity This exercise for classifying compounds according to their mechanism of action uses as input data the molecule’s hydrophobicity and experimental toxicity against Pimephales promelas and Tetrahymena pyriformis.112 SVM classification was applied for a set of 337 organic compounds from eight MOA classes (126 nonpolar narcotics, 79 polar narcotics, 23 ester narcotics, 13 amine narcotics, 13 weak acid respiratory uncouplers, 69 electrophiles, 8 proelectrophiles, and 6 nucleophiles).113 The MOA classification was based on three indices taken from a QSAR study by Ren, Frymier, and Schultz113 namely: log Kow , the octanol–water partition coefficient; log 1/IGC50, the 50% inhibitory growth concentration against Tetrahymena pyriformis; and log 1/LC50, the 50% lethal concentration against Pimephales promelas. The prediction power of each SVM model was evaluated with a leave–5%–out (L5%O) cross-validation procedure. In the first test we used SVM models to discriminate between nonpolar narcotic compounds (chemicals that have baseline toxicity) and other compounds having excess toxicity (representing the following MOAs: polar narcotics, ester narcotics, amine narcotics, weak acid respiratory uncouplers, electrophiles, proelectrophiles, and nucleophiles). From the total set of 337 compounds, 126 represent the SVM class þ1 (nonpolar narcotic) and 211 represent the SVM class 1 (all other MOA classes). The best cross-validation results for each kernel type are presented in Table 9. The linear, polynomial, RBF, and anova kernels have similar results that are of reasonably quality, whereas the neural kernel has very bad statistics; the slight classification improvement obtained for the RBF and anova kernels is not statistically significant.

360

Applications of Support Vector Machines in Chemistry

Table 9 SVM Classification of Nonpolar Narcotic Compounds (SVM class þ1) From Other Compounds (SVM class 1) Using as Descriptors log Kow , log 1/IGC50 and log 1/LC50 Kernel

TPc FNc TNc

L P, d ¼ 2 R, g ¼ 1:0 N, a ¼ 0:5, b ¼ 0 A, g ¼ 0:5, d ¼ 2

78 81 97 75 95

48 45 29 51 31

FPc SVc

186 25 195 185 26 176 190 21 172 98 113 158 190 21 169

ACc TPp FNp TNp FPp 0.78 0.79 0.85 0.51 0.85

79 82 89 49 87

47 44 37 77 39

186 184 180 130 182

25 27 31 81 29

SVp

ACp

185.8 165.8 165.1 152.1 119.2

0.79 0.79 0.80 0.53 0.80

The chemicals exhibiting excess toxicity belong to seven MOA classes, and their toxicity has a wide range of variation. For these molecules, it is useful to further separate them as being less-reactive and more-reactive compounds. In the second test, we have developed SVM models that discriminate between lessreactive compounds (SVM class þ1, formed by polar narcotics, ester narcotics, amine narcotics) and more-reactive compounds (SVM class 1, formed by weak acid respiratory uncouplers, electrophiles, proelectrophiles, and nucleophiles). From the total of 211 compounds with excess toxicity, 115 are lessreactive and 96 are more-reactive compounds. In Table 10, we show the best cross-validation results for each kernel type. The radial kernel has the best predictions, followed by the linear SVM model. The remaining kernels have worse predictions than does the linear model.

Classifying the Carcinogenic Activity of Polycyclic Aromatic Hydrocarbons Structure-activity relationships are valuable statistical models that can be used for predicting the carcinogenic potential of new chemicals and for the interpretation of the short-term tests of genotoxicity, long-term tests of carcinogenicity in rodents, and epidemiological data. We show here an SVM application for identifying the carcinogenic activity of a group of methylated and nonmethylated polycyclic aromatic hydrocarbons (PAHs).114 The dataset Table 10 SVM Classification of Less-Reactive Compounds and More-Reactive Compounds Kernel

TPc FNc TNc FPc SVc

ACc TPp FNp TNp FPp

L P, d ¼ 2 R, g ¼ 2:0 N, a ¼ 2, b ¼ 1 A, g ¼ 0:5, d ¼ 2

97 18 101 14 107 8 71 44 109 6

0.70 0.66 0.87 0.58 0.88

50 38 77 51 77

46 58 19 45 19

151 154 141 91 112

97 97 94 64 85

18 18 21 51 30

46 38 55 59 57

50 58 41 37 39

SVp ACp 144.2 144.0 133.2 90.5 105.8

0.68 0.64 0.71 0.58 0.67

Practical Aspects of SVM Classification

361

consists of 32 PAHs and 46 methylated PAHs taken from literature.115–118 From this set of 78 aromatic hydrocarbons, 34 are carcinogenic and 44 are noncarcinogenic. The carcinogenic activity was predicted by using the following four theoretical descriptors computed with the PM3 semiempirical method: energy of the highest occupied molecular orbital EHOMO ; energy of the lowest unoccupied molecular orbital ELUMO ; hardness HD, where HD ¼ ðELUMO  EHOMO Þ=2; and difference between EHOMO and EHOMO1 denoted H.117 The prediction power of each SVM model was evaluated with a leave– 10%–out cross-validation procedure. The best prediction statistics for each kernel type are presented here: linear, ACp ¼ 0:76; polynomial, degree 2, ACp ¼ 0.82; RBF, g ¼ 0.5, ACp ¼ 0.86; neural, a ¼ 2, b ¼ 0, ACp ¼ 0.66; and anova, g ¼ 0:5, d ¼ 1, ACp ¼ 0.84 (C ¼ 10 for these SVM models). The relationship between the quantum indices and PAH carcinogenicity is nonlinear, as evidenced by the increase in prediction power when going from a linear to an RBF kernel.

Structure-Odor Relationships for Pyrazines Various techniques of molecular design can significantly help fragrance researchers to find relationships between the chemical structure and the odor of organic compounds.119–124 A wide variety of structural descriptors (molecular fragments, topological indices, geometric descriptors, or quantum indices) and a broad selection of qualitative or quantitative statistical equations have been used to model and predict the aroma (and its intensity) for various classes of organic compounds. Besides providing an important guide for the synthesis of new fragrances, structure-odor relationships (SOR) offer a better understanding of the mechanism of odor perception. We illustrate the application of support vector machines for aroma classification using as our example 98 tetra–substituted pyrazines (Figure 52) representing three odor classes, namely 32 green, 23 nutty, and 43 bellpepper.125 The prediction power of each SVM model was evaluated with a leave–10%–out cross-validation procedure.126 This multiclass dataset was modeled with an one-versus-all approach. In the first classification test, class þ1 contained green aroma compounds and class 1 contained compounds with nutty or bell-pepper aroma. The best prediction statistics for each kernel type are linear, C ¼ 10, ACp ¼ 0.80; polynomial, degree 2, C ¼ 1000, ACp ¼ 0.86; RBF, C ¼ 10, g ¼ 0:5, ACp ¼ 0.79; neural, C ¼ 10, a ¼ 0:5, b ¼ 0, ACp ¼ 0.73; and anova, C ¼ 100, g ¼ 0:5, R4

N

R1

R3

N

R2

Figure 52 General structure of pyrazines.

362

Applications of Support Vector Machines in Chemistry

d ¼ 1, ACp ¼ 0.84. A degree 2 polynomial kernel has the best prediction, followed by the anova kernel and the linear model. In the second test, class þ1 contained compounds with nutty aroma, whereas the remaining pyrazines formed the class 1. The prediction statistics show a slight advantage for the anova kernel, whereas the linear, polynomial, and RBF kernels have identical results: linear, C ¼ 10, ACp ¼ 0.89; polynomial, degree 2, C ¼ 10, ACp ¼ 0.89; RBF, C ¼ 10, g ¼ 0:5, ACp ¼ 0.89; neural, C ¼ 100, a ¼ 0:5, b ¼ 0, ACp ¼ 0.79; and anova, C ¼ 100, g ¼ 0:5, d ¼ 1, ACp ¼ 0.92. Finally, compounds with bell-pepper aroma were considered to be in class þ1, whereas green and nutty pyrazines formed the class 1. Three kernels (RBF, polynomial, and anova) give much better predictions than does the linear SVM classifier: linear, C ¼ 10, ACp ¼ 0.74; polynomial, degree 2, C ¼ 10, ACp ¼ 0.88; RBF, C ¼ 10, g ¼ 0:5, ACp ¼ 0.89; neural, C ¼ 100, a ¼ 2, b ¼ 1, ACp ¼ 0.68; and anova, C ¼ 10, g ¼ 0:5, d ¼ 1, ACp ¼ 0.87. We have to notice that the number of support vectors depends on the kernel type (linear, SV ¼ 27; RBF, SV ¼ 43; anova, SV ¼ 31; all for training with all compounds), so for this structure-odor model, one might prefer the SVM model with a polynomial kernel that is more compact, i.e., contains a lower number of support vectors. In this section, we compared the prediction capabilities of five kernels, namely linear, polynomial, Gaussian radial basis function, neural, and anova. Several guidelines that might help the modeler obtain a predictive SVM model can be extracted from these results: (1) It is important to compare the predictions of a large number of kernels and combinations of parameters; (2) the linear kernel should be used as a reference to compare the results from nonlinear kernels; (3) some datasets can be separated with a linear hyperplane; in such instances, the use of a nonlinear kernel should be avoided; and (4) when the relationships between input data and class attribution are nonlinear, RBF kernels do not necessarily give the optimum SVM classifier.

PRACTICAL ASPECTS OF SVM REGRESSION Support vector machines were initially developed for class discrimination, and most of their applications have been for pattern classification. SVM classification is especially relevant for important cheminformatics problems, such as recognizing drug-like compounds, or discriminating between toxic and nontoxic compounds, and many such applications have been published. The QSAR applications of SVM regression, however, are rare, and this is unfortunate because it represents a viable alternative to multiple linear regression, PLS, or neural networks. In this section, we present several SVMR applications to QSAR datasets, and we compare the performance of several kernels. The SVM regression models we present below implement the following kernels: linear; polynomial (degree d ¼ 2, 3, 4, 5); radial basis function, Kðxi ; xj Þ ¼ expðgjjxi  xj jj2 Þ, (g ¼ 0:25, 0.5, 1.0, 1.5, 2.0); neural (tanh), Eq. [68], (a ¼ 0:5, 1.0, 2.0 and b ¼ 0, 1, 2); and anova, Eq. [69], (g ¼ 0:25,

Practical Aspects of SVM Regression

363

0.5, 1.0, 1.5, 2.0 and d ¼ 1, 2, 3). All SVM models were computed with mySVM, by Ru¨ping, (http://www–ai.cs.uni–dortmund.de/SOFTWARE/MYSVM/).

SVM Regression QSAR for the Phenol Toxicity to Tetrahymena pyriformis Aptula et al. used multiple linear regression to investigate the toxicity of 200 phenols to the ciliated protozoan Tetrahymena pyriformis.127 Using their MLR model, they then predicted the toxicity of another 50 phenols. Here we present a comparative study for the entire set of 250 phenols, using multiple linear regression, artificial neural networks, and SVM regression methods.128 Before computing the SVM model, the input vectors were scaled to zero mean and unit variance. The prediction power of the QSAR models was tested with complete cross-validation: leave–5%–out (L5%O), leave–10%–out (L10%O), leave–20%–out (L20%O), and leave–25%–out (L25%O). The capacity parameter C was optimized for each SVM model. Seven structural descriptors were used to model the 50% growth inhibition concentration, IGC50. These descriptors are log D, where D is the dissociation constant (i.e., the octanol–water partition coefficient corrected for ionization); ELUMO , the energy of the lowest unoccupied molecular orbital; MW, the molecular weight; PNEG, the negatively charged molecular surface area; ABSQon, the sum of absolute charges on nitrogen and oxygen atoms; MaxHp, the largest positive charge on a hydrogen atom; and SsOH, the electrotopological state index for the hydroxy group. The MLR model has a calibration correlation coefficient of 0.806 and is stable to cross-validation experiments: pIGC50 ¼ 0:154ð 0:080Þþ0:296ð 0:154ÞlogD0:352ð 0:183ÞELUMO þ0:00361ð 0:00188ÞMW0:0218ð 0:0113ÞPNEG 0:446ð 0:232Þ

ABSQonþ1:993ð 1:037ÞMaxHpþ0:0265ð 0:0138ÞSsOH n ¼ 250

rcal ¼ 0:806

rLOO ¼ 0:789

rL5%O ¼ 0:785

rL10%O ¼ 0:786

rL20%O ¼ 0:775

rL25%O ¼ 0:788

RMSEcal ¼ 0:49

q2LOO ¼ 0:622 q2L5%O ¼ 0:615 q2L10%O ¼ 0:617 q2L20%O ¼ 0:596 q2L25%O ¼ 0:620

scal ¼ 0:50

½106

Fcal ¼ 64

RMSELOO ¼ 0:51

RMSEL5%O ¼ 0:51

RMSEL10%O ¼ 0:51

RMSEL20%O ¼ 0:53

RMSEL25%O ¼ 0:51

Based on the cross-validation statistics, the best ANN model has tanh functions for both hidden and output neurons, and it has only one hidden neuron. The statistics for this ANN are rcal ¼ 0:824, RMSEcal ¼ 0:47; rL5%O ¼ 0:804, RMSEL5%O ¼ 0:49; rL10%O ¼ 0:805, q2L10%O ¼ 0:647, q2L5%O ¼ 0:645, 2 RMSEL10%O ¼ 0:49, rL20%O ¼ 0:802, qL20%O ¼ 0:642, RMSEL20%O ¼ 0:50;

364

Applications of Support Vector Machines in Chemistry

and rL25%O ¼ 0:811, q2L25%O ¼ 0:657, RMSEL25%O ¼ 0:48. On the one hand, the ANN statistics are better than those obtained with MLR, indicating that there is some nonlinearity between pIGC50 and the seven structural descriptors. On the other hand, the predictions statistics for ANN models with two or more hidden neurons decrease, indicating that the dataset has a high level of noise or error. The SVM regression results for the prediction of phenol toxicity to Tetrahymena pyriformis are presented in Tables 11 and 12. In calibration Table 11 Kernel Type and Corresponding Parameters for Each SVM Modela Exp Kernel 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 a

L P P P P R R R R R N N N N N N N N N A A A A A A A A A A A A A A A

2 3 4 5 0.25 0.5 1.0 1.5 2.0 0.5 1.0 2.0 0.5 1.0 2.0 0.5 1.0 2.0 0.25 0.5 1.0 1.5 2.0 0.25 0.5 1.0 1.5 2.0 0.25 0.5 1.0 1.5 2.0

0.0 0.0 0.0 1.0 1.0 1.0 2.0 2.0 2.0 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3

Copt

SVcal

rcal

RMSEcal

rL5%O

q2L5%O

RMSEL5%O

64.593 88.198 64.593 73.609 88.198 88.198 88.198 88.198 88.198 64.593 0.024 0.016 0.016 0.020 0.012 0.015 0.020 0.015 0.012 88.198 88.198 88.198 88.198 88.198 88.198 88.198 88.198 88.198 64.593 88.198 64.593 64.593 64.593 64.593

250 250 243 248 250 250 250 250 250 250 250 250 250 250 250 248 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250

0.803 0.853 0.853 0.993 0.999 0.983 0.996 1.000 1.000 1.000 0.748 0.714 0.673 0.691 0.723 0.688 0.642 0.703 0.695 0.842 0.857 0.868 0.880 0.884 0.977 0.994 1.000 1.000 1.000 0.999 1.000 1.000 1.000 1.000

0.51 0.44 0.45 0.09 0.04 0.15 0.08 0.01 0.00 0.00 0.56 0.59 0.61 0.60 0.61 0.61 0.64 0.62 0.62 0.46 0.43 0.42 0.40 0.40 0.18 0.09 0.01 0.00 0.00 0.04 0.00 0.00 0.00 0.00

0.789 0.787 0.326 0.047 0.137 0.694 0.660 0.668 0.659 0.636 0.743 0.722 0.696 0.709 0.706 0.678 0.614 0.670 0.586 0.718 0.708 0.680 0.674 0.688 0.531 0.406 0.436 0.492 0.542 0.312 0.506 0.625 0.682 0.708

0.593 0.591 3.921 10 0.68 0.69 0.63 0.62 0.64 0.56 0.58 0.60 0.59 0.60 0.62 0.65 0.62 0.67 0.62 0.63 0.67 0.68 0.66 1.10 1.33 1.22 1.02 0.88 1.89 1.10 0.77 0.65 0.61

Notations: Exp, experiment number; rcal , calibration correlation coefficient; RMSEcal , calibration root mean square error; rL5%O , leave–5%–out correlation coefficient; q2L5%O , leave– 5%–out q2 ; RMSEL5%O , leave–5%–out root-mean-square error; L, linear kernel; P, polynomial kernel (parameter: degree d); R, radial basis function kernel (parameter: g); N, neural kernel (parameters: a and b); and A, anova kernel (parameters: g and d).

Practical Aspects of SVM Regression

365

Table 12 Support Vector Regression Statistics for Leave–10%–out, Leave–20%–out, and Leave–25%–out Cross-validation Testsa Exp Kernel rL10%O q2L10%O

RMSEL10%O

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34

0.53 0.53 1.83 >10 >10 0.69 0.67 0.63 0.63 0.65 0.57 0.58 0.61 0.59 0.61 0.62 0.66 0.62 0.62 0.61 0.61 0.66 0.65 0.64 1.13 1.40 1.28 0.99 0.85 1.90 1.18 0.79 0.66 0.61

L P P P P R R R R R N N N N N N N N N A A A A A A A A A A A A A A A

0.789 0.784 0.316 0.008 0.035 0.676 0.663 0.662 0.650 0.628 0.737 0.719 0.685 0.714 0.689 0.684 0.610 0.678 0.682 0.725 0.723 0.684 0.694 0.703 0.493 0.351 0.349 0.471 0.549 0.282 0.449 0.597 0.671 0.703

0.593 0.586 3.915 10 0.72 0.69 0.65 0.67 0.69 0.56 0.59 0.60 0.60 0.60 0.65 0.64 0.62 0.71 0.72 0.76 0.73 0.76 0.76 1.34 1.30 1.11 0.94 0.84 1.68 1.05 0.75 0.67 0.63

0.589 0.501 11.734 neural > RBF > anova. The MLR and SVMR linear models are very similar, and both are significantly better than the SVM models obtained with nonlinear kernels. The inability of nonlinear models to outperform the linear ones can be attributed to the large experimental errors in determining BCF. SVM regression is a relatively novel addition to the field of QSAR, but its potential has not yet been sufficiently explored. In this pedagogically driven chapter, we have presented four QSAR applications in which we compared the performances of five kernels with models obtained with MLR and ANN. In general, the SVM regression cannot surpass the predictive ability of either MLR or ANN, and the prediction of nonlinear kernels is lower than that obtained with the linear kernel. Several levels of cross-validation are necessary to confirm the prediction stability; in particular, the QSAR for Chlorella vulgaris toxicity shows a different ranking for these methods, depending on the cross-validation test. The statistics of the QSAR models are dependent on the kernel type and parameters, and SVM regression gives in some cases unexpectedly low prediction statistics. Another problem with nonlinear kernels is overfitting, which was found in all four QSAR experiments. For this tutorial we also experimented with ANNs having different output transfer functions (linear, symlog, and tanh; data not shown). When the number of hidden neurons was kept low, all ANN results were consistently good, unlike SVM regression, which shows a wide and unpredictable variation with the kernel type and parameters.

Review of SVM Applications in Chemistry

371

The fact that the linear kernel gives better results than nonlinear kernels for certain QSAR problems is documented in the literature. Yang et al. compared linear, polynomial, and RBF kernels for the following properties of alkyl benzenes: boiling point, enthalpy of vaporization at the boiling point, critical temperature, critical pressure, and critical volume.146 A LOO test showed that the first four properties were predicted best with a linear kernel, whereas critical volume was predicted best with a polynomial kernel.

REVIEW OF SVM APPLICATIONS IN CHEMISTRY A rich literature exists on the topic of chemical applications of support vector machines. These publications are usually for classification, but some interesting results have also been obtained with SVM regression. An important issue is the evaluation of the SVM capabilities. Accordingly, many papers contain comparisons with other pattern recognition algorithms. Equally important is the assessment of various parameters and kernels that can give rise to the best SVM model for a particular problem. In this section, we present a selection of published SVM applications in chemistry that focus on drug design and classification of chemical compounds, SAR and QSAR, genotoxicity of chemical compounds, chemometrics, sensors, chemical engineering, and text mining for scientific information.

Recognition of Chemical Classes and Drug Design A test in which kinase inhibitors were discriminated from noninhibitors was used by Briem and Gu¨nther to compare the prediction performances of several machine learning methods.85 The learning set consisted of 565 kinase inhibitors and 7194 inactive compounds, and the validation was performed with a test set consisting of 204 kinase inhibitors and 300 inactive compounds. The structure of the chemical compounds was encoded into a numerical form by using Ghose–Crippen atom type counts. Four classification methods were used: SVM with a Gaussian RBF kernel, artificial neural networks, k-nearest neighbors with genetic algorithm parameter optimization, and recursive partitioning (RP). All four methods were able to classify kinase inhibitors from noninhibitors, but with slight differences in the predictive power of the models. The average test accuracy for 13 experiments indicates that SVMs give the best predictions for the test set: SVM 0.88, k-NN 0.84, ANN 0.80, RP 0.79. The results for a majority vote of a jury of 13 experiments show that SVM and ANN had identical test accuracy: SVM 0.88, k-NN 0.87, ANN 0.88, and RP 0.83. Mu¨ller et al. investigated several machine learning algorithms for their ability to identify drug-like compounds based on a set of atom type counts.147 Five machine learning procedures were investigated: SVM with polynomial

372

Applications of Support Vector Machines in Chemistry

and Gaussian RBF kernels, linear programming machines, linear discriminant analysis, bagged k-nearest neighbors, and bagged decision trees C4.5. Druglike compounds were selected from the World Drug Index (WDI), whereas non-drug compounds were selected from the Available Chemicals Directory (ACD), giving a total of 207,001 compounds. The chemical structure was represented with the counts of Ghose–Crippen atom types. The test error for discriminating drug-like from non-drug compounds shows that the two SVM models give the best results: SVM RBF 6.8% error, SVM linear 7.0% error, C4.5 8.2% error, and k-NN 8.2% error. Jorissen and Gilson applied SVM to in silico screening of molecular databases for compounds possessing a desired activity.148 Structural descriptors were computed with Dragon, and the parameters of the SVM (with a Gaussian RBF kernel) were optimized through a cross-validation procedure. Five sets of 50 diverse inhibitors were collected from the literature. The active compounds are antagonists of the a1A adrenoceptor and reversible inhibitors of cyclindependent kinase, cyclooxygenase-2, factor Xa, and phosphodiesterase-5. The nonactive group of compounds was selected from the National Cancer Institute diversity set of chemical compounds. Based on the enrichment factors computed for the five sets of active compounds, it was found that SVM can successfully identify active compounds and discriminate them from nonactive chemicals. Yap and Chen developed a jury SVM method for the classification of inhibitors and substrates of cytochromes P450 3A4 (CYP3A4, 241 inhibitors and 368 substrates), 2D6 (CYP2D6, 180 inhibitors and 198 substrates), and 2C9 (CYP2C9, 167 inhibitors and 144 substrates).86 Structural descriptors computed with Dragon were selected with a genetic algorithm procedure and a L10%O or L20%O SVM cross-validation. Two jury SVM algorithms were applied. The first is the positive majority consensus SVM (PM-CSVM), and the second is the positive probability consensus SVM (PP-CSVM). PM-CSVM classifies a compound based on the vote of the majority of its SVM models, whereas PP-CSVM explicitly computes the probability for a compound being in a certain class. Several tests performed by Yap and Chen showed that at least 81 SVM models are necessary in each ensemble. Both PM-CSVM and PP-CSVM were shown to be superior to a single SVM model (Matthews correlation coefficient for CYP2D6, MCC ¼ 0.742 for single SVM, MCC ¼ 0.802 for PM-CSVM, and MCC ¼ 0.821 for PP-CSVM). Because PP-CSVM appears to outperform PM-CSVM, the final classification results were generated with PP-CSVM: MCC ¼ 0.899 for CYP3A4, MCC ¼ 0.884 for CYP2D6, and MCC ¼ 0.872 for CYP2C9. Arimoto, Prasad, and Gifford compared five machine learning methods (recursive partitioning, naı¨ve Bayesian classifier, logistic regression, k-nearest neighbors, and SVM) for their ability to discriminate between inhibitors (IC50 < 3 mM) and noninhibitors (IC50 > 3 mM) of cytochrome P450 3A4.149 The dataset of 4470 compounds was characterized with four sets of

Review of SVM Applications in Chemistry

373

descriptors: MAKEBITS BCI fingerprints (4096 descriptors), MACCS fingerprints (166 descriptors), MOE TGT (typed graph triangle) fingerprints (13608 descriptors), and MolconnZ topological indices (156 descriptors). The models were evaluated with L10%O cross-validation and with a test set of 470 compounds (179 inhibitors and 291 noninhibitors). The most predictive models are the BCI fingerprints/SVM, which correctly classified 135 inhibitors and 249 noninhibitors; the MACCS fingerprints/SVM, which correctly classified 137 inhibitors and 248 noninhibitors; and topological indices/recursive partitioning, which correctly classified 147 inhibitors and 236 noninhibitors. A consensus of these three models slightly increased the accuracy to 83% compared to individual classification models. Svetnik et al. performed a large-scale evaluation of the stochastic gradient boosting method (SGB), which implements a jury of classification and regression trees.150 SGB was compared with a single decision tree, a random forest, partial least squares, k-nearest neighbors, naı¨ve Bayes, and SVM with linear and RBF kernels. For the 10 QSAR datasets that were used for these tests we indicate here the best two methods for each QSAR, as determined by the prediction statistics (mean for 10 cross-validation experiments): blood-brain barrier (180 active compounds, 145 non-active compounds), random forest AC ¼ 0.806 and SGB AC ¼ 0.789; estrogen receptor binding activity (131 binding compounds, 101 non-binding compounds) random forest AC ¼ 0.827 and SGB AC ¼ 0.824; P-glycoprotein (P-gp) transport activity (108 P-gp substrates, 78 P-gp non-substrates) random forest AC ¼ 0.804 and PLS AC ¼ 0.798; multidrug resistance reversal agents (298 active compounds, 230 non-active compounds) random forest AC ¼ 0.831 and SGB AC ¼ 0.826; cyclin-dependent kinase 2 antagonists (361 active compounds, 10579 inactive compounds) random forest AC ¼ 0.747 and SVM RBF AC ¼ 0.723; binding affinity for the dopamine D2 receptor (116 disubstituted piperidines) random forest q2 ¼ 0:477 and PLS q2 ¼ 0:454; log D (11260 compounds) SGB q2 ¼ 0:860 and SVM linear q2 ¼ 0:841; binding to unspecified channel protein (23102 compounds) SVM linear q2 ¼ 0:843 and SVM RBF q2 ¼ 0:525; cyclooxygenase-2 inhibitors (314 compounds for regression; and 153 active compounds and 161 non-active compounds for classification), regression random forest q2 ¼ 0:434 and SGB q2 ¼ 0:390, and classification SGB AC ¼ 0.789 and SVM linear AC ¼ 0.774. The study shows that jury methods are generally superior to single models. An important adverse drug reaction is the torsade de pointes (TdP) induction. TdP accounts for almost one third of all drug failures during drug development and has resulted in several drugs being withdrawn from the market. Yap et al. developed an SVM classification model to predict the TdP potential of drug candidates.151 The drugs that induce TdP were collected from the literature (204 for training and 39 for prediction), whereas drugs with no reported cases of TdP in humans were selected as the non-active compounds (204 for training and 39 for prediction). The molecular structure for

374

Applications of Support Vector Machines in Chemistry

each molecule was characterized with the linear solvation energy relationship, (LSER) descriptors. The prediction accuracy for each method is 91.0% accuracy for SVM with Gaussian RBF kernel, 88.5% accuracy for k-NN, 78.2% accuracy for probabilistic neural networks, and 65.4% accuracy for the C4.5 decision tree, thus illustrating the good results of support vector machines classification. HERG (human ether-a-go-go) potassium channel inhibitors can lead to a prolongation of the QT interval that can trigger TdP, an atypical ventricular tachycardia. Tobita, Nishikawa, and Nagashima developed an SVM classifier that can discriminate between high and low HERG potassium channel inhibitors.152 The IC50 values for 73 drugs were collected from the literature, and two thresholds were used by those authors to separate high and low inhibitors, namely pIC50 ¼ 4.4 (58 active and 15 non-active compounds) and pIC50 ¼ 6.0 (28 active and 45 non-active compounds). The chemical structure of each molecule was represented by 57 2D MOE descriptors and 51 molecular fragments representing a subset of the public 166-bit MACCS key set. The classification accuracy for L10%O cross-validation was 95% for pIC50 ¼ 4.4 and 90% for pIC50 ¼ 6.0, again showing the utility of SVM for classification. Xue et al. investigated the application of recursive feature elimination for the three following classification tests: P-glycoprotein substrates (116 substrates and 85 non-substrates), human intestinal absorption (131 absorbable compounds and 65 non-absorbable compounds), and compounds that cause torsade de pointes (85 TdP inducing compounds and 276 non-TdP inducing compounds).69 With the exception of TdP compounds, the recursive feature elimination increases significantly the prediction power of SVM classifiers with a Gaussian RBF kernel. The accuracy (AC) and Matthews correlation coefficient (MCC) for SVM alone and for SVM plus recursive feature elimination (SVM þ RFE) using a L20%O cross-validation test demonstrates the importance of eliminating ineffective descriptors: P-glycoprotein substrates, SVM AC ¼ 68.3% and MCC ¼ 0.37, SVM þ RFE AC ¼ 79.4% and MCC ¼ 0.59; human intestinal absorption, SVM AC ¼ 77.0% and MCC ¼ 0.48, SVM þ RFE AC ¼ 86.7% and MCC ¼ 0.70; torsade de pointes inducing compounds, SVM AC ¼ 82.0% and MCC ¼ 0.48, and SVM þ RFE AC ¼ 83.9% and MCC ¼ 0.56. Selecting an optimum group of descriptors is both an important and time-consuming phase in developing a predictive QSAR model. Fro¨hlich, Wegner, and Zell introduced the incremental regularized risk minimization procedure for SVM classification and regression models, and they compared it with recursive feature elimination and with the mutual information procedure.70 Their first experiment considered 164 compounds that had been tested for their human intestinal absorption, whereas the second experiment modeled the aqueous solubility prediction for 1297 compounds. Structural descriptors were computed by those authors with JOELib and MOE, and full cross-validation was performed to compare the descriptor selection methods. The incremental

Review of SVM Applications in Chemistry

375

regularized risk minimization procedure gave slightly better results than did the recursive feature elimination. Sorich et al. proposed in silico models to predict chemical glucuronidation based on three global descriptors (equalized electronegativity, molecular hardness, and molecular softness) and three local descriptors (atomic charge, Fukui function, and atomic softness), all based on the electronegativity equalization method (EEM).153 The metabolism of chemical compounds by 12 human UDP-glucuronosyltransferase (UGT) isoforms was modeled with a combined approach referred to as cluster–GA–PLSDA (cluster analysis– genetic algorithm–partial least-squares discriminant analysis) and with n-SVM with an RBF kernel. Groups containing between 50 and 250 substrates and nonsubstrates for each of the 12 UGT isoforms were used to generate the classification models. The average percentage of correctly predicted chemicals for all isoforms is 78% for SVM and 73% for cluster–GA–PLSDA. By combining the EEM descriptors with 2-D descriptors, the SVM average percentage of correctly predicted chemicals increases to 84%. Jury methods can increase the prediction performances of the individual models that are aggregated in the ensemble. Merkwirth et al. investigated the use of k-NN, SVM, and ridge regression for drug-like compound identification.84 The first test of their jury approach involved 902 compounds from high-throughput screening experiments that were classified as ‘‘frequent hitters’’ (479 compounds) and ‘‘non-frequent hitters’’ (423 compounds), each of which was characterized by 1814 structural descriptors. The second test consisted of inhibitors of the cytochrome P450 3A4 (CYP3A4), which were divided into a group of low inhibitors (186 compounds with IC50 < 1 mM) and another group of high inhibitors (224 compounds with IC50 > 1 mM). Their cross-validation statistics show that SVM models (single and in a jury of 15 models) are the best classifiers, as can be seen from the values of the Matthews correlation coefficient: for frequent hitters, SVM 0.91 and jury-SVM 0.92; for CYP3A4, SVM and jury-SVM 0.88. Both SVM and jury-SVM classifiers were obtained by using all structural descriptors, which gave better results than models obtained when using only selected input descriptors. Overall, this approach to jury prediction does not provide any significant advantage over single SVM classifiers. Five methods of feature selection (information gain, mutual information, w2-test, odds ratio, and GSS coefficient) were compared by Liu for their ability to discriminate between thrombin inhibitors and noninhibitors.71 The chemical compounds were provided by DuPont Pharmaceutical Research Laboratories as a learning set of 1909 compounds contained 42 inhibitors and 1867 noninhibitors, and a test set of 634 compounds contained 150 inhibitors and 484 noninhibitors. Each compound was characterized by 139,351 binary features describing their 3-D structure. In this comparison of naı¨ve Bayesian and SVM classifiers, all compounds were considered together, and a L10%O cross-validation procedure was applied. Based on information gain descriptor selection,

376

Applications of Support Vector Machines in Chemistry

SVM was robust to a 99% reduction of the descriptor space, with a small decrease in sensitivity (from 58.7% to 52.5%) and specificity (from 98.4% to 97.2%). Byvatov and Schneider compared the SVM-based and the Kolmogorov– Smirnov feature selection methods to characterize ligand–receptor interactions in focused compound libraries.72 Three datasets were used to compare the feature selection algorithms: 226 kinase inhibitors and 4479 noninhibitors; 227 factor Xa inhibitors and 4478 noninhibitors; and 227 factor Xa inhibitors and 195 thrombin inhibitors. SVM classifiers with a degree 5 polynomial kernel were used for all computations, and the molecular structure was encoded into 182 MOE descriptors and 225 topological pharmacophores. In one test, both feature selection algorithms produced comparable results, whereas in all other cases, SVM-based feature selection had better predictions. Finally, we highlight the work of Zernov et al. who tested the SVM ability to discriminate between active–inactive compounds from three libraries.154 The first test evaluated the discrimination between drug-like and non-drug compounds. The learning set contained 15,000 compounds (7465 drugs and 7535 non-drugs), and the test set had 7500 compounds (3751 drugs and 3749 non-drugs). The test set accuracy for SVM with an RBF kernel (75.15%) was slightly lower in percentage prediction than that of ANN (75.52%). The second experiment evaluated the discrimination between agrochemical and non-agrochemical compounds, and the third evaluated the discrimination between low and high carbonic anhydrase II inhibitors. In both of these tests, SVM classifiers had the lowest number of errors.

QSAR SVM classification and regression were used to model the potency of diverse drug-like compounds to inhibit the human cytochromes P450 3A4 (CYP3A4) and 2D6 (CYP2D6).155 The dataset consisted of 1345 CYP3A4 and 1187 CYP2D6 compounds tested for the 50% inhibition (IC50) of the corresponding enzyme. The SVM models were trained with the Gaussian RBF kernel, and the one-versus-one technique was used for multiclass classification. For SVM classification, the datasets were partitioned into three groups: strong inhibitors, consisting of compounds with IC50 < 2 mM (243 CYP3A4 inhibitors and 182 CYP2D6 inhibitors); medium inhibitors, consisting of those compounds with IC50 between 2 and 20 mM (559 CYP3A4 inhibitors and 397 CYP2D6 inhibitors); and weak inhibitors, consisting of compounds with IC50 > 20 mM (543 CYP3A4 inhibitors and 608 CYP2D6 inhibitors). Four sets of structural descriptors were used to train the SVM models: in-house 2-D descriptors, such as atom and ring counts; MOE 2-D descriptors, such as topological indices and pharmacophores; VolSurf descriptors, based on molecular interaction fields; and a set of 68 AM1 quantum indices. Leave– 10%–out was used to cross-validate the SVM models. The best SVM

Review of SVM Applications in Chemistry

377

classification predictions were obtained with the MOE 2-D descriptors. For CYP3A4, the test set accuracy is 72% (76% for strong inhibitors, 67% for medium inhibitors, and 77% for weak inhibitors), whereas for CYP2D6, the test set accuracy is 69% (84% for strong inhibitors, 53% for medium inhibitors, and 74% for weak inhibitors). The same group of descriptors gave the best SVM regression predictions: CYP3A4, q2 ¼ 0:51 vs. q2 ¼ 0:39 for PLS, and CYP2D6, q2 ¼ 0:52 vs. q2 ¼ 0:30 for PLS. In these QSAR models, SVM regression gave much better predictions than did PLS. Aires-de-Sousa and Gasteiger used four regression techniques [multiple linear regression, perceptron (a MLF ANN with no hidden layer), MLF ANN, and n-SVM regression] to obtain a quantitative structure-enantioselectivity relationship (QSER).156 The QSER models the enantiomeric excess in the addition of diethyl zinc to benzaldehyde in the presence of a racemic catalyst and an enantiopure chiral additive. A total of 65 reactions constituted the dataset. Using 11 chiral codes as model input and a three-fold cross-validation procedure, a neural network with two hidden neurons gave the best predictions: ANN 2 hidden neurons, R2 pred ¼ 0:923; ANN 1 hidden neurons, R2 pred ¼ 0:906; perceptron, R2 pred ¼ 0:845; MLR, R2 pred ¼ 0:776; and n-SVM regression with RBF kernel, R2 pred ¼ 0:748. A molecular similarity kernel, the Tanimoto similarity kernel, was used by Lind and Maltseva in SVM regression to predict the aqueous solubility of three sets of organic compounds.99 The Taniomto similarity kernel was computed from molecular fingerprints. The RMSE and q2 cross-validation statistics for the three sets show a good performance of SVMR with the Tanimoto kernel: set 1 (883 compounds), RMSE ¼ 0.62 and q2 ¼ 0:88; set 2 (412 compounds), RMSE ¼ 0.77 and q2 ¼ 0:86; and set 3 (411 compounds), RMSE ¼ 0.57 and q2 ¼ 0:88. An SVMR model was trained on set 1 and then tested on set 2 with good results, i.e., RMSE ¼ 0.68 and q2 ¼ 0:89. Yang et al. developed quantitative structure-property relationships (QSPR) for several properties of 47 alkyl benzenes. These properties included boiling point, enthalpy of vaporization at the boiling point, critical temperature, critical pressure, and critical volume.146 The molecular structure of alkyl benzenes was encoded with Randic´–Kier–Hall connectivity indices, electrotopological state indices, and Kappa indices. For each property, the optimum set of topological indices, kernel (linear, polynomial, or Gaussian RBF), C, and e were determined with successive LOO cross-validations. The LOO RMSE statistics for SVM regression, PLS, and ANN (three hidden neurons) show that the SVM model is the best: boiling point, SVMR 2.108, PLS 2.436, ANN 5.063; enthalpy of vaporization at the boiling point, SVMR 0.758, PLS 0.817, ANN 1.046; critical temperature, SVMR 5.523, PLS 7.163, ANN 9.704; critical pressure, SVMR 0.075, PLS 0.075, ANN 0.114; and critical volume, SVMR 4.692, PLS 5.914, ANN 9.452. The first four properties were best predicted with a linear kernel, whereas a polynomial kernel was used to model the critical volume.

378

Applications of Support Vector Machines in Chemistry

Kumar et al. introduced a new method for descriptor selection, the locally linear embedding, which can be used for reducing the nonlinear dimensions in QSPR and QSAR.68 SVM regression (Gaussian RBF kernel) was used to test the new descriptor selection algorithm, using three datasets: boiling points of 150 alkanes characterized by 12 topological indices; the Selwood dataset with 31 chemical compounds characterized by 53 descriptors; and the Steroids dataset consisting of 31 steroids with 1248 descriptors (autocorrelation of molecular surface indices). The statistics obtained with locally linear embedding were better than those obtained with all descriptors or by PCA descriptor reduction.

Genotoxicity of Chemical Compounds During the process of drug discovery, the genotoxicity of drug candidates must be monitored closely. Genotoxicity mechanisms include DNA methylation, DNA intercalation, unscheduled DNA synthesis, DNA adduct formation, and strand breaking. Li et al. compared the ability of several machine learning algorithms to classify a set of 860 compounds that were tested for genotoxicity (229 genotoxic and 631 non-genotoxic).157 Four methods were compared: SVM, probabilistic neural networks, k-nearest neighbors, and the C4.5 decision tree. An initial set of 199 structural descriptors (143 topological indices, 31 quantum indices, and 25 geometrical descriptors) was reduced to 39 descriptors using an SVM descriptor selection procedure. A L20%O crossvalidation test showed that SVM has the highest prediction accuracy: 89.4% SVM with RBF kernel, 82.9% k-NN, 78.9% probabilistic neural networks, and 70.7% C4.5. Typically, an SVM application that predicts properties from the molecular structure uses structural descriptors as input to the SVM model. These descriptors are used in nonlinear functions, such as the polynomial or RBF kernels, to compute the SVM solution. Mahe´ et al. defined a series of graph kernels that can predict various properties from only the molecular graph.97 Atoms (graph vertices) are characterized by their chemical nature or by their connectivity through the Morgan index. Their first test of the molecular graph kernels considered the classification of 230 aromatic and hetero-aromatic nitro compounds that were tested for mutagenicity on Salmonella typhimurium. This dataset was further divided into a regression-easy set of 188 compounds (125 mutagens and 63 nonmutagens) and a regression-difficult set of 42 compounds. In a comparative test of leave–10%–out cross-validation accuracy, the molecular graph kernel ranked third: feature construction 95.7%, stochastic matching 93.3%, graph kernel 91.2%, neural network 89.4%, linear regression 89.3%, and decision tree 88.3%. For the group of 42 compounds, the literature has fewer comparative tests. In a LOO test, the accuracy of the new kernel was higher than that of other methods: graph kernel 88.1%, inductive logic programming 85.7%, and decision tree 83.3%. Their second test of the

Review of SVM Applications in Chemistry

379

graph kernel used a dataset of 684 non-congeneric compounds classified as mutagens (341 compounds) or nonmutagens (343 compounds) in a Salmonella/ microsome assay. Previous models for this dataset, based on molecular fragments, have a L10%O accuracy of 78.5%, whereas the graph kernel has an accuracy between 76% and 79%. The mutagenicity dataset of 230 aromatic and hetero-aromatic nitro compounds was also used as a test case for a molecular graph kernel by Jain, Geibel, and Wysotzki.98 Their kernel is based on the Schur–Hadamard inner product for a pair of molecular graphs. The leave–10%–out cross-validation accuracy is 92% for the set of 188 compounds and 90% for the set of 42 compounds. The problem of computing the Schur–Hadamard inner product for a pair of graphs is NP complete, and in this paper, it was approximated with a recurrent neural network. However, these approximations are not, in general, a kernel. Moreover, for some values of the parameters that control the kernel, the calculation of an SVM solution was not possible. Helma et al. used the MOLFEA program for generating molecular substructures to discriminate between mutagenic and nonmutagenic compounds.158 A group of 684 compounds (341 mutagenic and 343 nonmutagenic) evaluated with the Ames test (Salmonella/microsome assay) was used to compare the C4.5 decision tree algorithm, the PART rule learning algorithm, and SVM with linear and degree 2 polynomial kernels. The L10%O accuracy of 78.5% for the SVM classifier is higher than that of the C4.5 (75.0% accuracy) and PART (74.7% accuracy) algorithms.

Chemometrics Several forms of transmissible spongiform encephalopathies are known today, such as scrapie, fatal familiar insomnia, kuru, chronic wasting disease, feline spongiform encephalopathy, Creutzfeldt-Jacob disease in humans, or bovine spongiform encephalopathy (BSE). The main pathological characteristic of these diseases is a sponge-like modification of brain tissue. Martin et al. developed a serum-based diagnostic pattern recognition method for BSE diagnosis.159 Mid-infrared spectroscopy of 641 serum samples was performed, and four classification algorithms (linear discriminant analysis, LDA; robust linear discriminant analysis, RLDA; ANN; SVM) were used to characterize the samples as BSE-positive or BSE-negative. The four classifiers were tested for a supplementary set of 160 samples (84 BSE-positive and 76 BSE-negative). For the test set, ANN had the highest sensitivity (ANN 93%, SVM 88%, LDA 82%, RLDA 80%), whereas SVM had the highest specificity (SVM 99%, LDA 93%, ANN 93%, RLDA 88%). After the emergence of the mad cow disease, the European Union regulatory agencies banned processed animal proteins (meat and bone meal, MBM) in feedstuffs destined to farm animals that are kept, fattened, or bred for the production of food. A Near IR–SVM (NIR–SVM) system based

380

Applications of Support Vector Machines in Chemistry

on plane array near-infrared imaging spectroscopy was proposed by Pierna et al. to detect MBM.160 Learning was based on NIR spectra from 26 pure animal meals and 59 pure vegetal meals, with a total of 5521 spectra collected (2233 animal and 3288 vegetal). An LOO comparative evaluation of PLS, ANN, and SVM shows that support vector machines have the lowest RMSE: SVM RBF kernel 0.102, ANN 0.139, and PLS 0.397. Wet granulation and direct compression are two methods used to manufacture tablets in the pharmaceutical industry. Zomer et al. used pyrolysisgas chromatography-mass-spectrometry coupled with SVM classification to discriminate between the two tablet production methods.161 Mass spectra data were submitted to a PCA analysis, and the first principal components were used as input for SVM models having linear, polynomial, and Gaussian RBF kernels. SVM classifiers with polynomial and RBF kernels performed better in prediction than discriminant analysis. The pathological condition induced by exposing an organism to a toxic substance depends on the mode of admission, the quantity, and the type of dosage (acute or chronic). Urine profiling by b-cyclodextrin-modified micellar electrokinetic capillary electrophoresis was used by Zomer et al. to identify the type of cadmium intoxication (acute or chronic).162 Their dataset of 96 samples was split into a learning set of 60 samples and a test set of 36 samples. Discriminant analysis applied to the first six principal components had better results on the test set (96.97% correctly classified) than did SVM trained on the original measured data (75.76% correctly classified). NIR spectroscopy is often used for nondestructive measurement of chemicals in various materials. The application of least-squares SVM regression (LS–SVMR) in predicting mixture composition from NIR spectra was investigated by Thissen et al.163 NIR spectra for ternary mixtures of ethanol, water, and 2-propanol were measured at 30 C, 40 C, 50 C, 60 C, and 70 C. The learning set consisted of 13 mixtures per temperature, whereas the test set consisted of 6 mixtures per temperature. For the test set, the least-squares SVM approach had an RMSE 2.6 times lower than that from a PLS analysis. Chauchard et al. investigated the ability of least-squares SVM regression to predict the acidity of different grape varieties from NIR spectra.164 NIR scans between 680 and 1100 nm for 371 grape samples were collected for three varieties: carignan (188 samples), mourverdre (84 samples), and ugniblanc (99 samples). The total acidity (malic and tartaric acid concentrations) was measured with an HPLC assay. The PLS model selected 68 wavelengths from the NIR spectra, and with eight principal factors gave a prediction q2 ¼ 0:76 and a test correlation coefficient R2 ¼ 0:77. Using 10 principal factors, LS–SVMR models were more predictive than was PLS, with q2 ¼ 0:83 and R2 ¼ 0:86. A comparison between an MLR with eight wavelengths (q2 ¼ 0:69 and R2 ¼ 0:68) and an LS–SVMR obtained for the same wavelengths (q2 ¼ 0:77 and R2 ¼ 0:78) showed a significant improvement for the support vector machines model.

Review of SVM Applications in Chemistry

381

PLS and SVM regression were compared in their ability to predict, from Raman spectra, the monomer masses for the copolymerization of methyl methacrylate and butyl acrylate in toluene.165 The high- and low-resolution Raman spectra of 37 training samples were used to compute the regression models, which were subsequently tested for 41 test samples. For the highresolution spectra, the mean relative errors were 3.9% for SVMR and 10.1% for PLS. For the low-resolution spectra, these errors were 22.8% for SVMR and 68.0% for PLS. In general, SVMR with a degree 1 polynomial kernel gave the best predictions, which shows that a linear SVM model predicts better than the linear PLS model for this type of analysis. Active learning support vector machines (AL–SVMs) was used by Zomer et al. to identify beach sand polluted with either gasoline or crude oil.166 A total of 220 samples were split into 106 learning samples and 114 test samples. Each sample was analyzed using HS–MS (head-space sampler coupled to a mass-spectrometer) and with the mass spectra recorded in the range m/z 49–160. The results obtained by Zomer et al. show that the active learning procedure is effective in selecting a small subset of training samples, thus greatly reducing the number of experiments necessary to obtain a predictive model. Chemometrics techniques are usually applied in capillary electrophoresis to obtain an optimum resolution of the peaks, lower detection limits, shorter migration times, good peak shapes, higher precision, and better signal-to-noise ratio. Optimum separation conditions in capillary electrophoresis were determined by Zhai et al. by combining a genetic algorithm with least-squares support vector machines.167 The optimization target of the genetic algorithm was to increase the peak resolution, symmetry, and height, and to decrease the migration time. The study involved the identification of four compounds with anti-tumor activity. The optimizable parameters are the voltage and electrophoresis buffer composition, whereas the output measured parameters were the migration time, height, and width for each of the four peaks. The correlation coefficient for LS–SVM LOO cross-validation was 0.978. By combining the simulation results of LS–SVM and a fitness function, the genetic algorithm finds an optimum combination of experimental conditions for capillary electrophoresis separation.

Sensors Heat treatment of milk ensures the microbial safety of milk and increases its shelf life. Different heat treatments (UHT, pasteurized, sterilized) can be distinguished by analyzing the volatile compounds with an electronic nose. A hybrid system that uses an electronic nose combined with an SVM classification method was tested by Brudzewski, Osowski, and Markiewicz for milk recognition and classification.168 The electronic nose was composed of seven tin oxide-based gas sensors, and the SVM model was tested with linear and RBF kernels. In the first experiment, four brands (classes) of milk were

382

Applications of Support Vector Machines in Chemistry

discriminated, with each class containing 180 samples. In the second experiment, the UHT milk from one producer was classified according to the fat content, again with 180 samples for each of the four brands. For each brand, 90 samples were used for learning and 90 samples for testing the SVM classifier. The prediction was perfect for both experiments and all brands of milk. Measurements collected from an electronic nose were used by Sadik et al. in an SVM classification system to identify several organophosphates.169 The following organophosphates were tested: parathion, malathion, dichlorvos, trichlorfon, paraoxon, and diazinon. The electronic nose contained 32 conducting polymer sensors whose output signal was processed and fed into the SVM classifier for one-versus-one and one-versus-all classification. A total of 250 measurements were recorded for each of the six organophosphates, and a L20%O cross-validation procedure was implemented. Four kernels were tested, namely linear, Gaussian RBF, polynomial, and the S2000 kernel, Kðx1 ; x2 Þ ¼ jjx1  x2 jj2 . In all experiments, the SVM performed better than a neural network. An electronic nose and an SVM classifier were evaluated by Distante, Ancona, and Siciliano for the recognition of pentanone, hexanal, water, acetone, and three mixtures of pentanone and hexanal in different concentrations.170 In a LOO test, the SVM classifier with a degree 2 polynomial kernel gave the best predictions: SVM 4.5% error, RBF neural network 15% error, and multilayer feed-forward ANN 40% error. Seven types of espresso coffee were classified by Pardo and Sberveglieri with a system composed of an electronic nose and an SVM with polynomial and Gaussian RBF kernels.171 For each coffee type, 36 measurements were performed with an electronic nose equipped with five thin-film semiconductor sensors based on SnO2 and Ti-Fe. The output signal from sensors was submitted to a PCA analysis whose principal components (between 2 and 5) represented the input data for the SVM classifier. The error surface corresponding to various kernel parameters and number of input principal components was investigated. Gasoline supplemented with alcohol or ethers has an enhanced octane number. Adding 10 vol% ethanol, for example, increases the octane number by 2.5 or more units. The most popular ether additive is methyl tertiary butyl ether (MTBE), followed by ethyl tertiary butyl ether (ETBE) and tertiary amyl methyl ether. MTBE adds 2.5–3.0 octane numbers to gasoline and is used in 25% of all U.S. gasoline. Brudzewski et al. used an electronic nose and support vector machines to identify gasoline supplemented with ethanol, MTBE, ETBE, and benzene.172 The electronic nose was composed of seven tin oxide-based gas sensors. Twelve gasoline blend types were prepared, and a total of 432 measurements were performed with the electronic nose. In a six-fold cross-validation experiment, it was found that SVM with linear, degree 2 polynomial, and Gaussian RBF kernels achieved a perfect classification.

Review of SVM Applications in Chemistry

383

Bicego used the similarity-based representation of electronic nose measurements for odor classification with the SVM method.173 In the similaritybased representation, the raw data from sensors are transformed into pairwise (dis)similarities, i.e., distances between objects in the dataset. The electronic nose is an array of eight carbon black-polymer detectors. The system was tested for the recognition of 2-propanol, acetone, and ethanol, with 34 experiments for each compound. Two series of 102 experiments were performed, the first one with data recorded after 10 minutes of exposure, whereas in the second group of experiments, the data were recorded after 1 second of exposure. The one-versus-one cross-validation accuracy of the first group of experiments was 99% for similarity computed using the Euclidean metric. For the second group of experiments, the accuracy was 79% for the Euclidean metric and 80% for the Manhattan metric.

Chemical Engineering Hybrid systems (ANN-GA and SVMR-GA) were compared by Nandi et al. for their ability to model and optimize the isopropylation of benzene on Hbeta catalyst.73 The input parameters used to model the reaction were temperature, pressure, benzene-to-isopropyl alcohol ratio, and weight hourly space velocity. The output parameters were the yield of isopropylbenzene and the selectivity S, where S ¼ 100 (weight of isopropylbenzene formed per unit time)/(weight of total aromatics formed per unit time). Based on 42 experiments, the genetic algorithm component was used to select the optimum set of input parameters that maximize both yield and selectivity. The GA-optimized solutions were then verified experimentally, showing that the two hybrid methods can be used to optimize industrial processes. The SVM classification of vertical and horizontal two-phase flow regimes in pipes was investigated by Trafalis, Oladunni, and Papavassiliou.174 The vertical flow dataset, with 424 cases, had three classes, whereas the horizontal flow dataset, with 2272 cases, had five classes. One-versus-one multiclass SVM models were developed with polynomial kernels (degrees 1 to 4). The transition region is determined with respect to pipe diameter, superficial gas velocity, and superficial liquid velocity. Compared with experimental observations, the predictions of the SVM model were, in most cases, superior to those obtained from other types of theoretical models. The locally weighted regression was extended by Lee et al. to support vector machines and tested on the synthesis of polyvinyl butyrate (PVB).175 Weighted SVM regression has a variable capacity C, which depends on a weight computed for each data point. The weighted SVM regression was computed with e ¼ 0.001. Three kernels were tested: polynomial, Gaussian RBF, and neural (tanh). A dataset of 120 patterns was dividend into 80 training patterns, 20 validation patterns, and 20 test patterns. Each pattern consisted of 12 measurements of controlled variables (such as viscosity and

384

Applications of Support Vector Machines in Chemistry

concentration of PVB, quantities of the first and second catalyst, reaction time, temperature) and one product property, PVB viscosity. A comparative test showed that the weighted SVM regression has the lowest error with RMSE ¼ 23.9, compared with SVM regression RMSE ¼ 34.9 and neural network RMSE ¼ 109.4. Chu, Qin, and Han applied an SVM classification model for the fault detection and identification of the operation mode in processes with multimode operations.176 They studied the rapid thermal annealing, which is a critical semiconductor process used to stabilize the structure of silicon wafers and to make uniform the physical properties of the whole wafer after ion implantation. A dataset of 1848 batch data was divided into 1000 learning data and 848 test data. Input data for the SVM model were selected with an entropy-based algorithm, and 62 input parameters were used to train three SVM classification models. The system based on SVM is superior to the conventional PCA fault detection method. The melt index of thermoplastic polymers like polypropylene (PP) and polystyrene is defined as the mass rate of extrusion flow through a specified capillary under prescribed conditions of temperature and pressure. The melt index of polypropylene and styrene-acrylonitrile (SAN) polymerization were modeled by Han, Han, and Chung with PLS, ANN, and SVM regression having a Gaussian RBF kernel.177 For the SAN polymerization, 33 process variables were measured for 1024 training data and 100 testing data. The test set RMSE shows that the best predictions were obtained with the SVM regression: SVMR 0.97, ANN 1.09, and PLS 3.15. For the PP synthesis, 78 process variables were measured for 467 training data and 50 testing data. The melt index of PP is best predicted by SVMR, as shown by the corresponding RMSE values: SVMR 1.51, PLS 2.08, and ANN 3.07.

Text Mining for Scientific Information Automatic text datamining is an important source of knowledge, with many applications in generating databases from scientific literature, such as protein–disease associations, gene expression patterns, subcellular localization, and protein–protein interactions. The NLProt system developed by Mika and Rost combines four support vector machines, trained individually for distinct tasks.178,179 The first SVM is trained to recognize protein names, whereas the second learns the environment in which a protein name appears. The third SVM is trained on both protein names and their environments. The output of these three SVMs and a score from a protein dictionary are fed into the fourth SVM, which provides as output the protein whose name was identified in the text. A dictionary of protein names was generated from SwissProt and TrEMBL, whereas the MerriamWebster Dictionary was used as a source of common words. Other terms were added to the dictionary, such as medical terms, species names, and tissue

SVM Resources on the WEB

385

types. The system has a 75% accuracy, and in a test on recent abstracts from Cell and EMBO Journal, NLProt reached 70% accuracy. An SVM approach to name recognition in text was used by Shi and Campagne to develop a protein dictionary.180 A database of 80,528 full text articles from Journal of Biological Chemistry, EMBO Journal, and Proceedings of the National Academy of Sciences were used as input to the SVM system. A dictionary of 59,990 protein names was produced. Three support vector machines were trained to discriminate among protein names and cell names, process names, and interaction keywords, respectively. The processing time is half a second for a new full-text paper. The method can recognize name variants not found in SwissProt. Using PubMed abstracts, the PreBIND system can identify protein–protein interactions with an SVM system.181 The protein–protein interactions identified by the automated PreBIND system are then combined and scrutinized manually to produce the BIND database (http://bind.ca). Based on a L10%O cross-validation of a dataset of 1094 abstracts, the SVM approach had a precision and recall of 92%, whereas a naı¨ve Bayes classifier had a precision and recall of 87%. Bio-medical terms can be recognized and annotated with SVM-based automated systems, as shown by Takeuchi and Collier.182 The training was performed with 100 Medline abstracts where bio-medical terms were marked-up manually in XML by an expert. The SVM system recognized approximately 3400 terms and showed good prediction capability for each class of terms (proteins, DNA, RNA, source, etc.). Bunescu et al. compared the ability of several machine learning systems to extract information regarding protein names and their interactions from Medline abstracts.183 The text recognition systems compared are dictionary based, the rule learning system Rapier, boosted wrapper induction, SVM, maximum entropy, k-nearest neighbors, and two systems for protein name identification, KEX and Abgene. Based on the F-measure (harmonic mean of precision and recall) in L10%O cross-validation, the best systems for protein name recognition are the maximum entropy with dictionary (F ¼ 57:86%) followed by SVM with dictionary (F ¼ 54:42%).

SVM RESOURCES ON THE WEB The Internet is a vast source of information on support vector machines. The interested reader can find tutorials, reviews, theoretical and application papers, as well as a wide range of SVM software. In this section, we present several starting points for retrieving relevant SVM information from the Web. http://support-vector-machines.org/. This Web portal is dedicated to support vector machines and their applications. It provides exhaustive lists of books, tutorials, publications (with special sections for applications in cheminformatics, bioinformatics, and computational biology), software for various

386

Applications of Support Vector Machines in Chemistry

platforms, and links to datasets that can be used for SVM classification and regression. Very useful are the links to open-access SVM papers. The site offers also a list of SVM-related conferences. http://www.kernel-machines.org/. This portal contains links to websites related to kernel methods. Included are tutorials, publications, books, software, datasets used to compare algorithms, and conference announcements. A list of major scientists in kernel methods is also available from this site. http://www.support-vector.net/. This website is a companion to the book An Introduction to Support Vector Machines by Cristianini and Shawe-Taylor,14 and it has a useful list of SVM software. http://www.kernel-methods.net/. This website is a companion to the book Kernel Methods for Pattern Analysis by Shawe-Taylor and Cristianini.21 The MatLab scripts from the book can be downloaded from this site. A tutorial on kernel methods is also available. http://www.learning-with-kernels.org/. Several chapters on SVM from the book Learning with Kernels by Scho¨lkopf and Smola17 are available from this site. http://www.boosting.org/. This is a portal for boosting and related ensemble learning methods, such as arcing and bagging, with application to model selection and connections to mathematical programming and large margin classifiers. The site provides links to software, papers, datasets, and upcoming events. Journal of Machine Learning Research, http://jmlr.csail.mit.edu/. The Journal of Machine Learning Research is an open-access journal that contains many papers on SVM, including new algorithms and SVM model optimization. All papers can be downloaded and printed for free. In the current context of widespread progress toward an open access to scientific publications, this journal has a remarkable story and is an undisputed success. http://citeseer.ist.psu.edu/burges98tutorial.html. This is an online reprint of Burges’s SVM tutorial ‘‘A Tutorial on Support Vector Machines for Pattern Recognition.’’23 The citeseer repository has many useful SVM manuscripts. PubMed, http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db¼PubMed. This is a comprehensive database of abstracts for chemistry, biochemistry, biology, and medicine-related literature. PubMed is a free service of the National Library of Medicine and is a great place to start your search for SVM-related papers. PubMed has direct links for many online journals, which are particularly useful for open-access journals, such as Bioinformatics or Nucleic Acids Research. All SVM applications in cheminformatics from major journals are indexed here, but unfortunately, the relevant chemistry journals are not open access. On the other hand, PubMed is the main hub for open access to important SVM applications, such as gene arrays, proteomics, or toxicogenomics.

SVM Software

387

PubMed Central, http://www.pubmedcentral.nih.gov/. PubMed Central (PMC) is the U.S. National Institutes of Health (NIH) free digital archive of biomedical and life sciences journal literature. It represents the main public repository for journals that publish open-access papers. The site contains information regarding the NIH initiative for open-access publication of NIH-funded research. Numerous papers can be found on SVM applications in bioinformatics and computational biology.

SVM SOFTWARE Fortunately, scientists interested in SVM applications in cheminformatics and computational chemistry can choose from a wide variety of free software, available for download from the Internet. The selection criteria for a useful package are problem type (classification or regression); platform (Windows, Linux/ UNIX, Java, MATLAB, R); available kernels (the more the better); flexibility in adding new kernels; possibility to perform cross-validation or descriptor selection. Collected here is relevant information for the most popular SVM packages. All are free for nonprofit use, but they come with little or no support. On the other hand, they are straightforward to use, are accompanied by extensive documentation, and almost all are available as source code. For users wanting to avoid compilation-related problems, many packages are available as Windows binaries. A popular option is the use of SVM scripts in computing environments such as MATLAB, R, Scilab, Torch, YaLE, or Weka (the last five are free). For small problems, the Gist server is a viable option. The list of SVM software presented below is ordered in an approximate decreasing frequency of use. SVMlight, http://svmlight.joachims.org/. SVMlight, by Joachims,184 is one of the most widely used SVM classification and regression packages. It has a fast optimization algorithm, can be applied to very large datasets, and has a very efficient implementation of the leave–one–out cross-validation. It is distributed as Cþþ source and binaries for Linux, Windows, Cygwin, and Solaris. Kernels available include polynomial, radial basis function, and neural (tanh). SVMstruct, http://svmlight.joachims.org/svm_struct.html. SVMstruct, by Joachims, is an SVM implementation that can model complex (multivariate) output data y, such as trees, sequences, or sets. These complex output SVM models can be applied to natural language parsing, sequence alignment in protein homology detection, and Markov models for part-of-speech tagging. Several implementations exist: SVMmulticlass, for multiclass classification; SVMcfg, which learns a weighted context free grammar from examples; SVMalign, which learns to align protein sequences from training alignments; and SVMhmm, which learns a Markov model from examples. These modules have straightforward applications in bioinformatics, but one can imagine

388

Applications of Support Vector Machines in Chemistry

significant implementations for cheminformatics, especially when the chemical structure is represented as trees or sequences. mySVM, http://www-ai.cs.uni-dortmund.de/SOFTWARE/MYSVM/ index.html. mySVM, by Ru¨ping, is a Cþþ implementation of SVM classification and regression. It is available as Cþþ source code and Windows binaries. Kernels available include linear, polynomial, radial basis function, neural (tanh), and anova. All SVM models presented in this chapter were computed with mySVM. JmySVM. A Java version of mySVM is part of the YaLE (Yet Another Learning Environment, http://www-ai.cs.uni-dortmund.de/SOFTWARE/YALE/ index.html) learning environment under the name JmySVM mySVM/db, http://www-ai.cs.uni-dortmund.de/SOFTWARE/MYSVM DB/index.html. mySVM/db is an efficient extension of mySVM, which is designed to run directly inside a relational database using an internal JAVA engine. It was tested with an Oracle database, but with small modifications, it should also run on any database offering a JDBC interface. It is especially useful for large datasets available as relational databases. LIBSVM, http://www.csie.ntu.edu.tw/ cjlin/libsvm/. LIBSVM (Library for Support Vector Machines) was developed by Chang and Lin and contains C-classification, n-classification, e-regression, and n-regression. Developed in Cþþ and Java, it also supports multiclass classification, weighted SVMs for unbalanced data, cross-validation, and automatic model selection. It has interfaces for Python, R, Splus, MATLAB, Perl, Ruby, and LabVIEW. Kernels available include linear, polynomial, radial basis function, and neural (tanh). looms, http://www.csie.ntu.edu.tw/ cjlin/looms/. looms, by Lee and Lin, is a very efficient leave–one–out model selection for SVM two-class classification. Although LOO cross-validation is usually too time consuming to be performed for large datasets, looms implements numerical procedures that make LOO accessible. Given a range of parameters, looms automatically returns the parameter and model with the best LOO statistics. It is available as C source code and Windows binaries. BSVM, http://www.csie.ntu.edu.tw/ cjlin/bsvm/. BSVM, authored by Hsu and Lin, provides two implementations of multiclass classification, together with SVM regression. It is available as source code for UNIX/Linux and as binaries for Windows. OSU SVM Classifier Matlab Toolbox, http://www.ece.osu.edu/ maj/ osu_svm/. This MATLAB toolbox is based on LIBSVM. SVMTorch, http://www.idiap.ch/learning/SVMTorch.html. SVMTorch, by Collobert and Bengio,185 is part of the Torch machine learning library (http://www.torch.ch/) and implements SVM classification and regression. It is distributed as Cþþ source code or binaries for Linux and Solaris. Weka, http://www.cs.waikato.ac.nz/ml/weka/. Weka is a collection of machine learning algorithms for datamining tasks. The algorithms can either

SVM Software

389

be applied directly to a dataset or called from a Java code. It contains an SVM implementation. SVM in R, http://cran.r-project.org/src/contrib/Descriptions/e1071.html. This SVM implementation in R (http://www.r-project.org/) contains Cclassification, n-classification, e-regression, and n-regression. Kernels available include linear, polynomial, radial basis, and neural (tanh). M-SVM, http://www.loria.fr/ guermeur/. This is a multi-class SVM implementation in C by Guermeur.52,53 Gist, http://microarray.cpmc.columbia.edu/gist/. Gist is a C implementation of support vector machine classification and kernel principal components analysis. The SVM part of Gist is available as an interactive Web server at http://svm.sdsc.edu. It is a very convenient server for users who want to experiment with small datasets (hundreds of patterns). Kernels available include linear, polynomial, and radial. MATLAB SVM Toolbox, http://www.isis.ecs.soton.ac.uk/resources/ svminfo/. This SVM toolbox, by Gunn, implements SVM classification and regression with various kernels, including linear, polynomial, Gaussian radial basis function, exponential radial basis function, neural (tanh), Fourier series, spline, and B spline. All figures from this chapter presenting SVM models for various datasets were prepared with a slightly modified version of this MATLAB toolbox. TinySVM, http://chasen.org/ taku/software/TinySVM/. TinySVM is a Cþþ implementation of C-classification and C-regression that uses sparse vector representation. It can handle several thousand training examples and feature dimensions. TinySVM is distributed as binary/source for Linux and binary for Windows. SmartLab, http://www.smartlab.dibe.unige.it/. SmartLab provides several support vector machines implementations, including cSVM, a Windows and Linux implementation of two-class classification; mcSVM, a Windows and Linux implementation of multiclass classification; rSVM, a Windows and Linux implementation of regression; and javaSVM1 and javaSVM2, which are Java applets for SVM classification. Gini-SVM, http://bach.ece.jhu.edu/svm/ginisvm/. Gini-SVM, by Chakrabartty and Cauwenberghs, is a multiclass probability regression engine that generates conditional probability distributions as a solution. It is available as source code. GPDT, http://dm.unife.it/gpdt/. GPDT, by Serafini, et al. , is a Cþþ implementation for large-scale SVM classification in both scalar and distributed memory parallel environments. It is available as Cþþ source code and Windows binaries. HeroSvm, http://www.cenparmi.concordia.ca/ people/jdong/HeroSvm. html. HeroSvm, by Dong, is developed in Cþþ, implements SVM classification, and is distributed as a dynamic link library for Windows. Kernels available include linear, polynomial, and radial basis function.

390

Applications of Support Vector Machines in Chemistry

Spider, http://www.kyb.tuebingen.mpg.de/bs/people/spider/. Spider is an object-orientated environment for machine learning in MATLAB. It performs unsupervised, supervised, or semi-supervised machine learning problems and includes training, testing, model selection, cross-validation, and statistical tests. Spider implements SVM multiclass classification and regression. Java applets, http://svm.dcs.rhbnc.ac.uk/. These SVM classification and regression Java applets were developed by members of Royal Holloway, University of London, and the AT&T Speech and Image Processing Services Research Laboratory. SVM classification is available from http://svm.dcs.rhbnc.ac.uk/pagesnew/ GPat.shtml. SVM regression is available at http://svm.dcs.rhbnc.ac.uk/pagesnew/ 1D-Reg.shtml. LEARNSC, http://www.support-vector.ws/html/downloads.html. This site contains MATLAB scripts for the book Learning and Soft Computing by Kecman.16 LEARNSC implements SVM classification and regression. Tree Kernels, http://ai-nlp.info.uniroma2.it/moschitti/Tree-Kernel.htm. Tree Kernels, by Moschitti, is an extension of SVMlight, and was obtained by encoding tree kernels. It is available as binaries for Windows, Linux, Mac-OSx, and Solaris. Tree kernels are suitable for encoding chemical structures, and thus this package brings significant capabilities for cheminformatics applications. LS-SVMlab, http://www.esat.kuleuven.ac.be/sista/lssvmlab/. LS-SVMlab, by Suykens, is a MATLAB implementation of least-squares support vector machines (LS–SVMs), a reformulation of the standard SVM that leads to solving linear KKT systems. LS–SVM primal–dual formulations have been formulated for kernel PCA, kernel CCA, and kernel PLS, thereby extending the class of primal– dual kernel machines. Links between kernel versions of classic pattern recognition algorithms such as kernel Fisher discriminant analysis and extensions to unsupervised learning, recurrent networks, and control are available. MATLAB SVM Toolbox, http://www.igi.tugraz.at/aschwaig/ software.html. This is a MATLAB SVM classification implementation that can handle 1-norm and 2-norm SVM (linear or quadratic loss function) problems. SVM/LOO, http://bach.ece.jhu.edu/pub/gert/svm/incremental/. SVM/ LOO, by Cauwenberghs, has a very efficient MATLAB implementation of the leave–one–out cross-validation. SVMsequel, http://www.isi.edu/ hdaume/SVMsequel/. SVMsequel, by Daume III, is an SVM multiclass classification package, distributed as C source or as binaries for Linux or Solaris. Kernels available include linear, polynomial, radial basis function, sigmoid, string, tree, and information diffusion on discrete manifolds. LSVM, http://www.cs.wisc.edu/dmi/lsvm/. LSVM (Lagrangian Support Vector Machine) is a very fast SVM implementation in MATLAB by Mangasarian and Musicant. It can classify datasets containing several million patterns. ASVM, http://www.cs.wisc.edu/dmi/asvm/. ASVM (Active Support Vector Machine) is a very fast linear SVM script for MATLAB, by Musicant and Mangasarian, developed for large datasets.

Conclusions

391

PSVM, http://www.cs.wisc.edu/dmi/svm/psvm/. PSVM (Proximal Support Vector Machine) is a MATLAB script by Fung and Mangasarian that classifies patterns by assigning them to the closest of two parallel planes. SimpleSVM Toolbox, http://asi.insa-rouen.fr/ gloosli/simpleSVM.html. SimpleSVM Toolbox is a MATLAB implementation of the SimpleSVM algorithm. SVM Toolbox, http://asi.insa-rouen.fr/%7Earakotom/toolbox/index. This fairly complex MATLAB toolbox contains many algorithms, including classification using linear and quadratic penalization, multiclass classification, e-regression, n-regression, wavelet kernel, and SVM feature selection. MATLAB SVM Toolbox, http://theoval.sys.uea.ac.uk/ gcc/svm/ toolbox/. Developed by Cawley, this software has standard SVM features, together with multiclass classification and leave–one–out cross-validation. R-SVM, http://www.biostat.harvard.edu/ xzhang/R-SVM/R-SVM.html. R-SVM, by Zhang and Wong, is based on SVMTorch and is designed especially for the classification of microarray gene expression data. R-SVM uses SVM for classification and for selecting a subset of relevant genes according to their relative contribution in the classification. This process is done recursively in such a way that a series of gene subsets and classification models can be obtained in a recursive manner, at different levels of gene selection. The performance of the classification can be evaluated either on an independent test dataset or by cross validation on the same dataset. R-SVM is distributed as Linux binary. JSVM, http://www-cad.eecs.berkeley.edu/ hwawen/research/projects/ jsvm/doc/manual/index.html. JSVM is a Java wrapper for SVMlight. SvmFu, http://five-percent-nation.mit.edu/SvmFu/. SvmFu, by Rifkin, is a Cþþ package for SVM classification. Kernels available include linear, polynomial, and Gaussian radial basis function.

CONCLUSIONS Kernel learning algorithms have received considerable attention in data modeling and prediction because kernels can straightforwardly perform a nonlinear mapping of the data into a high-dimensional feature space. As a consequence, linear models can be transformed easily into nonlinear algorithms that in turn can explore complex relationships between input data and predicted property. Kernel algorithms have applications in classification, clustering, and regression. From the diversity of kernel methods (support vector machines, Gaussian processes, kernel recursive least squares, kernel principal component analysis, kernel perceptron learning, relevance vector machines, kernel Fisher discriminants, Bayes point machines, and kernel Gram-Schmidt), only SVM was readily adopted for QSAR and cheminformatics applications. Support vector machines represent the most important development in chemometrics after (chronologically) partial least-squares and artificial neural networks. We have presented numerous SAR and QSAR examples in this chapter that demonstrate the SVM capabilities for both classification and

392

Applications of Support Vector Machines in Chemistry

regression. These examples showed that the nonlinear features of SVM should be used with caution, because this added flexibility in modeling the data brings with it the danger of overfitting. The literature results reviewed here show that support vector machines already have numerous applications in computational chemistry and cheminformatics. Future developments are expected to improve the performance of SVM regression and to explore the SVM use in jury ensembles as an effective way to increase their prediction power.

REFERENCES 1. V. Vapnik and A. Lerner, Automat. Remote Contr., 24, 774–780 (1963). Pattern Recognition Using Generalized Portrait Method. 2. V. Vapnik and A. Chervonenkis, Theory of Pattern Recognition, Nauka, Moscow, Russia, 1974. 3. V. Vapnik, Estimation of Dependencies Based on Empirical Data, Nauka, Moscow, Russia, 1979. 4. V. Vapnik, The Nature of Statistical Learning Theory, Springer, New York, 1995. 5. V. Vapnik, Statistical Learning Theory, Wiley-Interscience, New York, 1998. 6. C. Cortes and V. Vapnik, Mach. Learn., 20, 273–297 (1995). Support-Vector Networks. 7. B. Scho¨lkopf, K. K. Sung, C. J. C. Burges, F. Girosi, P. Niyogi, T. Poggio, and V. Vapnik, IEEE Trans. Signal Process., 45, 2758–2765 (1997). Comparing Support Vector Machines with Gaussian Kernels to Radial Basis Function Classifiers. 8. O. Chapelle, P. Haffner, and V. N. Vapnik, IEEE Trans. Neural Netw., 10, 1055–1064 (1999). Support Vector Machines for Histogram-based Image Classification. 9. H. Drucker, D. H. Wu, and V. N. Vapnik, IEEE Trans. Neural Netw., 10, 1048–1054 (1999). Support Vector Machines for Spam Categorization. 10. V. N. Vapnik, IEEE Trans. Neural Netw., 10, 988–999 (1999). An Overview of Statistical Learning Theory. 11. V. Vapnik and O. Chapelle, Neural Comput., 12, 2013–2036 (2000). Bounds on Error Expectation for Support Vector Machines. 12. I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, Mach. Learn., 46, 389–422 (2002). Gene Selection for Cancer Classification Using Support Vector Machines. 13. B. Scho¨lkopf, C. J. C. Burges, and A. J. Smola, Advances in Kernel Methods: Support Vector Learning, MIT Press, Cambridge, Massachusetts, 1999. 14. N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines, Cambridge University Press, Cambridge, United Kingdom, 2000. 15. A. J. Smola, P. Bartlett, B. Scho¨lkopf, and D. Schuurmans, Advances in Large Margin Classifiers, MIT Press, Cambridge, Massachusetts, 2000. 16. V. Kecman, Learning and Soft Computing, MIT Press, Cambridge, Massachusetts, 2001. 17. B. Scho¨lkopf and A. J. Smola, Learning with Kernels, MIT Press, Cambridge, Massachusetts, 2002. 18. T. Joachims, Learning to Classify Text Using Support Vector Machines: Methods, Theory, and Algorithms, Kluwer, Norwell, Massachusetts, 2002. 19. R. Herbrich, Learning Kernel Classifiers, MIT Press, Cambridge, Massachusetts, 2002. 20. J. A. K. Suykens, T. Van Gestel, J. De Brabanter, B. De Moor, and J. Vandewalle, Least Squares Support Vector Machines, World Scientific, Singapore, 2002. 21. J. Shawe-Taylor and N. Cristianini, Kernel Methods for Pattern Analysis, Cambridge University Press, Cambridge, United Kingdom, 2004. 22. A. J. Smola and B. Scho¨lkopf, Algorithmica, 22, 211–231 (1998). On a Kernel-based Method for Pattern Recognition, Regression, Approximation, and Operator Inversion.

References

393

23. C. J. C. Burges, Data Min. Knowl. Discov., 2, 121–167 (1998). A Tutorial on Support Vector Machines for Pattern Recognition. 24. B. Scho¨lkopf, S. Mika, C. J. C. Burges, P. Knirsch, K.-R. Mu¨ller, G. Ra¨tsch, and A. J. Smola, IEEE Trans. Neural Netw., 10, 1000–1017 (1999). Input Space Versus Feature Space in Kernel-based Methods. 25. J. A. K. Suykens, Eur. J. Control, 7, 311–327 (2001). Support Vector Machines: A Nonlinear Modelling and Control Perspective. 26. K.-R. Mu¨ller, S. Mika, G. Ra¨tsch, K. Tsuda, and B. Scho¨lkopf, IEEE Trans. Neural Netw., 12, 181–201 (2001). An Introduction to Kernel-based Learning Algorithms. 27. C. Campbell, Neurocomputing, 48, 63–84 (2002). Kernel Methods: A Survey of Current Techniques. 28. B. Scho¨lkopf and A. J. Smola, in Advanced Lectures on Machine Learning, Vol. 2600, Springer, New York, 2002, pp. 41–64. A Short Introduction to Learning with Kernels. 29. V. D. Sanchez, Neurocomputing, 55, 5–20 (2003). Advanced Support Vector Machines and Kernel Methods. 30. A. J. Smola and B. Scho¨lkopf, Stat. Comput., 14, 199–222 (2004). A Tutorial on Support Vector Regression. 31. A. Kurup, R. Garg, D. J. Carini, and C. Hansch, Chem. Rev., 101, 2727–2750 (2001). Comparative QSAR: Angiotensin II Antagonists. 32. K. Varmuza, in Handbook of Chemoinformatics, J. Gasteiger, Ed., Vol. 3, Wiley-VCH, Weinheim, Germany, 2003, pp. 1098–1133. Multivariate Data Analysis in Chemistry. 33. O. Ivanciuc, in Handbook of Chemoinformatics, J. Gasteiger, Ed., Vol. 1, Wiley-VCH, Weinheim, Germany, 2003, pp. 103–138. Graph Theory in Chemistry. 34. O. Ivanciuc, in Handbook of Chemoinformatics, J. Gasteiger, Ed., Vol. 3, Wiley-VCH, Weinheim, Germany, 2003, pp. 981–1003. Topological Indices. 35. R. Todeschini and V. Consonni, in Handbook of Chemoinformatics, J. Gasteiger, Ed., Vol. 3, Wiley-VCH, Weinheim, Germany, 2003, pp. 1004–1033. Descriptors from Molecular Geometry. 36. P. Jurs, in Handbook of Chemoinformatics, J. Gasteiger, Ed., Vol. 3, Wiley-VCH, Weinheim, Germany, 2003, pp. 1314–1335. Quantitative Structure-Property Relationships. 37. L. Eriksson, H. Antti, E. Holmes, E. Johansson, T. Lundstedt, J. Shockcor, and S. Wold, in Handbook of Chemoinformatics, J. Gasteiger, Ed., Vol. 3, Wiley-VCH, Weinheim, Germany, 2003, pp. 1134–1166. Partial Least Squares (PLS) in Cheminformatics. 38. J. Zupan, in Handbook of Chemoinformatics, J. Gasteiger, Ed., Vol. 3, Wiley-VCH, Weinheim, Germany, 2003, pp. 1167–1215. Neural Networks. 39. A. von Homeyer, in Handbook of Chemoinformatics, J. Gasteiger, Ed., Vol. 3, Wiley-VCH, Weinheim, Germany, 2003, pp. 1239–1280. Evolutionary Algorithms and Their Applications in Chemistry. 40. R. Fletcher, Practical Methods of Optimization, 2 ed., John Wiley and Sons, New York, 1987. 41. J. Platt, in Advances in Kernel Methods - Support Vector Learning, B. Scho¨lkopf, C. J. C. Burges, and A. J. Smola, Eds., MIT Press, Cambridge, Massachusetts, 1999, pp. 185–208. Fast Training of Support Vector Machines Using Sequential Minimal Optimization. 42. J. Mercer, Phil. Trans. Roy. Soc. London A, 209, 415–446 (1909). Functions of Positive and Negative Type and Their Connection with the Theory of Integral Equations. 43. B. Scho¨lkopf, A. J. Smola, R. C. Williamson, and P. L. Bartlett, Neural Comput., 12, 1207–1245 (2000). New Support Vector Algorithms. 44. C. C. Chang and C. J. Lin, Neural Comput., 13, 2119–2147 (2001). Training n-Support Vector Classifiers: Theory and Algorithms. 45. C. C. Chang and C. J. Lin, Neural Comput., 14, 1959–1977 (2002). Training n-Support Vector Regression: Theory and Algorithms.

394

Applications of Support Vector Machines in Chemistry

46. I. Steinwart, IEEE Trans. Pattern Anal. Mach. Intell., 25, 1274–1284 (2003). On the Optimal Parameter Choice for n-Support Vector Machines. 47. P. H. Chen, C. J. Lin, and B. Scho¨lkopf, Appl. Stoch. Models. Bus. Ind., 21, 111–136 (2005). A Tutorial on n-Support Vector Machines. 48. R. Debnath, N. Takahide, and H. Takahashi, Pattern Anal. Appl., 7, 164–175 (2004). A Decision-based One-against-one Method for Multi-class Support Vector Machine. 49. C. W. Hsu and C. J. Lin, IEEE Trans. Neural Netw., 13, 415–425 (2002). A Comparison of Methods for Multiclass Support Vector Machines. 50. R. Rifkin and A. Klautau, J. Mach. Learn. Res., 5, 101–141 (2004). In Defense of One-vs-all Classification. 51. C. Angulo, X. Parra, and A. Catala`, Neurocomputing, 55, 57–77 (2003). K-SVCR. A Support Vector Machine for Multi-class Classification. 52. Y. Guermeur, Pattern Anal. Appl., 5, 168–179 (2002). Combining Discriminant Models with New Multi-class SVMs. 53. Y. Guermeur, G. Pollastri, A. Elisseeff, D. Zelus, H. Paugam-Moisy, and P. Baldi, Neurocomputing, 56, 305–327 (2004). Combining Protein Secondary Structure Prediction Models with Ensemble Methods of Optimal Complexity. 54. A. Statnikov, C. F. Aliferis, I. Tsamardinos, D. Hardin, and S. Levy, Bioinformatics, 21, 631–643 (2005). A Comprehensive Evaluation of Multicategory Classification Methods for Microarray Gene Expression Cancer Diagnosis. 55. T. Li, C. L. Zhang, and M. Ogihara, Bioinformatics, 20, 2429–2437 (2004). A Comparative Study of Feature Selection and Multiclass Classification Methods for Tissue Classification Based on Gene Expression. 56. Y. Lee and C. K. Lee, Bioinformatics, 19, 1132–1139 (2003). Classification of Multiple Cancer Types by Tip Multicategory Support Vector Machines Using Gene Expression Data. 57. S. H. Peng, Q. H. Xu, X. B. Ling, X. N. Peng, W. Du, and L. B. Chen, FEBS Lett., 555, 358–362 (2003). Molecular Classification of Cancer Types from Microarray Data Using the Combination of Genetic Algorithms and Support Vector Machines. 58. S. Ramaswamy, P. Tamayo, R. Rifkin, S. Mukherjee, C. H. Yeang, M. Angelo, C. Ladd, M. Reich, E. Latulippe, J. P. Mesirov, T. Poggio, W. Gerald, M. Loda, E. S. Lander, and T. R. Golub, Proc. Natl. Acad. Sci. U. S. A., 98, 15149–15154 (2001). Multiclass Cancer Diagnosis Using Tumor Gene Expression Signatures. 59. O. L. Mangasarian and D. R. Musicant, IEEE Trans. Pattern Analysis Mach. Intell., 22, 950–955 (2000). Robust Linear and Support Vector Regression. 60. O. L. Mangasarian and D. R. Musicant, Mach. Learn., 46, 255–269 (2002). Large Scale Kernel Regression via Linear Programming. 61. J. B. Gao, S. R. Gunn, and C. J. Harris, Neurocomputing, 55, 151–167 (2003). SVM Regression Through Variational Methods and its Sequential Implementation. 62. J. B. Gao, S. R. Gunn, and C. J. Harris, Neurocomputing, 50, 391–405 (2003). Mean Field Method for the Support Vector Machine Regression. 63. W. P. Walters and B. B. Goldman, Curr. Opin. Drug Discov. Dev., 8, 329–333 (2005). Feature Selection in Quantitative Structure-Activity Relationships. 64. D. J. Livingstone and D. W. Salt, in Reviews in Computational Chemistry, K. B. Lipkowitz, R. Larter, and T. R. Cundari, Eds., Vol. 21, Wiley-VCH, New York, 2005, pp. 287–348. Variable Selection - Spoilt for Choice? 65. J. Bi, K. P. Bennett, M. Embrechts, C. M. Breneman, and M. Song, J. Mach. Learn. Res., 3, 1229–1243 (2003). Dimensionality Reduction via Sparse Support Vector Machines. 66. L. Cao, C. K. Seng, Q. Gu, and H. P. Lee, Neural Comput. Appl., 11, 244–249 (2003). Saliency Analysis of Support Vector Machines for Gene Selection in Tissue Classification. 67. G. M. Fung and O. L. Mangasarian, Comput. Optim. Appl., 28, 185–202 (2004). A Feature Selection Newton Method for Support Vector Machine Classification.

References

395

68. R. Kumar, A. Kulkarni, V. K. Jayaraman, and B. D. Kulkarni, Internet Electron. J. Mol. Des., 3, 118–133 (2004). Structure–Activity Relationships Using Locally Linear Embedding Assisted by Support Vector and Lazy Learning Regressors. 69. Y. Xue, Z. R. Li, C. W. Yap, L. Z. Sun, X. Chen, and Y. Z. Chen, J. Chem. Inf. Comput. Sci., 44, 1630–1638 (2004). Effect of Molecular Descriptor Feature Selection in Support Vector Machine Classification of Pharmacokinetic and Toxicological Properties of Chemical Agents. 70. H. Fro¨hlich, J. K. Wegner, and A. Zell, QSAR Comb. Sci., 23, 311–318 (2004). Towards Optimal Descriptor Subset Selection with Support Vector Machines in Classification and Regression. 71. Y. Liu, J. Chem. Inf. Comput. Sci., 44, 1823–1828 (2004). A Comparative Study on Feature Selection Methods for Drug Discovery. 72. E. Byvatov and G. Schneider, J. Chem. Inf. Comput. Sci., 44, 993–999 (2004). SVM-based Feature Selection for Characterization of Focused Compound Collections. 73. S. Nandi, Y. Badhe, J. Lonari, U. Sridevi, B. S. Rao, S. S. Tambe, and B. D. Kulkarni, Chem. Eng. J., 97, 115–129 (2004). Hybrid Process Modeling and Optimization Strategies Integrating Neural Networks/Support Vector Regression and Genetic Algorithms: Study of Benzene Isopropylation on Hbeta Catalyst. 74. Y. Wang, I. V. Tetko, M. A. Hall, E. Frank, A. Facius, K. F. X. Mayer, and H. W. Mewes, Comput. Biol. Chem., 29, 37–46 (2005). Gene Selection from Microarray Data for Cancer Classification - A Machine Learning Approach. 75. N. Pochet, F. De Smet, J. A. K. Suykens, and B. L. R. De Moor, Bioinformatics, 20, 3185–3195 (2004). Systematic Benchmarking of Microarray Data Classification: Assessing the Role of Non-linearity and Dimensionality Reduction. 76. G. Natsoulis, L. El Ghaoui, G. R. G. Lanckriet, A. M. Tolley, F. Leroy, S. Dunlea, B. P. Eynon, C. I. Pearson, S. Tugendreich, and K. Jarnagin, Genome Res., 15, 724–736 (2005). Classification of a Large Microarray Data Set: Algorithm Comparison and Analysis of Drug Signatures. 77. X. Zhou and K. Z. Mao, Bioinformatics, 21, 1559–1564 (2005). LS Bound Based Gene Selection for DNA Microarray Data. 78. A. K. Jerebko, J. D. Malley, M. Franaszek, and R. M. Summers, Acad. Radiol., 12, 479–486 (2005). Support Vector Machines Committee Classification Method for Computer-aided Polyp Detection in CT Colonography. 79. K. Faceli, A. de Carvalho, and W. A. Silva, Genet. Mol. Biol., 27, 651–657 (2004). Evaluation of Gene Selection Metrics for Tumor Cell Classification. 80. L. B. Li, W. Jiang, X. Li, K. L. Moser, Z. Guo, L. Du, Q. J. Wang, E. J. Topol, Q. Wang, and S. Rao, Genomics, 85, 16–23 (2005). A Robust Hybrid between Genetic Algorithm and Support Vector Machine for Extracting an Optimal Feature Gene Subset. 81. C. A. Tsai, C. H. Chen, T. C. Lee, I. C. Ho, U. C. Yang, and J. J. Chen, DNA Cell Biol., 23, 607–614 (2004). Gene Selection for Sample Classifications in Microarray Experiments. 82. T. Downs, K. E. Gates, and A. Masters, J. Mach. Learn. Res., 2, 293–297 (2001). Exact Simplification of Support Vector Solutions. 83. Y. Q. Zhan and D. G. Shen, Pattern Recognit., 38, 157–161 (2005). Design Efficient Support Vector Machine for Fast Classification. 84. C. Merkwirth, H. A. Mauser, T. Schulz-Gasch, O. Roche, M. Stahl, and T. Lengauer, J. Chem. Inf. Comput. Sci., 44, 1971–1978 (2004). Ensemble Methods for Classification in Cheminformatics. 85. H. Briem and J. Gu¨nther, ChemBioChem, 6, 558–566 (2005). Classifying "Kinase Inhibitorlikeness" by Using Machine-learning Methods. 86. C. W. Yap and Y. Z. Chen, J. Chem Inf. Model., 45, 982–992 (2005). Prediction of Cytochrome P450 3A4, 2D6, and 2C9 Inhibitors and Substrates by Using Support Vector Machines. 87. G. Valentini, M. Muselli, and F. Ruffino, Neurocomputing, 56, 461–466 (2004). Cancer Recognition with Bagged Ensembles of Support Vector Machines. 88. H. Saigo, J.-P. Vert, N. Ueda, and T. Akutsu, Bioinformatics, 20, 1682–1689 (2004). Protein Homology Detection Using String Alignment Kernels.

396

Applications of Support Vector Machines in Chemistry

89. C. S. Leslie, E. Eskin, A. Cohen, J. Weston, and W. S. Noble, Bioinformatics, 20, 467–476 (2004). Mismatch String Kernels for Discriminative Protein Classification. 90. J.-P. Vert, Bioinformatics, 18, S276–S284 (2002). A Tree Kernel to Analyse Phylogenetic Profiles. 91. Z. R. Yang and K. C. Chou, Bioinformatics, 20, 735–741 (2004). Bio-support Vector Machines for Computational Proteomics. 92. M. Wang, J. Yang, and K. C. Chou, Amino Acids, 28, 395–402 (2005). Using String Kernel to Predict Peptide Cleavage Site Based on Subsite Coupling Model. 93. R. Teramoto, M. Aoki, T. Kimura, and M. Kanaoka, FEBS Lett., 579, 2878–2882 (2005). Prediction of siRNA Functionality Using Generalized String Kernel and Support Vector Machine. 94. C. Leslie and R. Kuang, J. Mach. Learn. Res., 5, 1435–1455 (2004). Fast String Kernels Using Inexact Matching for Protein Sequences. 95. K. Tsuda and W. S. Noble, Bioinformatics, 20, i326–i333 (2004). Learning Kernels from Biological Networks by Maximizing Entropy. 96. A. Micheli, F. Portera, and A. Sperduti, Neurocomputing, 64, 73–92 (2005). A Preliminary Empirical Comparison of Recursive Neural Networks and Tree Kernel Methods on Regression Tasks for Tree Structured Domains. 97. P. Mahe´, N. Ueda, T. Akutsu, J.-L. Perret, and J.-P. Vert, J. Chem Inf. Model., 45, 939–951 (2005). Graph Kernels for Molecular Structure-Activity Relationship Analysis with Support Vector Machines. 98. B. J. Jain, P. Geibel, and F. Wysotzki, Neurocomputing, 64, 93–105 (2005). SVM Learning with the Schur-Hadamard Inner Product for Graphs. 99. P. Lind and T. Maltseva, J. Chem. Inf. Comput. Sci., 43, 1855–1859 (2003). Support Vector Machines for the Estimation of Aqueous Solubility. 100. B. Hammer and K. Gersmann, Neural Process. Lett., 17, 43–53 (2003). A Note on the Universal Approximation Capability of Support Vector Machines. 101. J. P. Wang, Q. S. Chen, and Y. Chen, in Advances in Neural Networks, F. Yin, J. Wang, and C. Guo, Eds., Vol. 3173, Springer, New York, 2004, pp. 512–517. RBF Kernel Based Support Vector Machine with Universal Approximation and its Application. 102. T. B. Thompson, K. C. Chou, and C. Zheng, J. Theor. Biol., 177, 369–379 (1995). Neural Network Prediction of the HIV-1 Protease Cleavage Sites. 103. Z. R. Yang and K. C. Chou, J. Chem. Inf. Comput. Sci., 43, 1748–1753 (2003). Mining Biological Data Using Self-organizing Map. 104. Y. D. Cai, X. J. Liu, X. B. Xu, and K. C. Chou, J. Comput. Chem., 23, 267–274 (2002). Support Vector Machines for Predicting HIV Protease Cleavage Sites in Protein. 105. T. Ro¨gnvaldsson and L. W. You, Bioinformatics, 20, 1702–1709 (2004). Why Neural Networks Should not be Used for HIV-1 Protease Cleavage Site Prediction. 106. E. Urrestarazu Ramos, W. H. J. Vaes, H. J. M. Verhaar, and J. L. M. Hermens, J. Chem. Inf. Comput. Sci., 38, 845–852 (1998). Quantitative Structure-Activity Relationships for the Aquatic Toxicity of Polar and Nonpolar Narcotic Pollutants. 107. S. Ren, Environ. Toxicol., 17, 415–423 (2002). Classifying Class I and Class II Compounds by Hydrophobicity and Hydrogen Bonding Descriptors. 108. S. Ren and T. W. Schultz, Toxicol. Lett., 129, 151–160 (2002). Identifying the Mechanism of Aquatic Toxicity of Selected Compounds by Hydrophobicity and Electrophilicity Descriptors. 109. O. Ivanciuc, Internet Electron. J. Mol. Des., 2, 195–208 (2003). Aquatic Toxicity Prediction for Polar and Nonpolar Narcotic Pollutants with Support Vector Machines. 110. O. Ivanciuc, Internet Electron. J. Mol. Des., 1, 157–172 (2002). Support Vector Machine Identification of the Aquatic Toxicity Mechanism of Organic Compounds. 111. A. P. Bearden and T. W. Schultz, Environ. Toxicol. Chem., 16, 1311–1317 (1997). StructureActivity Relationships for Pimephales and Tetrahymena: A Mechanism of Action Approach.

References

397

112. O. Ivanciuc, Internet Electron. J. Mol. Des., 3, 802–821 (2004). Support Vector Machines Prediction of the Mechanism of Toxic Action from Hydrophobicity and Experimental Toxicity Against Pimephales promelas and Tetrahymena pyriformis. 113. S. Ren, P. D. Frymier, and T. W. Schultz, Ecotox. Environ. Safety, 55, 86–97 (2003). An Exploratory Study of the use of Multivariate Techniques to Determine Mechanisms of Toxic Action. 114. O. Ivanciuc, Internet Electron. J. Mol. Des., 1, 203–218 (2002). Support Vector Machine Classification of the Carcinogenic Activity of Polycyclic Aromatic Hydrocarbons. 115. R. S. Braga, P. M. V. B. Barone, and D. S. Galva˜o, J. Mol. Struct. (THEOCHEM), 464, 257– 266 (1999). Identifying Carcinogenic Activity of Methylated Polycyclic Aromatic Hydrocarbons (PAHs). 116. P. M. V. B. Barone, R. S. Braga, A. Camilo Jr., and D. S. Galva˜o, J. Mol. Struct. (THEOCHEM), 505, 55–66 (2000). Electronic Indices from Semi-empirical Calculations to Identify Carcinogenic Activity of Polycyclic Aromatic Hydrocarbons. 117. R. Vendrame, R. S. Braga, Y. Takahata, and D. S. Galva˜o, J. Mol. Struct. (THEOCHEM), 539, 253–265 (2001). Structure-Carcinogenic Activity Relationship Studies of Polycyclic Aromatic Hydrocarbons (PAHs) with Pattern-Recognition Methods. 118. D. J. G. Marino, P. J. Peruzzo, E. A. Castro, and A. A. Toropov, Internet Electron. J. Mol. Des., 1, 115–133 (2002). QSAR Carcinogenic Study of Methylated Polycyclic Aromatic Hydrocarbons Based on Topological Descriptors Derived from Distance Matrices and Correlation Weights of Local Graph Invariants. 119. M. Chastrette and J. Y. D. Laumer, Eur. J. Med. Chem., 26, 829–833 (1991). Structure Odor Relationships Using Neural Networks. 120. M. Chastrette, C. El Aı¨di, and J. F. Peyraud, Eur. J. Med. Chem., 30, 679–686 (1995). Tetralin, Indan and Nitrobenzene Compound Structure-musk Odor Relationship Using Neural Networks. 121. K. J. Rossiter, Chem. Rev., 96, 3201–3240 (1996). Structure-Odor Relationships. 122. D. Zakarya, M. Chastrette, M. Tollabi, and S. Fkih-Tetouani, Chemometrics Intell. Lab. Syst., 48, 35–46 (1999). Structure-Camphor Odour Relationships using the Generation and Selection of Pertinent Descriptors Approach. 123. R. D. M. C. Amboni, B. S. Junkes, R. A. Yunes, and V. E. F. Heinzen, J. Agric. Food Chem., 48, 3517–3521 (2000). Quantitative Structure-Odor Relationships of Aliphatic Esters Using Topological Indices. 124. G. Buchbauer, C. T. Klein, B. Wailzer, and P. Wolschann, J. Agric. Food Chem., 48, 4273–4278 (2000). Threshold-Based Structure-Activity Relationships of Pyrazines with Bell-Pepper Flavor. 125. B. Wailzer, J. Klocker, G. Buchbauer, G. Ecker, and P. Wolschann, J. Med. Chem., 44, 2805–2813 (2001). Prediction of the Aroma Quality and the Threshold Values of Some Pyrazines Using Artificial Neural Networks. 126. O. Ivanciuc, Internet Electron. J. Mol. Des., 1, 269–284 (2002). Structure–Odor Relationships for Pyrazines with Support Vector Machines. 127. A. O. Aptula, N. G. Jeliazkova, T. W. Schultz, and M. T. D. Cronin, QSAR Comb. Sci., 24, 385–396 (2005). The Better Predictive Model: High q2 for the Training Set or Low Root Mean Square Error of Prediction for the Test Set? 128. O. Ivanciuc, Internet Electron. J. Mol. Des., 4, 928–947 (2005). QSAR for Phenols Toxicity to Tetrahymena pyriformis with Support Vector Regression and Artificial Neural Networks. 129. A. Carotti, C. Altornare, L. Savini, L. Chlasserini, C. Pellerano, M. P. Mascia, E. Maciocco, F. Busonero, M. Mameli, G. Biggio, and E. Sanna, Bioorg. Med. Chem., 11, 5259–5272 (2003). High Affinity Central Benzodiazepine Receptor Ligands. Part 3: Insights into the Pharmacophore and Pattern Recognition Study of Intrinsic Activities of Pyrazolo[4,3-c] quinolin-3-ones.

398

Applications of Support Vector Machines in Chemistry

130. D. Hadjipavlou-Litina, R. Garg, and C. Hansch, Chem. Rev., 104, 3751–3793 (2004). Comparative Quantitative Structure-Activity Relationship Studies (QSAR) on Nonbenzodiazepine Compounds Binding to Benzodiazepine Receptor (BzR). 131. L. Savini, P. Massarelli, C. Nencini, C. Pellerano, G. Biggio, A. Maciocco, G. Tuligi, A. Carrieri, N. Cinone, and A. Carotti, Bioorg. Med. Chem., 6, 389–399 (1998). High Affinity Central Benzodiazepine Receptor Ligands: Synthesis and Structure-Activity Relationship Studies of a New Series of Pyrazolo[4,3-c]quinolin-3-ones. 132. O. Ivanciuc, Internet Electron. J. Mol. Des., 4, 181–193 (2005). Support Vector Regression Quantitative Structure-Activity Relationships (QSAR) for Benzodiazepine Receptor Ligands. 133. T. I. Netzeva, J. C. Dearden, R. Edwards, A. D. P. Worgan, and M. T. D. Cronin, J. Chem. Inf. Comput. Sci., 44, 258–265 (2004). QSAR Analysis of the Toxicity of Aromatic Compounds to Chlorella vulgaris in a Novel Short-term Assay. 134. T. I. Netzeva, J. C. Dearden, R. Edwards, A. D. P. Worgan, and M. T. D. Cronin, Bull. Environ. Contam. Toxicol., 73, 385–391 (2004). Toxicological Evaluation and QSAR Modelling of Aromatic Amines to Chlorella vulgaris. 135. M. T. D. Cronin, T. I. Netzeva, J. C. Dearden, R. Edwards, and A. D. P. Worgan, Chem. Res. Toxicol., 17, 545–554 (2004). Assessment and Modeling of the Toxicity of Organic Chemicals to Chlorella vulgaris: Development of a Novel Database. 136. A. D. P. Worgan, J. C. Dearden, R. Edwards, T. I. Netzeva, and M. T. D. Cronin, QSAR Comb. Sci., 22, 204–209 (2003). Evaluation of a Novel Short-term Algal Toxicity Assay by the Development of QSARs and Inter-species Relationships for Narcotic Chemicals. 137. O. Ivanciuc, Internet Electron. J. Mol. Des., 4, 911–927 (2005). Artificial Neural Networks and Support Vector Regression Quantitative Structure-Activity Relationships (QSAR) for the Toxicity of Aromatic Compounds to Chlorella vulgaris. 138. O. Ivanciuc, Rev. Roum. Chim., 43, 347–354 (1998). Artificial Neural Networks Applications. Part 7 - Estimation of Bioconcentration Factors in Fish Using Solvatochromic Parameters. 139. X. X. Lu, S. Tao, J. Cao, and R. W. Dawson, Chemosphere, 39, 987–999 (1999). Prediction of Fish Bioconcentration Factors of Nonpolar Organic Pollutants Based on Molecular Connectivity Indices. 140. S. Tao, H. Y. Hu, X. X. Lu, R. W. Dawson, and F. L. Xu, Chemosphere, 41, 1563–1568 (2000). Fragment Constant Method for Prediction of Fish Bioconcentration Factors of Nonpolar Chemicals. 141. S. D. Dimitrov, N. C. Dimitrova, J. D. Walker, G. D. Veith, and O. G. Mekenyan, Pure Appl. Chem., 74, 1823–1830 (2002). Predicting Bioconcentration Factors of Highly Hydrophobic Chemicals. Effects of Molecular Size. 142. S. D. Dimitrov, N. C. Dimitrova, J. D. Walker, G. D. Veith, and O. G. Mekenyan, QSAR Comb. Sci., 22, 58–68 (2003). Bioconcentration Potential Predictions Based on Molecular Attributes - An Early Warning Approach for Chemicals Found in Humans, Birds, Fish and Wildlife. 143. M. H. Fatemi, M. Jalali-Heravi, and E. Konuze, Anal. Chim. Acta, 486, 101–108 (2003). Prediction of Bioconcentration Factor Using Genetic Algorithm and Artificial Neural Network. 144. P. Gramatica and E. Papa, QSAR Comb. Sci., 22, 374–385 (2003). QSAR Modeling of Bioconcentration Factor by Theoretical Molecular Descriptors. 145. O. Ivanciuc, Internet Electron. J. Mol. Des., 4, 813–834 (2005). Bioconcentration Factor QSAR with Support Vector Regression and Artificial Neural Networks. 146. S. S. Yang, W. C. Lu, N. Y. Chen, and Q. N. Hu, J. Mol. Struct. (THEOCHEM), 719, 119– 127 (2005). Support Vector Regression Based QSPR for the Prediction of Some Physicochemical Properties of Alkyl Benzenes. 147. K.-R. Mu¨ller, G. Ra¨tsch, S. Sonnenburg, S. Mika, M. Grimm, and N. Heinrich, J. Chem Inf. Model., 45, 249–253 (2005). Classifying ‘Drug-likeness’ with Kernel-based Learning Methods.

References

399

148. R. N. Jorissen and M. K. Gilson, J. Chem Inf. Model., 45, 549–561 (2005). Virtual Screening of Molecular Databases Using a Support Vector Machine. 149. R. Arimoto, M. A. Prasad, and E. M. Gifford, J. Biomol. Screen, 10, 197–205 (2005). Development of CYP3A4 Inhibition Models: Comparisons of Machine-learning Techniques and Molecular Descriptors. 150. V. Svetnik, T. Wang, C. Tong, A. Liaw, R. P. Sheridan, and Q. Song, J. Chem Inf. Model., 45, 786–799 (2005). Boosting: An Ensemble Learning Tool for Compound Classification and QSAR Modeling. 151. C. W. Yap, C. Z. Cai, Y. Xue, and Y. Z. Chen, Toxicol. Sci., 79, 170–177 (2004). Prediction of Torsade-causing Potential of Drugs by Support Vector Machine Approach. 152. M. Tobita, T. Nishikawa, and R. Nagashima, Bioorg. Med. Chem. Lett., 15, 2886–2890 (2005). A Discriminant Model Constructed by the Support Vector Machine Method for HERG Potassium Channel Inhibitors. 153. M. J. Sorich, R. A. McKinnon, J. O. Miners, D. A. Winkler, and P. A. Smith, J. Med. Chem., 47, 5311–5317 (2004). Rapid Prediction of Chemical Metabolism by Human UDP-glucuronosyltransferase Isoforms Using Quantum Chemical Descriptors Derived with the Electronegativity Equalization Method. 154. V. V. Zernov, K. V. Balakin, A. A. Ivaschenko, N. P. Savchuk, and I. V. Pletnev, J. Chem. Inf. Comput. Sci., 43, 2048–2056 (2003). Drug Discovery Using Support Vector Machines. The Case Studies of Drug-likeness, Agrochemical-likeness, and Enzyme Inhibition Predictions. 155. J. M. Kriegl, T. Arnhold, B. Beck, and T. Fox, QSAR Comb. Sci., 24, 491–502 (2005). Prediction of Human Cytochrome P450 Inhibition Using Support Vector Machines. 156. J. Aires-de-Sousa and J. Gasteiger, J. Comb. Chem., 7, 298–301 (2005). Prediction of Enantiomeric Excess in a Combinatorial Library of Catalytic Enantioselective Reactions. 157. H. Li, C. Y. Ung, C. W. Yap, Y. Xue, Z. R. Li, Z. W. Cao, and Y. Z. Chen, Chem. Res. Toxicol., 18, 1071–1080 (2005). Prediction of Genotoxicity of Chemical Compounds by Statistical Learning Methods. 158. C. Helma, T. Cramer, S. Kramer, and L. De Raedt, J. Chem. Inf. Comput. Sci., 44, 1402–1411 (2004). Data Mining and Machine Learning Techniques for the Identification of Mutagenicity Inducing Substructures and Structure Activity Relationships of Noncongeneric Compounds. 159. T. C. Martin, J. Moecks, A. Belooussov, S. Cawthraw, B. Dolenko, M. Eiden, J. von Frese, W. Ko¨hler, J. Schmitt, R. Somorjai, T. Udelhoven, S. Verzakov, and W. Petrich, Analyst, 129, 897–901 (2004). Classification of Signatures of Bovine Spongiform Encephalopathy in Serum Using Infrared Spectroscopy. 160. J. A. F. Pierna, V. Baeten, A. M. Renier, R. P. Cogdill, and P. Dardenne, J. Chemometr., 18, 341–349 (2004). Combination of Support Vector Machines (SVM) and Near-infrared (NIR) Imaging Spectroscopy for the Detection of Meat and Bone Meal (MBM) in Compound Feeds. 161. S. Zomer, R. G. Brereton, J. F. Carter, and C. Eckers, Analyst, 129, 175–181 (2004). Support Vector Machines for the Discrimination of Analytical Chemical Data: Application to the Determination of Tablet Production by Pyrolysis-gas Chromatography-mass Spectrometry. 162. S. Zomer, C. Guillo, R. G. Brereton, and M. Hanna-Brown, Anal. Bioanal. Chem., 378, 2008–2020 (2004). Toxicological Classification of Urine Samples Using Pattern Recognition Techniques and Capillary Electrophoresis. 163. U. Thissen, B. U¨stu¨n, W. J. Melssen, and L. M. C. Buydens, Anal. Chem., 76, 3099–3105 (2004). Multivariate Calibration with Least-Squares Support Vector Machines. 164. F. Chauchard, R. Cogdill, S. Roussel, J. M. Roger, and V. Bellon-Maurel, Chemometrics Intell. Lab. Syst., 71, 141–150 (2004). Application of LS-SVM to Non-linear Phenomena in NIR Spectroscopy: Development of a Robust and Portable Sensor for Acidity Prediction in Grapes. 165. U. Thissen, M. Pepers, B. U¨stu¨n, W. J. Melssen, and L. M. C. Buydens, Chemometrics Intell. Lab. Syst., 73, 169–179 (2004). Comparing Support Vector Machines to PLS for Spectral Regression Applications.

400

Applications of Support Vector Machines in Chemistry

166. S. Zomer, M. D. N. Sa´nchez, R. G. Brereton, and J. L. P. Pavo´n, J. Chemometr., 18, 294–305 (2004). Active Learning Support Vector Machines for Optimal Sample Selection in Classification. 167. H. L. Zhai, H. Gao, X. G. Chen, and Z. D. Hu, Anal. Chim. Acta, 546, 112–118 (2005). An Assisted Approach of the Global Optimization for the Experimental Conditions in Capillary Electrophoresis. 168. K. Brudzewski, S. Osowski, and T. Markiewicz, Sens. Actuators B, 98, 291–298 (2004). Classification of Milk by Means of an Electronic Nose and SVM Neural Network. 169. O. Sadik, W. H. Land, A. K. Wanekaya, M. Uematsu, M. J. Embrechts, L. Wong, D. Leibensperger, and A. Volykin, J. Chem. Inf. Comput. Sci., 44, 499–507 (2004). Detection and Classification of Organophosphate Nerve Agent Simulants Using Support Vector Machines with Multiarray Sensors. 170. C. Distante, N. Ancona, and P. Siciliano, Sens. Actuators B, 88, 30–39 (2003). Support Vector Machines for Olfactory Signals Recognition. 171. M. Pardo and G. Sberveglieri, Sens. Actuators B, 107, 730–737 (2005). Classification of Electronic Nose Data with Support Vector Machines. 172. K. Brudzewski, S. Osowski, T. Markiewicz, and J. Ulaczyk, Sens. Actuators B, 113, 135–141 (2006). Classification of Gasoline with Supplement of Bio-products by Means of an Electronic Nose and SVM Neural Network. 173. M. Bicego, Sens. Actuators B, 110, 225–230 (2005). Odor Classification Using Similaritybased Representation. 174. T. B. Trafalis, O. Oladunni, and D. V. Papavassiliou, Ind. Eng. Chem. Res., 44, 4414–4426 (2005). Two-phase Flow Regime Identification with a Multiclassification Support Vector Machine (SVM) Model. 175. D. E. Lee, J. H. Song, S. O. Song, and E. S. Yoon, Ind. Eng. Chem. Res., 44, 2101–2105 (2005). Weighted Support Vector Machine for Quality Estimation in the Polymerization Process. 176. Y. H. Chu, S. J. Qin, and C. H. Han, Ind. Eng. Chem. Res., 43, 1701–1710 (2004). Fault Detection and Operation Mode Identification Based on Pattern Classification with Variable Selection. 177. I. S. Han, C. H. Han, and C. B. Chung, J. Appl. Polym. Sci., 95, 967–974 (2005). Melt Index Modeling with Support Vector Machines, Partial Least Squares, and Artificial Neural Networks. 178. S. Mika and B. Rost, Nucleic Acids Res., 32, W634–W637 (2004). NLProt: Extracting Protein Names and Sequences from Papers. 179. S. Mika and B. Rost, Bioinformatics, 20, i241–i247 (2004). Protein Names Precisely Peeled off Free Text. 180. L. Shi and F. Campagne, BMC Bioinformatics, 6, 88 (2005). Building a Protein Name Dictionary from Full Text: A Machine Learning Term Extraction Approach. 181. I. Donaldson, J. Martin, B. de Bruijn, C. Wolting, V. Lay, B. Tuekam, S. D. Zhang, B. Baskin, G. D. Bader, K. Michalickova, T. Pawson, and C. W. V. Hogue, BMC Bioinformatics, 4, (2003). PreBIND and Textomy - Mining the Biomedical Literature for Protein-protein Interactions Using a Support Vector Machine. 182. K. Takeuchi and N. Collier, Artif. Intell. Med., 33, 125–137 (2005). Bio-medical Entity Extraction Using Support Vector Machines. 183. R. Bunescu, R. F. Ge, R. J. Kate, E. M. Marcotte, R. J. Mooney, A. K. Ramani, and Y. W. Wong, Artif. Intell. Med., 33, 139–155 (2005). Comparative Experiments on Learning Information Extractors for Proteins and Their Interactions. 184. T. Joachims, in Advances in Kernel Methods — Support Vector Learning, B. Scho¨lkopf, C. J. C. Burges, and A. J. Smola, Eds., MIT Press, Cambridge, Massachusetts, 1999. Making Large-scale SVM Learning Practical. 185. R. Collobert and S. Bengio, J. Mach. Learn. Res., 1, 143–160 (2001). SVMTorch: Support Vector Machines for Large-scale Regression Problems.

CHAPTER 7

How Computational Chemistry Became Important in the Pharmaceutical Industry Donald B. Boyd Department of Chemistry and Chemical Biology, Indiana University-Purdue University at Indianapolis, 402 North Blackford Street, Indianapolis, Indiana 46202-3274

INTRODUCTION The aim of this chapter is to give a brief account of the historical development of computational chemistry in the pharmaceutical industry. Starting in the 1960s, scientists entering the field had to cope with and overcome a number of significant obstacles. Better methods had to be conceived and developed. Easier ways for users to perform computational experiments had to be engineered into the software. Computers had to become faster, and memory capacity had to be increased significantly. The minds of scientists who were used to doing research one way had to be convinced that there were other productive ways that could, in certain circumstances, help them reach their research goals. By overcoming the hurdles, successes were achieved. Some of these successes were scientific advances, and some were new pharmaceutical products reaching the marketplace. The accumulating number of successes helped propel the field forward, helping to create opportunities for additional computational chemists to establish careers in the pharmaceutical industry. In the spirit of objectivity, however, the chapter also mentions some research projects that

Reviews in Computational Chemistry, Volume 23 edited by Kenny B. Lipkowitz and Thomas R. Cundari Copyright ß 2007 Wiley-VCH, John Wiley & Sons, Inc.

401

402

Computational Chemistry in the Pharmaceutical Industry

did not work out so well. Successes and failures are part of any research or technical undertaking; scientific breakthroughs rarely come easily. As a result of the personal interest and experience (25 years at Eli Lilly and Company) of the author,1 the emphasis in this retrospective is on using computers for drug discovery. But the use of computers in laboratory instruments and for analysis of experimental and clinical data is no less important. The history reviewed here was written with young scientists in mind. One of the main goals of this book series is to educate. We feel it is important that the new investigator have an appreciation of how the field evolved to its current circumstance, if for no other reason than to help steer toward a better future for those scientists using or planning to use computational chemistry in the pharmaceutical industry. In addition, this chapter may bring back some memories – fond and otherwise – by elder participants in the field. Discovering a molecule with a useful therapeutic effect had long been exclusively an experimental art and science. Several scientific and technical advances made a computational approach to pharmaceutical progress possible. One early, fundamental advance was the development of the concept that chemical structure is related to molecular properties including biological activity. This concept, depicted in Figure 1, underlies all of medicinal chemistry and is so fundamental that it is often taken for granted and not even mentioned in many books and review articles. Given this relationship, it is easy to conceive that if one could predict properties by calculations, one might be able to predict which structures should be investigated in the laboratory. Another advance was recognizing that a drug typically exerts its biological activity by binding to and/or inhibiting some biomolecule in the body. This concept stems from Fischer’s famous lock-and-key hypothesis (Schlu¨ssel-Schloss-Prinzip).2,3 Another advance was the development in the 1920s of the theory of quantum mechanics,4 which connected the distribution of electrons in molecules with observable molecular properties. Pioneering research in the 1950s then forged links between the electronic structure of molecules and their biological activity. A part of such work was collected in the 1963 book by Bernard and Alberte Pullman (Paris, France) sparking the imagination of many young scientists about what might be possible with calculations on biomolecules.5 The earliest papers that attempted to mathematically relate chemical structure and biological activity were published in Scotland way back in the middle of the 19th century.6,7 These and a couple other papers8,9 were forerunners to modern quantitative structure-activity relationships (QSAR) even though they were not widely known publications. In 1964, the role of molecular descriptors in describing biological activity was reduced to a simplified mathematical form, and the field of QSAR was propelled toward its modern visage.10,11 (A descriptor is any calculated or experimental numerical property related to a compound’s chemical structure.) Of course, the engineering development of computers was requisite for their use at pharmaceutical companies. The early computers were designed

Introduction

403

Molecular Structure

Chemical and Physical Properties

Transport to Target Receptor

Drug-Receptor Interaction

Biochemical Events

Biological Response

Figure 1 From the chemical structure of the molecule arises its other properties such as size, shape, lipophilicity, polarity, and so forth. These properties in turn determine how a molecule will be transported in the body and how it will interact with its intended receptor. These interactions result in biochemical events, which in turn evoke a biological response.

for military and accounting applications, but gradually it became apparent that computing machinery would have a vast number of uses. Computers were first deployed at pharmaceutical companies as early as the 1940s. These early computers were used for payroll and accounting, not for science. The power and number of computers gradually increased, so that around 1960 a few pioneering industrial scientists started to think about how computers might aid them with their drug discovery efforts. In addition, access to computers was gained through contractual agreements with nearby educational institutions or companies in other industries. Although payroll and accounting were still the main uses of computers in the 1960s, a few courageous innovators were allowed to use spare time on the mainframes or were allowed to acquire smaller machines specifically for science. This chapter reviews events, trends, hurdles, progress, people, hardware, and software. Whereas the chapter attempts to paint a picture of happenings as historically correct as possible, it is inevitably colored by the author’s experiences and memories. The timeline used in this chapter is divided by decade beginning with the 1960s and running through the 1990s. The conclusion gives an overview of how the field has grown; keys to success are identified.

404

Computational Chemistry in the Pharmaceutical Industry

For some topics mentioned in this chapter, hundreds of books12 and thousands of articles demonstrating the growing importance of computational chemistry in the pharmaceutical industry could be cited, but it is impractical to include them all. We hope that the reader will tolerate us citing only a few examples; the work of all the many brilliant scientists who made landmark contributions cannot be covered in a single chapter. The author is less familiar with events at European and Japanese companies than with events in the United States. For an excellent history of the general development of computational chemistry in the United States, not just in industry, the reader is referred to an earlier chapter in this book series.13 Also, histories have been written on the development of computational chemistry in the United Kingdom,14 Canada,15 France,16 and Germany,17 but they touch only lightly on the subject of industrial research.

GERMINATION: THE 1960s We can state confidently that in 1960 essentially 100% of the computational chemists were in academic or government laboratories, not industry. Of course, back then they were not called computational chemists because that is a term that had not yet entered the language. The scientists who worked with computers to learn about molecules were called theoretical chemists or quantum chemists. The students coming from those academic laboratories constituted the main pool of candidates that industry could hire for their initial ventures into using computers for drug discovery. Another pool of chemists educated using computers were X-ray crystallographers. Some of these young theoreticians and crystallographers were interested in helping solve human health challenges and steered their careers toward pharmaceutical work. Although a marvel at the time, the workplace of the 1960s looks archaic in hindsight. Online computer files and graphical user interfaces were still futuristic concepts. Computers generally resided in computer centers, where a small army of administrators, engineers, programming consultants, and support people would tend the mainframe computers then in use. The computers were kept in locked, air-conditioned rooms inaccessible to the ordinary users. One of the largest computers then in use by theoretical chemists and crystallographers was the IBM 7094. Support staff operated the tape readers, card readers, and printers. The users’ room at the computer centers echoed with the clunk-clunk-clunk of card punches that encoded data as little rectangular holes in the so-called IBM cards.12 The cards were manufactured in different colors so that users could conveniently differentiate their many card decks. As a by-product, the card punches produced piles of colorful rectangular confetti. There were no Delete or Backspace keys; if any mistake was made in keying in data, the user would need to begin again with a fresh blank card. The programs used by chemists in the 1960s were usually written in FORTRAN II. Programs used by the chemists typically ranged from half a

Germination: The 1960s

405

box to several boxes long (each box contained 2000 cards; each line of code corresponded to one card). Input decks containing the data needed by the programs were generally smaller – consisting of tens of cards – and were sandwiched between JCL (job control language for IBM machines) cards and bound by rubber bands. Carrying several boxes of cards to the computer center was good for physical fitness. If a box was dropped or if a card reader mangled some of the cards, the tedious task of restoring the deck and replacing the torn cards ensued. Computer output usually came in the form of ubiquitous pale-green-andwhite striped paper (measuring 11 by 14 7/8 inches per page). (A replica of that computer paper was used as the cover design for the Journal of Computational Chemistry during the early years of its publication.) Special cardboard covers and long nylon needles were used to hold and organize stacks of printouts. The user rooms resounded with the jagged squeal of stacks of fresh computer printouts being ripped apart into individual jobs. These were put in the pigeonholes of each user or group. The abundance of cards and printouts in the users’ room scented the air with a characteristic paper smell. Mathematical algorithms for common operations such as matrix diagonalization had been written and could be inserted as a subroutine in a larger molecular orbital program, for instance. Specialized programs for chemistry were generally developed by academic groups with the graduate students doing most or all of the programming. This was standard practice in part because the professors at different universities (or maybe at the same university) were in competition with each other and wanted better programs than their competitors had access to. (Better meant running faster, handling larger matrices, and doing more.) Also, this situation was standard practice so that the graduate students would learn by doing. Obviously, this situation led to much duplication of effort: the proverbial reinventing the wheel. To improve this situation, Prof. Harrison Shull and colleagues at Indiana University, Bloomington, conceived and sold the concept of having an international repository of software that could be shared. Thus was born in 1962 the Quantum Chemistry Program Exchange (QCPE). Competitive scientists were initially slow to give away programs they worked so hard to write, but gradually the depositions to QCPE increased. We do not have room here to give a full recounting of the history of QCPE,18 but suffice it to say that QCPE proved instrumental in advancing the field of computational chemistry, including that at pharmaceutical companies. Back in the 1960s and 1970s, there were no software companies catering to the computational chemistry market, so QCPE was the main resource for the entire community. As the name implies, QCPE was initially used for exchanging subroutines and programs for ab initio and approximate electronic structure calculations. But QCPE evolved to encompass programs for molecular mechanics, kinetics, spectroscopy, and a wide range of other calculations on molecules. The quarterly QCPE Newsletter (later renamed the QCPE Bulletin), which was edited by Mr. Richard W. Counts, was for a

406

Computational Chemistry in the Pharmaceutical Industry

long time the main vehicle for computational chemists to announce programs and other news of interest. Industrial computational chemists were among the members of QCPE and, with permission from their corporate management, even contributed programs for use by others. In regard to software, we note one program that came from the realm of crystallography. ORTEP (Oak Ridge Thermal Ellipsoid Program) was the first widely used program for (noninteractive) molecular graphics.19 Output from the program was inked onto long scrolls of paper run through expensive, flatbed printers. The ball-and-stick ORTEP drawings were fine for publication, but for routine laboratory work graph paper, ruler, protractor, and pencil were the tools for plotting Cartesian coordinates of a molecule the chemist wanted to study. Such handmade drawings quantified and visualized molecular geometry. Experimental bond lengths and bond angles needed for such structure generation were found in a heavily-used, British compilation.20 To help the chemist think about molecular shape, handheld molecular models were also widely used by experimentalists and theoreticians alike. There were two main types of mechanical models. One was analogous to modern stick representations with metal or plastic rods represented bonds between atoms, the latter represented by balls or joints that held the rods at specific angles. Drieding models, which were made of solid and hollow metal wires, were among the most accurate and expensive at that time. (Less-expensive stick models made of plastic are still used in the teaching of organic chemistry.) The other type of molecular models was the space-filling variety. The expensive, well-known CPK (Corey-Pauling-Koltun) models21,22 consisted of three-dimensional spherical segments made of plastic that was color-coded by element (white for hydrogen, blue for nitrogen, red for oxygen, etc.). From this convention came the color molecular graphics we are familiar with today. Before proceeding further, it is worthwhile to briefly describe the milieu of pharmaceutical research 40 years ago. In the 1960s (and before), drug discovery was done by trial and error. Progress depended on the intuition and knowledge of medicinal chemists and biologists, as well as serendipitous discoveries, not on computational predictions. Interesting compounds flowed from two main sources in that period. The smaller pipeline consisted of natural products, such as soil microbes that produce biologically active components or plants with medicinal properties. The larger pipeline involved classical medicinal chemistry. A lead compound would be discovered by biological screening or by reading the patent and scientific literature published by competitors at other pharmaceutical companies. From the lead, the medicinal chemists would use their ingenuity, creativity, and synthetic expertise to construct new compounds that would be tested by the appropriate in-house pharmacologists, microbiologists, and so forth. Those compounds would often be submitted to a battery of other bioactivity screens being run at the company so that leads for other drug targets could be discovered besides the intended

Germination: The 1960s

407

biological target. The most potent compounds found would then become the basis for another round of analog design and synthesis. Thus would evolve from many iterations a structure-activity relationship (SAR), which when summarized would consist of a table of compounds and their activities. In fortuitous circumstances, one of the medicinal chemists would make a compound with sufficient potency that a project team consisting of scientists from drug discovery and drug development would be assembled to oversee further experiments on the compound to learn if it had the appropriate characteristics to become a pharmaceutical product. The formula for career success for a medicinal chemist was simple: invent or claim authorship of a project team compound. Management would then bestow kudos on the chemist (as well as the biologists) involved in the project. What happened when a theoretical chemist was thrown into this milieu? Well, initially not much because the only theoretical methods of the 1960s that could treat drug-sized (200–500 Da) molecules were limited in what they could predict, and often those predictions were inaccurate. The molecular orbital methods used were extended Hu¨ckel theory23,24 and soon thereafter CNDO/2 (complete-neglect-of-differential-overlap/second parameterization).25,26 These approximate methods involved determining molecular orbitals from a highly approximated Fock matrix. Although crude by today’s standards and incapable of giving accurate, energy-minimized (‘‘optimized’’), three-dimensional molecular geometries (bond lengths, bond angles, and torsional angles), the methods were more practical than other methods available at the time. One of these other methods was Hartree-Fock27,28,29,30 (also called self-consistent field or nonempirical in the early literature, or ab initio in recent decades). Although Hartree-Fock calculations did fairly well at predicting molecular geometries, the computers of the era limited treatment to molecules not much larger than ethane. Simpler methods such as Hu¨ckel theory31,32,33 and Pariser-Parr-Pople (PPP) theory34 could treat large molecules but only pi electrons. Hence, they were formally limited to planar molecules, but not many pharmaceuticals are planar. In addition to the quantum chemistry programs in use in the 1960s, an alternative and independent approach was to use QSAR where the activity of a compound is assumed to be a linear (or quadratic or higher) function of certain molecular descriptors. One commonly used descriptor was the contribution of an atom or a functional group to the lipophilicity of a molecule; this descriptor was designated pi (p). Other famous descriptors included the Hammett sigma (s) values for aromatic systems and the Taft sigma (s*) values for aliphatic systems. Both parameters came from the realm of physical organic chemistry35,36,37 and are measures of the tendency of a substituent to withdraw or donate electron density relative to a hydrogen atom. Let us close this section with some final comments about the situation in the 1960s. Abbott, Schering-Plough, and Upjohn were among the first companies, besides Lilly, to venture into the area of using computers for attempts at drug

408

Computational Chemistry in the Pharmaceutical Industry

discovery. Dow Chemical, which had pharmaceutical interests, also initiated a very early effort. At these companies, a person with theoretical and computer expertise was hired or one of the company’s existing research scientists was allowed to turn attention to learning about this new methodology. Because the science was so new, much effort was expended by those early pioneers in learning about the scope of applicability of the available methods. Attempts to actually design a drug were neither numerous nor particularly successful. This generalization does not imply that there were no scientific successes. There were a few successes in finding correlations and in gaining a better understanding of what was responsible for biological activity at the molecular and atomic level. For example, early work at Lilly revealed the glimmer of a relationship between the calculated electronic structure of the beta-lactam ring of cephalosporins and antibacterial activity. The work was performed in the 1960s but was not published38 until 1973 because of delays by cautious research management and patent attorneys at the company. (The relationship was elaborated in subsequent years,39,40 but no new pharmaceutical product resulted.41)

GAINING A FOOTHOLD: THE 1970s Some of the tiny number of companies that first got into this game dropped out after a few years (but returned later), either for lack of management support or because the technology was not intellectually satisfying to the scientist involved. Other companies, like Lilly, persisted. Lilly’s pioneering effort paid off in establishing a base of expertise. Also, quite a few papers were published, almost like from scientists in an academic setting. In hindsight, however, Lilly may have entered the field too early because the initial efforts were so limited by the then existing science, hardware, and software. First impressions can be lasting, and Lilly management rejected further permanent growth for more than 20 years. A series of managers at Lilly at least sustained the computational chemistry effort until near the end of the 1980s when the computational chemistry group was enlarged to catch up to size at the other large pharmaceutical companies. Companies such as Merck and Smith Kline and French (using the old name) entered the field a few years after Lilly. Unlike Lilly, they hired chemists trained in both organic chemistry and computers and with a pedigree traceable back to Prof. E. J. Corey at Harvard and his attempts at computer-aided synthesis planning.42,43,44 Regarding hardware of the 1970s, pharmaceutical companies invested money from the sale of their products to buy better and better mainframes. Widely used models included members of the IBM 360 and 370 series. Placing these more powerful machines in-house made it easier and more secure to submit jobs and to retrieve output. But output was still in the form of long printouts. Input had advanced to the point where punch cards were no longer needed. So-called dumb terminals, i.e., terminals with no local processing

Gaining a Foothold: The 1970s

409

Figure 2 Laboratories used by computational chemists in the 1970s and early 1980s were characterized by computer card files, key punches, and stacks of computer printouts. The terminal in the foreground, being used by a promising assistant, is a Decwriter II connected to a DEC 10 computer at the corporate computer center. The terminal in the background is an IBM 3278 that was hardwired to an IBM mainframe in the corporate computer center. Neither terminal had graphical capability. Computer printouts were saved because the computational chemistry calculations were usually lengthy (some running for weeks on a CPU) and were therefore expensive to reproduce. This photograph was taken on the day before Christmas 1982, but the appearance of the environs had not changed much since the mid-1970s.

capability, could be used to set up input jobs for batch running. For instance, at Lilly an IBM 3278 and a Decwriter II (connected to a DEC-10 computer) were used by the computational chemistry group. The statistics program MINITAB was one of the programs that ran on the interactive Digital Equipment Corporation (DEC) machine. Card punches were not yet totally obsolete, but they received less and less usage. The appearance of a typical computational chemistry laboratory is shown in Figure 2. The spread of technology at pharmaceutical companies also meant that secretaries were given word processors (such as the Wang machines) to use in addition to typewriters, which were still needed for filling out forms. Keyboarding was the domain of secretaries, data entry technicians, and computational chemists. Only a few managers and scientists would type their own memos and articles in those days. Software was still written primarily in FORTRAN, but now mainly FORTRAN IV. The holdings of QCPE expanded. Among the important acquisitions was Gaussian 70, an ab initio quantum chemistry program written by Prof. John A. Pople’s group at Carnegie-Mellon University. Pople made the program available in 1973.45,46 (He later submitted Gaussian 76 and Gaussian

410

Computational Chemistry in the Pharmaceutical Industry

80 to QCPE, but they were withdrawn when the Gaussian program was commercialized by Pople and some of his students in 1987.) Nevertheless, ab initio calculations, despite all the e´lan associated with them, were still not very practical or helpful for pharmaceutically interesting molecules. Semiempirical molecular orbital methods such as EHT, CNDO/2, and MINDO/3 were the mainstays of quantum chemical applications. MINDO/3 was Prof. Michael J. S. Dewar’s third refinement of a modified intermediate neglect of differential overlap method.47 The prominent position of quantum mechanics led a coterie of academic theoreticians to think their approach could solve research problems facing the pharmaceutical industry. These theoreticians, who met annually in Europe and on Sanibel Island in Florida, coined the terms of quantum biology48 and quantum pharmacology,49 names that may seem curious to the uninitiated. They were not meant to imply that some observable aspect of biology or pharmacology stems from the wave-particle duality observed in the physics of electrons. Rather, the names conveyed to cognoscenti that they were applying their trusty old quantum mechanical methods to compounds discussed by biologists and pharmacologists.50 However, doing a calculation on a system of pharmacological interest is not the same as designing a drug. Calculating the molecular orbitals of serotonin, for instance, is a far cry from designing a new serotonin reuptake inhibitor that could become a pharmaceutical product. Nonetheless, something even more useful came on the software scene in the 1970s. That was Prof. N. L. Allinger’s MMI/MMPI program51,52 for molecular mechanics. Classical methods for calculating conformational energies date to the 1940s and early 1960s.53,54 Copies of Allinger’s program could be purchased at a nominal fee from QCPE. Molecular mechanics has the advantage of being much faster than quantum mechanics and capable of generating common organic chemical structures approaching ‘‘chemical accuracy’’ (bond lengths correctly predicted to within about 0.01 A˚). Because of the empirical manner in which force fields were derived, molecular mechanics was an anathema to the quantum purists, never mind that Allinger himself also used quantum chemistry. Molecular mechanics became an important technique in the armamentarium of industrial researchers. Meanwhile, a surprising number of academic theoreticians were slow to notice that the science was transitioning55,56 from quantum chemistry to multifaceted computational chemistry. Computational chemists in the pharmaceutical industry also branched out from their academic upbringing by acquiring an interest in force field methods, QSAR, and statistics. Computational chemists working to discover pharmaceuticals came to appreciate the fact that it was too limiting to confine one’s work to just one approach to a problem. To solve research problems in industry, one had to use the best available technique in the limited time available, and this did not necessarily mean going to a larger basis set or doing a

Gaining a Foothold: The 1970s

411

higher level (and therefore longer running) quantum chemistry calculation. It meant using molecular mechanics or QSAR or whatever. It meant not being hemmed in by a purely quantum theoretical perspective. Unfortunately, the tension between the computational chemists and the medicinal chemists at pharmaceutical companies did not ease in the 1970s. Medicinal chemists were at the top of the pecking order in the corporate research laboratories. On the basis of conversations at scientific meetings where computational chemists from industry (all of them together could fit in a small room in this period) could informally exchange their experiences and challenges, this was an industry-wide situation. (Readers should not get the impression that the tension between theoreticians and experimentalists existed solely in the business world – it also existed at academic chemistry departments.) The situation was that as medicinal chemists pursued an SAR, the computational chemists might suggest a structure worthy of synthesis because calculations indicated that it had the potential of being more active. But the computational chemist was totally dependent on the medicinal chemist to test the hypothesis. Suddenly, the medicinal chemist saw himself going from being the wellspring of design ideas to being a technician who was implementing someone else’s idea. Although never intended as a threat to the prestige and hegemony of the organic chemistry hierarchy, design proposals from outside that hierarchy were often perceived as such. Another problem was that it was easy to change a carbon to a nitrogen or any other element on a computer. Likewise, it was easy to attach a substituent at any position in whatever stereochemistry seemed best for enhancing activity. On a computer it was easy to change a six-member ring to a fivemember ring or vice versa. Such computer designs were frequently beyond the possibilities of the synthetic organic chemists, or at least beyond the fast-paced chemistry practiced in industry. This situation contributed to the disconnect between computational chemists and medicinal chemists. What good is a computer design if the molecule is impossible to make? If the computational chemist needed a less active compound to be synthesized so as to help establish a computational hypothesis, such as for a pharmacophore, that synthesis was totally out of the question. No self-respecting medicinal chemist would want to admit to his management that he purposely spent valuable time making a less active compound. Thus, the 1970s remained a period in which the relationship between computational chemists and medicinal chemists was still being worked out. People in management, who generally rose from the ranks of medicinal chemists, were often unable to perceive a system for effective use of data and ideas from computational approaches. The managers had to constantly think about their own career and did not want to get caught on the wrong side of a risky issue. Many managers of that time were far from convinced that computational input was worth anything.

412

Computational Chemistry in the Pharmaceutical Industry

The computational chemists at Lilly tackled this problem of a collaboration gap in several ways. One was to keep the communication channels open and constantly explain what was being done, what might be doable, and what was beyond the capabilities of the then-current state of the art. For organic chemists who had never used a computer, it was necessary to gently dispel the notion that one could push a button on a large box with blinking lights and the chemical structure of the next $200 million drug would tumble into the output tray of the machine. (Annual sales of $200 million was equivalent to a blockbuster drug in those days.) The limited capability to predict molecular properties accurately was stressed by the computational chemists to the synthetic chemists. Relative numbers might be predictable, but not absolute values. Moreover, it was up to the human, not the machine, to use chemical intuition to capitalize on relationships found between calculated physical properties and sought-after biological activities. Also, it was important for the computational chemist to avoid technical jargon and theory when talking with medicinal chemists. The computational chemists, to the best of their ability, had to speak the language of the organic chemists, not vice versa. In an outreach to the medicinal chemists at Lilly, a one-week workshop was created and taught in the research building where the organic chemists were located. (The computational chemists were initially assigned office space with the analytical chemists and later with the biologists.) The workshop covered the basic and practical aspects of performing calculations on molecules. The input requirements (which included the format of the data fields on the punch cards) were taught for several programs. One program was used to generate Cartesian atomic coordinates. Output from that program was then used as input for the molecular orbital and molecular mechanics programs. Several of the adventurous young PhD organic chemists took the course. The outreach was successful in that it empowered a few medicinal chemists to do their own calculations for testing molecular design ideas – it was a foot in the door. These young medicinal chemists could set an example for the older ones. An analogous strategy was used at some other pharmaceutical companies. For instance, Merck conducted a workshop on synthesis planning for their chemists.57 Despite these efforts, medicinal chemists were slow to accept what computers could provide. Medicinal chemists would bring a research problem to the computational chemists, sometimes out of curiosity about what computing could provide, sometimes as a last resort after a question was unsolvable by other approaches. The question might be to explain why adding a certain substituent unexpectedly decreased activity in a series of compounds. Or the problem might involve finding a QSAR for a small set of compounds. If the subsequent calculations were unable to provide a satisfactory answer, there was a tendency on the part of some medicinal chemists to generalize from that one try and to dismiss the whole field of computational chemistry.

Gaining a Foothold: The 1970s

413

This facet of human nature, especially of scientifically educated people, was difficult to fathom. A perspective that we tried to instill with our colleagues was that a computer should be viewed as just another piece of research apparatus. Experiments could be done on a computer just like experiments could be run on a spectrometer or in an autoclave. Sometimes the instrument would give the results the scientist was looking for; other times, the computational experiment would fail. Not every experiment – at the bench or in the computer – works every time. If a reaction failed, a medicinal chemist would not dismiss all of synthetic chemistry; instead, another synthetic route would be attempted. However, the same patience did not seem to extend to computational experiments. Finally, in regard to the collaboration gap, the importance of a knowledgeable (and wise) mentor – an advocate – cannot be overstated. For a nascent effort to take root in a business setting, younger scientists working in exploratory areas had to be shielded from excessive critiquing by powerful medicinal chemists and management. The computational chemists were able to engage in collaborations with their fellow physical chemists. Some research questions dealt with molecular conformation and spectroscopy. The 1970s were full of small successes such as finding relationships between calculated and experimental properties. Some of these correlations were published. Even something so grand as the de novo design of a pharmaceutical was attempted but was not within reach. Two new computer-based resources were launched in the 1970s. One was the Cambridge Structural Database58 (CSD; based in Cambridge, England), and the other was the Protein Data Bank59 (PDB; then based at Brookhaven National Laboratory in New York). Computational chemists recognized that these compilations of 3-D molecular structures would prove very useful, especially as more pharmaceutically relevant compounds were deposited. The CSD was supported by subscribers including pharmaceutical companies. On the other hand, the PDB was supported by American taxpayers. We have not discussed QSAR very much, but two influential books of the 1970s can be mentioned. Dr. Yvonne Martin began her scientific career as an experimentalist in a pharmaceutical laboratory, but after becoming interested in the potential of QSAR, she spent time learning the techniques at the side of Prof. Corwin Hansch and also Prof. Al Leo of Pomona College in California. As mentioned in her book,60 she encountered initial resistance to a QSAR approach at Abbott Laboratories. Another significant book that was published in the late 1970s was a compilation of substituent constants.61 These parameters were heavily relied on in QSAR investigations. As the field of computer-aided drug design began to catch on in the 1970s, leaders in the field recognized the need to set standards for publications. For instance, a group of American chemists involved in QSAR research published a paper recommending minimal requirements for reporting the

414

Computational Chemistry in the Pharmaceutical Industry

results of QSAR studies.62 Also, a number of scientists recognized that the following problem could develop when CADD is juxtaposed to experimental drug discovery. Some CADD studies would eventually lead to hypotheses for more active chemical structures. The natural impulse on the part of a scientist is to publish the prediction. In fact, some computational chemists have been rather boastful about correctly predicting properties prior to experiment. A prediction and any subsequent experiments to synthesize a designed compound and test its biological activity is at the heart of the scientific method: hypothesis testing. But suppose a compound is made and it is active. What are the chances that the compound would be developed by a pharmaceutical company and would eventually reach the patients who could benefit from it? Unfortunately, the chances were not very good. Once an idea is publicly disclosed, there is a limit on obtaining patent rights to the compound. Because of the high expense of developing a pharmaceutical product, pharmaceutical companies would be reticent to become involved with the designed compound. A committee formed by the International Union of Pure and Applied Chemistry (IUPAC) attempted to preclude this problem from becoming widespread. They proposed that any designed compound should be made and tested prior to its structure being disclosed.63

GROWTH: THE 1980s If the 1960s were the Dark Ages and the 1970s were the Middle Ages of computational chemistry, the 1980s were the Renaissance, the Baroque Period, and the Enlightenment all rolled into one. The decade of the 1980s was when the various approaches of quantum chemistry, molecular mechanics, molecular simulations, QSAR, and molecular graphics coalesced into modern computational chemistry. In the world of scientific publishing, a seminal event occurred in 1980. Professor Allinger launched his Journal of Computational Chemistry. This helped stamp a name on the field. Before the journal began publishing, the field was variously called theoretical chemistry, calculational chemistry, modeling, and so on. Interestingly, Allinger first took his journal proposal to the business managers in charge of publications of the American Chemical Society (ACS). When they rejected the concept, Allinger turned to publisher John Wiley & Sons, which went on to become the premier producer of journals and books in the field. (Sadly, it was not until 2005 that the ACS finally recognized the need to improve its journal offerings in the field of computational chemistry/molecular modeling. It also took the ACS bureaucracy a long time to recognize computational chemistry as an independent subdiscipline of chemistry.) Several exciting technical advances fostered the improved environment for computer use at pharmaceutical companies in the 1980s. The first was

Growth: The 1980s

415

the development of the VAX 11/780 computer by Digital Equipment Corporation (DEC) in 1979. The machine was departmental size, i.e., the price, dimensions, and easy care of the machine allowed each department or group to have its own superminicomputer. This was a start toward non-centralized control over computing resources. At Lilly, the small molecule X-ray crystallographers were the first to gain approval for the purchase of a VAX around 1980. Fortunately, the computational chemists and a few other scientists were allowed to use it too. The machine was a delight to use and far better than the batchjob-oriented mainframes of International Business Machines (IBM) and other hardware manufacturers. The VAX could be run interactively. Users communicated with the VAX through interactive graphical terminals, the first of which were monochrome. The first VAX at Lilly was fine for one or two users, but would get bogged down and response times would slow to a crawl if more than five users were logged on simultaneously. Lilly soon started building an ever more powerful cluster of VAXes (also called VAXen in deference to the plural of ox). Several other hardware companies that manufactured superminicomputers in the same class as the VAX sprung up. But DEC proved to be a good, relatively long-lasting vendor to deal with, and many pharmaceutical companies acquired VAXes for research. (Today, DEC and those other hardware companies no longer exist.) The development of personal computers (PCs) in the 1980s started to change the landscape of computing. The pharmaceutical companies certainly noticed the development of the IBM PC, but its DOS (disk operating system) made learning to use it difficult. Some scientists nonetheless bought these machines. The Apple Macintosh appeared on the scene in 1984. With its cute little, lightweight, all-in-one box including monochrome screen, the Mac brought interactive computing to a new standard of user friendliness. Soon after becoming aware of these machines, nearly every medicinal chemist wanted one at work. The machines were great at word processing, graphing, and managing small (laboratory-sized) databases. The early floppy disks formatted for the Macs had a memory capacity of only 400 KB, but by 1988 double-sided, double-density disks could hold 1400 KB, which seemed plenty in those days. In contrast to today’s huge applications requiring a compact disk (> 500 MB) for storage, a typical program of the 1980s could be ‘‘stuffed’’ (compressed) on one or maybe two floppy disks. On the software front, three advances changed the minds of the medicinal chemists from being diehard skeptics to almost enthusiastic users. One advance was the development of electronic mail. As the Macs and terminals to the VAX spread to all the chemists in drug discovery and development, the desirability of being connected became obvious. The chemists could communicate with each other and with management and could tap into databases and other computer resources. As electronic traffic increased, research buildings had to be periodically retrofitted with each new generation of cabling to the computers. A side effect to the spread of computer terminals to the desktop

416

Computational Chemistry in the Pharmaceutical Industry

of every scientist was that management could cut back on secretarial help for scientists, who then had to do their own keyboarding to write reports and papers. The second important software advance was ChemDraw,64,65,66,67 which was released first for the Mac in 1986. This program gave chemists the ability to quickly create two-dimensional chemical diagrams. Every medicinal chemist could appreciate the aesthetics of a neat ChemDraw diagram. The diagrams could be cut and pasted into reports, articles, and patents. The old plastic ring templates for drawing chemical diagrams by hand were suddenly unnecessary. The third software advance also had an aesthetic element. This was the technology of computer graphics or, as it is called when 3-D structures are displayed on the computer screens, molecular graphics. Whereas medicinal chemists might have trouble understanding the significance of the highest occupied molecular orbital or the octanol-water partition coefficient of a structure, they could readily appreciate the stick, ball-and-stick, tube, and space-filling representations of 3-D molecular structures.68,69,70 The graphics could be shown in color and, on more sophisticated terminals, in stereo. These images were so stunning that one director of drug discovery at Lilly decreed that terms like theoretical chemistry, molecular modeling, and computational chemistry were out. The whole field was henceforth to be called molecular graphics as far as he was concerned. A picture was something that could be understood! Independently, the Journal of Molecular Graphics sprung up in 1983. Naturally, with the flood of new computer technology came the need to train the research scientists in its use. Whereas the Mac was so easy that medicinal chemists could master it and ChemDraw in less than a day of training, the VAX was a little more formidable. The author designed, organized, and taught VAX classes offered to the medicinal chemists and process chemists at Lilly. Computer programs that the computational chemists had been running on the arcane IBM mainframes were ported to the VAXes. This step made the programs more accessible because all the chemists were given VAX accounts. So, although the other programs (e.g., email and ChemDraw) enticed medicinal chemists to sit down in front of the computer screen, they were now more likely to experiment with molecular modeling calculations. (As discussed elsewhere,71 the terms computational chemistry and molecular modeling were used more or less interchangeably at pharmaceutical companies, whereas other scientists in the field tried to distinguish the terms.) Besides the classes and workshops, one-on-one training was offered to help the medicinal chemists run the computational chemistry programs. This was generally fruitful but occasionally led to amusing results such as when one medicinal chemist burst out of his lab to happily announce his discovery that he could obtain a correctlooking 3-D structure from MM2 optimization even if he did not bother to

Growth: The 1980s

417

attach hydrogens to the carbons. However, he had not bothered to check the bond lengths and bond angles for his molecule. On a broader front, large and small pharmaceutical companies became aware of the potential for computer-aided drug design. Although pharmaceutical companies were understandably reticent to discuss what compounds they were pursuing, they were quite free in disclosing their computational chemistry infrastructure. For instance, Merck, which had grown its modeling group to be one of the largest in the world, published their system72 in 1980. Lilly’s infrastructure73 was described at a national meeting of the American Chemical Society in 1982. Whereas corporate managers generally are selected for displaying leadership skills, they often just follow what they see managers doing at other companies. Hence, a strategy used by scientists to obtain new equipment or other resources was to make their managers aware of what other companies were able to do. New R&D investments are often spurred by the desire of management to keep up with competing companies. In the mid-1980s, the author initiated a survey of 48 pharmaceutical and chemical companies that were using computer-aided molecular design methods and were operating in the United States.74 The aim of the survey was to collect data that would convince management that Lilly’s computational chemistry effort needed to grow. We summarize here some highlights of the data because they give a window on the situation existing in the mid1980s. Between 1975 and 1985, the number of computational chemists employed at the 48 companies increased from less than 30 to about 150, more than doubling every five years. Thus, more companies were jumping on the bandwagon, and many companies that were already in this area were expanding their efforts. Hiring of computational chemists accelerated through the decade.75 Aware of the polarization that could exist between theoretical and medicinal chemists, some companies tried to circumvent this problem by hiring organic chemistry PhDs who had spent a year or two doing postdoctoral research in molecular modeling. This trend was so pervasive that by 1985, only about a fifth of the computational chemists working at pharmaceutical companies came from a quantum mechanical background. Students too became aware of the fact that if their PhD experience was in quantum chemistry, it would enhance their job prospects if they spent a year or two in some other area such as performing molecular dynamics simulations of proteins. The computational chemistry techniques used most frequently at that time were molecular graphics and molecular mechanics. Ab initio quantum programs were in use at 21 of the 48 companies. Over 80% of the companies were using commercially produced software. Two-thirds of the companies were using software sold by Molecular Design Ltd. (MDL). A quarter were using SYBYL from Tripos Associates, and 15% were using the molecular modeling program CHEMGRAF by Chemical Design Ltd.

418

Computational Chemistry in the Pharmaceutical Industry

The following companies had five or more scientists working full-time as computational chemists in 1985: Abbott, DuPont, Lederle (part of American Cyanamid), Merck, Rohm and Haas, Searle, SmithKline Beecham, and Upjohn. Some of these companies had as many as 12 scientists working on computer-aided molecular design applications and software development. For the 48 companies, the mean ratio of the number of synthetic chemists to computational chemists was 29:1. This ratio reflects not only what percentage of a company’s research effort was computer-based, but also the number of synthetic chemists that each computational chemist might be expected to serve. Hence, a small ratio indicates more emphasis on computing or a small staff of synthetic chemists. Pharmaceutical companies with low ratios (less than 15:1) included Abbott, Alcon, Allergan, Norwich Eaton (part of Proctor & Gamble), and Searle. The most common organizational arrangement (at 40% of the 48 companies) was for the computational chemists to be integrated in the same department or division as the synthetic chemists. The other companies tried placing their computational chemists in a physical/analytical group, in a computer science group, or in their own unit. About three-quarters of the 48 companies were using a VAX 11/780, 785, or 730 as their primary computing platform for research. The IBM 3033, 3083, 4341, and so on were being used for molecular modeling at about a third of the companies. (The percentages add up to more than 100% because larger companies had several types of machines.) The most commonly used graphics terminal was the Evans and Sutherland PS300 (E&S PS300) (40%), followed by Tektronix, Envison, and Retrographics VT640 at about one-third of the companies each, and IMLAC (25%). The most used brands of plotter in 1985 were the Hewlett-Packard and Versatec. As mentioned above, the most widely used graphics terminal in 1985 was the E&S PS300. This machine was popular because of its very high resolution, color, speed, and stereo capabilities. (It is stunning to think that a company so in fashion and dominant during one decade could totally disappear from the market a decade later. Such are the foibles of computer technology.) At Lilly, the E&S PS300 was set up in a large lighted room with black curtains enshrouding the cubicle with the machine. All Lilly scientists were free to use the software running on the machine. In addition, the terminal also served as a showcase of Lilly’s research prowess that was displayed to visiting Lilly sales representatives and visiting dignitaries. No doubt a similar situation occurred at other companies. The ability to see molecular models or other three-dimensional data on a computer screen was a novelty that further widened interest in computer graphics. Most users required special stereo glasses to see the images in stereo, but some chemists delighted themselves by mastering the relaxed-eye or crossed-eye of looking at the pairs of images. The 1980s saw an important change in the way software was handled. In the 1970s, most of the programs used by computational chemists were

Growth: The 1980s

419

distributed essentially freely through QCPE, exchanged person to person, or developed in-house. But in the 1980s, many of the most popular programs – and some less popular ones – were commercialized. The number of software vendors mushroomed. For example, Pople’s programs for ab initio calculations were withdrawn from QCPE; marketing rights were turned over to a company he helped found, Gaussian Inc. (Pittsburgh, Pennsylvania). This company also took responsibility for continued development of the software. In the molecular modeling arena, Tripos Associates (St. Louis, Missouri) was dominant by the mid-1980s. Their program SYBYL originally came from academic laboratories at Washington University (St. Louis).76 In the arena of chemical structure management, MDL (then in Hayward, California) was dominant. This company, which was founded in 1978 by Prof. Todd Wipke and others, marketed a program called MACCS for management of databases of compounds synthesized at or acquired by pharmaceutical companies. The software stored chemical structures (in two-dimensional representation) and allowed substructure searching and later similarity searching.77,78 The software was vastly better than the manual systems that pharmaceutical companies had been using for recording compounds on file cards that were stored in filing cabinets. Except for some companies, such as Upjohn, which had their own home-grown software for management of their corporate compounds, many companies bought MACCS and became dependent on it. As happens in a free market where there is little competition, MACCS was very expensive. Few if any academic groups could afford it. A serious competing software product for compound management did not reach the market until 1987, when Daylight Chemical Information Systems was founded. By then, pharmaceutical companies were so wed to MACCS that there was great inertia against switching their databases to another platform, even if it was cheaper and better suited for some tasks. In 1982, MDL started selling REACCS, a database management system for chemical reactions. Medicinal chemists liked both MACCS and REACCS. The former could be used to check whether a compound had been synthesized in-house and, if so, how much material was left in inventory. The latter program could be used to retrieve information about synthetic transformations and reaction conditions that had been published in the literature. Some other momentous advances occurred on the software front. One was the writing of MOPAC, a semiempirical molecular orbital program, by Dr. James J. P. Stewart, a postdoctoral associate in Prof. Michael Dewar’s group at the University of Texas at Austin.79,80,81 MOPAC was written in FORTRAN 77, a language that became popular among computational chemists in the 1980s. MOPAC was the first widely used program capable of automatically optimizing the geometry of molecules. This was a huge improvement over prior programs that could only perform calculations on fixed geometries. Formerly, a user would have to vary a bond length or a bond angle in increments, doing a separate calculation for each; then fit a

420

Computational Chemistry in the Pharmaceutical Industry

parabola to the data points and try to guess where the minimum was. Hence, MOPAC made the determination of 3-D structures much simpler and more efficient. The program could handle molecules large enough to be of pharmaceutical interest. With VAXes, a geometry optimization calculation could run as long as two or three weeks of wall clock time. An interruption of a run caused by a machine shutdown meant rerunning the calculation from the start. For the most part, however, the VAXes were quite stable. MOPAC was initially applicable to any molecule parameterized for Dewar’s MINDO/3 or MNDO molecular orbital methods (i.e., common elements of the first and second rows of the periodic table). The optimized geometries were not in perfect agreement with experimental numbers but were better than what could have been obtained by prior molecular orbital programs for large molecules (those beyond the capability of ab initio calculations). Stewart made his program available through QCPE in 1984, and it quickly became (and long remained) the most requested program from QCPE’s library of several hundred.82 Unlike commercialized software, programs from QCPE were attractive because they were distributed as source code and cost very little. In the arena of molecular mechanics, Prof. Allinger’s ongoing, meticulous refinement of an experimentally-based force field for organic compounds was welcomed by chemists interested in molecular modeling at pharmaceutical companies. The MM2 force field83,84 gave better results than MMI. To fund his research, Allinger sold distribution rights for the program initially to Molecular Design Ltd. (At the time, MDL also marketed several simple molecular graphics and modeling programs. Later, distribution rights for Allinger’s programs were transferred to Tripos.) A program of special interest to the pharmaceutical industry was CLOGP. This program was developed by Prof. Al Leo (Pomona College) in the 1980s.85,86,87 It was initially marketed through Daylight Chemical Information Systems (then of New Orleans and California). CLOGP could predict the lipophilicity of organic molecules. The algorithm was based on summing the contribution from each fragment (set of atoms) within a structure. The fragment contributions were parameterized to reproduce experimental octanol-water partition coefficients, log Po/w. There was some discussion among scientists about whether octanol was the best organic solvent to mimic biological tissues, but this solvent proved to be the satisfactory for most purposes and eventually became the standard. To varying degrees, lipophilicity is related to many molecular properties, including molecular volume, molecular surface area, transport through membranes, binding to receptor surfaces, and hence to many different bioactivities. The calculated log Po/w values were widely used as a descriptor in QSAR studies in both industry and academia. Yet another program was Dr. Kurt Enslein’s TOPKAT.88,89 It was sold through his company, Health Designs (Rochester, New York). The software was based on statistics and was trained to predict the toxicity of a molecule from its structural fragments. Hence, compounds with fragments such as nitro

Growth: The 1980s

421

or hydrazine would score poorly, basically confirming what an experienced medicinal chemist already knew. The toxicological endpoints included carcinogenicity, mutagenicity, teratogenicity, skin and eye irritation, and so forth. Today, pharmaceutical companies routinely try to predict toxicity, metabolism, bioavailibilty, and other factors that determine whether a highly potent ligand has what it takes to become a medicine. But back in the 1980s, the science was just beginning to be tackled computationally. The main market for the program was probably government laboratories and regulators. Pharmaceutical laboratories were aware of the existence of the program but were leery of using it much. Companies trying to develop drugs were afraid that if the program, which was of unknown reliability for any specific compound, erroneously predicted danger for a structure, it could kill a project even though a multitude of laboratory experiments might give the compound a clean bill of health. There was also the worry about litigious lawyers. A compound could pass all the difficult hurdles of becoming a pharmaceutical, yet some undesirable, unexpected side effect might show up in some small percentage of patients taking it. If lawyers and lay juries (who frequently had — and have — trouble comprehending complex topics such as science, the relative merits of different experiments, and the benefit-risk ratio associated with any pharmaceutical product) learned that a computer program had once put up a red flag for the compound, the pharmaceutical company could be alleged to be at fault. We briefly mention one other commercially produced program SAS. This comprehensive data management and statistics program was used mainly for handling clinical data, which was analyzed by the statisticians at each company. Computational chemists also used SAS and other programs when statistical analyses were needed. SAS also had then unique capabilities for producing graphical representations of multidimensional numerical data.90 (This was in the days prior to Spotfire.) With the widespread commercialization of molecular modeling software in the 1980s came both a boon and a bane to the computational chemist and pharmaceutical companies. The boon was that the software vendors sent marketing people to individual companies as well as to scientific meetings. The marketeers would extol the virtues of the programs they were pushing. Great advances in drug discovery were promised if only the vendor’s software systems were put in the hands of the scientists. Impressive demonstrations of molecular graphics, overlaying molecules, and so forth convinced company managers and medicinal chemists that here was the key to increasing research productivity. As a result of this marketing, most pharmaceutical companies purchased the software packages. The bane was that computer-aided drug design (CADD) was oversold, thereby setting up unrealistic expectations of what could be achieved by the software. Unrealistic expectations were also set for what bench chemists could accomplish with the software. Bench chemists tend to be intolerant of problematic molecular modeling software.

422

Computational Chemistry in the Pharmaceutical Industry

Whereas experienced computational chemists are used to tolerating complex, limited, jury-rigged, or tedious software solutions, bench chemists generally do not have the time or patience to work with software that is difficult to use. Unless the experimentalists devoted a good deal of time to learning the methods and limitations, the software was best left in the hands of computational chemistry experts. Also in the 1980s, structure-based drug design (SBDD) underwent a similar cycle. Early proponents oversold what could be achieved through SBDD, thereby causing pharmaceutical companies to reconsider their investments when they discovered that SBDD too was no panacea for filling the drug discovery cornucopia with choice molecules for development. Nevertheless, SBDD was an important advance. All through the 1970s, computational chemists were often rhetorically quizzed by critics about what, if any, pharmaceutical product had ever been designed by computer. Industrial computational chemists had a solid number of scientific accomplishments but were basically on the defensive when challenged with this question. Evidence to rebut the critics strengthened in the 1980s. The fact is that only a few computer-designed structures had ever been synthesized. (See our earlier discussion on who gets credit for a design idea.) The fact is that only a very tiny percentage of molecules – from any source – ever makes it as far as being a clinical candidate. The stringent criteria set for pharmaceutical products to be used in humans winnows out almost all molecules. The odds were not good for any computational chemist achieving the ultimate success: discovering a drug solely with the aid of the computer. In fact, many medicinal chemists would toil diligently at their benches and fume hoods for a whole career and never have one of their compounds selected as a candidate for clinical development. Another factor impeding computational chemistry from reaching its full usefulness was that there were only a few drug targets that had had their 3-D structures solved prior to the advancing methods for protein crystallography of the 1980s. One such early protein target was dihydrofolate reductase (DHFR), the 3-D structures of which became known in the late 1970s.91,92 This protein became a favorite target of molecular modeling/drug design efforts in industry and elsewhere in the 1980s. Many resources were expended trying to find better inhibitors than the marketed pharmaceuticals of the antineoplastic methotrexate or the antibacterial trimethoprim. Innumerable papers and lectures sprung from those efforts. Scientists do not like to report negative results, but one brave author of a 1988 review article quietly alluded to the fact that none of the computer-based efforts at his company or disclosed by others in the literature had yielded better drugs.93 Although this first major, widespread effort at SBDD was a disappointment, the situation looked better on the QSAR front. In Japan, Koga94,95,96 employed classical (Hansch-type) QSAR while discovering the antibacterial agent norfloxacin around 1982. Norfloxacin was the first of the third-generation analogs of nalidixic

Growth: The 1980s

423

acid to reach the marketplace. This early success may not have received the notice it deserved, perhaps because the field of computer-aided drug design continued to focus heavily on computer graphics, molecular dynamics, X-ray crystallography, and nuclear magnetic resonance spectroscopy.97 Another factor obscuring this success may have been that medicinal chemists and microbiologists at other pharmaceutical companies capitalized upon the discovery of norfloxacin to elaborate even better quinoline antibacterials that eventually dominated the market. As computers and software improved, SBDD became a more popular approach to drug discovery. One company, Agouron in San Diego, California, set a new paradigm for discovery based on iterations between crystallography and medicinal chemistry. As new compounds were made, some of them could be co-crystallized with the target protein. The 3-D structure of the complexes were solved by rapid computer techniques. Molecular modeling observations of how the compounds fit into the receptor suggested ways to improve affinity, leading to another round of synthesis and crystallography. Although considered by its practitioners and most others as an experimental science, protein crystallography (now popularly called structural biology) often employed a step whereby the refraction data was refined in conjunction with constrained molecular dynamics (MD) simulations. Dr. Axel Bru¨nger’s program X-PLOR98 met this important need. The force field in the program had its origin in CHARMM developed by Prof. Martin Karplus’s group at Harvard.99 Pharmaceutical companies that set up protein crystallography groups acquired X-PLOR to run on their computers. The SBDD approach affected computational chemists positively. The increased number of 3-D structures of therapeutically relevant targets opened new opportunities for molecular modeling of the receptor sites. Computational chemists assisted the medicinal chemists in interpreting the fruits of crystallography for design of new ligands. Molecular dynamics simulations can consume prodigious amounts of computer time. Not only are proteins very large structures, but also the MD results are regarded as better the longer they are run because more of conformational space is assumed to be a sample by the jiggling molecules. Even more demand for computer power seemed necessary when free energy perturbation (FEP) theory appeared on the scene. Some of the brightest luminaries in academic computational chemistry proclaimed that here was a powerful new method for designing drugs.100,101 Pharmaceutical companies were influenced by these claims.102 On the other hand, computational chemists closer to the frontline of working with medicinal chemists generally recognized that whereas FEP was a powerful method for accurately calculating the binding energy between ligands and macromolecular targets, it was too slow for extensive use in actual drug discovery. The molecular modifications that could be simulated with FEP treatment, such as changing one substituent to another, were relatively minor. Because the FEP simulations had to be run so long to obtain good results, it was often possible for a medicinal chemist to synthesize

424

Computational Chemistry in the Pharmaceutical Industry

the new modification in less time than it took to do the calculations! And, in those cases where a synthesis would take longer than the calculations, not many industrial medicinal chemists would rate the modification predicted from theory to be worth investing that much of their time. Researchers in industry are under a great deal of pressure to tackle problems quickly and not spend too much time on them. As we near the end of our coverage of the 1980s, we mention one unusual organizational structure. Whereas it was common practice in pharmaceutical companies for a medicinal chemist or other organic chemist to manage the computational chemistry group, one small company, Searle in Chicago, experimented in the mid-1980s with the arrangement of having the medicinal chemistry group report to a computational chemist. A potential advantage of this arrangement was that molecular structures designed on the computer would more likely be synthesized. Also, collaboration between the computational and the medicinal chemists could be mandated by a manager who wanted CADD to have a chance to succeed. However, the experiment lasted only two years. A publication in 1991 revealed that Searle experienced some of the same frictions in trying to maximize the contributions of computational chemistry that plagued other companies.103 (Searle was eventually subsumed by Pharmacia, which was swallowed by Pfizer.) The insatiable need for more computing resources in the 1980s sensitized the pharmaceutical companies to investigate supercomputing.104 Some pharmaceutical companies opted to acquire specialized machines such as array processors. By the mid-1980s, for example, several pharmaceutical companies had acquired the Floating Point System (FPS) 164. Other pharmaceutical companies sought to meet their needs by buying time and/or forming partnerships with one of the state or national supercomputing centers that had been set up in the United States, Europe, and Japan. For instance, in 1988, Lilly partnered with the National Center for Supercomputing Applications (NCSA) in Urbana-Champaign, Illinois. Meanwhile, supercomputer manufacturers such as Cray Research and ETA Systems, both in Minnesota, courted scientists and managers at the pharmaceutical companies. A phrase occasionally heard in this period was that computations were the ‘‘third way’’ of science. The other two traditional ways to advance science were experiment and theory. The concept behind the new phrase was that computing could be used to develop and test theories and to stimulate ideas for new experiments.

GEMS DISCOVERED: THE 1990s The 1990s was a decade of fruition because the computer-based drug discovery work of the 1980s yielded an impressive number of new chemical entities reaching the pharmaceutical marketplace. We elaborate on this

Gems Discovered: The 1990s

425

statement later in this section, but first we complete the story about supercomputers in the pharmaceutical industry. Pharmaceutical companies were accustomed to supporting their own research and making large investments in it. In fact, the pharmaceutical industry has long maintained the largest self-supporting research enterprise in the world. However, the price tag on a supercomputer was daunting. To help open the pharmaceutical industry as customers for supercomputers, the chief executive officer (CEO) of Cray Research took the bold step of paying a visit to the CEO of Lilly in Indianapolis. Apparently, Cray’s strategy was to entice a major pharmaceutical company to purchase a supercomputer, and then additional pharmaceutical companies might follow suit in the usual attempt to keep their research competitive. Lilly was offered a Cray-2 at an irresistible price. Not only did Lilly buy a machine, but other pharmaceutical companies either bought or leased a Cray. Merck, Bristol-Myers Squibb, Marion Merrell Dow (then a large company in Cincinnati, Ohio), Johnson & Johnson, and Bayer were among the companies that chose a Cray. Some of these machines were the older X-MP or the smaller J90 machine, the latter being less expensive to maintain. After Lilly’s purchase of the Cray 2S-2/128, line managers were given the responsibility to make sure the purchase decision had a favorable outcome. This was a welcome opportunity because line management was fully confident that supercomputing would revolutionize research and development (R&D).105 The Lilly managers believed that a supercomputer would enable their scientists to test more ideas than would be practical with older computers. Management was optimistic that a supercomputer would foster collaborations and information sharing among employees in different disciplines at the company. The managers hoped that both scientific and business uses of the machine would materialize. Ultimately then, supercomputing would speed the identification of promising new drug candidates. Scientists closer to the task of using the supercomputer saw the machine primarily as a tool for performing longer molecular dynamics simulations and quantum mechanical calculations on large molecules. However, if some other computational technique such as QSAR or data mining was more effective at discovering and optimizing new lead compounds, then the supercomputer might not fulfill the dreams envisioned for it. A VAX cluster remained an essential part of the technological infrastructure best suited for management of the corporate library of compounds (see more about this later). Lilly management organized special workshops to train potential users of the Cray. This pool of potential users included as many willing medicinal chemists and other personnel as could be rounded up. In-house computational chemists and other experts were assigned the responsibility of conducting the off-site, week-long workshops. The workshops covered not only how to submit and retrieve jobs, but also the general methods of molecular modeling, molecular dynamics, quantum chemistry, and QSAR.

426

Computational Chemistry in the Pharmaceutical Industry

The latter, as mentioned, did not require supercomputing resources, except perhaps occasionally to generate quantum mechanical descriptors. Mainly, however, the training had the concomitant benefit of exposing more medicinal chemists, including younger ones, to what could be achieved with the current state of the art of computational chemistry applied to molecular design. As the role of the computational chemists became more important, attitudes toward them became more accepting. At some large, old pharmaceutical houses, and at many smaller, newer companies, it was normal practice to allow computational chemists to be co-inventors on patents if the computational chemists contributed to a discovery. Other companies, including Lilly, had long maintained a company-wide policy that computational chemists could not be on drug patents. The policy was changed at Lilly as the 1990s dawned. Computational chemists were becoming nearly equal partners in the effort to discover drugs. This was good both for the computational chemists and for the company because modern pharmaceutical R&D requires a team effort. Lilly’s Cray also served as an impressive public relations showcase. The machine was housed in a special, climate-controlled room. One side of the darkened room had a wall of large glass windows treated with a layer of polymer-dispersed liquid crystals. The thousands of visitors who came to Lilly each year were escorted into a uniquely designed observation room where an excellent video was shown that described the supercomputer and how it would be used for drug discovery. The observation room was automatically darkened at the start of the video. At the dramatic finish of the video, the translucent glass wall was turned clear and bright lights were turned on inside the computer room revealing the Cray-2 and its cooling tower for the heat transfer liquid. The visitors enjoyed the spectacle. To the disappointment of Lilly’s guest relations department, Lilly’s Cray-2 was later replaced with a Cray J90, a mundane-looking machine. But the J90 was more economical especially because it was leased. The supercomputers were almost always busy with molecular dynamics and quantum mechanical calculations.106 Of the personnel at the company, the computational chemists were the main beneficiaries of supercomputing. At the same time supercomputers were creating excitement at a small number of pharmaceutical companies, another hardware development was attracting attention at just about every company interested in designing drugs. Workstations from Silicon Graphics Inc. (SGI) were becoming increasingly popular for molecular research. These high-performance, UNIX-based machines were attractive because of their ability to handle large calculations quickly and because of their high-resolution, interactive computer graphics. Although a supercomputer was fine for CPU-intensive jobs, the workstations were better suited for interactive molecular modeling software being used for drug research. The workstations became so popular that some medicinal

Gems Discovered: The 1990s

427

chemists wanted them for their offices, not so much for extensive use, but rather as a status symbol. Another pivotal event affecting the hardware situation of the early 1990s merits mention. As already stated, the Apple Macintoshes were well liked by scientists. However, in 1994, Apple lost its lawsuit against Microsoft regarding the similarities of the Windows graphical user interface (GUI) to Apple’s desktop design. Adding to Apple Corporation’s problems, the price of Windows-based PCs dropped significantly below that of Macs. The tables tilted in favor of PCs. More scientists began to use PCs. At Lilly, and maybe other companies, the chief information officer (a position that did not even exist until the 1990s when computer technology became so critical to corporate success) decreed that the company scientists would have to switch to PCs whether they wanted to or not. The reasons for this switch were several-fold. The PCs were more economical. With PCs being so cheap, it was likely more people would use them, and hence, there was a worry that software for Macs would become less plentiful. Also, the problem of incompatible files would be eliminated if all employees used the same type of computer and software. On the software front, the early 1990s witnessed a continued trend toward commercially produced programs being used in pharmaceutical companies. Programs such as SYBYL (marketed by Tripos), Insight/Discover (BIOSYM), and Quanta/CHARMm (Polygen, and later Molecular Simulations Inc., and now called Accelrys) were popular around the world for molecular modeling and simulations. Some pharmaceutical companies bought licenses to all three of these well-known packages. Use of commercial software freed the in-house computational chemists from the laborious task of code development, documentation, and maintenance, so that they would have more time to work on actual drug design. Another attraction of using commercial software was that the larger vendors would have a help desk that users could telephone for assistance when software problems arose, as they often did. The availability of the help desk meant that the in-house computational chemists would have fewer interruptions from medicinal chemists who were having difficulty getting the software to work. On the other hand, some companies, particularly Merck and Upjohn, preferred to develop software in-house because it was thought to be better than what the vendors could provide. Increasing use of commercial software for computational chemistry meant a declining role for software from QCPE. QCPE had passed its zenith by ca. 1992, when it had almost 1900 members and over 600 programs in its catalog. This catalog included about 15 molecular modeling programs written at pharmaceutical companies and contributed for the good of the community of computational chemists. Among the companies contributing software were Merck, DuPont, Lilly, Abbott, and Novartis. When distribution rights for MOPAC were acquired by Fujitsu in 1992, it was a severe blow to QCPE. After a period of decline, the operations of QCPE changed in 1998. Today only a Web-based operation continues at Indiana University Bloomington.

428

Computational Chemistry in the Pharmaceutical Industry

The 1990s witnessed changes for the software vendors also. The California company that started out as BioDesign became Molecular Simulations Inc. (MSI). Management at MSI went on a buying spree starting in 1991. The company acquired other small software companies competing in the same drug design market, including Polygen, BIOSYM, BioCAD, and Oxford Molecular (which had already acquired several small companies including Chemical Design Ltd. in 1998).107 Pharmaceutical companies worried about this accretion because it could mean less competition and it could mean that their favorite molecular dynamics (MD) program might no longer be supported in the future. This latter possibility has not come to pass because there was sufficient loyalty and demand for each MD package to remain on the market. Researchers from pharmaceutical companies participated in user groups set up by the software vendors. Pharmaceutical companies also bought into consortia created by the software vendors. These consortia, some of which dated back to the 1980s, aimed at developing new software tools or improving existing software. The pharmaceutical companies hoped to get something for their investments. Sometimes the net effect of these investments was that it enabled the software vendors to hire several postdoctoral research associates who worked on things that were of common interest to the investors. Although the pharmaceutical companies received some benefit from the consortia, other needs such as more and better force field parameters, remained underserved. Inspired by the slow progress in one force field development consortium, Merck single-handedly undertook the de novo development of a force field they call the Merck Molecular Force Field (MMFF94). This force field, which targeted the modeling of pharmaceutically interesting molecules well, was published,108–114 and several software vendors subsequently incorporated it in their molecular modeling programs. The accolades of fellow computational chemists led to the developer being elected in 1992 to become chairman of one of the Gordon Research Conferences on Computational Chemistry. (The latter well-respected conference series originated in 1986.115) On the subject of molecular modeling and force fields, a general molecular modeling package was developed in an organic chemistry laboratory at Columbia University in New York City.116 Perhaps because MacroModel was written with organic chemists in mind, it proved popular with industrial medicinal chemists, among others. The program was designed so that versions of widely used, good force fields including those developed by Allinger and by Kollman could easily be invoked for any energy minimization or molecular simulation. The 1990s witnessed other exciting technological developments. In 1991, Dr. Jan K. Labanowski, then an employee of the Ohio Supercomputer Center (Columbus, Ohio), launched an electronic bulletin board called the Computational Chemistry List (CCL). Computational chemists rapidly joined because it was free and an effective forum for informal exchange of information. Computational chemists at pharmaceutical companies were among the 2000 or so members who joined in the 1990s. Often these employees would

Gems Discovered: The 1990s

429

take the time to answer questions from beginners, helping them learn about the field of computer-aided drug design. The CCL was a place where the relative merits of different methodologies and computers, and the pros and cons of various programming languages could be debated, sometimes passionately. In 1991, MDL came out with a new embodiment of their compound management software called ISIS (Integrated Scientific Information System). Pharmaceutical companies upgraded to the new system, having become so dependent on MDL. In general, managers of information technology at pharmaceutical companies preferred one-stop solutions. On the other hand, computational chemists found Daylight Chemical Information Systems software more useful for developing new research applications. MACCS and then ISIS gave researchers exceptional new tools for drug discovery when similarity searching came along. Chemical structures were stored in the database as connectivity tables (describing the atoms and which ones are connected by bonds). In addition, chemical structures could be stored as a series of on-off flags (‘‘keys’’) indicating the presence or absence of specific atoms or combinations of atoms and/or bonds. The similarity of compounds could be quantitated by the computer in terms of the percentage of keys that the compounds shared in common. Thus, if a researcher was aware of a lead structure from in-house work or the literature, it was possible to find compounds in the corporate database that were similar and then get these compounds assayed for biological activities. Thus the technique of data mining became important. It was fairly easy to find compounds with low levels of activity by this method depending on how large the database was. Some of these active compounds might have a skeleton different from the lead structure. The new skeleton could form the basis for subsequent lead optimization. As Dr. Yvonne C. Martin (Abbott) has wryly commented in her lectures at scientific meetings, one approach to drug discovery is to find a compound that the target receptor sees as the same as an established ligand but that a patent examiner sees as a different compound (and therefore satisfying the novelty requirement for patentability). Many or most of the results from data mining in industry went unpublished because the leads generated were potentially useful knowledge and because of the never-ending rush of high priority work. When a few academic researchers gained access to commercial data mining software and a compound database, the weakly active compounds that they found were excitedly published. This difference between industry and academia in handling similar kinds of results is a matter of priorities. In industry, the first priority is to find marketable products and get them out the door. In academia, the priority is to publish (especially in high-impact journals). Contrary to a common misconception, however, scientists in industry do publish, a point we return to later. Software use for drug discovery and development can be classified in various ways. One way is technique based. Examples would be programs based on force fields or on statistical fitting (the latter including log P prediction and

430

Computational Chemistry in the Pharmaceutical Industry

toxicity prediction). Another way to classify software is based on whether the algorithm can be applied to cases where the 3-D structure of the target receptor is known or not. An example of software useful when the receptor structure is not known is Catalyst.117 This program, which became available in the early 1990s, tried to generate a 3-D model of a pharmacophore based on a small set of compounds with a range of activities against a given target. The pharmacophore model, if determinable, could be used as a query to search databases of 3-D structures in an effort to find new potential ligands. In fortuitous circumstances where the 3-D structure of the target receptor was known, three computational chemistry methodologies came into increased usage. One was docking, i.e., letting an algorithm try to fit a ligand structure into a receptor. Docking methodology dates back to the 1980s, but the 1990s saw more crystal structures of pharmaceutically relevant proteins being solved and used for ligand design.118 A second technique of the 1990s involved designing a computer algorithm to construct a ligand de novo inside a receptor structure. The program would assemble small molecular fragments or ‘‘grow’’ a chemical structure such that the electrostatic and steric attributes of the ligand would complement those of the receptor.119,120,121 The third technique of the 1990s was virtual screening.122,123 The computer would screen hypothetical ligand structures, not necessarily compounds actually in bottles, against the 3-D structure of a receptor in order to find those most likely to fit and therefore worthy of synthesis and experimentation. Technologies for protein crystallography continued to improve. Using computational chemistry software to refine ‘‘experimental’’ protein structures advanced. A paper by Bru¨nger et al. went on to become one of the most highly cited papers in the 10-year period starting in 1995.124 A new approach to drug discovery came to prominence around 1993. The arrival of this approach was heralded with optimism reminiscent of earlier waves of new technologies. The proponents of this innovation – combinatorial chemistry – were organic chemists. Although rarely explicitly stated, the thinking behind combinatorial chemistry seemed to be as follows. The chance of finding a molecule with therapeutic value was extremely low (one in 5000 or one in 10,000 were rough estimates that were often bandied about). Attempts at rational drug design had not significantly improved the odds of finding those rare molecules that could become a pharmaceutical product. Because the low odds could not be beat, make tens of thousands, . . . no, hundreds of thousands, . . . no, millions of compounds! Then, figuratively fire a massive number of these molecular bullets at biological targets and hope that some might stick. New computer-controlled robotic machinery would permit synthesis of all these compounds much more economically than the traditional one-compound-at-a-time process of medicinal chemistry. Likewise computer-controlled robotic machinery would automate the biological testing and reduce the cost per assay. Thus was introduced high-throughput screening (HTS) and ultra-HTS.

Gems Discovered: The 1990s

431

Proponents promised that use of combinatorial chemistry (combi-chem) and HTS was the way to fill the drug discovery pipeline with future pharmaceutical products. Pharmaceutical companies, encouraged by the advice of highly paid consultants from academia, made massive investments in people and infrastructure to set up the necessary equipment in the 1990s. The computers needed to run the equipment had to be programmed, and this work was done by instrument engineers, although chemists helped set up the systems that controlled the synthesis. Combinatorial chemistry increased the rate of output of new compounds by three orders of magnitude. Before combi-chem came on the scene, a typical SAR at a pharmaceutical company might have consisted of fewer than a couple hundred compounds, and a massive effort involving 10–20 medicinal chemistry laboratories might have produced two or three thousand compounds over a number of years. In 1993, with traditional one-compound-ata-time chemistry, it took one organic chemist on average one week to make one compound for biological testing. Some years later, with combi-chem a chemist could easily produce 2000 compounds per week. With the arrival of combi-chem, computational chemists had a new task in addition to what they had been doing. Computational chemistry was needed so that the combinatorial chemistry was not mindlessly driven by whatever reagents were available in chemical catalogs or from other sources. Several needs were involved in library designs.125 At the beginning of a research project, the need would be to cover as much of ‘‘compound space’’ as possible, i.e., to produce a variety of structures to increase the likelihood that at least one of the compounds might stick to the target. (Although the terms chemical or compound space have been in use for a couple years, formal definitions in the literature are hard to find. We regard compound space as the universe of chemically reasonable (energetically stable) combinations of atoms and bonds. A reaction involves the crossing of paths going from one set of points in chemical space to another set. A combinatorial library of compounds would be a subset of chemical space.) After the drug discovery researchers had gained a general idea of what structure(s) would bind to the target receptor, a second need arose: to design compounds similar to the lead(s). In other words, to pepper compound space around the lead in an effort to find a structure that would optimize biological activity. A third need was to assess the value of libraries being offered for sale by various outside intermediaries. Computational chemists could help determine whether these commercial libraries complemented or duplicated a company’s existing libraries of compounds and determine the degree of variety in the compounds being offered. Computational chemists could determine whether a proposed library would overlap a library previously synthesized. How does one describe chemical space and molecular similarity? Computational chemists had already developed the methodologies of the molecular descriptors and substructure keys, which we mentioned earlier. With these tools, the computational chemist could discern

432

Computational Chemistry in the Pharmaceutical Industry

where structures were situated in multidimensional compound or property space and provide advice to the medicinal chemists. (Each dimension of multidimensional space can be thought of as corresponding to a different descriptor or property.) Along with all the data generated by combi-chem and HTS came the need to manage and analyze the data. Hence, computers and the science of informatics became increasingly vital. The need to visualize and learn from the massive quantities of data arising from both experimental and computational approaches to drug discovery led to development of specialized graphical analysis tools. The computational chemist was now becoming more important to drug discovery research than ever before. Hence, by 1993–1994, these technological changes helped save the jobs of many computational chemists at a time when pharmaceutical companies in the United States were downsizing, as we now explain. The industry has been a favorite whipping boy of politicians for at least 40 years. In 1992–1993 an especially negative political force threatened the pharmaceutical industry in the United States. That force was the healthcare reform plan proposed by Hillary and Bill Clinton. Their vision of America was one that required more lawyers and regulators. Readers who are well versed in history of the 1930s will be aware of the economic system handed down from the fascist governments of pre-World War II Europe. Under that system, the means of production (industry) remains in private ownership but the prices that the companies can ask for their products are regulated by government. That was the scheme underlying the Clintons’ healthcare reform proposal. Pharmaceutical companies in the United States generally favored any proposal that would increase access to their products but feared this specific proposal because of the great uncertainty it cast over the status quo and future growth prospects. As a result, thousands of pharmaceutical workers – including research scientists – were laid off or encouraged to retire. Rumors swirled around inside each pharmaceutical company about who would be let go and who would retain their jobs. When word came down about the corporate decisions, the computational chemists were generally retained, but the ranks of the older medicinal chemists were thinned. A new generation of managers at pharmaceutical companies now realized that computer-assisted molecular design and library design were critical components of their company’s success. One is reminded of the observation of the Nobel laureate physicist, Max Planck, ‘‘An important scientific innovation rarely makes its way by gradually winning over and converting its opponents. . .. What does happen is that its opponents gradually die out and the growing generation is familiarized with the idea from the beginning.’’ Nevertheless, the Clintons’ healthcare reform scheme had a deleterious effect on the hiring of new computational chemists. The job market for computational chemists in the United States fell126 from a then record high in 1990 to a depression in 1992–1994. This happened because pharmaceutical

Gems Discovered: The 1990s

433

companies were afraid to expand until they were sure that the business climate was once again hospitable for growth. The healthcare reform proposal was defeated in the United States Congress, but it took a year or two before pharmaceutical companies regained their confidence and started rebuilding their workforces. Whereas, in the past, some companies hired employees with the intention of keeping them for an entire career, the employment situation became more fluid after the downsizing episode. Replacements were sometimes hired as contract workers rather than as full employees. This was especially true for information technology (IT) support people. Pharmaceutical companies hired these temporary computer scientists to maintain networks, PCs, and workstations used by the permanent employees. (Even more troubling to scientists at pharmaceutical companies was the outsourcing of jobs as a way to control R&D costs.) Toward the mid-1990s, a new mode of delivering online content came to the fore: the Web browser. Information technology engineers and computational chemists help set up intranets at pharmaceutical companies. This allowed easy distribution of management memos and other information to the employees. In addition, biological screening data could be posted on the intranet so that medicinal chemists could quickly access it electronically. Computational chemists made their applications (programs) Web-enabled so that medicinal chemists and others could perform calculations from their desktops. The hardware situation continued to evolve. Personal computers became ever more powerful in terms of processor and hard drive capacity. The price of PCs continued to fall. Clusters of PCs were built. Use of the open-source Linux operating system spread in the 1990s. Distributed processing was developed so a long calculation could be farmed out to separate machines. Massively parallel processing was tried. All these changes meant that the days of the supercomputers were numbered. Whereas the trend in the 1980s was toward dispersal of computing power to the departments and the individual user, the IT administrators started bringing the PCs under their centralized control in the 1990s. Software to monitor each machine was installed so that what each user did could be tracked. Gradually, computational chemists and other workers lost control over what could and could not be installed on their office machines. The same was true for another kind of hardware: the SGI workstations. These UNIX machines became more powerful and remained popular for molecular modeling through the 1990s. Silicon Graphics Inc. acquired the expiring Cray technology, but it did not seem to have much effect on their workstation business. Traditionally, in pursuit of their structure-activity relationships, medicinal chemists had focused almost exclusively on finding compounds with greater and greater potency. However, these SARs often ended up with

434

Computational Chemistry in the Pharmaceutical Industry

compounds that were unsuitable for development as pharmaceutical products. These compounds were not soluble enough in water, were not orally bioavailable, or were eliminated too quickly or too slowly from mammalian bodies. Pharmacologists and pharmaceutical development scientists for years had tried to preach the need for the medicinal chemists to think about these other factors that determined whether a compound could be a medicine. As has been enumerated elsewhere, there are many factors that determine whether a potent compound has what it takes to become a drug.127 Experimentally, it took a great deal of time to determine these other factors. Often, the necessary research resources would not be allocated to a compound until it had already been selected for project team status. At the beginning of the 1990s, predicting the factors beyond potency that are essential for a compound to become a pharmaceutical product were generally beyond the capability of computational chemistry methods to predict reliably. These factors include properties such as absorption, distribution, metabolism, elimination, and toxicity (ADME/ Tox). However, as the decade unfolded, computational chemists and other scientists created new and better methodologies for helping the medicinal chemists and biologists to select compounds considering more of the characteristics necessary to become a drug. In 1997, Lipinski’s now-famous ‘‘Rule of Five’’ was published.128 These simple rules were easily encoded in database mining operations at every company, so that compounds with low prospects of becoming an orally active, small-molecule drug (e.g., having a molecular weight less than 500 Da) could be weeded out by computer. Software vendors also incorporated these and other rules into their programs for sale to the pharmaceutical companies. The computational methods used in the 1980s focused, like medicinal chemistry, on finding compounds with ever-higher affinity between the ligand and its target receptor. That is why in the past we have advocated use of the term computer-aided ligand design (CALD) rather than CADD.126,129 However, with increased attention to factors other than potency, the field was finally becoming more literally correct in calling itself CADD. Another important change started in the mid-1990s. Traditionally, a QSAR determined at a pharmaceutical company might have involved only 5–30 compounds. The number depended on how many compounds the medicinal chemist had synthesized and submitted to testing by the biologists. Sometimes this size data set sufficed to reveal useful trends. In other cases, the QSARs were not very robust in terms of predictability. As large libraries of compounds were produced, data sets available for QSAR analysis became larger. With all that consistently produced (although not necessarily very accurately) biological data and a plethora of molecular descriptors, it was possible to find correlations with better predictability. In fact, QSAR proved to be one of the best approaches to providing assistance to the medicinal chemist in the 1990s. Computational chemists were inventive in creating new molecular descriptors; thousands have been described in the literature.130,131,132,133

Gems Discovered: The 1990s

435

As stated in the opening of this section, the 1990s witnessed the fruition of a number of drug design efforts. Making a new pharmaceutical product available to patients is a long, arduous, and costly enterprise. It takes 10–15 years from the time a compound is discovered in the laboratory until it is approved for physicians to prescribe. Hence, a molecule that reached the pharmacies in the 1990s was probably first synthesized at a pharmaceutical company well back in the 1980s. (Most of today’s widely prescribed medicines come from the pharmaceutical industry rather than from government or academic laboratories.) The improved methodologies of computational chemistry that became available in the 1980s should therefore have started to show their full impact in the 1990s. (Likewise, the improved experimental and computational methodologies of the 1990s, if they were as good as claimed, should be bearing fruit now in the early part of the 21st century.) Table 1 lists medicines whose discovery was aided in some way by computer-based methods. Those compounds marked ‘‘CADD’’ were publicized in a series of earlier publications.134–140 The CADD successes were compiled in 1997 when we undertook a survey of the corresponding authors of papers published after 1993 in the prestigious Journal of Medicinal Chemistry. Corresponding authors were asked whether calculations were crucial to the discovery of any compounds from their laboratory. Of the hundreds of replies, we culled out all cases where calculations had not led to drug discovery or had been done post hoc on a clinical candidate or pharmaceutical product. We have always felt strongly that the term ‘‘computer-aided drug design’’ should be more than just doing a calculation; it should be providing information or ideas that directly help with the conception of a useful new structure. For the survey, we retained only those cases where the senior author of a paper (usually a medicinal chemist) vouched that computational chemistry had actually been critically important in the research that led to the discovery of a compound that had reached the market. As seen in Table 1, there were seven compounds meeting this criterion in the period 1994–1997. The computational techniques used to find these seven compounds included QSAR, ab initio molecular orbital calculations, molecular modeling, molecular shape analysis,141 docking, active analog approach,142 molecular mechanics, and SBDD. More recently, a group in England led by a structural biologist compiled a list of marketed medicines that came from SBDD.143 These are labeled ‘‘SBDD’’ in Table 1. It can be seen that only a little overlap exists between the two compilations (CADD and SBDD). It can also be seen that the number of pharmaceuticals from SBDD is very impressive. Computer-based technologies are clearly making a difference in helping bring new medicines to patients. Often computational chemists had a role to play in fully exploiting the X-ray data. Looking at the success stories, we see that it has often been a team of researchers working closely together that led to the success. It took quite a while for other members of the drug discovery research community to

436

Brand name

Noroxin Cozaar Trusopt Norvir Crixivan Aricept Zomig Viracept Agenerase Relenza Tamiflu Aluviran Gleevec Tarceva

Generic name

Norfloxacin Losartan Dorzolamide Ritonavir Indinavir Donepezil Zolmitriptan Nelfinavir Amprenavir Zanamivir Oseltamivir Lopinavir Imatinib Erlotinib

Merck Merck Merck Abbott Merck Esai AstraZeneca Pfizer GlaxoSmithKline GlaxoSmithKline Roche Abbott Novartis OSI

Marketed by 1983 1994 1995 1996 1996 1997 1997 1997 1999 1999 1999 2000 2001 2004

Year approved in United States

Table 1 Marketed Pharmaceuticals Whose Discovery Was Aided by Computers

QSAR CADD CADD/SBDD CADD CADD QSAR CADD SBDD SBDD SBDD SBDD SBDD SBDD SBDD

Discovery assisted by

Activity Antibacterial Antihypertensive Antiglaucoma Antiviral Antiviral Anti-Alzheimer’s Antimigraine Antiviral Antiviral Antiviral Antiviral Antiviral Antineoplastic Antineoplastic

Final Observations

437

appreciate what computational chemistry could provide. Even today there remains room for further improvement in this regard. Computational chemistry is probably most effective when researchers work in an environment where credit is shared.144 Research management must try to balance, usually by trial and error, the opposing styles of encouraging competition among co-workers or encouraging cooperation in order to find what produces the best results from a given set of team members working on a given project. If management adopts a system whereby company scientists are competing with each other, some scientists may strive harder to succeed but collaborations become tempered and information flows less freely. On the other hand, if all members of an interdisciplinary team of scientists will benefit when the team succeeds, then collaboration increases, synergies can occur, and the team is more likely to succeed. Sometimes it helps to put the computational chemistry techniques in the hands of the medicinal chemists, but it seems that only some of these chemists have the time and inclination to use the techniques to best advantage. Therefore, computational chemistry experts play an important role in maximizing drug discovery efforts.

FINAL OBSERVATIONS Computers are so ubiquitous in pharmaceutical research and development today that it may be hard to imagine a time when they were not available to assist the medicinal chemist or biologist. The notion of a computer on the desk of every scientist and company manager was rarely contemplated a quarter century ago. Now, computers are essential for generating, managing, and transmitting information. Over the last four decades, we have witnessed waves of new technologies sweep over the pharmaceutical industry. Sometimes these technologies tended to be oversold at the beginning and turned out to not be a panacea to meet the quota of the number of new chemical entities that each company would like to launch each year. Computer hardware has been constantly improving. Experience has shown that computer technology so pervasive at one point in time can almost disappear 10 years later. In terms of software, the early crude methods for studying molecular properties have given way to much more powerful, better suited methods for discovering drugs. The data in Figure 3 attempts to summarize and illustrate what we have tried to describe about the history of computing at pharmaceutical companies over the last four decades. We plot the annual number of papers published [and abstracted by Chemical Abstracts Service (CAS)] for each year from 1964 through 2005. These are papers that were indexed by CAS as pertaining to ‘‘computer or calculation’’ and that came from pharmaceutical companies. Initially, we had wanted to structure our SciFinder Scholar145 search for all papers using terms pertaining to computational chemistry, molecular

Figure 3 Annual number of papers published by researchers at pharmaceutical companies during a 42-year period. The data were obtained by searching the CAPLUS and MEDLINE databases for papers related to computer or calculation. Then these hits were refined with SciFinder Scholar (v. 2004) by searching separately for 64 different company names. Well-known companies from around the world were included. Many of these companies are members of the Pharmaceutical Research and Manufacturing Association (PhRMA). The companies are headquartered in the United States, Switzerland, Germany, and Japan. The names of companies with more than 150 total hits in the period 1964-2004 are shown in the box. The indexing by CAS is such that a search on SmithKline Beecham gave the same number of hits as for GlaxoSmithKline (GSK) but much more than for Smith Kline and French. Searches on Parke-Davis and Warner Lambert give the same total number of hits. The top 10 companies for producing research publications relevant to computers and calculations are, in rank order, GlaxoSmithKline, Bayer, Merck, BASF, Lilly, Upjohn, Pfizer, Hoffmann-La Roche, Hoechst, and Ciba-Geigy. Some companies in the plot ceased having papers to publish simply because they were acquired by other pharmaceutical companies, and hence the affiliation of the authors changed. In the category marked ‘‘Other’’ are 40 other pharmaceutical and biopharmaceutical companies. These mostly smaller companies had fewer than 150 papers in the 42-year period. The CAPLUS database covers 1500 journal titles. This plot is easier to see in color, but for this grayscale reproduction we note that the order of companies in the legend is the same as the order of layers in the chart.

438

Final Observations

439

modeling, computer-aided drug design, quantitative structure-activity relationships, and so forth. However, CAS classifies these terms as overlapping concepts, and so SciFinder Scholar was unable to do the searches as desired. Searching on ‘‘computer or calculation’’ yields many relevant hits but also a nontrivial number of papers that are of questionable relevance. This contamination stems from the subjective way abstractors at CAS have indexed articles over the years. The irrelevant papers introduce noise in the data, so we want to focus on the qualitative trend of the top curve, which represents the sum of papers by all 64 companies covered in the survey. Figure 3 shows that industrial scientists do publish quite a bit. The total number of publications started off very low and increased slowly and erratically from 1964 until 1982. From 1982 to 1993, the annual number of papers grew dramatically and monotonically. This period is when the superminicomputers, supercomputers, and workstations appeared on the scene. After 1993, the total number of papers published each year by all companies in our analysis continued growing but more slowly. The number peaked at more than 600 papers per year in 2001. Curiously, the last few years show a slight decline in the number of papers published, although the number is still very high. Perhaps in recent years more proprietary research has been occupying the attention of computational chemists in the pharmaceutical industry. Although we have no way of knowing the total number of computational chemists employed in the pharmaceutical industry during each year for the last 40 years, it is possible that this number follows a curve similar to that for the total number of papers plotted in Figure 3. As the twentieth century came to a close, the job market for computational chemists had recovered from the 1992–1994 debacle. In fact, demand for computational chemists leaped to new highs each year in the second half of the 1990s.146 Most of the new jobs were in industry, and most of these industrial jobs were at pharmaceutical or biopharmaceutical companies. As we noted at the beginning of this chapter, in 1960 there were essentially no computational chemists in industry. But 40 years later, perhaps well over half of all computational chemists are working in pharmaceutical laboratories. The outlook for computational chemistry is therefore very much linked to the health of the pharmaceutical industry. Forces that adversely affect pharmaceutical companies will have a negative effect on the scientists who work there as well as at auxiliary companies such as software vendors that develop programs and databases for use in drug discovery and development. Discovering new medicines is a serious, extremely difficult, and expensive undertaking. Tens of thousands of scientists are employed in this activity. Back in 1980, pharmaceutical and biotechnology companies operating in the United States invested in aggregate about US$ 2 109 in R&D. The sum has steadily increased (although there was the slight pause in 1994 that we mentioned earlier). By 2003, investment in pharmaceutical R&D had grown to $34.5 109, and it increased further to $38 109 in 2004 and

440

Computational Chemistry in the Pharmaceutical Industry

$39.4 109 in 2005.147 Drug discovery is risky in the sense that there is no guarantee that millions of dollars invested in a project will pay off. Currently, it may cost as much as $1.2 109 on average to discover a new pharmaceutical product and bring it to the market. Prior to the 1990s, the majority of new chemical entities (NCEs) were coming from pharmaceutical companies in Europe. However, as the European governmental units over-regulated the industry and discouraged innovation, pharmaceutical research activity slowed in Europe and moved to the United States.148 Many of the outstanding computational chemists in Europe immigrated to the United States where opportunities for pharmaceutical discovery were more exciting. Today the United States pharmaceutical industry invests far more in discovering new and better therapies than the pharmaceutical industry in any other country or any government in the world. Because of its capitalistic business climate, the United States has the most productive pharmaceutical industry in the world. Despite the ever increasing investment in R&D each year, the annual number of NCEs approved for marketing in the United States (or elsewhere) has not shown any overall increase in the last 25 years. In the last two decades, the number of NCEs has fluctuated between 60 and 20 NCEs per year. The annual number of NCEs peaked in the late 1990s and was only 20 in 2005. Before the late 1990s, this very uncomfortable fact134 was not widely discussed by either research scientists or corporate executives but is now mentioned frequently. Scientists did not want to bring attention to their low success rate; executives did not want to alarm their stakeholders. The dearth of NCEs is depicted in Figures 4 and 5 which serve to illustrate that combinatorial chemistry generates many more compounds going into the drug discovery pipeline, but the number of new drugs coming out the funnel has not improved. This comparison demonstrates how difficult it has become to discover useful new pharmaceutical products. Also, it demonstrates that combi-chem, like other new technologies, was probably oversold by the organic chemists. A recent study of companies in various businesses found that there is no direct relationship between R&D spending and common measures of corporate success such as growth or profitability.149 The number of patents also does not correlate with sales growth. Superior research productivity was ascribed to the quality of an organization’s innovative capabilities. In other words, the ability and luck to identify worthwhile areas of research is more important than the number of dollars spent on research. An analysis of NCE data attempted to reach the conclusion that innovation is bringing to market drugs with substantial advantage over existing treatments.150 However, deciding whether R&D is becoming more productive depends on how the NCE data is handled. Generally, as recognized by most people in the field, the NCEs are not as numerous as one would like.151,152

Final Observations

441

10 000 compounds synthesized and screened

1 000 compounds advance to extended biological testing

100 compounds advance to toxicological testing

10 compounds advance to clinic

1 compound advances to market

Figure 4 Before the arrival of combinatorial chemistry and high-throughput screening, pharmaceutical scientists had to investigate on average 10,000 compounds to find one compound that was good enough to become a pharmaceutical product.

In an attempt to boost NCE output, executives at pharmaceutical companies have put their researchers under extreme pressure to focus and produce. Since the early 1990s, this pressure has moved in only one direction: up. Those pharmaceutical companies with scientists who are best at creating and using tools will be able to innovate their way to the future. With combinatorial chemistry, high-throughput screening, genomics, structural biology, and informatics firmly embedded in modern drug discovery efforts, computational chemistry is indispensable. Modern drug discovery involves inter- and intra-disciplinary teamwork. To succeed, highly specialized chemists and biologists must collaborate synergistically. For a computational chemistry group to succeed, they need to be led by knowledgeable proponents with an understanding of the need to align the group’s expertise with corporate goals. Groups that have been led by people too academically oriented and who rate success mainly in academic terms have not helped their companies to remain viable. All musical composers work with the same set of notes, but the geniuses put the notes together in an extraordinarily beautiful way. Synthetic

442

Computational Chemistry in the Pharmaceutical Industry

100 000 – 1 000 000 compounds synthesized and screened

1 000 compounds advance to extended biological testing

100 compounds advance to toxicological testing

10 compounds advance to clinic

1 compound advances to market

Figure 5 With combinatorial chemistry and high-throughput screening deployed, pharmaceutical scientists investigate many more compounds but still find on average only one compound that is good enough to become a pharmaceutical product.

chemists all have available to them the same elements. The successful medicinal chemist will combine atoms such that amazing therapeutic effect is achieved with the resulting molecule. Computational chemistry has become important in the pharmaceutical industry because it can provide a harmonious accompaniment to medicinal chemistry. The computational chemist’s goal should be to help the medicinal chemist by providing information about structural and electronic requirements to enhance activity, for example, information about which regions of compound space are most propitious for exploration. Fortunately, the effort that goes into pharmaceutical R&D does benefit society. In nations where modern medicines are available, life expectancy has increased and disability rates among the elderly have declined. Considering all of the things that can go wrong with the human body, many challenges remain for the pharmaceutical researcher. Hopefully, this review will inspire some

References

443

young readers to take up the challenge and join the noble quest to apply science to help find cures to improve people’s lives.

ACKNOWLEDGMENTS We are grateful for the privilege to contribute this historical perspective. We thank Dr. Kenneth B. Lipkowitz for his usual vigorous editorial improvements to this chapter. We thank our many colleagues over the years for what they taught us or tried to teach us. In particular, we acknowledge Dr. Max M. Marsh who was one of the very first people in the pharmaceutical industry to recognize the future potential of computer-aided drug design. Dr. Marsh, who started out as an analytical chemist, served 42 years at Eli Lilly and Company before retiring in 1986. It was generally recognized that Lilly was a family-oriented company committed to doing what was right in all phases of its business (the company’s early motto was ‘‘If it bears a red Lilly, it’s right’’), and there was great mutual loyalty between the company and the employees. Dr. Marsh epitomized these traditions of company culture. A better mentor and more gentlemanly person is hard to imagine. The research effort that he initiated in the early 1960s is still in operation today and of course is now much larger. During part of a 25-year career at the Lilly Research Laboratories of Eli Lilly and Company, the author had the privilege to work with Dr. Marsh. The author would also like to mention Dr. Roger G. Harrison, who started out as an organic chemist working for Eli Lilly and Company in England. He was one of the new generation of managers who appreciated what computational chemistry could contribute to drug discovery research and set in place a climate to maximize its potential. The author also thanks Prof. Norman L. Allinger, Mr. Douglas M. Boyd, Mrs. Joanne H. Boyd, Dr. David K. Clawson, Dr. Richard D. Cramer III, Dr. David A. Demeter, Mr. Gregory L. Durst, Mrs. Susanne B. Esquivel, Dr. Richard W. Harper, Dr. Robert B. Hermann, Dr. Anton J. Hopfinger, Dr. Stephen W. Kaldor, Mrs. Cynthia B. Leber, Dr. Yvonne C. Martin, Dr. Samuel A. F. Milosevich, Dr. James J. P. Stewart, and Dr. Terry R. Stouch for aid as this review was being written. Creation of this review was also assisted by the computer resources of SciFinder Scholar, Google, and Wikipedia.

REFERENCES 1. The organic, inorganic, and physical chemistry courses that the author took in graduate school at Harvard University were so permeated with quantum mechanics that he chose a research project in this field. The project involved molecular orbital calculations on some well-known biomolecules. This interest was further developed in a postdoctoral position at Cornell University, so it was natural that his career path led to the pharmaceutical industry. The author joined the drug discovery efforts at the Lilly Research Laboratories of Eli Lilly and Company in Indianapolis in 1968. After a satisfying career of 25 years at the company, he became a research professor at his present affiliation. 2. E. Fischer, Ber. Dtsch. Chem. Ges., 27, 2985–2993 (1894). Einflub der Konfiguration auf die Wirkung der Enzymen. 3. R. B. Silverman, The Organic Chemistry of Drug Design and Drug Action, Academic Press, San Diego, CA, 1992. 4. A. Messiah, Quantum Mechanics, Vol. I, (Translated from the French by G. M. Temmer), Wiley, New York, 1966. 5. B. Pullman and A. Pullman, Quantum Biochemistry, Interscience Publishers, Wiley, New York, 1963. 6. A. Crum Brown and T. R. Fraser, Trans. Roy. Soc. Edinburgh, 25, 151–203 (1869). On the Connection between Chemical Constitution and Physiological Action. Part I. On the Physiological Action of the Salts of the Ammonium Bases, Derived from Strychnia, Brucia, Thebaia, Codeia, Morphia, and Nicoti.

444

Computational Chemistry in the Pharmaceutical Industry

7. A. Crum Brown and T. R. Fraser, Trans. Roy. Soc. Edinburgh, 25, 693–739 (1869). On the Connection between Chemical Constitution and Physiological Action. Part II. On the Physiological Action of the Ammonium Bases Derived from Atropia and Conia. 8. T. C. Bruice, N. Kharasch, and R. J. Winzler, Arch. Biochem. Biophys., 62, 305–317 (1956). A Correlation of Thyroxine-like Activity and Chemical Structure. 9. R. Zahradnik, Experimentia, 18, 534–536 (1962). Correlation of the Biological Activity of Organic Compounds by Means of the Linear Free Energy Relations. 10. C. Hansch and T. Fujita, J. Am. Chem. Soc., 86, 1616–1626 (1964). r-s-p Analysis; Method for the Correlation of Biological Activity and Chemical Structure. 11. S. M. Free Jr. and J. W. Wilson, J. Med. Chem., 7, 395–399 (1964). A Mathematical Contribution to Structure-Activity Studies. 12. K. B. Lipkowitz and D. B. Boyd, in Reviews in Computational Chemistry, K. B. Lipkowitz and D. B. Boyd, Eds., Wiley-VCH, New York, 2001, Vol. 17, pp. 255–357. Books Published on the Topics of Computational Chemistry. 13. J. D. Bolcer and R. B. Hermann, in Reviews in Computational Chemistry, K. B. Lipkowitz and D. B. Boyd, Eds., VCH Publishers, New York, 1994, Vol. 5, pp. 1–63. The Development of Computational Chemistry in the United States. 14. S. J. Smith and B. T. Sutcliffe, in Reviews in Computational Chemistry, K. B. Lipkowitz and D. B. Boyd, Eds., VCH Publishers, New York, 1997, Vol. 10, pp. 271–316. The Development of Computational Chemistry in the United Kingdom. 15. R. J. Boyd, in Reviews in Computational Chemistry, K. B. Lipkowitz and D. B. Boyd, Eds., Wiley-VCH, New York, 2000, Vol. 15, pp. 213–299. The Development of Computational Chemistry in Canada. 16. J.-L. Rivail and B. Maigret, in Reviews in Computational Chemistry, K. B. Lipkowitz and D. B. Boyd, Eds., Wiley-VCH, New York, 1998, Vol. 12, pp. 367–380. Computational Chemistry in France: A Historical Survey. 17. S. D. Peyerimhoff, in Reviews in Computational Chemistry, K. B. Lipkowitz and D. B. Boyd, Eds., Wiley-VCH, Hoboken, NJ, 2002, Vol. 18, pp. 257–291. The Development of Computational Chemistry in Germany. 18. K. B. Lipkowitz and D. B. Boyd, Eds., in Reviews in Computational Chemistry, Wiley-VCH, New York, 2000, Vol. 15, pp. v-xi. A Tribute to the Halcyon Days of QCPE. 19. C. K. Johnson, Crystallographic Computing, Proceedings of the International Summer School on Crystallographic Computing, Carleton University, Ottawa, Canada, Aug. 1969, Crystallographic Computing, F. R. Ahmed, Ed., Munksgaard, Copenhagen, Denmark, 1970, pp. 227–230. Drawing Crystal Structures by Computer. 20. L. E. Sutton, D. G. Jenkin, A. D. Mitchell, and L. C. Cross, Eds., Tables of Interatomic Distances and Configuration in Molecules and Ions, Special Publ. No. 11, The Chemical Society, London, 1958. 21. W. L. Koltun, Biopolymers, 3, 665–679 (1965). Precision Space-Filling Atomic Models. 22. D. B. Boyd, J. Chem. Educ., 53, 483–488 (1976). Space-Filling Molecular Models of FourMembered Rings. Three-Dimensional Aspects in the Design of Penicillin and Cephalosporin Antibiotics. 23. R. Hoffmann and W. N. Lipscomb, J. Chem. Phys., 36, 2179–2189 (1962). Theory of Polyhedral Molecules. I. Physical Factorizations of the Secular Equation. 24. R. Hoffmann, J. Chem. Phys., 39, 1397–1412 (1963). An Extended Hu¨ckel Theory. I. Hydrocarbons. 25. J. A. Pople and G. A. Segal, J. Chem. Phys., 43, S136-S149 (1965). Approximate SelfConsistent Molecular Orbital Theory. II. Calculations with Complete Neglect of Differential Overlap. 26. J. A. Pople and D. L. Beveridge, Approximate Molecular Orbital Theory, McGraw-Hill, New York, 1970.

References

445

27. D. R. Hartree, Proc. Cambridge Phil. Soc., 24, 89–110 (1928). The Wave Mechanics of an Atom with a Non-Coulomb Central Field. I. Theory and Methods. 28. D. R. Hartree, Proc. Cambridge Phil. Soc., 24, 111–132 (1928). The Wave Mechanics of an Atom with a Non-Coulomb Central Field. II. Some Results and Discussion. 29. D. R. Hartree, Proc. Cambridge Phil. Soc., 24 (Pt. 3), 426–437 (1928). Wave Mechanics of an Atom with a Non-Coulomb Central Field. III. Term Values and Intensities in Series in Optical Spectra. 30. V. Fock, Zeitschrift fu¨r Physik, 62, 795–805 (1930). ‘‘Self-Consistent Field’’ with Interchange for Sodium. 31. J. D. Roberts, Notes on Molecular Orbital Calculations, Benjamin, New York, 1962. 32. W. B. Neely, Mol. Pharmacol., 1, 137–144 (1965). The Use of Molecular Orbital Calculations as an Aid to Correlate the Structure and Activity of Cholinesterase Inhibitors. 33. R. S. Schnaare and A. N. Martin, J. Pharmaceut. Sci., 54, 1707–1713 (1965). Quantum Chemistry in Drug Design. 34. R. G. Parr, Quantum Theory of Molecular Electronic Structure, Benjamin, New York, 1963. 35. L. P. Hammett, Physical Organic Chemistry; Reaction Rates, Equilibria, and Mechanisms, 2nd ed., McGraw-Hill, New York, 1970. 36. R. W. Taft Jr., J. Am. Chem. Soc., 74, 2729–2732 (1952). Linear Free-Energy Relationships from Rates of Esterification and Hydrolysis of Aliphatic and Ortho-Substituted Benzoate Esters. 37. E. S. Gould, Mechanism and Structure in Organic Chemistry, Holt Reinhart Winston, New York, 1959. 38. R. B. Hermann, J. Antibiot., 26, 223–227 (1973). Structure-Activity Correlations in the Cephalosporin C Series Using Extended Hu¨ckel Theory and CNDO/2. 39. D. B. Boyd, R. B. Hermann, D. E. Presti, and M. M. Marsh, J. Med. Chem., 18, 408–417 (1975). Electronic Structures of Cephalosporins and Penicillins. 4. Modeling Acylation by the Beta-Lactam Ring. 40. D. B. Boyd, D. K. Herron, W. H. W. Lunn, and W. A. Spitzer, J. Am. Chem. Soc., 102, 1812– 1814 (1980). Parabolic Relationships between Antibacterial Activity of Cephalosporins and Beta-Lactam Reactivity Predicted from Molecular Orbital Calculations. 41. D. B. Boyd, in The Amide Linkage: Structural Significance in Chemistry, Biochemistry, and Materials Science, A. Greenberg, C. M. Breneman, and J. F. Liebman, Eds., Wiley, New York, 2000, pp. 337–375. Beta-Lactam Antibacterial Agents: Computational Chemistry Investigations. 42. E. J. Corey, W. T. Wipke, R. D. Cramer III, and W. J. Howe, J. Am. Chem. Soc., 94, 421–430 (1972). Computer-Assisted Synthetic Analysis. Facile Man-Machine Communication of Chemical Structure by Interactive Computer Graphics. 43. E. J. Corey, W. T. Wipke, R. D. Cramer III, and W. J. Howe, J. Am. Chem. Soc., 94, 431–439 (1972). Techniques for Perception by a Computer of Synthetically Significant Structural Features in Complex Molecules. 44. W. T. Wipke and P. Gund, J. Am. Chem. Soc., 96, 299–301 (1974). Congestion. Conformation-Dependent Measure of Steric Environment. Derivation and Application in Stereoselective Addition to Unsaturated Carbon. 45. W. J. Hehre, W. A. Lathan, R. Ditchfield, M. D. Newton, and J. A. Pople, QCPE, 11, 236 (1973). GAUSSIAN 70: Ab Initio SCF-MO Calculations on Organic Molecules. 46. W. J. Hehre, L. Radom, P. v. R. Schleyer, and J. A. Pople, Ab Initio Molecular Orbital Theory, Wiley-Interscience, New York, 1986, p. 44. 47. R. C. Bingham, M. J. S. Dewar, and D. H. Lo, J. Am. Chem. Soc., 97, 1285–1293 (1975). Ground States of Molecules. XXV. MINDO/3. Improved Version of the MINDO Semiempirical SCF-MO Method.

446

Computational Chemistry in the Pharmaceutical Industry

48. P. O. Lo¨wdin, Ed., Int. J. Quantum Chem., Quantum Biol. Symp. No. 1, Proceedings of the International Symposium on Quantum Biology and Quantum Pharmacology, Held at Sanibel Island, Florida, January 17–19, 1974, Wiley, New York, 1974. 49. W. G. Richards, Quantum Pharmacology, Butterworths, London, UK, 1977. 50. E. C. Olson and R. E. Christoffersen, Eds., Computer-Assisted Drug Design, Based on a Symposium Sponsored by the Divisions of Computers in Chemistry and Medicinal Chemistry at the ACS/CSJ Chemical Congress, Honolulu, Hawaii, April 2–6, 1979, ACS Symposium Series 112, American Chemical Society, Washington, DC, 1979. 51. N. L. Allinger, M. A. Miller, F. A. Van Catledge, and J. A. Hirsch, J. Am. Chem. Soc., 89, 4345–4357 (1967). Conformational Analysis. LVII. The Calculation of the Conformational Structures of Hydrocarbons by the Westheimer-Hendrickson-Wiberg Method. 52. R. Gygax, J. Wirz, J. T. Sprague, and N. L. Allinger, Helv. Chim. Acta., 60, 2522–2529 (1977). Electronic Structure and Photophysical Properties of Planar Conjugated Hydrocarbons with a 4n-Membered Ring. Part III. Conjugative Stabilization in an ‘‘Antiaromatic’’ System: The Conformational Mobility of 1,5-Bisdehydro[12]annulene. 53. F. H. Westheimer and J. E. Mayer, J. Chem. Phys., 14, 733–738 (1946). The Theory of the Racemization of Optically Active Derivatives of Biphenyl. 54. J. B. Hendrickson, J. Am. Chem. Soc., 83, 4537–4547 (1961). Molecular Geometry. I. Machine Computation of the Common Rings. 55. D. B. Boyd and K. B. Lipkowitz, J. Chem. Educ., 59, 269–274 (1982). Molecular Mechanics. The Method and Its Underlying Philosophy. 56. D. B. Boyd and K. B. Lipkowitz, Eds., in Reviews in Computational Chemistry, VCH, New York, 1990, Vol. 1, pp. vii-xii. Preface on the Meaning and Scope of Computational Chemistry. 57. P. Gund, E. J. J. Grabowski, G. M. Smith, J. D. Andose, J. B. Rhodes, and W. T. Wipke, in Computer-Assisted Drug Design. E. C. Olson and R. E. Christoffersen, Eds., ACS Symposium Series 112, American Chemical Society, Washington, DC, 1979, pp. 526–551. 58. F. H. Allen, S. Bellard, M. D. Brice, B. A. Cartwright, A. Doubleday, H. Higgs, T. Hummelink, B. G. Hummelink-Peter, O. Kennard, W. D. S. Motherwell, J. R. Rodgers, and D. G. Watson, Acta Crystallogr., Sect. B, B35, 2331–2339 (1979). The Cambridge Crystallographic Data Centre: Computer-based Search, Retrieval, Analysis and Display of Information. 59. F. C. Bernstein, T. F. Koetzle, G. J. B. Williams, E. F. Meyer Jr., M. D. Brice, J. R. Rodgers, O. Kennard, T. Shimanouchi, and M. Tasumi, J. Mol. Biol., 112, 535–542 (1977). The Protein Data Bank: A Computer-Based Archival File for Macromolecular Structures. 60. Y. C. Martin, Quantitative Drug Design. A Critical Introduction, Dekker, New York, 1978. 61. C. Hansch and A. Leo, Substituent Constants for Correlation Analysis in Chemistry and Biology, Wiley, New York, 1979. 62. P. N. Craig, C. H. Hansch, J. W. McFarland, Y. C. Martin, W. P. Purcell, and R. Zahradnik, J. Med. Chem., 14, 447 (1971). Minimal Statistical Data for Structure-Function Correlations. 63. L. G. Humber, A. Albert, E. Campaigne, J. F. Cavalla, N. Anand, M. Provita, A. I. Rachlin, and P. Sensi, Information Bulletin Number 49, International Union of Pure and Applied Chemistry, Oxford, UK, March 1975. ‘‘Predicted’’ Compounds with ‘‘Alleged’’ Biological Activities from Analyses of Structure-Activity Relationships: Implications for Medicinal Chemists. 64. E. J. Corey, A. K. Long, and S. D. Rubenstein, Science, 228, 408–418 (1985). ComputerAssisted Analysis in Organic Synthesis. 65. S. D. Rubenstein, Abstracts of Papers, 228th ACS National Meeting, Philadelphia, PA, August 22–26, 2004, CINF-054. Electronic Documents in Chemistry, from ChemDraw 1.0 to Present. 66. H. E. Helson, in Reviews in Computational Chemistry, K. B. Lipkowitz and D. B. Boyd, Eds., Wiley-VCH, New York, 1999, Vol. 13, pp. 313–398. Structure Diagram Generation.

References

447

67. T. Monmaney, Smithsonian, 36 (8), 48 (2005). Robert Langridge: His Quest to Peer into the Essence of Life No Longer Seems So Strange. 68. R. Langridge, Chemistry & Industry (London), 12, 475–477 (1980). Computer Graphics in Studies of Molecular Interactions. 69. R. Langridge, T. E. Ferrin, I. D. Kuntz, and M. L. Connolly, Science, 211, 661–666 (1981). Real-Time Color Graphics in Studies of Molecular Interactions. 70. J. G. Vinter, Chemistry in Britain, 21 (1), 32, 33–35, 37–38 (1985). Molecular Graphics for the Medicinal Chemist. 71. D. B. Boyd, in Ullmann’s Encyclopedia of Industrial Chemistry, 7th edition, Wiley-VCH, Weinheim, Germany, 2006. Molecular Modeling - Industrial Relevance and Applications. 72. P. Gund, J. D. Andose, J. B. Rhodes, and G. M. Smith, Science, 208, 1425–1431 (1980). Three-Dimensional Molecular Modeling and Drug Design. 73. D. B. Boyd and M. M. Marsh, Abstracts of Papers, 183rd National Meeting of the American Chemical Society, Las Vegas, Nevada, March 28 - April 2, 1982. Computational Chemistry in the Design of Biologically Active Molecules at Lilly. 74. D. B. Boyd, Quantum Chemistry Program Exchange (QCPE) Bulletin, 5, 85–91 (1985). Profile of Computer-Assisted Molecular Design in Industry. 75. K. B. Lipkowitz and D. B. Boyd, Eds., in Reviews in Computational Chemistry, Wiley-VCH, New York, 1998, Vol. 12, pp. v-xiii. Improved Job Market for Computational Chemists. 76. G. R. Marshall, C. D. Barry, H. E. Bosshard, R. A. Dammkoehler, and D. A. Dunn, in Computer-Assisted Drug Design. E. C. Olson and R. E. Christoffersen, Eds., ACS Symposium Series 112, American Chemical Society, Washington, DC, 1979, pp. 205–226. The Conformational Parameter in Drug Design: The Active Analog Approach. ComputerAssisted Drug Design. 77. Y. C. Martin, M. G. Bures, and P. Willett, in Reviews in Computational Chemistry, K. B. Lipkowitz and D. B. Boyd, Eds., VCH, New York, 1990, Vol. 1, pp. 213–263. Searching Databases of Three-Dimensional Structures. 78. G. Grethe and T. E. Moock, J. Chem. Inf. Comput. Sci., 30, 511–520 (1990). Similarity Searching in REACCS. A New Tool for the Synthetic Chemist. 79. M. J. S. Dewar, E. F. Healy, and J. J. P. Stewart, J. Chem. Soc., Faraday Trans. 2: Mol. Chem. Phys., 80, 227–233 (1984). Location of Transition States in Reaction Mechanisms. 80. J. J. P. Stewart, J. Computer-Aided Mol. Des., 4, 1–105 (1990). MOPAC: A Semiempirical Molecular Orbital Program. 81. J. J. P. Stewart, in Reviews in Computational Chemistry, K. B. Lipkowitz and D. B. Boyd, Eds., VCH, New York, 1990, Vol. 1, pp. 45–81. Semiempirical Molecular Orbital Methods. 82. J. J. P. Stewart, QCPE, 11, 455 (1983). MOPAC: A Semiempirical Molecular Orbital Program. 83. N. L. Allinger, J. Am. Chem. Soc., 99, 8127–8134 (1977). Conformational Analysis. 130. MM2. A Hydrocarbon Force Field Utilizing V1 and V2 Torsional Terms. 84. U. Burkert and N. L. Allinger, Molecular Mechanics, ACS Monograph 177, American Chemical Society, Washington DC, 1982. 85. A. J. Leo, J. Pharmaceut. Sci., 76, 166–168 (1987). Some Advantages of Calculating OctanolWater Partition Coefficients. 86. A. J. Leo, Methods Enzymol., 202, 544–591 (1991). Hydrophobic Parameter: Measurement and Calculation. 87. A. J. Leo, Chem. Rev., 93, 1281–1306 (1993). Calculating log Poct from Structures. 88. K. Enslein, Pharmacol. Rev., 36 (2, Suppl.), 131–135 (1984). Estimation of Toxicological Endpoints by Structure-Activity Relationships. 89. K. Enslein, Toxicol. Industrial Health, 4, 479–498 (1988). An Overview of Structure-Activity Relationships as an Alternative to Testing in Animals for Carcinogenicity, Mutagenicity, Dermal and Eye Irritation, and Acute Oral Toxicity.

448

Computational Chemistry in the Pharmaceutical Industry

90. D. B. Boyd, J. Med. Chem., 36, 1443–1449 (1993). Application of the Hypersurface Iterative Projection Method to Bicyclic Pyrazolidinone Antibacterial Agent. 91. D. A. Matthews, R. A. Alden, J. T. Bolin, D. J. Filman, S. T. Freer, R. Hamlin, W. G. Hol, R. L. Kisliuk, E. J. Pastore, L. T. Plante, N. Xuong, and J. Kraut, J. Biol. Chem., 253, 6946–6954 (1978). Dihydrofolate Reductase from Lactobacillus casei. X-Ray Structure of the Enzyme Methotrexate-NADPH Complex. 92. D. A. Matthews, R. A. Alden, S. T. Freer, H. X. Nguyen, and J. Kraut, J. Biol. Chem., 254, 4144–4151 (1979). Dihydrofolate Reductase from Lactobacillus casei. Stereochemistry of NADPH Binding. 93. A. J. Everett, in Topics in Medicinal Chemistry, P. R. Leeming, Ed., Proceedings of the 4th SCIRSC Medicinal Chemistry Symposium, Cambridge, UK, Sept. 6–9, 1987, Special Publication 65, Royal Society of Chemistry, London, 1988, pp. 314–331. Computing and Trial and Error in Chemotherapeutic Research. 94. A. Ito, K. Hirai, M. Inoue, H. Koga, S. Suzue, T. Irikura, and S. Mitsuhashi, Antimicrob. Agents Chemother., 17, 103–108 (1980). In vitro Antibacterial Activity of AM-715, A New Nalidixic Acid Analog. 95. H. Koga, A. Itoh, S. Murayama, S. Suzue, and T. Irikura, J. Med. Chem., 23, 1358–1363 (1980). Structure-Activity Relationships of Antibacterial 6,7- and 7,8-Disubstituted 1-Alkyl1,4-dihydro-4-oxoquinoline-3-carboxylic Acids. 96. H. Koga, Kagaku no Ryoiki, Zokan, 136, 177–202 (1982). Structure-Activity Relationships and Drug Design of Pyridonecarboxylic Acid Type (Nalidixic Acid Type) Synthetic Antibacterial Agents. 97. T. J. Perun and C. L. Propst, Eds., Computer-Aided Drug Design: Methods and Applications, Dekker, New York, 1989. 98. A. Bru¨nger, M. Karplus, and G. A. Petsko, Acta. Crystallogr., Sect. A, A45, 50–61 (1989). Crystallographic Refinement by Simulated Annealing: Application to Crambin. 99. B. R. Brooks, R. E. Bruccoleri, B. D. Olafson, D. J. States, S. Swaminathan, and M. Karplus, J. Comput. Chem., 4, 187–217 (1983). CHARMM: A Program for Macromolecular Energy, Minimization, and Dynamics Calculations. 100. P. Kollman, Annu. Rev. Phys. Chem., 38, 303–316 (1987). Molecular Modeling. 101. J. A. McCammon, Science, 238, 486–91 (1987). Computer-Aided Molecular Design. 102. M. R. Reddy, M. D. Erion, and A. Agarwal, in Reviews in Computational Chemistry, K. B. Lipkowitz and D. B. Boyd, Eds., Wiley-VCH, New York, 2000, Vol. 16, pp. 217–304. Free Energy Calculations: Use and Limitations in Predicting Ligand Binding Affinities. 103. J. P. Snyder, Med. Res. Rev., 11, 641–662 (1991). Computer-Assisted Drug Design. Part I. Conditions in the 1980s. 104. S. Karin and N. P. Smith, The Supercomputer Era, Harcourt Brace Jovanovich, Boston, 1987. 105. J. S. Wold, testimony before the U.S. Senate; Commerce, Science and Transportation Committee; Science, Technology and Space Subcommittee. Available at http://www. funet.fi/pub/sci/molbio/historical/biodocs/wold.txt. Supercomputing Network: A Key to U.S. Competitiveness in Industries Based on Life-Sciences Excellence. 106. S. A. F. Milosevich and D. B. Boyd, Perspectives in Drug Discovery and Design, 1, 345–358 (1993). Supercomputing and Drug Discovery Research. 107. A. B. Richon, Network Science, 1996. Available at http://www.netsci.org/Science/Compchem/feature17a.html. A History of Computational Chemistry. 108. T. A. Halgren, J. Am. Chem. Soc., 114, 7827–7843 (1992). The Representation of van der Waals (vdW) Interactions in Molecular Mechanics Force Fields: Potential Form, Combination Rules, and vdW Parameters. 109. T. A. Halgren, J. Comput. Chem., 17, 490–519 (1996). Merck Molecular Force Field. I. Basis, Form, Scope, Parameterization and Performance of MMFF94.

References

449

110. T. A. Halgren, J. Comput. Chem., 17, 520–552 (1996). Merck Molecular Force Field. II. MMFF94 van der Waals and Electrostatic Parameters for Intermolecular Interactions. 111. T. A. Halgren, J. Comput. Chem., 17, 553–586 (1996). Merck Molecular Force Field. III. Molecular Geometrics and Vibrational Frequencies for MMFF94. 112. T. A. Halgren and R. B. Nachbar, J. Comput. Chem., 17, 587–615 (1996). Merck Molecular Force Field. IV. Conformational Energies and Geometries. 113. T. A. Halgren, J. Comput. Chem., 17, 616–641 (1996). Merck Molecular Force Field. V. Extension of MMFF94 Using Experimental Data, Additional Computational Data and Empirical Rules. 114. T. A. Halgren, J. Comput. Chem., 20, 720–729 (1999). MMFF. VI. MMFF94s Option for Energy Minimization Studies. 115. D. B. Boyd and K. B. Lipkowitz, in Reviews in Computational Chemistry, K. B. Lipkowitz and D. B. Boyd, Eds., Wiley-VCH, New York, 2000, Vol. 14, pp. 399–439. History of the Gordon Research Conferences on Computational Chemistry. 116. F. Mohamadi, N. G. J. Richards, W. C. Guida, R. Liskamp, M. Lipton, C. Caufield, G. Chang, T. Hendrickson, and W. C. Still, J. Comput. Chem., 11, 440–467 (1990). MacroModel - An Integrated Software System for Modeling Organic and Bioorganic Molecules Using Molecular Mechanics. 117. P. W. Sprague, Recent Advances in Chemical Information, Special Publication 100, Royal Society of Chemistry, 1992, pp. 107–111. Catalyst: A Computer Aided Drug Design System Specifically Designed for Medicinal Chemists. 118. J. M. Blaney and J. S. Dixon, Perspectives in Drug Discovery and Design, 1, 301–319 (1993). A Good Ligand is Hard to Find: Automated Docking Methods. 119. H.-J. Boehm, Proceedings of the Alfred Benzon Symposium, No. 39, Munksgaard, Copenhagen, 1996, pp. 402–413. Fragment-Based de novo Ligand Design. 120. M. A. Murcko, in Reviews in Computational Chemistry, K. B. Lipkowitz and D. B. Boyd, Eds., Wiley-VCH, New York, 1997, Vol. 11, pp. 1–66. Recent Advances in Ligand Design Methods. 121. D. E. Clark, C. W. Murray, and J. Li, in Reviews in Computational Chemistry, K. B. Lipkowitz and D. B. Boyd, Eds., Wiley-VCH, New York, 1997, Vol. 11, pp. 67–125. Current Issues in de novo Molecular Design. 122. G. Lauri and P. A. Bartlett, J. Comput-Aided Mol. Des., 8, 51–66 (1994). CAVEAT: A Program to Facilitate the Design of Organic Molecules. 123. W. P. Walters, M. T. Stahl, and M. A. Murcko, Drug Discovery Today, 3, 160–178 (1998). Virtual Screening - An Overview. 124. A. T. Brunger, P. D. Adams, G. M. Clore, W. L. DeLano, P. Gros, R. W. Grosse-Kunstleve, J.-S. Jiang, J. Kuszewski, M. Nilges, N. S. Pannu, R. J. Read, L. M. Rice, T. Simonson, and G. L. Warren, Acta Crystallogr. Sect. D: Biol. Crystallogr., D54, 905–921 (1998). Crystallography & NMR System: A New Software Suite for Macromolecular Structure Determination. 125. R. A. Lewis, S. D. Pickett, and D. E. Clark, in Reviews in Computational Chemistry, K. B. Lipkowitz and D. B. Boyd, Eds., Wiley-VCH, New York, 2000, Vol. 16, pp. 1–51. Computer-Aided Molecular Diversity Analysis and Combinatorial Library Design. 126. K. B. Lipkowitz and D. B. Boyd, Eds., Reviews in Computational Chemistry, VCH, New York, 1996, Vol. 7, pp. v–xi. Trends in the Job Market for Computational Chemists. 127. K. B. Lipkowitz and D. B. Boyd, Eds., Reviews in Computational Chemistry, Wiley-VCH, New York, 1997, Vol. 11, pp. v–x. Preface on Computer Aided Ligand Design. 128. C. A. Lipinski, F. Lombardo, B. W. Dominy, and P. J. Feeney, Adv. Drug Deliv. Rev., 23, 3–25 (1997). Experimental and Computational Approaches to Estimate Solubility and Permeability in Drug Discovery and Development Settings.

450

Computational Chemistry in the Pharmaceutical Industry

129. D. B. Boyd. Abstracts of Papers, Symposium on Connecting Molecular Level Calculational Tools with Experiment, 206th National Meeting of the American Chemical Society, Chicago, Illinois, August 22–26, 1993, PHYS 256. Computer-Aided Molecular Design Applications. 130. C. Hansch and A. Leo, Exploring QSAR: Fundamentals and Applications in Chemistry and Biology, American Chemical Society, Washington, DC, 1995. 131. C. Hansch, A. Leo, and D. Hoekman, Exploring QSAR: Hydrophobic, Electronic, and Steric Constants, American Chemical Society, Washington, DC, 1995. 132. R. Todeschini and V. Consonni, Handbook of Molecular Descriptors, Wiley-VCH, Berlin, 2000. 133. M. Karelson, Molecular Descriptors in QSAR/QSPR, Wiley, New York, 2000. 134. D. B. Boyd, in Rational Molecular Design in Drug Research, T. Liljefors, F. S. Jørgensen, and P. Krogsgaard-Larsen, Eds., Proceedings of the Alfred Benzon Symposium No. 42. Munksgaard, Copenhagen, 1998, pp. 15–23. Progress in Rational Design of Therapeutically Interesting Compounds. 135. D. B. Boyd, CHEMTECH, 28 (5), 19–23 (1998). Innovation and the Rational Design of Drugs. 136. D. B. Boyd, Modern Drug Discovery, November/December, 1 (2), pp. 41–48 (1998). Rational Drug Design: Controlling the Size of the Haystack. 137. D. B. Boyd, in Encyclopedia of Computational Chemistry, P. v. R. Schleyer, N. L. Allinger, T. Clark, J. Gasteiger, P. Kollman, and H. F. Schaefer III, Eds., Wiley, Chichester, 1998, Vol. 1, pp. 795–804. Drug Design. 138. D. B. Boyd, in Rational Drug Design: Novel Methodology and Practical Applications, A. L. Parrill and M. R. Reddy, Eds., ACS Symp. Series 719, American Chemical Society, Washington, DC, 1999, pp. 346–356. Is Rational Design Good for Anything? 139. P. Zurer, Chemical Eng. News, June 20, 2005, p. 54. Crixivan. 140. J. C. Dyason, J. C. Wilson, and M. Von Itzstein, in Computational Medicinal Chemistry for Drug Discovery, P. Bultinck, H. De Winter, W. Langenaeker, and J. P. Tollenaere, Eds., Dekker, New York, 2004. Sialidases: Targets for Rational Drug Design. 141. D. E. Walters and A. J. Hopfinger, THEOCHEM, 27, 317–323 (1986). Case Studies of the Application of Molecular Shape Analysis to Elucidate Drug Action. 142. S. A. DePriest, D. Mayer, C. B. Naylor, and G. R. Marshall, J. Am. Chem. Soc., 115, 5372– 5384 (1993). 3-D-QSAR of Angiotensin-Converting Enzyme and Thermolysin Inhibitors: A Comparison of CoMFA Models Based on Deduced and Experimentally Determined Active Site Geometries. 143. M. Congreve, C. W. Murray, and T. L. Blundell, Drug Discovery Today, 10, 895–907 (2005). Structural Biology and Drug Discovery. 144. D. B. Boyd, A. D. Palkowitz, K. J. Thrasher, K. L. Hauser, C. A. Whitesitt, J. K. Reel, R. L. Simon, W. Pfeifer, S. L. Lifer, K. Takeuchi, V. Vasudevan, A. D. Kossoy, J. B. Deeter, M. I. Steinberg, K. M. Zimmerman, S. A. Wiest, and W. S. Marshall, in Computer-Aided Molecular Design: Applications in Agrochemicals, Materials, and Pharmaceuticals, C. H. Reynolds, M. K. Holloway, and H. K. Cox, Eds., ACS Symp. Series 589, American Chemical Society, Washington, DC, 1995, pp. 14–35. Molecular Modeling and Quantitative Structure-Activity Relationship Studies in Pursuit of Highly Potent Substituted Octanoamide Angiotensin II Receptor Antagonists. 145. See, for example, A. B. Wagner, J. Chem. Inf. Model, 46, 767–774 (2006). SciFinder Scholar 2006: An Empirical Analysis of Research Topic Query Processing. And references therein. 146. D. B. Boyd and K. B. Lipkowitz, in Reviews in Computational Chemistry, K. B. Lipkowitz and D. B. Boyd, Eds., Wiley-VCH, New York, 2002, Vol. 18, pp. 293–319. Examination of the Employment Environment for Computational Chemistry. 147. Pharmaceutical Research and Manufacturing Association, Washington, DC. www. phrma.org.

References

451

148. European Federation of Pharmaceutical Industries and Associations, Brussels, Belgium. www.efpia.org/6_publ/infigure2004d.pdf. The Pharmaceutical Industry in Figures, 2000 Edition. 149. M. McCoy, Chem. Eng. News, Oct. 17, 2005, p. 9. Study Finds R&D Money Doesn’t Buy Results. 150. E. F. Schmid and D. A. Smith, Drug Discovery Today, 10, 1031–1039 (2005). Is Declining Innovation in the Pharmaceutical Industry a Myth? 151. S. Class, Chem. Eng. News, Dec. 5, 2005, pp. 15–32. Pharma Reformulates. 152. R. Mullin, Chem. Eng. News, Jan. 23, 2006, p. 9. Tufts Report Anticipates Upturn.

Author Index Abbott, N. L., 260 Abrams, C. F., 258 Abusalbi, N., 228 Adamo, C., 118, 232 Adams, J. E., 225 Adams, J. T., 228 Adams, P. D., 449 Agarwal, A., 448 Ahlrichs, P., 261 Ahlrichs, R., 76, 78, 81 Ahmed, F. R., 444 Aires-de-Sousa, J., 399 Akkermans, R. L. C., 258, 259 Akutsu, T., 395, 396 Albert, A., 446 Albu, T. V., 223, 229 Alden, R. A., 448 Alder, B. J., 262 Alhambra, C., 229, 231, 232 Aliferis, C. F., 394 Al-Laham, M. A., 119, 232 Allen, F. H., 446 Allen, M. P., 260 Allinger, N. L., 81, 446, 447, 450 Allison, T. C., 227 Almlo¨f, J. E., 76, 77, 78, 82 Altornare, C., 397 Amboni, R. D. M. C., 397 Amos, R. D., 81, 119 Anand, N., 446 Ancona, N., 400 Andersen, H. C., 260 Andersson, K., 120 Andose, J. D., 446, 447 Andres, J. L., 232

Angelo, M., 394 Anglada, J. M., 121 Angulo, C., 394 Antti, H., 393 Aoki, M., 396 Applegate, B. E., 118 Aptula, A. O., 397 Arfken, G. B., 77 Arimoto, R., 399 Armstrong, R. C., 259 Arnhold, T., 399 Atchity, G. J., 117, 118 Austin, A. J., 118 Ayala, P. Y., 82, 118, 225, 232 Aynechi, T., 289 Ayton, G., 261 Babamov, V. K., 229 Baboul, A. G., 119 Bader, G. D., 400 Badhe, Y., 395 Baer, M., 119, 223 Baerends, E. J., 119 Baeten, V., 399 Bai, D., 258 Bajorath, J., 287, 288, 289 Bakken, V., 118 Balakin, K. V., 399 Balasubramanian, K., 123 Baldi, P., 394 Baldridge, K. K., 120, 225 Bandyopadhyay, S., 261 Barckholtz, T. A., 116, 117 Barnhill, S., 392 Barns, J., 77

Reviews in Computational Chemistry, Volume 23 edited by Kenny B. Lipkowitz and Thomas R. Cundari Copyright ß 2007 Wiley-VCH, John Wiley & Sons, Inc.

453

454

Author Index

Baron, H. P., 81 Barone, P. M. V. B., 397 Barone, V., 118, 232 Barry, C. D., 447 Bartlett, P., 392 Bartlett, P. A., 449 Bartlett, P. L., 393 Bartlett, R. J., 75, 119, 120 Bartol, D., 225 Baschnagel, J., 259 Baskin, B., 400 Bates, D., 117 Batoulis, J., 258, 261 Beachy, M. D., 82 Bearden, A. P., 396 Bearpark, M. J., 116, 121 Beatson, R., 77 Beck, B., 399 Becke, A. D., 79 Bekker, H., 260 Bell, R. P., 225 Bellard, S., 446 Bellon-Maurel, V., 399 Belooussov, A., 399 Ben-Nun, M., 118, 121, 122 Bendale, R. D., 287 Bengio, S., 400 Bennett, K. P., 394 Berendsen, H. J. C., 260 Bergsma, J. P., 232 Berman, M., 123 Bernardi, F., 116, 118, 121, 122 Bernasconi, C. F., 228 Bernhardsson, A., 119 Bernholc, J., 258 Berning, A., 119 Bernstein, F. C., 446 Berry, M. V., 123 Bersuker, I. B., 117 Bertra´n, J., 223, 230, 232 Bertz, S. H., 287 Beveridge, D. J., 230, 444 Bi, J., 394 Bicego, M., 400 Bickelhaupt, F. M., 119 Bierbaum, V., 123 Biermann, O., 258, 259 Biggio, G., 397, 398 Binder, K., 258, 259, 261 Bingham, R. C., 445 Binkley, J. S., 81 Birge, L., 288

Blancafort, L., 120, 123 Blaney, J. M., 449 Blundell, T. L., 450 Boatz, J. A., 120 Bo¨hm, H.-J., 449 Bofill, J. M., 121 Bolcer, J. D., 444 Bolin, J. T., 448 Bonacˇic´ Koutecky´, V., 122 Bonchev, D., 287 Bondi, D. K., 229 Borden, W. T., 119 Born, M., 76, 116, 117 Bosshard, H. E., 447 Bouman, T. D., 81 Bourasseau, E., 260 Boutin, A., 260 Bowes, C., 223 Bowler, D. R., 80 Bowman, J. M., 232 Boyd, D. B., 75, 77, 116, 119, 230, 288, 444, 445, 446, 447, 448, 449, 450 Boyd, R. J., 444 Boyd, R. K., 223 Braga, R. S., 397 Brandt, A., 258 Brannigan, G., 262 Breneman, C. M., 394, 445 Brereton, R. G., 399, 400 Brice, M. D., 446 Briels, W. J., 258 Briem, H., 395 Broo, A., 122 Brooks, B. R., 232, 448 Brown, D., 260 Brown, F. L. H., 262 Brown, S. P., 81 Bruccoleri, R. E., 232, 448 Brudzewski, K., 400 Bruice, T. C., 444 Bru¨nger, A., 448, 449 Brunne, R. M., 289 Buchbauer, G., 397 Buchenau, U., 258 Buckingham, A. D., 76 Budzien, J., 260 Bultinck, P., 450 Bunescu, R., 400 Burant, J. C., 78, 118, 232 Bures, M. G., 447 Bu¨rger, T., 258, 261 Burges, C. J. C., 392, 393, 400

Author Index Burghardt, I., 124 Burkert, U., 447 Bush, I. J., 80 Busonero, F., 397 Buydens, L. M. C., 399 Byron, R. B., 76 Byvatov, E., 395 Cai, Y. D., 396, 399 Callis, P. R., 122 Camilo Jr., A., 397 Cammi, R., 118, 232 Campagne, F., 400 Campaigne, E., 446 Campbell, C., 393 Cao, J., 398 Cao, L., 394 Cao, Y., 82 Cao, Z. W., 399 Car, R., 79 Carini, D. J., 393 Carmesin, I., 259 Carotti, A., 397, 398 Carrieri, A., 398 Carter, E. A., 82 Carter, J. F., 399 Cartwright, B. A., 446 Carver, T. J., 258 Castro, E. A., 397 Catala`, A., 394 Caufield, C., 449 Cavalla, J. F., 446 Cawthraw, S., 399 Cederbaum, L. S., 117, 118, 121, 122, 124 Celani, P., 119 Cembran, A., 118 Chakravorty, S. J., 287 Challacombe, M., 77, 78, 79, 82, 119, 232 Chambers, C. C., 231 Chang, C. C., 393 Chang, G., 449 Chang, Y. T., 230 Chapelle, O., 392 Chastrette, M., 397 Chatfield, D. C., 228. 232 Chauchard, F., 399 Cheeseman, J. R., 81, 118, 232 Chen, B., 79 Chen, C. H., 395 Chen, J. J., 395 Chen, L. B., 394 Chen, N. Y., 398

Chen, P. H., 394 Chen, Q. S., 396 Chen, W., 119, 232 Chen, X., 395 Chen, X. G., 400 Chen, Y., 396 Chen, Y. Z., 395, 399 Chen, Z., 260 Chervonenkis, A., 392 Chiasserini, L., 397 Chou, K. C., 396 Christiansen, P. A., 123 Christoffersen, R. E., 446, 447 Chu, Y. H., 400 Chu, Z. T., 122 Chuang, Y.-Y., 223, 226, 227, 229, 230, 231 Chung, C. B., 400 Ciccotti, G., 259, 260 Cinone, N., 398 Cioslowski, J., 119, 232 Cipriani, J., 77 Clancy, T. C., 261 Clark, D. E., 449 Clark, T., 81, 289, 450 Class, S., 451 Clifford, S., 119, 232 Clore, G. M., 449 Coe, J. D., 123 Cogdill, R. P., 399 Cohen, A., 396 Cohen, B., 122 Coitin˜o, E. L., 223, 229, 230 Colhoun, F. L., 259 Collier, N., 400 Collins, M. A., 230 Collobert, R., 400 Colombo, L., 79 Coltrin, M. E., 228 Congreve, M., 450 Connolly, M. L., 447 Connor, J. N. L., 229 Connors, K. A., 230 Consonni, V., 288, 393, 450 Cooke, I. R., 262 Cooper, D. L., 119 Corchado, J. C., 223, 226, 228, 229, 230, 231 Corey, E. J., 445, 446 Cortes, C., 392 Cossi, M., 118, 120, 232 Cover, T. M., 288

455

456

Author Index

Cox, H. K., 450 Craig, P. N., 446 Cramer, C. J., 230, 231, 232 Cramer III, R. D., 445 Cramer, T., 399 Crawford, T. D., 119 Cremer, D., 76 Crespo Hernandez, C. E., 122 Cristianini, N., 392 Cronin, M. T. D., 397, 398 Cross, J. B., 118 Cross, L. C., 444 Cross, P. C., 224 Crum Brown, A., 443, 444 Csizmadia, I. G., 223 Csonka, G. I., 79 Cubic, B. C., 228 Cui, Q., 119, 232 Cundari, T. R., 122, 123, 394 Curro, J. G., 258 Curtiss, C. F., 76 Dachsel, H., 119 Dallos, M., 119 Daly, J., 287 Dammkoehler, R. A., 447 Dancoff, S. M., 287 Daniels, A. D., 80, 118, 232 Daniels, M., 122 Dannenberg, J. J., 118 Dantus, M., 116 Dapprich, S., 118, 232 Dardenne, P., 399 Daudel, R., 224 Davidson, E. R., 77, 119, 121 Daw, M. S., 80 Dawson, R. W., 398 De Brabanter, J., 392 de Bruijn, B., 400 de Carvalho, A., 395 De Moor, B. L. R., 392, 395 de Pablo, J. J., 260, 261 De Raedt, L., 399 De Smet, F., 395 de Vries, A. H., 260 De Winter, H., 450 Dearden, J. C., 398 Debnath, R., 394 Decius, J. C., 224 Deegan, M. J. O., 119 Deeter, J. B., 450 DeLano, W. L., 449

Della Valle, R. G., 259 Delle Site, L., 258 DePriest, S. A., 450 Deserno, M., 262 Dewar, M. J. S., 230, 445, 447 Dickey, A. N., 262 Diercksen, G. H. F., 76, 77 Dimitrov, S. D., 398 Dimitrova, N. C., 398 Distante, C., 400 Ditchfield, R., 81, 445 Dixon, D. A., 119 Dixon, J. S., 449 Dobbyn, A. J., 120 Doi, M., 258, 259 Dolenko, B., 399 Domcke, W., 116, 117, 118, 121, 122, 123 Dominy, B. W., 288, 449 Donaldson, I., 400 Doruker, P., 259 Doser, B., 76 Doubleday, A., 446 Downs, T., 395 Doxastakis, M., 260 Drucker, H., 392 Du, L., 395 Du, W., 394 Duffy, E. M., 289 Dunietz, D. B., 82 Dunlea, S., 395 Dunn, D. A., 447 Du¨nweg, B., 261 Dupuis, M., 120, 121, 225 Dyason, J. C., 450 Dyczmons, V., 76 Eckart, C., 225 Ecker, G., 397 Eckers, C., 399 Eckert, F., 120 Ediger, M. D., 260 Edwards, R., 398 Edwards, S. F., 259 Ehara, M., 118 Eiden, M., 399 Eilhard, J., 258 El Aı¨di, C., 397 El Ghaoui, L., 395 Elbert, S. T., 118, 120 Eliason, M. A., 224 Elisseeff, A., 394 Ellingson, B. A., 223, 227

Author Index Embrechts, M., 394, 400 Engkvist, O., 260 Englman, R., 117 Enslein, K., 447 Eriksson, L., 393 Erion, M. D., 231, 448 Ermler, W. C., 123 Escobedo, F. A., 260 Eskin, E., 396 Espinosa-Garcia, J., 228 Esquivel, R. O., 287, 288 Esselink, K., 261 Esterman, I., 117 Evans, M. G., 223 Everett, A. J., 448 Ewig, C. S., 232 Eynon, B. P., 395 Eyring, H., 76, 223 Faceli, K., 395 Facius, A., 395 Faegri, K., 76 Falck, E., 261 Faller, R., 258, 259, 260, 261, 262 Farago, O., 262 Farazdel, A., 121 Farkas, O., 118, 232 Fast, P. L., 223, 224, 229, 230 Fatemi, M. H., 398 Feeney, P. J., 288, 449 Fernandez-Ramos, A., 223, 228, 229 Ferr, N., 118 Ferrin, T. E., 447 Feshbach, H., 77 Feyereisen, M. W., 78 Filman, D. J., 448 Finley, J., 120 Fischer, E., 443 Fkih-Tetouani, S., 397 Flanigan, M. C., 225 Flannery, B. P., 80 Fleischer, U., 81 Fletcher, R., 393 Fock, V., 75, 445 Fogarasi, G., 226 Foresman, J. B., 119, 232 Fowler, R. H., 223 Fox, D. J., 119, 232 Fox, T., 399 Fraaije, J. G. E. M., 260 Franaszek, M., 395 Frank, E., 395

457

Frazer, T. R., 443, 444 Free, Jr., S. M., 444 Freer, S. T., 448 Frenkel, D., 260 Friedman, R. S., 228, 232 Friesner, R., 82 Frisch, M. J., 78, 80, 81, 118, 225, 232 Fro¨hlich, H., 287, 395 Fru¨chtl, H. A., 78 Frurip, D. J., 231 Frymier, P. D., 397 Fuchs, A. H., 260 Fujimoto, H., 225 Fujita, T., 444 Fukuda, R., 118 Fukui, K., 224, 225 Fukunaga, H., 258 Fung, G. M., 394 Fuss, W., 122 Fu¨sti-Molnar, L., 78 Fytas, G., 260 Gadre, S. R., 287 Galli, G., 79 Galva˜o, D. S., 397 Gamow, G., 225 Gao, H., 400 Gao, J., 229, 231, 232 Gao, J. B., 394 Garavelli, M., 116, 118, 122 Garcia-Viloca, M., 229, 231, 232 Garg, R., 393, 398 Garrett, B. C., 223, 224, 225, 226, 227, 228, 229, 231, 232 Gasteiger, J., 81, 393, 399, 450 Gates, K. E., 395 Gatti, C., 287 Gauss, J., 76, 81 Gazzillo, D., 259 Ge, R. F., 400 Geibel, P., 396 Gelani, P., 122 Georgievskii, Y., 227 Gerald, W., 394 Gersmann, K., 396 Gertner, B. J., 232 Ghosh, J., 262 Gianola, A., 123 Giesen, D. J., 231 Gifford, E. M., 399 Gilbert, R. G., 232 Gill, P. M. W., 76, 77, 78, 79, 119, 232

458

Author Index

Gillan, M. J., 80 Gilson, M. K., 399 Girosi, F., 392 Glaesemann, K. R., 120 Glezakou, V.-A., 118 Godden, J. W., 287, 288, 289 Goedecker, S., 79, 80 Goetz, R., 261 Goldman, B. B., 394 Golub, T. R., 394 Gomperts, R., 118, 232 Gompper, G., 261 Gonzalez, C., 119, 232 Gonza´lez-Lafont, A., 226, 227, 228, 229 Gordon, M. S., 118, 120, 225, 228 Gould, E. S., 445 Grabowski, E. J. J., 446 Graham, D. J., 287, 288 Graham, R. L., 79 Gramatica, P., 398 Granucci, G., 122 Greenberg, A., 445 Greengard, L., 77 Grenthe, I., 116 Grest, G. S., 258, 259 Grethe, G., 447 Grev, R. S., 224 Grimm, M., 398 Gros, P., 449 Gross, E. K. U., 120 Grosse-Kunstleve, R. W., 449 Grotendorst, J., 80, 81, 82 Grubbs, F., 288 Gu, Q., 394 Gu, Z., 226 Guermeur, Y., 394 Guida, W. C., 449 Guillo, C., 399 Gund, P., 445, 446, 447 Gunn, S. R., 394 Gu¨nther, J., 395 Guo, C., 396 Guo, H., 261 Guo, Z., 395 Gusev, A. A., 259 Guyon, I., 392 Gwaltney, S., 80 Gwinn, W. D., 227 Gygax, R., 446 Haboudou, M., 260 Hack, M. D., 121

Hada, M., 118 Hadjichristidis, N., 260 Hadjipavlou-Litina, D., 398 Haffner, P., 392 Hafskjold, B., 261 Hahn, O., 258, 259, 261 Haire, K. R., 258 Halgren, T. A., 448, 449 Haliloglu, T., 261 Hall, G. G., 76 Hall, L. H., 288 Hall, M. A., 395 Halvick, P., 228 Hamlin, R., 448 Hammer, B., 396 Hammes Schiffer, S., 121 Hammett, L. P., 445 Hampel, C., 82, 120 Han, C. H., 400 Han, I. S., 400 Han, S., 123 Hancock, G. C., 223, 228 Handy, N. C., 81, 225 Hanna-Brown, M., 399 Hansch, C., 393, 398, 444, 446, 450 Hansen, A. E., 81 Hardin, D., 394 Hare, P. M., 122 Harris, C. J., 394 Harrison, R. J., 119 Hartree, D. R., 75, 445 Hasegawa, J., 118 Ha¨ser, M., 76, 81, 82 Hass, Y., 121 Hauser, K. L., 450 Hauswirth, W., 122 Hawkins, G. D., 231, 232 He, K., 287 Head-Gordon, M., 76, 77, 78, 79, 80, 81, 82, 232 Healy, E. F., 230, 447 Hehre, W. J., 445 Heidrick, D., 229, 232 Heinrich, N., 398 Heinzen, V. E. F., 397 Helgaker, T., 76, 77, 79, 81, 82, 120 Helma, C., 399 Helson, H. E., 446 Hendrickson, J. B., 446 Hendrickson, T., 449 Henkel, T., 289 Herbrich, R., 392

Author Index Hermann, R. B., 444, 445 Hermens, J. L. M., 396 Hernandez, E., 80 Herron, D. K., 445 Hertel, I. V., 123 Herzberg, G., 117, 226, 227 Hess, B., 260 Heß, B. A., 123 Hetzer, G., 82, 120 Heuer, A., 259 Hierse, W., 80 Higgs, H., 446 Hilbers, P. A. J., 261 Hinton, J. F., 81 Hirai, K., 448 Hirao, K., 120 Hirsch, J. A., 446 Hirschfelder, J. O., 76, 224, 225 Ho, I. C., 395 Ho, M., 287, 288 Hoekman, D., 450 Hoenigman, R., 123 Hoffman, B. C., 124 Hoffmann, R., 444 Hoffmann-Ostenhof, M., 78 Hoffmann-Ostenhof, T., 78 Hogekamp, A., 78 Hogue, C. W. V., 400 Hohenberg, P., 75 Hol, W. G., 448 Holloway, M. K., 450 Holmes, E., 393 Honda, Y., 118 Hopfinger, A. J., 450 Horiuti, J., 224 Horn, H., 81 Howe, W. J., 445 Hratchian, H. P., 118 Hsu, C. W., 394 Hu, H. Y., 398 Hu, Q. N., 398 Hu, W.-P., 223, 229, 230 Hu, Z. D., 400 Huang, K., 117 Huang, X., 232 Huarte-Larran˜aga, F., 232 Humber, L. G., 446 Hummelink, T., 446 Hummelink-Peter, B. G., 446 Hut, P., 77 Huuskonen, J., 289 Hynes, J. T., 124, 232

459

Ichino, T., 123 Inoue, M., 448 Irikura, K., 231 Irikura, T., 448 Irwin, J. J., 288 Isaacson, A. D., 223, 226, 227, 228, 229 Ischtwan, J., 230 Ishida, M., 118 Ismail, N., 120 Itoh, A., 448 Ivanciuc, O., 393, 396, 397, 398 Ivanov, I., 261 Ivaschenko, A. A., 399 Iyengar, S. S., 118 Jackels, C. F., 223, 226, 227 Jahn, J. A., 117 Jain, B. J., 396 Jalali-Heravi, M., 398 Jaramillo, J., 118 Jarnagin, K., 395 Jasper, A. W., 116, 118 Jaszunski, M., 81 Jayaraman, V. K., 395 Jaynes, E. T., 288 Jeliazkova, N. G., 397 Jenkin, D. G., 444 Jensen, H. J. A., 120 Jensen, J. H., 120 Jerebko, A. K., 395 Jiang, J.-S., 449 Jiang, W., 395 Joachims, T., 392, 400 Johansson, E., 393 Johnson, B. G., 76, 77, 78, 79, 119, 232 Johnson, C. K., 444 Jordan, M. J. T., 232 Jørgensen, F. S., 450 Jørgensen, P., 79, 82, 120 Jorgensen, W. L., 289 Jørgenson, P., 76 Jorissen, R. N., 399 Joseph, T., 223, 225, 228 Joy, A. T., 288 Junkes, B. S., 397 Jurs, P. C., 289, 399 Kanaoka, M., 396 Karelson, M., 450 Karin, S., 448 Karlstro¨m, G., 120, 260 Karplus, M., 232, 448

460

Author Index

Karttunen, M., 261 Kasheva, T. N., 289 Kate, R. J., 400 Kato, S., 123, 225 Katriel, J., 121 Katsov, K., 261 Keating, S. P., 123 Keck, J. C., 224 Kecman, V., 392 Kedziora, G. S., 119 Keith, T., 119, 232 Keith, T. A., 81 Kelly, C. P., 231, 232 Kemble, E. C., 225 Kendall, R. A., 78 Kendrick, B. K., 117 Kennard, O., 446 Kharasch, N., 444 Kier, L. B., 288 Kierstad, W. P., 232 Kim, E. B., 260 Kim, J., 79 Kim, Y., 228, 229 Kimball, G. E., 76 Kimura, T., 396 Kisliuk, R. L., 448 Kitao, O., 118 Kitchen, D. B., 288 Kjaer, K., 262 Kla¨rner, F.-G., 81 Klautau, A., 394 Klein, C. T., 397 Klein, M. L., 261 Klein, S., 121 Klene, M., 118 Klessinger, M., 121 Klippenstein, S. J., 226, 227 Klocker, J., 397 Knirsch, P., 393 Knowles, P., 82 Knowles, P. J., 119 Knox, J. E., 118 Kobayashi, T., 122 Koch, H., 120 Koetzle, T. F., 446 Koga, H., 448 Koga, N., 120 Kohen, A., 231 Kohler, B., 120, 122 Ko¨hler, W., 399 Kohn, W., 75, 76, 78 Kolinski, A., 259

Kollman, P. A., 81, 448, 450 Koltun, W. L., 444 Komaromi, I., 119, 232 Komornicki, A., 225 Kong, J., 78 Konuze, E., 398 Ko¨ppel, H., 116, 117, 118, 121 Kornberg, R. D., 262 Korona, T., 119 Korsell, K., 76 Koseki, S., 120, 122 Kossoy, A. D., 450 Kouri, D. J., 228 Koziol, F., 80, 81 Kramer, S., 399 Kramers, H., 123 Kranenburg, M., 261 Kraut, J., 448 Kreer, T., 261 Kreevoy, M. M., 228 Kremer, K., 258, 259, 260, 261, 262 Kriegl, J. M., 399 Krishnan, R., 81 Krogsgaard-Larsen, P., 450 Krylov, A. I., 120 Kuang, R., 396 Kubinyi, H., 288 Kudin, K. N., 118, 232 Kuhl, T. L., 262 Kuhn, W., 261 Kulkarni, A., 395 Kulkarni, B. D., 395 Kullback, S., 288 Kumar, R., 395 Kuntz, I. D., 289, 447 Kuppermann, A., 117, 225, 228 Kurepa, M. V., 228 Kurup, A., 393 Kussmann, J., 80 Kuszewski, J., 449 Kutzelnigg, W., 76, 81 Laage, D., 124 Labute, P. A., 288, 289 Ladd, C., 394 Laidler, K. J., 224 Lambrecht, D. S., 76 Lanckriet, G. R. G., 395 Land, W. H., 400 Lander, E. S., 394 Langenaeker, W., 450 Langridge, R., 447

Author Index Larsen, H., 79, 82 Larter, R., 122, 123, 394 Laso, M., 261 Lathan, W. A., 445 Latulippe, E., 394 Lauderdale, J. G., 223 Laumer, J. Y. D., 397 Lauri, G., 449 Lawley, K. P., 80 Lay, V., 400 Lee, C. K., 394 Lee, D. E., 400 Lee, H. P., 394 Lee, M., 80 Lee, M. S., 78, 80, 82 Lee, S., 232 Lee, T. C., 395 Lee, T.-S., 79 Lee, Y., 394 Leeming, P. R., 448 Leibensperger, D., 400 Lengauer, T., 395 Lengsfield, B. H., 119 Lenz, O., 261 Leo, A. J., 446, 447, 450 Lerner, A., 392 Leroy, F., 395 Leszczynski, J., 77 Leslie, C., 396 Leslie, C. S., 396 Lester, M. I., 124 Levine, I. N., 76 Levine, R. D., 121 Levy, S., 394 Lewis, R. A., 449 Li, H., 399 Li, J., 231, 232, 449 Li, L. B., 395 Li, T., 394 Li, X., 118, 395 Li, X.-P., 79 Li, Z. R., 395, 399 Liashenko, A., 119, 232 Liaw, A., 399 Lichten, W., 117 Liebman, J. F., 445 Lifer, S. L., 450 Liljefors, T., 450 Lloyd, A. W., 120 Lim, C., 223 Limbach, H.-H., 231 Lin, C. J., 393, 394

461

Lin, H., 229 Lin, J.-H., 289 Lind, P., 396 Lindahl, E., 260 Lindh, R., 119, 120 Lineberger, W. C., 123 Ling, X. B., 394 Liotard, D. A., 231, 232 Lipinski, C. A., 288, 289, 449 Lipkowitz, K. B., 75, 77, 116, 119, 122, 123, 230, 288, 394, 444, 446, 447, 448, 449, 450 Lipowsky, R., 261 Lipscomb, W. N., 444 Lipton, M., 449 Lischka, H., 119 Liskamp, R., 449 Liu, G., 119, 232 Liu, X. J., 396 Liu, Y., 395 Liu, Y.-P., 223, 227, 228, 229, 230, 231 Livingstone, D. J., 394 Lluch, J. M., 226 Lo, D. H., 445 Loda, M., 394 Lohr, Jr., L. L., 232 Lombardo, F., 288, 449 Lonari, J., 395 London, F., 81 Long, A. K., 446 Longuet-Higgins, H. C., 117 Loomis, R. A., 124 Lopes, C. F., 261 Lorenzo, L., 288 Lo¨wdin, P. O., 446 Lu, D.-h., 223, 227, 227, 228 Lu, W. C., 398 Lu, X. X., 398 Ludwig, D. S., 262 Lundstedt, T., 393 Lunn, W. H. W., 445 Luz Sa´nchez, M., 229 Lynch, B. J., 223 Lynch, G. C., 223, 227, 228, 229 Lynch, V. A., 227 Maciocco, E., 397, 398 Maggiora, G. M., 287 Magnuson, A. W., 224 Mahe´, P., 396 Maigret, B., 444 Majewski, J., 262 Malarkey, C., 287

462

Author Index

Malick, D. K., 118, 232 Malley, J. D., 395 Malli, G. L., 123 Malmqvist, P. A˚., 120 Maltseva, T., 396 Mameli, M., 397 Manaa, M. R., 116 Manby, F. R., 119 Mangasarian, O. L., 394 Mannhold, R., 288 Manohar, L., 122 Manthe, U., 232 Mao, K. Z., 395 Marcotte, E. M., 400 Marcus, R. A., 224, 225, 226, 228, 229 Marian, C. M., 122, 123 Marino, D. J. G., 397 Mark, A., 260 Markiewicz, T., 400 Marrink, S. J., 260 Marsh, M. M., 445, 447 Marshall, G. R., 447, 450 Marshall, W. S., 450 Martin, A. N., 445 Martin, J., 400 Martin, M. E., 122 Martin, R. L., 119, 232 Martin, T. C., 399 Martin, Y. C., 446, 447 Martinez, T. J., 118, 121, 122, 123 Mascia, M. P., 397 Maslen, P. E., 78, 80, 82 Massarelli, P., 398 Masters, A., 395 Matsika, S., 118, 120, 123, 124 Matsunaga, N., 120, 122 Mattai, J., 262 Matthews, D. A., 447 Mattice, W. L., 259, 261 Mauri, F., 79 Mauser, H. A., 395 Mayer, D., 450 Mayer, J. E., 446 Mayer, K. F. X., 395 McCammon, J. A., 448 McCoy, J. D., 258 McCoy, M., 451 McElroy, N. R., 289 McFarland, J. W., 446 McIntosh, D. F., 226 McIver Jr., J. W., 225 McKinnon, R. A., 399

McMurchie, L. E., 77 McNicholas, S. J., 120 McQuarrie, D. A., 231 McWeeny, R., 80 Mead, C. A., 116, 117, 118, 123 Meeden, G., 287 Mekenyan, O. G., 398 Melissas, V. S., 223, 225, 228 Melssen, W. J., 399 Mennucci, B., 118, 122, 232 Mercer, J., 393 Mercer, K. L., 262 Mercha´n, M., 120 Merkwirth, C., 395 Mesirov, J. P., 394 Messiah, A., 80, 443 Mewes, H. W., 395 Meyer, H., 258, 259, 260 Meyer, H.-D., 121 Meyer, W., 120 Meyer, Jr., E. F., 446 Michalickova, K., 400 Micheli, A., 396 Michelian, K. H., 226 Michl, J., 116, 121, 122 Mielke, S. L., 227 Migani, A., 118, 121 Mika, S., 393, 398, 400 Milano, G., 259 Millam, J. M., 79, 232 Miller, C. E., 262 Miller, M. A., 446 Miller, T. A., 116, 117 Miller, W. H., 121, 224, 225, 230 Milliam, J. M., 118 Milosevich, S. A. F., 448 Mina, N., 223 Miners, J. O., 399 Minichino, C., 230 Mitchell, A. D., 444 Mitchell, B. E., 289 Mitsuhashi, S., 448 Miyazaki, T., 80 Moecks, J., 399 Mohamadi, F., 449 Møller, C., 75 Molnar, F., 118 Monmaney, T., 447 Montgomery Jr, J. A., 118, 120, 232 Moock, T. E., 447 Mooney, R. J., 400 Moore, G. E., 76

Author Index Moore, P. B., 261 Morgan III, J. D., 78 Morokuma, K., 118, 120, 232 Morse, P. M., 77 Moser, K. L., 395 Mosquera, R. A., 288 Motherwell, W. D. S., 446 Mouritsen, O. G., 262 Mowshowitz, A., 287 Mukherjee, S., 394 Mu¨ller, B., 260 Mu¨ller, H., 289 Mu¨ller, K.-R., 393, 398 Mu¨ller, M., 261 Mu¨ller, Th., 119 Mu¨ller-Plathe, F., 258, 259, 260 Mullin, R., 451 Mura, M. E., 120 Murat, M., 258, 259 Murayama, S., 448 Murcko, M. A., 449 Murphy, R. B., 82 Murray, C. W., 449, 450 Murrell, J. N., 121 Murtola, T., 261 Muselli, M., 395 Musicant, D. R., 394 Nachbar, R. B., 449 Nagashima, R., 399 Nakai, H., 118 Nakajima, T., 118 Nakano, H., 120 Nakatsuji, H., 118 Nanayakkara, A., 119, 232 Nandi, S., 395 Nangia, S., 116 Natanson, G. A., 226 Natsoulis, G., 395 Naylor, C. B., 450 Neely, W. B., 445 Negri, F., 122 Nencini, C., 398 Neogrady, P., 120 Netzeva, T. I., 398 Newton, M. D., 445 Ng, C.-Y., 76, 119 Nguyen, H. X., 448 Nguyen, K. A., 120, 223, 227, 230 Nicklass, A., 120 Niedfeldt, K., 82 Nielsen, S. O., 261

Nilges, M., 449 Nishikawa, T., 399 Niyogi, P., 392 Noble, W. S., 396 Nunes, R. W., 79 Obara, S., 77 Ochsenfeld, C., 76, 78, 80, 81 Ochterski, J. W., 118, 232 Ogihara, M., 394 Ohtani, H., 122 Oladunni, O., 400 Olafson, B. D., 232, 448 Olivucci, M., 116, 118, 120, 121, 122 Olsen, J., 76, 79, 82 Olsen, S., 122 Olson, E. C., 446, 447 O’Malley, T. F., 117 Opik, U., 117 Oppenheimer, R. A., 76, 116 Ordejon, P., 80 Ortiz, J. V., 119, 232 Osowski, S., 400 Ostlund, N. S., 75 Ostovic, D., 228 Ovchinnikova, M. Y., 229 Overend, J., 226 Ozisik, R., 261 Pacher, T., 117 Page, M., 225 Palkowitz, A. D., 450 Palmieri, P., 120 Pannu, N. S., 449 Pant, P. V. K., 260 Papa, E., 398 Papavassiliou, D. V., 400 Pardo, M., 400 Parr, R. G., 76, 445 Parra, X., 394 Parrill, A. L., 450 Pasini, P., 262 Pastore, E. J., 448 Patey, G. N., 232 Patra, M., 261 Paugam-Moisy, H., 394 Paul, W., 259, 261 Pavo´n, J. L. P., 400 Pawson, T., 400 Pearson, C. I., 395 Pechukas, P., 224 Pellerano, C., 397, 398

463

464

Author Index

Peng, C., 225 Peng, C. Y., 119, 232 Peng, S. H., 394 Peng, X. N., 394 Pepers, M., 399 Perdew, J. P., 78, 79 Perram, J. W., 78 Perret, J.-L., 396 Perun, S., 122 Perun, T. J., 448 Peruzzo, P. J., 397 Pesa, M., 226 Petersen, H. G., 78 Petersson, G. A., 118, 232 Petrich, W., 399 Petsko, G. A., 448 Peyerimhoff, S. D., 123, 444 Peyraud, J. F., 397 Pfeifer, W., 450 Pickett, S. D., 449 Pierloot, K., 120 Pierna, J. A. F., 399 Pilling, M. J., 226, 230 Piskorz, P., 119, 232 Pitzer, K. S., 227 Pitzer, R. M., 119, 120 Plante, L. T., 448 Platt, J., 393 Plesset, M. S., 75 Pletnev, I. V., 399 Pochet, N., 395 Poggio, T., 392, 394 Polanyi, M., 223 Polinger, V. Z., 117 Pollard, W. T., 82 Pollastri, G., 394 Pomelli, C., 118, 232 Pon, F. R, 262 Pople, J. A., 76, 77, 78, 79, 80, 81, 116, 119, 230, 232, 444, 445 Portera, F., 396 Poulsen, T. D., 231 Prasad, M. A., 399 Press, W. H., 80 Presti, D. E., 445 Preston, R. K., 121 Prigogine, I., 120 Propst, C. L., 448 Provita, M., 446 Pryce, H. L., 117 Pu, J., 223, 228, 229 Pulay, P., 78, 80, 81, 82, 226

Pullman, A., 443 Pullman, B., 76, 224, 443 Pupyshev, V. I., 77 Purcell, W. P., 446 Pu¨tz, M., 259 Qin, S. J., 400 Quastler, H., 287 Quenneville, J., 121 Rabinovitch, B. S., 227 Rabuck, A. D., 119, 232 Rachlin, A. I., 446 Radloff, W., 123 Radom, L., 445 Ragazos, I. N., 118, 121 Raghavachari, K., 119, 232 Rai, S. N., 223, 228 Ramani, A. K., 400 Ramaswamy, S., 394 Ramirez, J. C., 288 Rao, B. S., 395 Rao, S., 395 Raphael, C., 260 Ra¨tsch, G., 393, 398 Rauhut, G., 119 Read, R. J., 449 Reddy, M. R., 231, 448, 450 Redmon, M. J., 223, 225 Reed, A. E., 225 Reed, R. A., 262 Reeder, R. C., 261 Reel, J. K., 450 Rega, N., 118 Reich, M., 394 Reichel, R., 289 Reith, D., 258, 259, 260 Rekvig, L., 261 Ren, S., 396, 397 Renier, A. M., 399 Replogle, E. S., 232 Reynolds, C. H., 450 Rhodes, J. B., 446, 447 Ribi, H. O., 262 Rice, R. A., 449 Rice, S. A., 120 Richards, N. G. J., 449 Richards, W. G., 446 Richon, A. B., 448 Richter, D., 258 Rifkin, R., 394 Rinaldi, D., 232

Author Index Ringnalda, M. N., 82 Rios, M. A., 229 Rivail, J.-L., 444 Robb, M. A., 116, 118, 120, 121, 122, 123, 232 Roberts, J. D., 445 Robertson, S., 227 Robertson, S. H., 226 Roche, O., 395 Rodgers, J. R., 446 Roger, J. M., 399 Ro¨gnvaldsson, T., 396 Rokhlin, V., 77 Roos, B. O., 119, 120 Roothaan, C. C. J., 76 Rorabacher, D. B., 288 Ross, R. B., 123 Rossi, I., 223, 230 Rossiter, K. J., 397 Rost, B., 400 Rothman, M. J., 232 Rouse, P. E., 260 Roussel, S., 399 Rozenholc, Y., 288 Rubenstein, S. D., 446 Ruedenberg, K., 117, 118 Ruffino, F., 395 Runge, E., 120 Rupert, L. A. M., 261 Rutledge, G. C., 259 Ruud, K., 81 Ruzsinszky, A., 79 Ryckaert, J.-P., 260 Ryde, U., 120 Sack, R. A., 117 Sadik, O., 400 Sadlej, A. J., 120 Saebø, S., 82 Sagar, R. P., 287, 288 Saigo, H., 395 Saika, A., 77 Saito, T., 122 Sakurai, J. J., 80 Salo, M., 289 Salt, D. W., 394 Salvador, P., 118 Samoylova, E., 123 Sa´nchez, M. D. N., 400 Sa´nchez, M. L., 231 Sanchez, V. D., 393 Sanna, E., 397

465

Santos, S., 259 Santry, D. P., 230 Saravanan, C., 79 Satija, S., 262 Sato, T., 288 Saunders, V. R., 77 Savchuk, N. P., 399 Savini, L., 397, 398 Sberveglieri, G., 400 Scalmani, G., 118 Scanlon, K., 226 Schacht, D., 287 Schaefer III, H. F., 81, 119, 450 Schaller, T., 81 Scha¨rpf, O., 258 Schenter, G. K., 224, 231, 232 Scherbinin, A. V., 77 Schermerhorn, E. J., 288 Schick, M., 261 Schimmelpfennig, B., 120 Schlegel, H. B., 81, 116, 118, 120, 225, 232 Schleyer, P. v. R., 81, 445, 450 Schlijper, A. G., 261 Schmid, E. F., 451 Schmid, F., 261 Schmid, W. E., 122 Schmidt, M. W., 120, 225 Schmitt, J., 399 Schmitz, H., 259 Schnaare, R. S., 445 Schneider, G., 395 Schnell, I., 81 Scho¨lkopf, B., 392, 393, 394, 400 Schoolnik, G. K., 262 Schouten, J. A., 80 Schreckenbach, G., 81 Schreiner, P. R., 81 Schro¨dinger, E., 75, Schulmerich, M. V., 287 Schulten, K., 118 Schultz, T., 123 Schultz, T. W., 396, 397 Schulz-Gasch, T., 395 Schumann, U., 120 Schurr, J. M., 227 Schu¨tz, M., 82, 119 Schuurmans, D., 392 Schwartz, R. L., 124 Schwegler, E., 77, 78 Schwenke, D. W., 228, 232 Scott, D. W., 288 Scuseria, G. E., 78, 79, 80, 82, 118, 232

466

Author Index

Seakins, P. W., 230 Sears, S. B., 287 Seelbach, U. P., 81 Segal, G. A., 225, 230, 444 Seijo, L, 120 Seminario, J. M., 79 Seng, C. K., 394 Sensi, P., 446 Serrano-Andre`s, L., 120 Seth, M., 119 Sham, L. J., 76 Shanmugasundaram, V., 287 Shannon, C. E., 287 Shao, Y., 79, 81 Shavitt, I., 119, 225 Shawe-Taylor, J., 392 Shelley, J. C., 261 Shelley, M. Y., 261 Shen, D. G., 395 Shepard, R., 119 Sheridan, R. P., 399 Shi, L., 400 Shimanouchi, T., 446 Shipley, G. G., 262 Shockcor, J., 393 Shoichet, B. J., 288 Siciliano, P., 400 Siebrand, W., 229 Siegbahn, P. E. M. Sierka, M., 78 Silva, W. A., 395 Silverman, R. B., 443 Silvi, B., 77 Simon, R. L., 450 Simonson, T., 449 Sinicropi, A., 118 Skodje, R. T., 225, 228 Skolnick, J., 259 Slater, J. C., 76 Smedarchina, Z., 229 Smit, B., 260, 261 Smith, B. R., 121 Smith, D. A., 451 Smith, E. R., 78 Smith, F. T., 117 Smith, G. M., 446, 447 Smith, G. S., 262 Smith Jr., V. H., 287, 288 Smith, N. P., 448 Smith, P. A., 399 Smith, S. C., 226 Smith, S. J., 444

Smola, A. J., 392, 393, 400 Snyder, J. P., 448 Sobolewski, A. L., 122, 123 Soddemann, T., 261 Soelvason, D., 78 Sommerfeld, T., 124 Somorjai, R., 399 Song, J. H., 400 Song, M., 394 Song, Q., 399 Song, S. O., 400 Sonnenburg, S., 398 Sorich, M. J., 399 Sperduti, A., 396 Spiess, H. W., 81 Spitzer, W. A., 445 Sprague, J. T., 446 Sprague, P. W., 449 Sridevi, U., 395 Stahl, M., 395 Stahl, M. T., 449 Stahura, F. L., 287, 288, 289 Stanton, J. F., 75, 119, 120 Staroverov, V. N., 78, 79 States, D. J., 232, 448 Statnikov, A., 394 Stechel, E., 80 Steckler, R., 223, 225, 228, 232 Steel, C., 224 Stefanov, B. B., 119, 232 Steinberg, M. I., 450 Steinwart, I., 394 Stepanov, N. F., 77 Stewart, J. J. P., 223, 228, 230, 447 Still, W. C., 449 Stock, G., 118 Stoll, H., 83, 120 Stone, A. J., 117, 120 Strain, J., 77, 78 Strain, M. C., 118, 232 Stratmann, R. E., 118, 232 Strauss, H. L., 227 Strobl, G., 260 Sturges, H. A., 288 Su, S., 120 Summers, R. M., 395 Sun, L. Z., 395 Sun, Q., 259, 260, 262 Sung, K. K., 392 Sutcliffe, B. T., 76, 444 Suter, U. W., 259, 261 Sutter, J. M., 289

Author Index Sutton, L. E., 444 Suykens, J. A. K., 392, 393, 395 Suzue, S., 448 Svetnik, V., 399 Swaminathan, S., 232, 448 Szabo, A., 75 Szalay, P. G., 119 Szilva, A. B., 82 Taft Jr., R. W., 445 Takahashi, H., 394 Takahata, Y., 397 Takahide, N., 394 Takeuchi, K., 400, 450 Takimoto, J., 258 Tamayo, P., 394 Tambe, S. S., 395 Tamboli, A. C., 262 Tanchuk, V. Y., 289 Tao, J., 78, 79 Tao, S., 398 Tapia, O., 230, 232 Tarroni, R., 120 Taskinen, J., 289 Tasumi, M., 446 Taylor, P. R., 76, 77, 119 Teitelbaum, H., 223, 224 Teller, E., 116, 117 Tennyson, J., 121 Teramoto, R., 396 Teter, M., 79 Tetko, I. V., 289, 395 Teukolsky, S. A., 80 Theodorou, D. N., 260 Thissen, U., 399 Thompson, D. L., 227 Thompson, J. D., 231, 232 Thompson, M. A., 231 Thompson, T. B., 396 Thorsteinsson, T., 120 Thrasher, K. J., 450 Tildesley, D. J., 260 Timmerman, H., 288 Tobita, M., 399 Todeschini, R., 288, 393, 450 Tollabi, M., 397 Tollenaere, J. P., 450 Tolley, A. M., 395 Tomasi, J., 118 Tomasi, J., 122, 232 Tong, C., 399 Toniolo, A., 122

467

Topol, E. J., 395 Toropov, A. A., 397 Torrie, G. M., 232 Toyota, K., 118 Trafalis, T. B., 400 Tries, V., 259 Trinajstic, N., 287 Trucks, G. W., 81, 232 Truhlar, D. G., 116, 117, 118, 121, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232 Truong, T. N., 223, 224, 227, 228, 229 Trushin, S. A., 122 Tsai, C. A., 395 Tsamardinos, I., 394 Tscho¨p, W., 258, 261 Tsuda, K., 393, 396 Tucker, S. C., 223, 229 Tuekam, B., 400 Tugendreich, S., 395 Tuligi, G., 398 Tully, J. C., 121, 124 Tweedale, A., 224 Udelhoven, T., 399 Ueda, N., 395, 396 Uematsu, M., 400 Ulaczyk, J., 400 Ung, C. Y., 399 Ungerer, P., 260 Urrestarazu Ramos, E., 396 U¨stu¨n, B., 399 Vaes, W. H. J., 396 Vahtras, O., 78 Valentini, G., 395 Valleau, J. P., 232 Van Catledge, F. A., 446 van der Spoel, D., 260 Van Gestel, T., 392 van Os, N. M., 261 van Voorhis, T., 80 Van Wazer, J. R., 232 van Wu¨llen, C., 81 Vanderbilt, D., 79 Vandewalle, J., 392 Vapnik, V. N., 392 Varandas, A. J. C., 121 Varmuza, K., 393 Vasudevan, V., 450 Vattulaine, I., 261 Veith, G. D., 398 Vendrame, R., 397

468

Author Index

Venturoli, M., 261 Verhaar, H. J. M., 396 Veillard, A., 76 Vert, J.-P., 395, 396 Veryazov, V., 120 Verzakov, S., 399 Vetterling, W. T., 80 Villa, A. E. P., 289 Villa`, J., 223, 225, 226, 229, 231 Vinter, J. G., 447 Volykin, A., 400 von Frese, J., 399 von Homeyer, A., 393 Von Itzstein, M., 450 von Meerwall, E. D., 261 von Neumann, J., 116 Voth, G. A., 118, 261 Vreven, T., 118 Wagner, A. B., 450 Wagner, A. F., 227 Wailzer, B., 397 Wainwright, T. E., 262 Walch, S. P., 124 Walker, J. D., 398 Walter, D., 82 Walter, J., 76 Walters, D. E., 450 Walters, W. P., 394, 449 Wand, M. P., 287 Wanekaya, A. K., 400 Wang, D., 232 Wang, J., 396 Wang, J. P., 396 Wang, M., 396 Wang, Q., 395 Wang, Q. J., 395 Wang, S. Y., 122 Wang, T., 399 Wang, Y., 395 Wardlaw, D. A., 226, 227 Warren, G. L., 449 Warshel, A., 122, 230 Watson, D. G., 446 Weaver, D. F., 287, 288 Weaver, W., 287 Weber, H. J., 77 Weber, V., 82 Wegner, J. K., 287, 395 Weinhold, F., 225 Weis, P., 81 Weiss, R. M., 230

Werner, H.-J., 82, 119 Westheimer, F. H., 446 Weston, J., 392, 396 Weygand, M., 262 White, C. A., 76, 77, 78, 79, 80, 81 Whitesitt, C. A., 450 Whitten, J. L., 82 Widmark, P.-O., 120 Wiegand, S., 260 Wiest, S. A., 450 Wigner, E., 224, 225 Wigner, E. P., 116 Willett, P., 447 Williams, D. E., 77 Williams, G. J. B., 446 Williamson, R. C., 393 Wilson, J. C., 450 Wilson, J. W., 444 Wilson Jr., E. B., 224 Wilson, K. R., 232 Wilson, P. J., 81 Wilson, S., 77 Windle, A. H., 258 Windus, T. L., 120 Winget, P., 231, 232 Winkler, D. A., 399 Winzler, R. J., 444 Wipke, W. T., 445, 446 Wirz, J., 446 Wittmer, J., 261 Wold, J. S., 448 Wold, S., 393 Wolinski, K., 81, 120 Wolschann, P., 397 Wolting, C., 400 Wong, B. Y., 262 Wong, L., 400 Wong, M. W., 119, 232 Wong, Y. W., 400 Wood, W. W., 262 Worgan, A. D. P., 398 Worth, G. A., 121, 122 Wu, D. H., 392 Wyatt, R. E., 117 Wynne-Jones, W. F. K., 223 Wysotzki, P., 396 Xantheas, S. S., 117, 118 Xidos, J. D., 232 Xing, J., 229 Xu, F. L., 398 Xu, Q. H., 394

Author Index Xu, X. B., 396 Xue, L., 288 Xue, Y., 395, 399 Xuong, N., 448 Yabushita, S., 119 Yan, Q., 260 Yang, J., 396 Yang, S. S., 398 Yang, U. C., 395 Yang, W., 76, 79 Yang, Z. R., 396 Yap, C. W., 395, 399 Yaris, R., 259 Yarkony, D. R., 76, 77, 80, 116, 117, 118, 119, 120, 121, 123, 124 Yazyev, O., 118 Yeang, C. H., 394 Yin, F., 396 Yoon, E. S., 400 You, L. W., 396 Yunes, R. A., 397 Zahradnik, R., 444, 446 Zakarya, D., 397 Zakrzewski, V. G., 118, 232

Zannoni, C., 262 Zell, A., 287, 395 Zelus, D., 394 Zernov, V. V., 399 Zewail, 116 Zgierski, M., 229 Zhai, H. L., 400 Zhan, Y. Q., 395 Zhang, C. L., 394 Zhang, J. Z., 117 Zhang, Q., 79 Zhang, S. D., 400 Zhang, Z., 119 Zhao, Q., 79 Zheng, C., 396 Zhou, X., 395 Zhu, C., 116 Zhu, T., 231, 232 Ziegler, T., 81 Zilberg, S., 121 Zimmerman, K. M., 450 Zirkel, A., 258 Zoebisch, E. G., 230 Zomer, S., 399, 400 Zupan, J., 393 Zurer, P., 450

469

Subject Index Computer programs are denoted in boldface; databases and journals are in italics. Abbott Laboratories, 407, 413, 418, 427 Abgene, 385 Accidental conical intersections, 90, 105 Actin filaments, 246 Activated complex, 128 Active learning support vector machines (AL-SVMs), 381 Active space, 99, 101 Adenine, 108, 111 Adiabatic approximation, 164 Adiabatic representation, 87 Adiabatic-diabatic representation, 87 ADME/Tox, 434 Agouron, 422 Alcon, 418 Allergan, 418 Allyl radical, 112 AM1, 192, 352, 367 American Chemical Society (ACS), 414 Ames test, 379 Analytic gradients, 100 Anchoring points, 235 Angiotensin II, 297 Anharmonic motions, 159 Anharmonic vibrational energy levels, 158 Anisotropic potential energy function, 238 AO-MP2 method, 67 Apparent randomness, 264 Aqueous solubility, 283 Aroma classification, 361 Array processors, 424 Artificial neural networks (ANNs), 302, 348, 362, 371, 379

ASVM, 390 Asymptotic scaling, 2 Atactic chains, 238 Atactic polystyrene, 249 Atom-centered basis set, 3 Atomic basis functions, 47 Atomic orbital, 3 Atomistic detail, 252 Atomistic models, 235, 242 Autocorrelation function, 246 Automatic text datamining, 384 Available Chemical Directory (ACD), 271, 276, 281, 372 Average-of-states MCSCF, 99 Avoided crossings, 84, 101 Backward Euler method, 143 Barnes-Hut (BH) tree methods, 35 Barrier height, 128 Barrierless association reactions, 157 BASF, 438 Basis functions, 3, 43, 97 Bayer, 425, 438 Bayes point machines, 291 Bayesian statistics, 283 Bead-spring models, 234 Benzodiazepine receptor ligands, 366 Berry phase, 89 Bimolecular rate constant, 203 Bimolecular reaction, 130, 140, 166, 188, 206 Binary hard-disk fluid, 256 Binary polymer melt, 243 Binary QSAR, 283, 284

Reviews in Computational Chemistry, Volume 23 edited by Kenny B. Lipkowitz and Thomas R. Cundari Copyright ß 2007 Wiley-VCH, John Wiley & Sons, Inc.

471

472

Subject Index

BIND database, 385 Bioavailability, 421 BioCAD, 428 Bioconcentration factor (BCF), 369 BioDesign, 427 Bioinformatics, 385 Bioinformatics, 386 Biological activity, 299, 402 Biological membranes, 254 Biological systems, 106 Bio-medical terms, 385 Biorthogonality condition, 44 Biorthonormality condition, 44 BIOSYM, 428 Boltzmann inversion, 240, 241 Bond-breaking, 101, 106 Bond-fluctuation model, 251 Bond-making, 106 Boosted wrapper induction, 385 Born-Oppenheimer approximation, 5, 83, 85, 97, 126, 128, 131, 204 Born-Oppenheimer PES, 193 Bound support vectors, 322 Boundary effects, 273 Bovine spongiform encephalopathy (BSE), 379 Branching coordinate, 92 Branching plane, 91, 92 Branching space, 89, 91, 110 Bristol-Myers Squibb, 425 BSVM, 388 Calculation chemistry, 414 Calculations of biomolecules, 402 Cambridge Structural Database (CSD), 413 Canonical ensemble, 128, 136 Canonical MO coefficient matrix, 36, 42, 65 Canonical unified statistical (CUS) model, 138 Canonical variational theory (CVT), 127, 134 Canonical variational transition state theory, 127, 128, 131 Capillary electrophoresis, 380, 381 Carcinogenic activity, 360 Carcinogenicity, 421 Cartesian coordinates, 196, 239, 412 CASSCF, 99, 101 CASSCF/AMBER, 107 Catalyst, 430 CCSD, 98 CCSD(T), 98 Cell, 385 Central nervous system, 366

Centrifugal-dominant small-curvature approximation, 171 Cephalosporins, 408 Chain contour, 246 Chain diffusion coefficient, 247 Chain stiffness, 245 Chapman and Hall (CH) natural products database, 277 Charge distribution, 18 Charge-transfer reactions, 83, 106 CHARMM, 208, 211, 423, 427 CHARMMRATE, 191, 211 ChemDraw, 416 CHEMGRAF, 417 Chemical accuracy, 190 Chemical descriptors, 269, 283 Chemical Design Ltd., 417, 428 Chemical diversity, 275 Chemical engineering, 383 Chemical information, 263, 279 Chemical information content, 278 Chemical intuition, 284, 412 Chemical libraries, 263, 270, 275 Chemical reaction rates, 125 Chemical shifts, 60 Chemoinformatics, 264, 265, 269, 286, 317, 362, 385, 387 Chemometrics, 379 Chromophore, 106 Ciba-Geigy, 438 CIS, 98 CISD, 98 Class discrimination, 362 Class membership, 295, 302 Classical barrier height, 128 Classical CVT rate constant, 134 Classical partition function, 128 Classical threshold energies, 167 Classical trajectory calculations, 130 Classical turning points, 166, 182 Classification, 291, 293 Classification errors, 318 Classification hyperplane, 294 Classification rules, 302 ClogP, 297 CLOGP, 420 Closed-shell systems, 97 CNDO/2, 407, 410 Coarse-grained models, 242 Coarse-grained Monte Carlo simulations, 250 CODESSA, 347 Coffee, 382

Subject Index Collaboration gap, 412, 413 Collective bath coordinate, 133 COLUMBUS, 100, 104 Combinatorial chemistry, 430 Commercial software, 427 Complete active space second-order perturbation theory (CASPT2), 101, 107 Complete neglect of differential overlap (CNDO), 407 Complex descriptors, 273, 277 Complexity of descriptor, 273 Composite charge distribution, 18 Compound diversity, 269 Computational biology, 385 Computational chemistry, 265, 286, 387, 401 Computational Chemistry List (CCL), 428 Computational chemists, 404 Computer centers, 404 Computer graphics, 416 Computer use at pharmaceutical companies, 414 Computer-aided drug design (CADD), 413, 414, 417, 434 Computer-aided ligand design (CALD), 434 Computer-aided synthesis planning, 408, 412 Computers, 402 Apple Macintosh, 415, 426 Cray-2, 425 DEC-10, 409 Floating Point System (FPS)-164, 424 IBM 3033, 418 IBM 3083, 418 IBM 3278, 409 IBM 360, 408 IBM 4341, 418 IBM 7094, 404 IBM PC, 415 VAX 11/780, 415, 418 VAX 11/783, 418 VAX 11/785, 418 Condensed-phase reactions, 206 Configuration state functions (CSFs), 97 Configurational bias, 250 Conical intersections (CIs), 83, 84, 90, 93 Conjugate gradient algorithm, 147 Connectivity indices, 273, 377 Constrained minimization problems, 311 Continuous, charge distribution, 23, 32 Continuous fast multipole method (CFMM), 16, 34, 5 Continuous space models, 236 Contour length, 234, 249

473

Contracted Gaussian basis functions, 5 Contracted Gaussian distributions, 26 Contracted multipole integrals, 26 Contravariant basis vectors, 44 Conventional transition state theory, 126, 128 Core orbitals, 100 Corner cutting, 222 Corner-cutting tunneling, 164 Coulomb integrals, 15 Coulomb interactions, 255 Coulomb matrix, 15 Coulomb’s Law, 11 Coulomb-type contraction, 69 Coupled-cluster (CC) methods, 2, 98 Coupling matrix elements, 87 Coupling terms, 86 Covariant basis vectors, 44 Covariant integral representation, 47 CPK (Corey-Pauling-Koltun) models, 406 Cray Research, 424, 425 Creutzfeldt-Jacob disease, 379 Cross-entropy, 269 Cross-validation, 299, 302, 355, 363 Curved directions, 55 Curvilinear coordinates, 133, 150, 152, 154, 221, 246 Curvilinear internal coordinates, 152, 163 Curvy steps method, 55 Cytochromes P450, 372, 375 Cytosine, 107 Data mining, 429 Databases from scientific literature, 384 Daylight Chemical Information Systems, 419, 429 De novo design, 413 Debye-Hu¨ckel theory, 205 DEC, 415 Decision tree, 378 Decwriter II, 409 Degeneracy, 90 Degree of freedom, 128 Degree of polymerization, 249 Density functional theory (DFT), 2, 98 Density matrix, 6, 37 Density matrix-based coupled perturbed SCF (D-CPSCF), 62 Density matrix-based energy functional, 49 Density matrix-based quadratically convergent SCF (D-QCSCF), 55 Density matrix-based SCF, 42 Density operator, 48

474

Subject Index

Derivative coupling, 86, 96 Descriptors, 263, 283, 301 1-D, 272 2-D, 272, 281, 284 3-D, 281 Descriptor comparison, 269 Descriptor design, 285 Descriptor selection, 264, 283, 285, 347, 378 Descriptor space, 301 Descriptor variability, 275 Diabatic representation, 87 Diagonalization, 42 Differential Shannon entropy (DSE), 265, 275 Diffuse functions, 99 Diffusion, 203 Dihydrofolate reductase (DHFR), 422 Dipalmitoyl-phosphatidylcholine (DPPC), 255 Dipalmitoyl-phosphatidylethanolamine (DPPE), 256 Direct Born-Oppenheimer molecular dynamics, 57 Direct dynamics, 126, 190, 191, 217, 222 Direct mapping of the Lennard-Jones time, 250 Direct methods, 100 Direct SCF methods, 8 Directed acyclic graph SVM (DAGSVM), 339 Disconnect between computational chemists and medicinal chemists, 411 Dissipative particle dynamics, 255 Distinguished reaction coordinate (DRC), 208 Distortion energy, 204 Diversity of kernel methods, 391 Divide-and-conquer methods, 42 Dividing hypersurface, 128, 131, 152, 158, 205 DNA, 28, 378 DNA/RNA bases, 107 Docking, 430 Dow Chemical, 408 Dragon, 347, 372 Drieding models, 406 Drug design, 291, 371 Drug discovery, 403 Drug-like compound, 271, 362, 371, 375 Drug-like compound identification, 348 Drugs, 376 Dual-level dynamics, 199 DuPont, 418, 427 Dynamic correlation, 73, 97, 99, 108 Dynamic mapping, 246 Dynamical bottleneck, 128, 130, 173, 221 Dynamics, 104

Dynamics of polymers, 248 Dynamics trajectories, 280 Eckart barrier, 139 Eckart potential, 198 Ehrenfest dynamics, 105 Electron correlation, 1, 12, 64, 97 Electron density distributions, 279 Electron repulsion integral, 20 Electron transfer, 126 Electronegativity equalization method (EEM), 375 Electronic coordinates, 85 Electronic mail, 415 Electronic nose, 381 Electronic partition function, 148, 150 Electronic structure calculations, 126, 190 Electronic structure theory, 1 Electronic wavefunction, 85, 87 Electrostatic interaction, 255 Electrostatic potential, 17 Electrotopological indices, 377 Eli Lilly and Company, 402, 407, 427, 438 EMBO Journal, 385 Empirical valence bond method, 192 End-to-end distance, 246 Energy gradients, 57 Energy minimization, 53, 253 Energy transfer, 130 Ensemble averaging, 207 Ensemble of reaction paths, 221 Ensemble-averaged variation transition state theory (EA-VTST), 206, 207 Entanglement length, 248 Entanglement time, 249 Entropic separation (ES), 277, 281 Entropy, 263 Entropy metric, 264 Entropy-based information theory, 283 Envison, 418 Enzyme-catalyzed reactions, 206, 207 Equations of motion, 141 Equilibrium solvation, 206 Equilibrium solvation path (ESP) approximation, 206 Errors, 317 Espresso coffee, 382 ETA Systems, 424 Ethyl radical, 157 Ethyl tertiary butyl ether (ETBE), 382 Euler steepest-descent (ESD) method, 143 Evans and Sutherland PS300, 418

Subject Index Evolutionary algorithms, 302 Exchange-correlation functional, 6, 40 Exchange-type contractions, 35, 71 Excitation energies, 101 Excited state dynamics, 111 Excited state properties, 102 Excited states, 84, 98, 99, 103, 172 Experimental errors, 299 Extended Hu¨ckel theory (EHT), 407, 410 Far-field (FF) interactions, 29, 30 Fast multipole method (FMM), 16, 27, 34 Features, 301 Feature construction, 378 Feature functions, 293, 326 Feature selection, 264, 375 Feature space, 293, 323 Feed-forward neural networks, 351, 382 Fermi operator expansions (FOE), 42 Fermi operator projections (FOP), 42 Fermions, 47 Fingerprints, 273, 373 First order CI (FOCI), 100 Fixed basis functions, 5 Flux, 130, 138, 205 Fock matrix, 6, 37, 47 Fock operator, 5 Focused compound libraries, 376 FORTAN 77, 419 FORTRAN II, 404 FORTRAN IV, 409 Fourier transform Coulomb (FTC) method, 35 Fragrances, 361 Free diffusion, 248 Free energies, 241, 244 Free energy of activation, 129, 147 Free energy of reaction, 129 Free energy perturbation (FEP) theory, 423 Free software, 387 Full CI (FCI), 98 Full Multiple Spawning (FMS), 105 GAMESS, 101 GAMESSPLUSRATE, 191 Gangloside lipid (GM1), 256 Gasoline, 382 Gas-phase reactions, 127 Gauge-including atomic orbitals (GIAO), 61 GAUSSIAN, 97, 104 Gaussian 70, 409 Gaussian 76, 409 Gaussian 80, 409

475

GAUSSIAN 98, 217 Gaussian basis functions, 6 Gaussian distributions, 20, 270 Gaussian Inc., 419 Gaussian processes, 291 Gaussian very fast multipole methods (GvFFM), 35 GAUSSRATE, 191, 217 Generalized transition state, 131, 221 Generalized transition state dividing surface, 205 Generalized transition state partition function, 134, 149, 152 Generalized transition state theory, 127 Genetic algorithm, 284, 381 Genotoxicity of chemical compounds, 378 Geometric phase effect, 89, 113 Ghose-Crippen atom types, 372 Gini-SVM, 389 Gist, 389 GlaxoSmithKline, 438 GPDT, 389 Graining, 29 Graph descriptors, 301 Graph theory, 264 Graphical user interface (GUI), 427 Green fluorescent protein, 107 Gromacs, 241 Ground state, 84, 148 Group similarity, 280 Gyration radius, 246 Hamiltonian matrix, 88, 92 Hamilton’s equations of motion, 105 Hard margin nonlinear SVM classification, 334 Hard-disk model, 256 Hard-sphere fluids, 256 Harmonic vibrational energy levels, 158 Hartree-Fock (HF) method, 1, 97 Hartree-Fock reference, 99 Hartree-Fock wavefunction, 97 Health Designs, 420 Heaviside step function, 163 Heavy elements, 112 HERG (human ether-a-go-go) potassium channel inhibitors, 374 HeroSvm, 389 Hessian, 142, 151, 190 Heteropolymers, 254 High information content, 283 Highly symmetric reaction paths, 155

476

Subject Index

High-throughput screening, 430 Hilbert space, 323 Hindered internal rotations, 159 Histogram bins, 267 Historical development of computational chemistry, 401 Hoechst, 438 Hoffmann-LaRoche, 438 HOMO-LUMO gap, 36, 42 Hybrid functionals, 40 Hydrodynamics, 250 Hydrogen-atom transfer reaction, 109 Hydrophobicity, 297, 359 Hyperplane classifier, 302 IBM, 415 IBM mainframes, 416 Idempotency, 46 Imaginary frequency, 127, 128, 190 Imbalanced classification, 338 IMLAC, 418 Implicit solvation models, 126 Improved canonical variational theory (ICVT), 137 Inductive logic programming, 378 Inertial centrifugal effect, 169 Informatics, 431 Information content, 264, 269 Information content analysis, 263 Information content of a signal, 264 Information content of organic molecules, 278 Information theoretic analysis, 284 Information theory, 264 Information-rich descriptors, 273, 277, 285 Integral screening, 10 Integrated Scientific Information System (ISIS), 429 Interaction domains, 67 Interaction sites, 237 Interactive computing, 415 Interactive graphical terminals, 415 Intermediate partition function, 159 Internal contracted MRCI, 100 Internal coordinates, 192, 196 Internal degrees of freedom, 148 International Union of Pure and Applied Chemistry (IUPAC), 414 Internet, 385, 387 Interpolated optimized corrections (IOC) method, 200 Interpolated optimized energies, 202 Interpolated single-point energies, 200

Interpolated variational transition state theory by mapping (IVTST-M), 196 Interpolated VTST, 192 Intersystem crossings, 84, 106, 113 Intramolecular electron transfer, 106 Intrinsic reaction coordinate (IRC), 133 IR spectroscopy, 60 Isoinertial coordinates, 132, 140, 188 Iterative Boltzmann method (IBM), 240 Iterative structural coarse-graining, 242 Jahn-Teller effect, 90, 110 Jaynes entropy (JE), 269, 280 JmySVM, 388 JOELib, 374 Johnson & Johnson, 425 Journal of Biological Chemistry, 385 Journal of Computational Chemistry, 405, 414 Journal of Machine Learning Research, 386 Journal of Medicinal Chemistry, 435 Journal of Molecular Graphics, 416 JSVM, 391 Jury methods, 373 Jury SVM, 348, 372 Kappa indices, 377 Karuch-Kuhn-Tucker (KKT) conditions, 312, 321, 342 K-class support vector classification-regression (K-SVCR), 340 Kernel principal component analysis, 291 Kernel-based techniques, 291 Kernels, 294, 326 Additive, 333 Anova, 332 B spline, 295, 299, 316, 333, 354 Dot, 329, 353 Exponential RBF, 316, 331 Fourier series, 332 Gaussian RBF, 316, 331, 371 Graph, 378 Linear, 295, 329, 353 Neural, 332 Nonlinear, 295 Polynomial, 295, 330, 354 Radial basis function (RBF), 295, 375 Sigmoid, 332 Spline, 299, 332 SVM, 329 Tanh, 332 Tensor product, 333

Subject Index Kernels for biosequences, 349 Kernels for molecular structures, 350 KEX, 385 Keys, 429 Kier-Hall indices, 273, 367 Kinase inhibitors, 371 Kinetic isotope effects, 127 k-nearest neighbors (k-NNs), 302, 348, 371, 372, 373, 375, 378, 385 Kohn-Sham DFT, 6, 40 Kramers degeneracy, 113 Kuhn segment, 251 Kullback-Leibler (KL) function, 269 Lagrange function, 311 Lagrange multipliers, 103, 113, 311, 320 Laplace transform, 65 Large curvature transmission coefficient, 172 Large margin classifiers, 302 Large systems, 64 Large-curvature path (LCP), 189 Large-curvature tunneling (LCT), 164, 172, 173, 180, 222 Large-curvature tunneling paths, 192 Lattice models, 236, 250 Lattice site, 250 Lattice-Boltzmann models, 250 Lead compound, 406 Lead identification, 283 Lead-like compound, 271 Learning set, 302 LEARNSC, 390 Least-action path (LAP), 189 Least-action path tunneling (LAT), 189 Least-squares SVM regression (LS-SVMR), 380 Leave-one-out-model-selection, 388 Lederle, 418 Legendre polynomials, 21 Lennard-Jones (LJ) potentials, 255 Lennard-Jones parameters, 239 Lennard-Jones time, 250 Library design, 431 Library designers, 281 LIBSVM, 388 Light harvesting, 106 Lincs, 246 Linear classifiers, 324, 351, 363 Linear discriminant analysis (LDA), 301, 379 Linear regression, 378 Linear scaling, 1, 15, 37 Linear scaling calculation of SCF energies, 56

477

Linear scaling exchange, 38 Linear separable classes, 302 Linear support vector machines, 308 Linear transition state complex, 150 Linearly non-separable data, 317 Linearly separable classes of objects, 292 Linearly separable classification problems, 306 Linearly separable data, 308, 314 LinK method, 39, 40, 57 Lipid bilayers, 255 Lipid bilayer self-assembly, 255 Lipid simulations, 247 Lipophilicity, 407, 420 Local chain reorientation, 247 Local gauge-origin methods, 61 Local interactions, 242 Local minima, 126 Local packing of interaction centers, 242 Local quadratic approximation, 144 Local Shannon entropy, 264, 280 Local tacticity, 238 Local-equilibrium approximation, 130 Locating conical intersections, 102 Lock-and-key hypothesis, 402 Logistic regression, 372 logP(o/w), 270, 271, 276, 284 Long-range behavior of correlation effects, 67 LOO cross-validation, 388 Looms, 388 Loose transition states, 157 Low information content, 277 LS-SVMlab, 390 LSVM, 390 l-temperature, 257 MACCS, 373, 419 Machine learning, 291, 301, 306 MacroModel, 428 Mad cow disease, 379 Mainframes, 403, 416 MAKEBITS, 373 Management, 411 Mapping, 235, 301 Mapping between scales, 236 Mapping by chain diffusion, 247 Mapping through local correlation times, 247 Margin support vectors, 322 Marion Merrell Dow, 425 Mass spectra, 380 Mass-scaled coordinates, 133, 140, 212 Mass-weighted coordinates, 133 MATLAB, 294, 386

478

Subject Index

MATLAB SVM toolbox, 389, 390, 391 Maximum entropy, 269, 385 Maximum tunneling probability, 163 MC-TINKERATE, 191 McWeeny’s purification, 50 Mean-field approach, 1 Mean-square displacement, 247 Mechanical models, 406 Mechanism of action (MOA), 352, 355 Mechanism of odor perception, 361 Mechanism of toxicity, 352 Medicinal chemists, 281, 411 Medline, 385 Menshutkin reaction, 217 Mercer’s conditions, 328 Merck Molecular Force Field (MMFF94), 428 Merck, 408, 417, 418, 425, 427, 438 Meso-scale model, 234, 244 Methotrexate, 422 Methyl tertiary butyl ether (MTBE), 382 Metrics of information content, 269 Mexican hat, 90 Michaelis complex, 207 Microcanonical variation transition state theory (mVT), 137, 163 Microcanonical ensemble, 128, 137 Microcanonical rate constant, 137 Microcanonically optimized multidimensional tunneling (mOMT), 164 Microcanonically optimized transmission coefficient, 188 Microcanonically optimized tunneling probability, 189 Microscopic reversibility, 174 Microstates, 130 Milk, 382 MINDO/3, 410, 420 Minimum energy path (MEP), 129, 132, 140, 142, 210 MINITAB, 409 Mixed phospholipids, 256 MLF ANN, 377 MM2, 420 MMI, 420 MMI/MMPI, 410 MNDO, 420 MN-GSM, 217 Modeling, 279, 414 Modes transverse to the reaction coordinate, 131 MOLCAS, 101

MolconnZ, 373 Molecular descriptors, 402, 431 Molecular Design Ltd. (MDL), 417, 419, 429 Molecular Drug Data Report (MDDR), 271, 276, 281 Molecular dynamics (MD), 130, 208, 234, 246, 250, 279, 423 Molecular graph, 279, 378 Molecular graphics, 406, 416, 417 Molecular information content, 264 Molecular mechanics, 192, 410, 417, 420 Molecular Operating Environment (MOE), 271, 281, 373, 374 Molecular orbitals (MOs), 6 Molecular response properties, 59 Molecular similarity, 269 Molecular Simulations Inc., 427 MOLFEA, 379 Møller-Plesset perturbation theory, 2, 65, 98 MOLPRO, 100 Moment of inertia, 149, 150, 160, 199 Momentum, 130 Monte Carlo simulations, 234, 250, 256 Moore’s Law, 3 MOPAC, 419, 427 MORATE, 191 Morgan index, 378 Morse function, 162 MP2, 2, 65, 98 M-SVM, 389 Multiclass dataset, 361 Multi-class SVM classification, 339 Multiconfiguration molecular mechanics (MCMM), 190, 192 Multiconfiguration SCF (MCSCF), 98 Multiconfiguration time-dependent Hartree (MCTDH) method, 104 Multidimensional tunneling, 125, 167 Multidimensional tunneling corrections, 164 MULTILEVELRATE, 191 Multiple linear regression (MLR), 302, 362 Multipole accelerated resolution of the identity (MARI-J), 35 Multipole expansion, 15, 20 Multipole integrals, 13 Multipole series, 13 Multipole translation operator, 26 Multipole-based integral estimates (MBIEs), 11, 72 Multireference configuration interaction (MRCI), 99 Multireference methods, 98

Subject Index Multi-scale modeling, 235 Multistate perturbative methods, 101 Mutagenicity, 421 mySVM, 352, 388 mySVM/db, 388 Naı¨ve Bayesian classifier, 372, 375 Narcotic pollutants, 352 National Institutes of Health (NIH), 386 National Library of Medicine, 386 Natural collision coordinates, 134 Natural products, 281 Natural representation, 46 Near degeneracies, 98, 101 Near-field (NF) interactions, 29 Near-field integral calculation, 35 Neglect of diatomic differential overlap (NDDO), 192 Neural network, 285, 351, 378, 382 New chemical entities (NCEs), 440 Newton-Raphson equation, 55, 103 NIR spectroscopy, 380 NLProt, 384 NMR, 60, 61, 246, 248 NMR chemical shielding, 61 Nobel Prize in Chemistry, 85 Noise, 299, 317 Nonadiabatic coupling, 103 Nonadiabatic nuclear dynamics, 96 Nonadiabatic processes, 83, 85 Nonadiabatic transitions, 86, 87, 108 Nonadiabatic tunneling, 172 Nonclassical reflection, 128, 131, 163 Noncrossing rule, 88, 110, 113 Non-drugs, 376 Nonequilibrium solvation (NES) effects, 206 Nonlinear classifier, 302, 324, 351 Nonlinear mapping, 294, 317 Nonlinear models, 291 Nonlinear separation surfaces, 323 Nonlinear support vector machines, 323 Nonlinear transition state complex, 150 Nonphysical results, 102 Nonredundant internal coordinates, 155 Nonrelativistic Hamiltonian, 85 Non-self-interactions, 243 No-recrossing assumption, 128, 130 Norfloxacin, 422, 436 Normal data distributions, 268 Normal distributions, 270 Normal mode frequencies, 151 Normal modes, 60, 93, 127, 128, 142, 151

479

Norwich Eaton, 418 Novartis 427 n-SVM classification, 337, 375, 376 Nuclear coordinates, 85 Nuclear displacements, 93 Nuclear wavefunction, 85 Nucleic acids, 107 Nucleic Acids Research, 386 Odd-electron systems, 113 Odor classification, 383 Off-lattice model, 251 One-dimensional spline interpolation large curvature tunneling (ILCT(1D), 187 One-particle density matrix, 36, 47, 62 One-way flux, 131 ONX (order N exchange) algorithm, 38 Optimal basis density-matrix minimization (OBDMM), 42 Optimization techniques, 239 Optimized Euler stabilization method, 143 Optimized multidimensional tunneling (OMT), 164, 211 Optimizing the SVM model, 347 Optimum separation hyperplane (OSH), 308, 311, 318, 334 Optimum tunneling paths, 164 Orbital minimization (OM), 42 Orbital Shannon entropy, 280 Organic photochemistry, 106 Organophosphates, 382 ORTEP (Oak Ridge Thermal Ellipsoid Program), 406 Oscillator strength, 108 Outliers, 268, 271, 340 Overfitting, 330, 370 Oxford Molecular, 428 Ozone, 91, 105 Page-McIver (PM) algorithm, 144, 212, 217 Pair distribution function, 241 Pariser-Parr-Pople (PPP) theory, 407 Partial least squares (PLS), 301, 348, 362, 373, 376 Partition functions, 127, 147, 148, 199 Pattern classification, 301, 308 Pattern recognition, 292, 301 Peaked conical intersections, 93 Permutational symmetry, 39 Persistence length, 245 Perturbation theory, 98, 101, 112 Pfizer, 424

480

Subject Index

Pharmaceutical industry, 401 Pharmaceutical R&D, 440 Pharmacia, 424 Pharmacophore, 411 Phase separation, 256 Phase space, 128, 250 Phospholipids, 247, 254 Photochemical damage, 106 Photochemical reactions, 106 Photochemistry, 83, 106, 126 Photodissociation, 106 Photo-initiated electron transfer, 107 Photoisomerization, 106 Photophysics, 83 Photosynthesis, 83, 106 Physicochemical property, 283 PHYSPROP database, 284 PM3, 361 Polarizabilties, 60 Polarization energy, 204 Polycarbonate, 238 Polycyclic aromatic hydrocarbons (PAHs), 360 Polydimethylsiloxane, 246 Polygen, 427, 428 Polyisoprene, 240, 244, 247 Polymer coarse-graining, 234 Polymers, 254 Polypropylene (PP), 384 POLYRATE, 127, 132, 155, 157, 161, 168, 191, 200, 217, 222 Polystyrene, 237, 241 Polystyrene melt, 244 Positive majority consensus SVM (PM-CSVM), 372 Positive probability consensus SVM (PP-CSVM), 372 Post-HF methods, 2, 64 Potential energy, 244 Potential energy function (PEF), 190 Potential energy surface (PES), 83, 125, 190 Potentials of mean force (PMF), 126, 205, 208, 240 Practical aspects of SVM classification, 350 Practical aspects of SVM regression, 362 PreBIND, 385 Predictive model, 299 Predictor-corrector algorithm, 143 Pressure, 245 Pressure correction potential, 245 Primitive Gaussian distributions, 26

Principal anharmonicity, 162 Principal component analysis (PCA), 283, 301, 348 Principal component regression, 302 Principal force constants, 162 Principal moments of inertia, 150 Probability density function, 283, 284 Proceedings of the National Academy of Sciences, 385 Product region, 128 Profiling of chemical libraries, 275 Projected gradient techniques, 104 Projected Hessian, 156 Projection operators, 48 Propagation, 104 Property descriptors, 277 Protein classification, 349 Protein Data Bank (PDB), 413 Protein homology detection, 387 Protein names, 385 Protein sequence similarity, 349 Pseudo-eigenvalue problem, 6 Pseudo-time, 251 PSVM, 391 PubMed, 385, 386 PubMed Central, 387 Pure polymer, 243 Purity transformation, 51 Pyrazines, 361 QCPE Newsletter, 405 QSPR, 284, 347, 377 Quantitative structure-activity relationships (QSARs), 283, 291, 292, 296, 347, 352, 363, 366, 369, 376, 413 Quantitative structure-enantioselectivity relationships (QSERs), 377 Quantitative structure-toxicity model, 367 Quantized VTST calculation, 139 Quantum biology, 410 Quantum calculations, 301 Quantum chemical tree code (QCTC), 35 Quantum chemistry, 1 Quantum Chemistry Program, Exchange (QCPE), 405, 409, 418, 420, 427 Quantum descriptors, 352 Quantum dynamics, 104 Quantum effects, 128, 130, 135, 138, 163, 236 Quantum effects on reaction coordinate motion, 163 Quantum mechanical/molecular mechanical (QM/MM) methods, 106, 126

Subject Index Quantum mechanics, 83, 192, 264, 279, 402, 410 Quantum pharmacology, 410 Quantum threshold energy, 166 Quasiadiabatic mode, 181 Quasiadiabatic states, 97 Quasidegenerate perturbation theory, 91 R, 387 Racah’s normalization, 24 Radial distribution functions (RDFs), 240, 242 Radiationless transition, 96, 110 Raman spectra, 381 Raman spectroscopy, 60 Random forest, 373 Rapid nonadiabatic transitions, 84 Rapier, 385 Rational drug design, 430 Rattle, 246 REACCS, 419 Reactant region, 128 Reaction coordinate, 128, 206 Reaction field, 204 Reaction mechanisms, 106 Reaction path curvature, 172, 188 Reaction paths, 85, 129, 140, 152, 169, 176, 183, 206 Reaction potential energy surfaces, 192 Reaction swath, 172, 186 Reactions in liquids, 203 Reactive normal mode, 131 Reattachment, 250 Receptor mode, 181 Recognition of chemical classes, 371 Recrossing, 221 Recrossing transmission coefficient, 210 Rectilinear coordinates, 132, 150 Recursive partitioning (RP), 371, 372 Reduced mass, 133 Redundant curvilinear coordinates, 217 Redundant internal coordinates, 155, 192 Regression, 291, 296 Relative entropy function, 269 Relativistic effective core potentials, 112 Relaxation times, 234 Remote homology detection, 349 Reorientation of the dividing surface (RODS), 136, 145 Representative tunneling effects (RTE), 216 Reptation model, 248, 250 Retinal, 107 Retrographics VT640, 418

481

Reverse mapping, 252 Rhodopsin, 107 Ridge regression, 348, 375 Robust linear discriminant analysis (RLDA), 379 Rohm and Haas, 418 Roothaan-Hall equations, 5, 42, 57 Rotational partition function, 148 Rotational symmetry number, 149, 150 Rouse modes, 247, 248 Rouse time, 249 R-SVM, 391 Rule of Five, 271, 434 Rydberg states, 99 s-p correlation, 100, 108 Saddle points, 126, 131, 147, 151, 190, 199 SAS, 421 Scaled Shannon entropy (SSE), 267 Scaling behaviors, 234 SCF convergence, 58 SCF energy gradients, 57 Schering-Plough, 407 Schro¨dinger equation, 1, 85, 96, 104 Schwarz integral screening, 9, 38 Scientific information, 291, 384 SciFinder Scholar, 439 SciLab, 387 SCOP superfamilies, 349 Screening assays, 284 Seam, 103 Seam coordinate, 92 Seam space, 89 Searle, 418, 424 Second order CI (SOCI), 100 Segmental dynamics, 246 Segmental relaxation time, 248 Self-consistent field, 4 Self-consistent modeling techniques, 245 Self-organizing maps, 351 Selwood dataset, 378 Semiempirical molecular orbital theory, 192, 410 Sensors, 381 Separable equilibrium solvation (SES) model, 205 Sequential minimal optimization (SMO), 313 Shake, 246 Shannon entropy (SE), 263, 279 Shape indices, 273 Shepard interpolation method, 192, 194 Shepard point, 193 Silicon Graphics Inc. (SGI), 426, 433

482

Subject Index

Similar objects, 301 Similarity searching, 277, 419, 429 SimpleSVM toolbox, 391 Simplex, 239 Single decision tree, 373 Single reference methods, 98, 101 Single-chain distribution potentials, 238 Single-chain Monte Carlo simulations, 238 Slack variable, 318, 335, 340 Slater determinants, 5, 47, 97 Slater-Condon rules, 5 Small-curvature tunneling (SCT), 163, 169 SmartLab, 389 Smith, Kline and French, 408 SmithKline Beecham, 418 SMx universal solvent models, 204 Soft margin nonlinear SVM classification, 335 Soft margin SVMR, 340 Software vendors, 419, 421 Solubility, 283 Solute geometry, 204 Solute-solvent interactions, 204 Solvation effects, 279 Solvent, 251 Solvent effects, 218 Solvent molecules, 204, 279 Solvent reaction field, 204 Solvent rearrangement, 204 Solvent-accessible surface area (SASA), 205, 284 Solvent-free models, 256 Sparse SVM, 347 Specific reaction parameters (SRPs), 191 Spherical coordinates, 239 Spherical harmonic functions, 22 Spherical multipole expansion, 24 Spider, 390 Spin orbitals, 5 Spin-forbidden processes, 110 Spin-forbidden transitions, 106 Spin-orbit coupling, 106, 112 Spin-orbit coupling operator, 112 Spongiform encephalopathy, 379 Static mapping, 238 Stationary points, 126, 142 Statistical learning theory, 291, 292, 306 Statistical mechanics, 263 Steady-state approximation, 203 Steepest descent, 132 Stereo glasses, 418 Steroids dataset, 378

Stochastic gradient boosting (SGB) method, 373 Stochastic matching, 378 Stretch-bend partition function, 159 Structural descriptors, 299, 347, 352, 361, 369 Structural keys, 269 Structural risk minimization (SRM), 306 Structure factors, 240 Structure of polymers, 242 Structure-activity relationships (SARs), 292, 317, 407 Structure-based drug design (SBDD), 422 Structure-odor relationships, 361, 362 Structures, 85 Sturges rule, 271 Sub-linear scaling, 39, 40 Substructure keys, 431 Substructure searching, 419 Super-atoms, 237, 244, 250 Supercomputers, 424 Superminicomputer, 415 Supervised learning, 291 Support vector machines (SVMs), 291, 302, 348, 351, 372, 375, 378, 379 Support vectors, 293 Support vectors selection, 348 Surface-hopping models, 105 SVM classification, 292 SVM hyperplane, 294 SVM regression, 292 SVM regression (SVMR), 340, 362, 367, 369 SVM regression models, 362 SVM resources on the web, 385 SVM/LOO, 390 SvmFu, 391 SVMsequel, 390 SVMstruct, 387 SVMTorch, 388 SwissProt, 384 SYBYL, 417, 419, 427 Symmetry, 84, 90, 110 Symmetry-allowed conical intersections, 90 Symmetry-required conical intersections, 90 System/environment separation, 207 Tablet production methods, 380 Tanimoto similarity, 377 Taylor expansion, 54, 55, 91, 154 Taylor series, 193, 195 Tektronix, 418 Temperature-dependent transmission coefficients, 131

Subject Index Tensor, 43 Tensor notation, 43 Teratogenicity, 421 Test set, 283, 284 Tests for outliers, 268 Text mining, 291, 384 Text recognition systems, 385 Theoretical chemistry, 414 Theoretical chemists, 404, 407 Thermal annealing, 384 Thermodynamics, 129 Thermoplastic polymers, 384 Three-state conical intersections, 110, 111 Thrombin inhibitors, 375 Tight transition state, 157, 206 Tight-binding (TB) calculations, 50 Tilted cones, 93 Time reversal operator, 113 Time reversal symmetry, 113 Time-dependent DFT (TDDFT), 101 Time-dependent Schro¨dinger equation, 105 Time-independent Schro¨dinger equation, 4, 85 TinySVM, 389 TOPKAT, 420 Topography, 93 Topological indices, 264, 301 Torch, 387 Torsade de pointes (TdP), 373 Torsion, 159 Torsion partition function, 159 Toxicity, 352, 359, 363, 421 Toxicity evaluations, 366 Toxicity of aromatic compounds, 366 Toxicological endpoints, 421 Training set, 283, 284, 295, 302, 317 Trajectory-Surface-Hopping (TSH), 105 Trans-1,4-polyisoprene, 249 Transition state, 85, 128 Transition state dividing surfaces, 126 Transition state ensemble, 208 Transition state partition function, 134 Transition state theory (TST), 126, 128 Transmission coefficient, 130, 131, 139, 167, 168, 186 Tree Kernels, 390 TrEMBL, 384 Trimethoprim, 422 Tripos Associates, 417, 419, 427 Tunneling, 127, 128, 131, 139, 163 Tunneling amplitude, 180, 183 Tunneling effects, 214, 221 Tunneling energies, 163, 187

483

Tunneling paths, 163, 169, 172, 176, 183 Tunneling probabilities, 163, 169, 172 Tunneling swath, 129 Tunneling transmission coefficient, 211 Turning point, 173, 182 Two-electron integral screening, 8 Two-electron integrals, 6, 8, 24, 37 Ultra-fast experimental techniques, 84 Ultra-short excited state lifetimes, 107 Umbrella sampling, 208 Uncertainty principle, 130 Unified statistical (US) model, 137 Unimolecular reaction, 130, 148, 167, 189 United atom (UA) models, 236 Unphysical moves, 250 Unphysically high pressure, 245 Unscaled coordinates, 133 Upjohn, 407, 438 Uracil, 94, 111 Urine profiling, 380 Valence coordinates, 152 Valence force coordinates, 152 Valence-Rydberg methods, 101 Vapnik-Chervonenkis dimension, 292, 306 Variational configuration interaction, 97 Variational dividing surfaces, 145 Variational principle, 99 Variational reaction path (VRP) algorithm, 145 Variational transition state theory (VTST), 125 Vertical cones, 93 Vertical conical intersections, 93 Vertical excitation energies, 102, 108 Very fast multipole methods (vFFM), 35 Vibrational excited states, 172 Vibrational frequencies, 60, 142, 157 Vibrational modes, 134 Vibrational partition functions, 131, 149, 150, 159 Vibrational spectra, 111 Virtual orbitals, 97 Virtual screening, 430 Vision, 83, 106 Water, 95, 254 Water molecules, 247 Wave vector space, 241 Wavepackets, 104 Weighted SVM, 338, 383 Weka, 387, 388

484

Subject Index

Well-separatedness (WS) criterion, 29, 33 Wide margin classifiers, 306 Wilson B matrix, 155 Wilson C matrix, 155 Wilson G matrix, 156 Wilson GF matrix method, 156 Word processing, 415 Word processors, 409 Workstations, 426

World Drug Index (WDI), 372 X-PLOR, 423 YaLE, 387, 388 Zero-curvature tunneling (ZCT), 164, 169 ZINC compound database, 276 Zwitterionic head groups, 255