NMR of Biomolecules: Towards Mechanistic Systems Biology

Wiley, 2012. — 631 p. NMR is one of the most powerful methods for imaging of biomolecules. This book is the ultimate NMR

352 99 9MB

English Pages [631]

Recommend Papers

Structural Biology: Practical NMR Applications 0387243674, 9780387243672

This textbook begins with an overview of NMR development and applications in biological systems. It describes recent dev

104 75 3MB Read more

SYSTEMS BIOLOGY

530 27 5MB Read more

Fundamentals of Protein NMR Spectroscopy (Focus on Structural Biology, 5) 9781402034992, 1402034997

NMR spectroscopy has proven to be a powerful technique to study the structure and dynamics of biological macromolecules.

105 13 5MB Read more

Biomolecules 9788184882421

133 63 68MB Read more

Foundations of Systems Biology 0262112663, 9780262112666, 9780585436999

The emerging field of systems biology involves the application of experimental, theoretical, and modeling techniques to

381 88 2MB Read more

Foundations of Systems Biology 0262112663, 9780262112666, 9780585436999

The emerging field of systems biology involves the application of experimental, theoretical, and modeling techniques to

372 5 2MB Read more

Systems Biology: PROPERTIES OF RECONSTRUCTED NETWORKS

TEAM DDU

464 106 9MB Read more

Fundamentals of Protein NMR Spectroscopy (Focus on Structural Biology) [1 ed.] 1402034997, 9781402034992

I'm a complete novice to NMR and this text provides a great introduction to the field with a focus on biological mo

461 89 5MB Read more

Industrially Important Fungi for Sustainable Development: Volume 2: Bioprospecting for Biomolecules (Fungal Biology) 303085602X, 9783030856021

Fungi are an essential, fascinating and biotechnologically useful group of organisms with an incredible biotechnological

120 18 17MB Read more

Mechanistic Criminology 042955785X, 9780429557859

The science of criminology is at a crossroads. Despite accumulating a dizzying array of facts about crime, the field has

197 13 3MB Read more

NMR of Biomolecules: Towards Mechanistic Systems Biology

Author / Uploaded
Bertini I.
McGreevy K.S.
Parigi G. (Eds.)

Similar Topics
Biology
Molecular

Commentary
1209888

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Edited by Ivano Bertini, Kathleen S. McGreevy, and Giacomo Parigi

NMR of Biomolecules

Related Titles Keeler, J.

Understanding NMR Spectroscopy 2010 ISBN: 978-0-470-74608-0

de Graaf, R.

In Vivo NMR Spectroscopy Principles and Techniques 2007 ISBN: 978-0-470-02670-0

Edited by Ivano Bertini, Kathleen S. McGreevy, and Giacomo Parigi

NMR of Biomolecules

Towards Mechanistic Systems Biology

The Editors Prof. Dr. Ivano Bertini University of Florence Department of Chemistry and Magnetic Resonance Center (CERM) Via L. Sacconi 6 50019 Sesto Fiorentino Italy Kathleen S. McGreevy University of Florence Department of Chemistry and Magnetic Resonance Center (CERM) Via L. Sacconi 6 50019 Sesto Fiorentino Italy

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and speciﬁcally disclaim any implied warranties of merchantability or ﬁtness for a particular purpose. No warranty can be created or extended by sales representatives or written sales materials. The Advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor authors shall be liable for any loss of proﬁt or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. Library of Congress Card No.: applied for

Prof. Giacomo Parigi University of Florence Department of Chemistry and Magnetic Resonance Center (CERM) Via L. Sacconi 6 50019 Sesto Fiorentino Italy

British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library.

Cover The artwork on the cover attempts to convey the idea of the systems of interacting biomolecules that are at the basis of Life. The Birth of Venus, which has been reimagined for the cover in this spirit, was painted by Sandro Botticelli in the late 15th century and is held by the Ufﬁzi Gallery in Florence. His painting depicts the birth of the goddess of love as she emerges as a fully grown adult from the sea.

Ó 2012 Wiley-VCH Verlag & Co. KGaA, Boschstr. 12, 69469 Weinheim, Germany

Acording to Plato, as well as members of the Florentine Platonic Academy, Venus had two aspects: an earthly goddess who aroused humans to physical love, and a heavenly goddess that inspired intellectual love. Who better than she to represent our passion for the study of biomolecular structures and mechanisms, and the physical-intellectual duality that leads us to learn more about the living world around and within each of us? Actually Venus has already been used as the logo by the Society of Biological Inorganic Chemistry for the Journal of Biological Inorganic Chemistry (JBIC).

Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliograﬁe; detailed bibliographic data are available on the Internet at http://dnb.d-nb.de.

Wiley-Blackwell is an imprint of John Wiley & Sons, formed by the merger of Wiley’s global Scientiﬁc, Technical, and Medical business with Blackwell Publishing. All rights reserved (including those of translation into other languages). No part of this book may be reproduced in any form – by photoprinting, microﬁlm, or any other means – nor transmitted or translated into a machine language without written permission from the publishers. Registered names, trademarks, etc. used in this book, even when not speciﬁcally marked as such, are not to be considered unprotected by law. Cover Design Adam-Design, Weinheim Typesetting Thomson Digital, Noida, India Printing and Binding Printed in Printed on acid-free paper Print ISBN: ePDF ISBN: oBook ISBN: ePub ISBN: Mobi ISBN:

978-3-527-32850-5 978-3-527-64452-0 978-3-527-64450-6 978-3-527-64451-3 978-3-527-64453-7

j

Contents

Preface XXI List of Contributors List of Abbreviations Part One Introduction

XXIII XXIX

1

1

NMR and its Place in Mechanistic Systems Biology 3 Ivano Bertini, Kathleen S. McGreevy, and Giacomo Parigi

2

Structure of Biomolecules: Fundamentals

2.1

Structural Features of Proteins 7 Lucia Banci and Francesca Cantini Introduction: From Primary to Quaternary Structure 7 Geometrical and Conformational Properties 8 Backbone Dihedral Angles 8 Side-Chain Dihedral Angles 9 Secondary Structure Elements in Proteins 9 Prediction of Secondary Structure 13 Structural Motifs and Structural Domains – Combination of Secondary Structural Elements and Structural Motifs 13 Types of Folds and their Classiﬁcation 15 Folds of the a Class 15 Folds in the b Class 16 Folds in the a/b Class 17 Folds in the a þ b Class 17 Tertiary Structure 18 Quaternary Structure 19

2.1.1 2.1.2 2.1.2.1 2.1.2.2 2.1.3 2.1.4 2.1.5 2.1.6 2.1.6.1 2.1.6.2 2.1.6.3 2.1.6.4 2.1.7 2.1.8 2.2 2.2.1 2.2.1.1 2.2.2 2.2.2.1 2.2.2.2 2.2.2.3 2.2.2.4 2.2.2.4.1 2.2.2.4.2 2.2.2.4.3 2.2.2.4.4

7

Nucleic Acids 21 Mirko Cevec, Hendrik R.A. Jonker, Senada Nozinovic, Christian Richter, and Harald Schwalbe Introduction 21 Conformations 22 DNA Structure 24 B-DNA and Derivatives 24 A-DNA 25 Z-DNA 25 Nonstandard DNA Structures 25 Circular DNA 25 Helical Junction 26 Triple Helix 26 i-Motif 26

V

VI

j

Contents

2.2.2.4.5 2.2.3 2.2.3.1 2.2.3.2 2.2.3.3 2.2.3.3.1 2.2.3.3.2 2.2.3.3.3 2.2.3.3.4 2.2.3.4 2.2.3.5 2.2.3.6

Quadruplex DNA 26 RNA Structure 27 Regular RNA Structure – A-Form Helices 28 Mismatches, Bulges, and Unusual Base Pairing 29 Reversal and Alteration of Strand Direction: Commonly Observed Loop and Turn Motifs 29 U-Turn 29 K-Turn 29 C-Loop 29 E-Loop 30 Tetraloops and Tetraloop–Receptor Contact 30 Higher-Order RNA Tertiary Structure Elements: Coaxial Stacking Motifs 31 DNA–RNA Hybrids 31

3

What Can be Learned About the Structure and Dynamics of Biomolecules from NMR 33

3.1

Proteins Studied by NMR 33 Lucio Ferella, Antonio Rosato, and Paola Turano Why NMR Structures? 33 NMR Bundle 37 Protein Dynamics 41 Intermolecular Interactions Involving Proteins

3.1.1 3.1.2 3.1.3 3.1.4 3.2 3.2.1

44

Nucleic Acids Studied by NMR 47 Janez Plavec Structure, Mobility, and Function 47

Part Two Role of NMR in the Study of the Structure and Dynamics of Biomolecules 51 4 4.1 4.1.1 4.2 4.2.1 4.2.2 4.2.3 4.3 4.3.1 4.3.2 4.3.3 4.3.4 4.4 4.4.1 4.4.2 4.5 4.5.1 4.5.2 4.6 4.6.1 4.6.2 4.6.3 4.6.4 4.6.5

Determination of Protein Structure and Dynamics 53 Lucio Ferella, Antonio Rosato, and Paola Turano Determination of Protein Structures 53 Resonance Assignment 53 NMR Restraints 58 Distance Restraints 58 Dihedral Angles 59 Residual Dipolar Couplings 61 Structure Calculations 65 Traditional 65 Automated NOESY Assignment 69 Energy Reﬁnement of Protein Structures 70 Chemical Shift-Based Approaches for Protein Structure Determination 71 Validation of Protein Structures 72 Experimental Data 72 Geometric Quality 74 Protein Dynamics and NMR Observables 76 NMR Observables Affected by Dynamics 76 NMR Experiments to Measure Dynamics and their Interpretation 78 Protocols 83 Sample Labeling 83 NMR Assignment 83 Manual Collection of Restraints 86 Structure Calculations 87 Structure Reﬁnement 89

Contents

90

4.6.6 4.6.7 4.6.8 4.7 4.7.1 4.7.2

Chemical Shift-Based Structure Calculations Structure Validation 90 Protein Dynamics 91 Troubleshooting 92 Data Collection 92 Structure Calculations 93 Further Reading 94

5

DNA 97 Janez Plavec NMR Spectroscopy of DNA 97 Assessment of the Folding Topology 99 Resonance Assignment through Sequential and Interstrand Interactions 100 Pseudorotation of Deoxyribofuranose Rings 104 Backbone Conformation 105 Natural Abundance Nucleobase Substitutions 106 Natural Abundance Heteronuclear Experiments 106 Site-Speciﬁc Low Isotopic Enrichment 107 Translational Diffusion Coefﬁcients 107 Determination of Three-Dimensional Structure 107 Search for Transient Structures 109 Protocols 110 Sample Preparation and Initial NMR Experiments 110 Stoichiometric Analysis through Translational Diffusion 111 Sequential Assignment 111 Assessment of the Preferred Sugar Pucker 112 Conformations along the Backbone 112 Nucleobase Substitutions and Site-Speciﬁc Low 13C/15N Isotopic Enrichment 112 Topology and Atomic-Detail Three-Dimensional Structure Determination 113 Example Experiments and Troubleshooting 114 Further Reading 115

5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12 5.12.1 5.12.2 5.12.3 5.12.4 5.12.5 5.12.6 5.12.7 5.13

6 6.1 6.2 6.2.1 6.2.2 6.2.3 6.2.4 6.3 6.4 6.5 6.5.1 6.5.2 6.5.3 6.5.4 6.6 6.6.1 6.6.2 6.6.3 6.7 6.8 6.9

RNA 119 Richard Stefl and Vladimır Sklenar NMR Spectroscopy of RNA 119 Preparation of RNA Samples for NMR 120 In Vitro Transcription Using T7 RNA Polymerase 120 In Vivo Recombinant RNA Synthesis 120 Chemical Synthesis 120 Segmental Labeling 121 Probing of the RNA Fold 121 Assessment of the Spectral Resolution 122 Strategy for the Resonance Assignment 123 Hydrogen Bond Formation and Base Pair Identiﬁcation 123 Through-Bond-Type Experiments – Base Spin System Identiﬁcation 124 Through-Bond-Type Experiments – Sugar Spin System Identiﬁcation 124 Sequential Connectivities 125 Collection of Structural Information 126 Collection of Distance-Dependent Structural Restraints 126 Collection of Torsion-Angle-Dependent Structural Restraints 127 Collection of Long-Range Structural Restraints 127 Structural Calculation of RNA 128 Assessment of Quality of NMR Structures 129 Protocols 129

j

VII

VIII

j

Contents

6.9.1 6.9.1.1 6.9.1.2 6.9.1.3 6.9.2 6.9.2.1 6.9.2.2 6.9.2.3 6.9.2.4 6.10

7 7.1 7.1.1 7.1.2 7.2 7.3 7.3.1 7.3.2 7.3.3 7.3.4 7.3.5 7.3.6 7.3.7 7.3.8 7.3.9 7.3.10 7.4 7.4.1 7.4.2 7.5 7.5.1 7.5.2 7.5.3

8 8.1 8.2 8.2.1 8.2.2 8.3 8.4 8.5 8.6 8.7 8.8 8.8.1 8.8.2

Flowcharts 130 Sample Preparation 130 Collection of the Experimental Data 130 Structure Calculation 131 Protocol for In Vitro Transcription Using Bacteriophage T7 RNA Polymerase 132 Template Design 132 Transcription Protocol 133 Puriﬁcation of RNA 134 Simple Protocol for Bacteriophage T7 RNA Polymerase Preparation Troubleshooting 135 Further Reading 135

134

Intrinsically Disordered Proteins 137 Isabella C. Felli, Roberta Pierattelli, and Peter Tompa Intrinsically Disordered Proteins 137 When the Concept was First Introduced 137 Functions and Functional Advantages Associated with Disorder 138 Importance of NMR to Study IDPs 140 Structural and Dynamic Information on IDPs – NMR Observables 141 Reduced Chemical Shift Dispersion and SequenceSpeciﬁc Assignment 141 Chemical Shifts and Secondary Structural Propensities 142 Additional Observables and Conformational Averaging 143 Scalar Couplings 143 Residual Dipolar Couplings 143 Solvent Exposure 144 15 N Relaxation and Heteronuclear NOEs 144 Paramagnetic Relaxation Rate Enhancements 145 Proton–Proton NOEs 145 Relevance of Structural Disorder in In Vivo and In-Cell Studies of IDPs 146 Protocols 146 Use of Bioinformatic Tools and Databases 146 Use of NMR in the Characterization of IDPs 148 Troubleshooting 150 Bioinformatics Can Help! 150 Understanding the Function of Unfolded Protein Regions 151 Does In Vitro Reﬂect In Vivo Behavior? 152 Further Reading 152 Paramagnetic Molecules 155 Ivano Bertini, Claudio Luchinat, and Giacomo Parigi Paramagnetism-Assisted NMR 155 Scalar and Dipolar Electron Spin–Nuclear Spin Interactions: Hyperﬁne Shift 157 Contact Contributions to the Hyperﬁne Shift 157 Pseudocontact Contributions to the Hyperﬁne Shift 158 Scalar and Dipolar Electron Spin–Nuclear Spin Interactions: PRE Indirect Electron Spin–Nuclear Spin Effects: Paramagnetism-Induced RDCs 161 Cross-Correlation Between Curie and Dipolar Relaxation 162 Good Metal Ions and Bad Metal Ions 163 Paramagnetism-Based Drug Discovery 164 Protocols 165 Collecting the Paramagnetism-Based Restraints 165 Protocols to Extract Structural Information from the PCS, PRDC, PRE and PCCR 166

159

Contents

8.8.3 8.8.4 8.8.5 8.9 8.9.1 8.9.2 8.9.3

Protocols for Protein–Protein Interactions 167 Protocols for the Analysis of Conformational Freedom in Two-Domain Proteins 168 Example Experiment 169 Troubleshooting 169 Tips and Tricks to Optimize Signal Detection in Paramagnetic Systems 169 Selection of the Paramagnetic Ion 170 Switching Between Diamagnetic and Paramagnetic Systems 170 Further Reading 170

Part Three

9 9.1 9.2 9.3 9.4 9.4.1 9.4.2 9.4.3 9.4.3.1 9.4.3.2 9.4.3.3 9.4.3.4 9.4.3.5 9.4.3.6 9.4.4 9.4.4.1 9.4.4.2 9.4.4.3 9.4.5 9.5 9.6 9.6.1 9.6.2 9.6.3 9.6.4 9.6.5 9.6.6 9.6.7 9.7

10 10.1 10.2 10.2.1 10.2.2 10.2.3 10.2.4 10.2.5

Role of NMR in the Study of the Structure and Dynamics of Biomolecular Interactions 173

NMR Methodologies for the Analysis of Protein–Protein Interactions 175 Tobias Madl and Michael Sattler Introduction 175 Dynamics and Ligand Binding 176 General Strategy 177 Overview of Methods 178 Sample Preparation 178 Structures of Domains/Subunits 179 Interfaces 180 Chemical Shift Perturbations (CSPs) 180 NOEs 180 Cross-Saturation 181 Differential Line-Broadening 181 Hydrogen Exchange 182 Solvent PREs 182 Domain/Subunit Orientation 183 NMR Relaxation Data 183 Residual Dipolar Couplings (RDCs) 184 Paramagnetic Restraints 184 Structure Calculations 185 Outlook 186 Protocols for the Analysis of Protein Complexes 186 Spin Labeling and Paramagnetic Tagging 186 Structures of the Individual Domains/Subunits 188 Optimizing Conditions for Structural Studies of the Protein Complex 189 Detection of Dynamics 189 Determining Interaction Interfaces Using Solvent PREs 189 Structure Calculation Approach 191 Example Experiment 193 Troubleshooting 194 Further Reading 194 Metal-Mediated Interactions 197 Simone Ciofi-Baffoni Theoretical Background 197 Protocol for the Structural Determination of a Metal-Mediated Complex 200 Optimization of Experimental Conditions to Obtain the Protein–Protein Complex 200 Titrations to Map Protein–Protein Interfaces and Obtain Binding Characteristics 201 Modeling 201 Deﬁnition of Metal Coordination 201 Determination of Three-Dimensional Structure 202

j

IX

X

j

Contents

10.3 10.4

Example Experiment 202 Troubleshooting 202 Further Reading 203

11

Protein–Paramagnetic Protein Interactions 205 Peter H.J. Keizers, Yoshitaka Hiruma, and Marcellus Ubbink Paramagnetic Sources in Protein Complexes 205 Types of NMR Restraints Obtained from Paramagnetic Centers Protein Complexes 207 Structures of Protein Complexes 207 Dynamics in Protein Complexes 208 Plastocyanin and Cytochrome f 209 Cytochrome c and Cytochrome c Peroxidase 209 Histidine Phosphocarrier Protein and Enzyme I 209 Adrenodoxin and Cytochrome c 210 Calmodulin Domain Dynamics 211 Protocols 211 Protein Titrations to Obtain the Kd and a Binding Map 211 Paramagnetic Tagging and Use of Paramagnets in NMR 212 Ensemble Modeling 214 Example Experiment 215 Troubleshooting 215 Further Reading 217

11.1 11.2 11.3 11.3.1 11.3.2 11.3.2.1 11.3.2.2 11.3.2.3 11.3.2.4 11.3.2.5 11.4 11.4.1 11.4.2 11.4.3 11.5 11.6

12 12.1 12.1.1 12.1.2 12.2 12.2.1 12.2.2 12.2.2.1 12.2.2.2 12.2.2.3 12.2.3 12.2.3.1 12.2.3.2 12.2.3.3 12.2.4 12.3 12.3.1 12.3.1.1 12.3.1.2 12.3.2 12.3.2.1 12.3.2.2 12.3.3 12.3.3.1 12.3.3.2 12.3.4 12.3.4.1 12.3.4.2

206

Protein–RNA Interactions 219 Vijayalaxmi Manoharan, Jose Manuel Perez-Ca~ nadillas, and Andres Ramos Introduction 219 Post-Transcriptional Regulation and RNA Recognition Domains 219 Protein–RNA Interfaces 219 NMR Methodology 221 Using NMR to Investigate Protein–RNA Interactions 221 Mapping Interaction Sites 222 Chemical Shift Perturbation 222 Cross-Saturation 223 Paramagnetic Relaxation Enhancement 224 Afﬁnity and Speciﬁcity 226 Determining Dissociation Constants 226 Determining Stoichiometry 226 Scaffold-Independent Analysis 227 High-Resolution Structure Determination and Dynamics 227 Protocols and Troubleshooting 228 Preparation of Protein–RNA Complexes 228 Materials and Sample Preparation 229 Troubleshooting 229 Protocol and Example Experiment 1: Calculating the Afﬁnity of a Protein–RNA Interaction 230 Materials and Sample Preparation 230 Troubleshooting 231 Protocol and Example Experiment 2: Paramagnetic Labeling 232 Materials and Sample Preparation 233 Troubleshooting 233 Protocol and Example Experiment 3: SIA 234 Materials and Sample Preparation 234 Troubleshooting 235 Further Reading 235

Contents

13 13.1 13.2 13.3 13.3.1 13.3.2 13.3.3 13.3.4 13.4

Protein–DNA Interactions 239 Lidija Kovacic and Rolf Boelens State of the Art 239 Conclusions and Perspectives 242 Protocols 243 Sample Preparation 243 NMR Methodology 244 Identiﬁcation of the Interaction Surfaces Structure Calculations 250 Troubleshooting 251 Further Reading 252

Part Four NMR in Drug Discovery 14

14.1 14.2 14.3

247

253

High-Throughput Screening and Fragment-Based Design: General Considerations for Lead Discovery and Optimization 255 Maurizio Pellecchia High-Throughput Screening and Fragment-Based Design 255 General Aspects of NMR Spectroscopy in Hit Identiﬁcation and Optimization Processes 257 Chemical Shift Perturbation as a Screening Method 261 Further Reading 263

Ligand-Observed NMR in Fragment-Based Approaches 265 z, Chris Abell, and Alessio Ciulli Pawe l Sled 15.1 Ligand-Observed NMR Spectroscopy 265 15.2 On the Transient Binding of Small Molecules to the Protein 266 15.3 Questions Asked by Ligand-Based Fragment Screening 267 15.3.1 Direct Yes/No Binding Experiments 268 15.3.1.1 STD 269 15.3.1.2 WaterLOGSY 270 15.3.1.3 Relaxation-Edited One-Dimensional Experiments 271 15.3.2 Afﬁnity Measurements and Afﬁnity-Oriented Screening 271 15.3.3 Information on the Binding Mode 272 15.4 Summary 274 15.5 Protocols 274 15.5.1 General Aspects of Sample Preparation 274 15.5.2 Preparing Samples 275 15.5.3 Pulse Sequences 276 15.5.4 Automation 276 15.6 Example Experiments 276 15.6.1 One-Dimensional Direct Binding Experiments 276 15.6.2 Competition NMR Screening Experiments 277 15.6.3 Two-Dimensional ILOE Binding Experiments 278 15.7 Troubleshooting 278 15.7.1 Sample Preparation 278 15.7.2 Experimental Setup 280 Further Readings 280 15

/

16 16.1 16.1.1 16.1.2 16.1.3 16.2 16.3

Interactions of Metallodrugs with DNA 283 Hong-Ke Liu and Peter J. Sadler Metallodrugs and DNA Interactions 283 Metallodrugs 283 General Features of Metallodrug–DNA Interactions 284 NMR Techniques and Metallodrug–DNA Interactions 285 Coordinative Binding 286 Groove Binding 289

j

XI

XII

j

Contents

16.3.1 16.3.2 16.4 16.4.1 16.4.2 16.5 16.5.1 16.5.2 16.6 16.6.1 16.6.2 16.6.3 16.6.4 16.7

17

17.1 17.2 17.3 17.4 17.5 17.6 17.6.1 17.6.2 17.6.3 17.6.4 17.6.5 17.7

18

18.1 18.2 18.2.1 18.2.2 18.2.3 18.2.4 18.2.5 18.2.6 18.3 18.4 18.5 18.5.1 18.5.2 18.5.3 18.5.4

Major Groove Binding 289 Minor Groove Binding 289 Intercalation and Insertion 290 Metallo-Intercalators 290 DNA Insertion and Metallo-Inserters 291 Dual Binding (Coordination and Intercalation) 291 Intercalation by Metallodrug with s-Bonded Intercalator 291 Intercalation by Metallodrug with p-Bonded Intercalator 292 Protocols 293 Determination of the Coordinative Binding Sites 293 Detection of Intercalation and Insertion 294 Detection of Metallodrug–DNA Groove-Binding Interactions 294 Example Experiment 294 Tricks and Troubleshooting 294 Further Reading 295 RNA as a Drug Target 299 Jan-Peter Ferner, Elke Duchardt Ferner, J€org Rinnenthal, Janina Buck, Jens W€ohnert, and Harald Schwalbe RNA as a Target for Small Molecules 299 Chemical Shift Perturbation and Paramagnetic Relaxation Enhancement 301 Nuclear Overhauser Effect-Based Methods 304 Fluorine Labeling of RNA 304 Ligand-Based Methods 305 Protocols 306 Determination of the Cooperativity Between Mg2 þ Binding Sites in RNA by CSP Analysis 306 Ligand-Binding Site Mapping by CSP Tracking 308 Mapping Mg2 þ -Binding Sites in RNA Using MnCl2 PRE Experiments 308 Ligand-Binding Site Mapping by NOE Methods 308 Mapping Outer Sphere Mg2 þ -Binding Sites by Co (NH3)3 þ 6 NOESY Experiments 311 Troubleshooting 312 Further Reading 313 Fluorine NMR Spectroscopy for Biochemical Screening in Drug Discovery 315 Claudio Dalvit Enzymatic Inhibition Mechanisms 316 n-FABS 317 One Substrate and One Enzyme 318 One Substrate and One Enzyme in the Presence of Serum Albumin and/ or Enzymes of the Cytochrome P450 Superfamily 318 One Substrate with Multiple Enzymes 319 Multiple Substrates with One Enzyme 320 Multiple Substrates with Multiple Enzymes 320 Application to Functional Genomics 321 Comparison of n-FABS with Other Biophysical Techniques 322 Outlook 323 Protocols 323 Protocol for Setup of the n-FABS Assay 323 Protocol for a Screening Run with n-FABS 324 Protocol for Measuring the IC50 of Identiﬁed Inhibitors 325 Example Experiment 326

Contents

18.6

Troubleshooting 326 Further Reading 327

NMR of Peptides 329 Johannes G. Beck, Andreas O. Frank, and Horst Kessler 19.1 Introduction 329 19.2 Resonance Assignment 330 19.3 Stereostructure and Conformational Restraints 330 19.3.1 NOEs and ROEs 331 3 19.3.2 J Coupling Constants 332 19.3.3 Residual Dipolar Couplings 332 19.4 Structure Calculation 333 19.5 Importance of Peptide Conformations for Biological Activity 334 19.6 Protocols 334 19.6.1 Resonance Assignment 334 19.6.2 ROESY: Extraction of Accurate Distances 336 3 19.6.3 J Couplings: Use in Structure Determination and Useful Experiments 19.6.4 Modeling of the Structure 339 19.6.4.1 Distance Geometry 340 19.6.4.2 MD 340 19.7 Troubleshooting 342 Further Reading 343 19

Part Five 20 20.1 20.2 20.3 20.4 20.5 20.6 20.7 20.8 20.8.1 20.8.2 20.8.3 20.8.3.1 20.8.3.2 20.8.4 20.8.4.1 20.8.5 20.8.5.1 20.8.5.2 20.8.5.3 20.8.5.4 20.8.6 20.8.7 20.8.8 20.8.9 20.9 20.9.1 20.9.2 20.9.3

Solid-State NMR

337

345

Biomolecular Solid-State NMR/Basics 347 Emeline Barbet-Massin and Guido Pintacuda Introduction 347 NMR Hamiltonian 347 Magic Angle Spinning 349 Cross-Polarization 350 Heteronuclear 1 H Decoupling 350 Dipolar Recoupling 351 Recent Progress: New Probes – Ultrafast MAS – High Magnetic Fields Protocols 355 Hardware Setup 355 Magic Angle Setting 356 Adamantane 356 MAS Spinning 356 Calibrate the 1 H Channel 357 Calibrate the 13 C Channel 357 Reference the Spectra 358 Cross-Polarization 358 Set the Spinning 358 Calibrate the 1 H Channel 358 Setup a Cross-Polarization Experiment 358 Calibrate the 13 C/15 N Fields 360 Heteronuclear Decoupling Sequences 360 DCP 360 Recoupling Sequences 361 Example of an Experiment 362 Troubleshooting 362 Tips to Optimize Signal Acquisition and Sensitivity 362 Narrowing the 13 C or 15 N Linewidths 363 Recoupling Optimization 363 Further Reading 364

354

j

XIII

XIV

j

Contents

21 21.1 21.2 21.3 21.4 21.5 21.6 21.6.1 21.6.2 21.6.3 21.6.4 21.6.5 21.6.6 21.6.7 21.6.8

22

22.1 22.2 22.3 22.3.1 22.3.1.1 22.3.1.2 22.3.1.3 22.3.2 22.3.2.1 22.3.2.2 22.3.2.3 22.3.2.4 22.3.2.5 22.3.3 22.3.3.1 22.3.3.2 22.3.3.3 22.4 22.4.1 22.4.2 22.5

22.6 22.6.1 22.6.2 22.6.3 22.6.4 22.6.5 22.6.6 22.7

Protein Dynamics in the Solid State 367 Jozef R. Lewandowski and Lyndon Emsley Introduction 367 Basic Concepts 369 Coherent versus Incoherent Processes: Decay is not Always Relaxation 370 Deuterium as a Probe of Dynamics 371 15 N and 13C T1 – Spin-Lattice Relaxation 372 Protocols 373 Measuring 15N T1 Relaxation 373 Measuring 13C T1 Relaxation 373 Heteronuclear NOE 373 Measuring 15N Dipolar–CSA Cross-Correlated Relaxation 374 Measuring 15N R1r 374 Measuring Motionally Averaged Secular Interactions 374 Slow Conformational Exchange 374 Identiﬁcation of Highly Mobile Sites 374 Further Reading 375 Microcrystalline Proteins – An Ideal Benchmark for Methodology Development 377 W. Trent Franks, Barth-Jan van Rossum, Benjamin Bardiaux, Enrico Ravera, Giacomo Parigi, Claudio Luchinat, and Hartmut Oschkinat Microcrystalline Protein Sample Preparation 377 Sequential Assignment of Proteins 378 Structural Restraints 380 Distance Measurements: Cross-Relaxation-Like Transfer 380 PDSD/DARR/RAD 380 ChhC and NhhC Experiments 381 1H-Detected NOE 381 Distance Measurements: Multiple-Pulse Dipolar Recoupling 382 RFDR 382 TSAR, PAR, and PAIN 382 REDOR 382 TEDOR 382 Proton Dipolar Recoupling: TMREV and LG-CP 383 Parameters Encoding Torsion Angles 383 Chemical Shifts 383 Double Dip-Shift Spectroscopy 383 CSA–Dipolar Interactions 384 Paramagnetic Systems 384 PRE 384 PCS 386 Benchmarking of the Solid-State NMR Structure Determination Methodology: Comparison of Structure Calculation Protocols and Accuracy of Structures 386 Protocols 389 Three-Dimensional Nanocrystalline Sample Preparation 389 Sequential Assignments of Solid-State NMR Spectra 389 Distance Restraint Assignments with Solid-State NMR 389 Secondary Structure Prediction by TALOS 390 Dipolar Tensor, CSA, and Vector Angle Fitting 390 Use of PCS for Structural Restraints: Structure of the Catalytic Domain of MMP-12 as an Example 390 Troubleshooting 391 Further Reading 392

Contents

23 23.1 23.2 23.3 23.4 23.4.1 23.4.2 23.4.3 23.4.4 23.4.5 23.5

24 24.1 24.1.1 24.1.2 24.1.3 24.1.4 24.2 24.2.1 24.2.2 24.3 24.3.1 24.3.2 24.3.3 24.3.4 24.3.5 24.4 24.4.1 24.4.2

Structural Studies of Protein Fibrils by Solid-State NMR 395 Anja B€ockmann and Beat H. Meier Background 395 NMR Spectra of Fibrils 396 Outlook 397 Protocols and Examples 398 Sample Preparation 398 Experiments for the Resonance Assignment 398 Experiments to Obtain Structural Restraints 400 Flexible Elements 401 Structure Calculation in Fibrils 402 Troubleshooting 404 Further Reading 405 Solid-State NMR on Membrane Proteins: Methods and Applications 407 A.A. Cukkemane, M. Renault, and M. Baldus Solid-State NMR of Membrane Proteins 407 Magic Angle Spinning 407 Methods for High-Resolution Structural Investigation of Membrane Proteins 408 Investigation of Membrane Protein Topology 409 Sensitivity and Resolution 410 MAS Applied to Ion Channels and Retinal Proteins 411 Retinal Proteins 411 Chimeric Potassium Channel KcsA–Kv1.3 411 Protocols 413 Sample Preparation 413 Isotope Labeling 414 Resonance Assignment 415 Collecting NMR Restraints 415 A Real Experiment 416 Troubleshooting 417 Tips and Tricks in Optimizing Sample Preparation 417 Improving Sensitivity 417 Further Reading 417

Part Six Frontiers in NMR Spectroscopy 25 25.1 25.2 25.2.1 25.2.2 25.2.3 25.2.4 25.3 25.3.1 25.3.2 25.3.3 25.3.4 25.4 25.5 25.5.1 25.5.2 25.5.3 25.5.4

419

Dynamic Nuclear Polarization 421 Thomas F. Prisner Dynamic Nuclear Polarization at High Magnetic Fields Theoretical Background 422 Overhauser Effect 423 Solid Effect 425 Three-Spin Cross-Polarization: Cross-Effect 425 Many-Spin Cross-Polarization: Thermal Mixing 426 Protocols 427 HF Liquid DNP Spectrometers 427 Shuttle DNP 428 SS MAS DNP 428 Low-Temperature Dissolution Polarizer 429 Example Experiment 429 Perspectives 430 HF Liquid DNP 430 Shuttle DNP 430 SS MAS DNP 430 Dissolution DNP 431 Further Reading 431

421

j

XV

XVI

j

Contents

26 26.1 26.1.1 26.1.2 26.1.3 26.2 26.3

27 27.1 27.1.1 27.1.2 27.2 27.2.1 27.2.1.1 27.2.1.2 27.2.1.3 27.2.2 27.2.2.1 27.2.2.2 27.2.2.3 27.2.2.4 27.2.3 27.2.3.1 27.2.3.2 27.2.3.3 27.3 27.3.1 27.3.1.1 27.3.1.2 27.3.2 27.3.3 27.3.3.1 27.3.3.2 27.3.4 27.3.5

13

C Direct Detection NMR 433 Isabella C. Felli and Roberta Pierattelli 13 C Direct Detection NMR for Biomolecular Applications 433 13 C NMR Properties and Application Areas 433 Problem of Homonuclear Decoupling in the Direct Acquisition Dimension 436 Experiments for High-Resolution NMR 438 Protocols for Experimental Setup 439 Troubleshooting 442 Further Reading 442 Speeding Up Multidimensional NMR Data Acquisition 445 Bernhard Brutscher, Dominique Marion, and Lucio Frydman Multidimensional NMR: Basic Concepts and Features 445 Discrete Data Sampling, Aliasing, and Truncation Artifacts 446 Time Requirements in N-Dimensional NMR 447 Fast Methods in N-Dimensional NMR 447 Sparse Time-Domain Data Sampling and Processing 447 Sampling Scheme and the Point Spread Function 448 Sparse Sampling Schemes 448 Alternative Data Processing Methods 449 Fast-Pulsing Methods 451 Repetition Rate and Experimental Sensitivity 452 Longitudinal 1H Relaxation Enhancement 452 Ernst Angle Excitation 453 BEST and SOFAST Experiments 453 Single-Scan Ultrafast Two-Dimensional NMR 454 General Principles 454 Spatiotemporal Encoding Process 455 Spatiotemporal Decoding: Two-Dimensional NMR in One Scan 455 Protocols for Fast N-Dimensional NMR and Troubleshooting 457 Setting Up an Appropriate Sampling Grid 457 Information-Driven Spectral Aliasing 458 Random Sparse Sampling and MaxEnt Processing 458 Practical Considerations for the Setup of Fast-Pulsing Experiments 459 Setting Up a Single-Scan Two-Dimensional Experiment 460 Properties of Different Encoding Schemes 460 Single-Scan Two-Dimensional NMR Spectrum: Characteristics 461 Fast Two-Dimensional NMR for Sample Quality Screening and Molecular Interaction Studies 462 Real-Time Two-Dimensional NMR Measurements 463 Further Reading 465

Metabolomics 467 Leonardo Tenori 28.1 Metabolomics in Systems Biology 467 28.2 NMR and Metabolomics 469 28.3 Data Analysis 471 28.4 Success in the Application of Metabolomics 472 28.5 Protocols 473 28.5.1 Serum Extraction 473 28.5.2 Plasma Extraction 473 28.5.3 Urine Collection 474 28.5.4 Tip on Samples Labeling 474 28.5.5 Sample Preparation for NMR Analysis 474 28.5.5.1 Urine 474 28

Contents

28.5.5.2 28.5.6 28.5.7 28.5.7.1 28.5.7.2 28.5.8 28.5.9 28.5.10 28.5.11 28.6

Serum/Plasma 475 Buffer Recipes 475 NMR Analysis (Working with a 600-MHz Bruker Spectrometer) Urine 475 Serum/Plasma 475 Spectral Processing 476 Bucketing 476 Data Mining 476 Example Experiment 477 Troubleshooting 477 Further Reading 477

29

In-Cell Protein NMR Spectroscopy 479 David S. Burz, David Cowburn, Kaushik Dutta, and Alexander Shekhtman Background 479 Speciﬁc Applications 480 Structure Determination 481 Perturbations within the Cell – How In-Cell and In Vitro are Different Environments 481 Speciﬁc Protein–Protein Interactions: STINT-NMR 483 Other Protein–Ligand Interactions 484 Post-Translational Modiﬁcation – In-Cell Biochemistry 484 Other In-Cell Biochemical Studies 485 Nucleic Acids In-Cell 485 Conclusions and Future Directions 485 Protocols and Example Experiments 486 Experimental Design of Multiple Expression Systems In-Cell 486 Detailed STINT Protocol 487 Data Acquisition and Analysis 488 Example Experiment 489 Detailed SMILI-NMR Protocol 489 Example Experiment 489 Detailed Protocol for Post-Translational Modiﬁcation 491 Example Experiment 491 Troubleshooting 492 No Signal 492 Weak Signal 493 Signal is Extracellular 493 Control Experiments for Protein Leakage/Cell Lysis 493 Cell Viability 493 Cell Growth 493 Further Reading 493

29.1 29.2 29.2.1 29.2.2 29.2.3 29.2.4 29.2.5 29.2.5.1 29.2.6 29.3 29.4 29.4.1 29.4.2 29.4.2.1 29.4.2.2 29.4.3 29.4.3.1 29.4.4 29.4.4.1 29.5 29.5.1 29.5.2 29.5.3 29.5.3.1 29.5.3.2 29.5.4

30 30.1 30.2 30.3 30.4 30.5 30.6 30.7 30.8 30.8.1 30.8.2 30.8.3

475

Structural Investigation of Cell-Free Expressed Membrane Proteins 497 Solmaz Sobhanifar, Sina Reckel, Frank L€ohr, Frank Bernhard, and Volker D€otsch Introduction 497 Cell-Free Expression of Membrane Proteins 498 Cell-Free Expression in Membrane-Mimetic Environments 499 Strategies for Functional Protein Expression 500 Cell-Free Approaches for Structural Studies 501 Cell-Free Labeling Strategies for Backbone Assignment 502 Structure Determination with Limited Nuclear Overhauser Effect Long-Distance Restraints 502 Protocols 504 Extract 504 Reaction Devices 505 DNA Template Preparation 506

j

XVII

XVIII

j

Contents

30.8.4 30.8.5 30.9

Different Modes for the Cell-Free Production of Membrane Proteins 506 Example Experiment 507 Troubleshooting 507 Further Reading 508

Part Seven Computational Aspects 31 31.1 31.2 31.3 31.3.1 31.3.2 31.3.3 31.4

32

32.1 32.1.1 32.1.2 32.2 32.2.1 32.2.2 32.2.3 32.2.4 32.2.5 32.2.6 32.2.7 32.2.8 32.2.9 32.3 32.3.1 32.3.2 32.3.3 32.3.4 32.3.5 32.3.6 32.3.7 32.4 32.4.1 32.4.2 32.4.3 32.5 32.5.1 32.5.2 32.5.3

33 33.1 33.2 33.2.1

509

Grid Computing 511 Antonio Rosato Grid Infrastructure 511 e-NMR Web Platform 512 Protocols 515 How to Register with the e-NMR/WeNMR VO Performing a CYANA Calculation 516 Reﬁning a Structure with AMBER 516 Troubleshooting 517 Further Reading 518

515

Protein–Protein Docking with HADDOCK 521 Christophe Schmitz, Adrien S.J. Melquiond, Sjoerd J. de Vries, Ezgi Karaca, Marc van Dijk, Panagiotis L. Kastritis, and Alexandre M.J.J. Bonvin Protein–Protein Docking: General Concepts 521 Why Protein–Protein Docking? 521 General Methods for Protein–Protein Docking 522 Gathering Experimental Information for Data-Driven Docking 522 Chemical Shift Perturbations 523 Cross-Saturation Experiments 524 Hydrogen/Deuterium Exchange 524 Intermolecular NOEs 524 Paramagnetic Relaxation Enhancement 524 Pseudocontact Shift 525 Residual Dipolar Coupling 525 Diffusion Anisotropy 526 Non-NMR Information 526 How Does HADDOCK Use the Information? 526 Incorporation of Ambiguous Distance Restraints 527 Incorporation of Unambiguous Distance Restraints 527 Incorporation of Shape Restraints 528 Incorporation of Orientation Restraints 528 Symmetry Restraints 528 Additional Docking Mode 528 Overview of a HADDOCK Run 529 Protocol: A Guided Tour of the HADDOCK Web Interface 530 Prerequisite: Registration 530 Description of the Web Interface 531 Analysis of the Docking Run 533 Troubleshooting 534 General Considerations 534 Problems Related to the PDB File 534 Problems Encountered During Docking 535 Further Reading 535 Automated Protein Structure Determination Methods Paul Guerry and Torsten Herrmann NMR Experiment-Driven Protein Modeling 537 NOE-Based Structure Determination 538 Chemical Shift Ambiguity 539

537

Contents

33.2.2 33.2.3 33.2.4 33.2.5 33.3 33.4 33.5 33.6 33.7 33.7.1 33.7.2 33.7.3

Ambiguous Distance Restraints 539 Using Intermediate Structures and Elimination of Artifacts 540 Structure Calculation and Energy Reﬁnement 540 Structure Validation 541 Sequence-Speciﬁc Resonance Assignment 541 NMR Signal Identiﬁcation 542 Perspectives 542 Protocols 543 Example Structure Determination and Troubleshooting 544 Sequence-Speciﬁc Resonance Assignment 545 Amino Acid Sequence, Cis/Trans Isomerization, and Redox State 545 NOE Assignment, Structure Calculation, and Structure Validation 545 Further Reading 546

34

NMR Structure Determination of Protein–Ligand Complexes 549 Ulrich Schieborr, Sridhar Sreeramulu, and Harald Schwalbe Protein–Ligand Complex Structure Determination by NMR 549 Methods for High-Afﬁnity Binders 550 Methods for Low-Afﬁnity Binders 552 Protocols and Troubleshooting 558 Further Reading 561

34.1 34.2 34.3 34.4

35 35.1 35.2 35.3 35.4 35.5 35.6 35.7 35.7.1 35.7.2 35.7.3 35.7.4 35.7.5 35.8

Small Angle X-Ray Scattering/Small Angle Neutron Scattering as Methods Complementary to NMR 563 M.V. Petoukhov and D.I. Svergun Introduction 563 Invariants 566 Ab Initio Shape Determination 567 Validation of Atomic Models 568 Rigid-Body Modeling of Quaternary Structure 568 Equilibrium Mixtures and Flexible Systems 569 Protocols 570 Validation of High-Resolution Models by Solution Scattering 570 Unrestrained Rigid-Body Modeling of a Complex 571 Rigid-Body Modeling with Contact Restraints from NMR Chemical Shifts 572 Rigid-Body Modeling of a Complex with Orientational Constraints from NMR RDCs 572 Characterization of Flexible Systems 573 Troubleshooting 573 Further Reading 574

575

References Index

609

j

XIX

j

Preface

The use of NMR to solve protein structures has a tradition that dates back to 1984 (M.P. Williamson, T.F. Havel, and K. W€ uthrich (1985) J. Mol. Biol. 182, 295). Since that time, the role of NMR in structural biology has constantly increased in terms of the number of researchers involved and the scientiﬁc relevance of the results. Spectrometers are becoming more and more powerful, with magnetic ﬁelds that currently reach 22 T, and high-temperature superconducting materials raise the possibility that this value can be surpassed. The investment required for a magnet of the above intensity is currently around D 10 million (US$14.3 million) and an estimate of D 18 million (US$25.7 million) is reasonable for new-generation magnets. It is clear that NMR is a technology that deserves a special place in research infrastructures, as individual schools may ﬁnd it difﬁcult to have a battery of machines, all at the forefront of the technology, dedicated to various types of experiments. In 1994, the European Commission (EC) began ﬁnancing transnational access to NMR instrumentation at some research infrastructures, which have continued and grown in number until the present with the EC-funded Bio-NMR1) project. In Europe, the recent European Strategy Forum for Research Infrastructures (ESFRI) Roadmap identiﬁes NMR as a fundamental node in the Integrated Structural Biology Infrastructure (INSTRUCT),2) while it also plays a role in the EU-OPENSCREEN (European Infrastructure of Open Screening Platforms for Chemical Biology)3) infrastructure, Euro-BioImaging,4) and Biobanking and Biomolecular Resources Research Infrastructure (BBMRI).5) The EC-funded electronic infrastructures (e-NMR6) and WeNMR7)) provide nonspecialists with tools for automatic data handling, structure calculations, molecular dynamics simulations, and the creation of interaction models in such a way that the potential of the NMR technology can blossom in favor of the progress of science. Much of this reasoning was debated during the FP6-funded Coordination Action NMR-Life8) and resulted in a booklet entitled NMR in Mechanistic Systems Biology,9) which ultimately served as the spark for this volume. We were pleased when Gregor 1)

http://www.bio-nmr.net.

2)

http://www.structuralbiology.eu.

3)

http://www.eu-openscreen.eu.

4)

http://www.eurobioimaging.eu.

5)

http://www.bbmri.eu.

6)

http://www.enmr.eu.

7)

http://www.wenmr.eu.

8)

http://www.postgenomicnmr.net.

9)

http://www.postgenomicnmr.net/NMRLife/docs/NMR_in_MSB.pdf.

XXI

XXII

j

Preface

Cicchetti from Wiley-VCH noticed this booklet and proposed that we edit a book on the very same subject, and we gathered a number of outstanding contributors to fulﬁll the task. Our intention, which we hope pervades the book, was to provide a text for graduate students, junior post-docs, and other newcomers that would serve as an introduction to the ﬁeld, addressing classical NMR approaches from solution to the solid state, providing some tips and tricks not available in journal articles, and providing perspectives on future developments. It is our hope that the Protocols and Troubleshooting sections will be of assistance and guidance when choosing experiments and overcoming difﬁculties. However, everyone who has experience in editing books knows how difﬁcult a task it is – obtaining the manuscripts on time, convincing everyone to adhere to a template and write for students and not for their fellow professors, and even drawing the line on what content to include and when to call an end to the editorial process, including substitution of recalcitrant contributors. We editors have tried our best to overcome these difﬁculties, but we are aware that much more could have been done. For example, the development of isotopic labeling has been fundamental for the development of NMR, but we decided not to address it here. The reader should therefore be aware that the ﬁeld of NMR is even broader and more exciting than it appears from our efforts! Part I of the book (Introduction) explains NMRs role in Mechanistic Systems Biology and provides a broad overview of biomolecular structure before identifying what NMR can teach us about the structure and dynamics of biomolecules. Parts II– VII address a series of relevant topics in NMR-driven biological research: the role of NMR in the study of the structure and dynamics of biomolecules, its role in the study of the structure and dynamics of biomolecular interactions, NMR in drug discovery, solid-state NMR, frontiers in NMR spectroscopy, and computational aspects. We would like to take the opportunity to thank, in addition to Gregor, Dr. Marco Fragai of the Center for Magnetic Resonance (CERM) at the University of Florence for his assistance in editing some chapters of the book, and Professor Claudio Luchinat, who consistently demonstrates his friendship and his willingness to support any initiative of CERM, and the scientiﬁc personnel of CERM who have contributed to discussions and sustained the work. It is our sincere hope that this book will ﬁnd a home not only in NMR facilities, but also in biomedical laboratories around the world, where it can be of use to the broader scientiﬁc community and help diffuse NMR as a technique for the study of biological systems.

Florence, January 2012

Ivano Bertini Kathleen S. McGreevy Giacomo Parigi

j

XXIII

List of Contributors Chris Abell University of Cambridge University Chemical Laboratory Lensﬁeld Road Cambridge CB2 1EW UK

Johannes G. Beck Technische Universität München Department Chemie Lichtenbergstrasse 4 85747 Garching Germany

Alexandre M.J.J. Bonvin Utrecht University Bijvoet Center for Biomolecular Research Padualaan 8 3584 CH Utrecht The Netherlands

Marc Baldus Utrecht University Bijvoet Center for Biomolecular Research Padualaan 8 3584 CH Utrecht The Netherlands

Frank Bernhard Goethe University Frankfurt Institute of Biophysical Chemistry and Center for Biomolecular Magnetic Resonance (BMRZ) Max-von-Laue-Strasse 9 60438 Frankfurt am Main Germany

Bernhard Brutscher Institut de Biologie Structurale – JeanPierre Ebel UMR5075 CNRS-CEA-UJF 41 rue Jules Horowitz 38027 Grenoble Cedex France

Lucia Banci University of Florence Department of Chemistry and Magnetic Resonance Center (CERM) Via L. Sacconi 6 50019 Sesto Fiorentino Italy Emeline Barbet-Massin Université de Lyon Institut de Sciences Analitiques Centre de RMN à Très Hauts Champs 5 rue de la Doua 69100 Villeurbanne France Benjamin Bardiaux Leibniz-Institut für Molekulare Pharmakologie NMR-Supported Structural Biology Robert-Rössle-Strasse 10 13125 Berlin Germany

Ivano Bertini University of Florence Department of Chemistry and Magnetic Resonance Center (CERM) Via L. Sacconi 6 50019 Sesto Fiorentino Italy

Janina Buck Goethe University Frankfurt Center for Biomolecular Magnetic Resonance (BMRZ) and Institute of Organic Chemistry and Chemical Biology Max-von-Laue-Straße 7 60438 Frankfurt am Main Germany

Anja Böckmann Université de Lyon IBCP UMR 5086 CNRS 7 passage du Vercors 69367 Lyon France

David S. Burz University at Albany Department of Chemistry 1400 Washington Avenue Albany, NY 12222 USA

Rolf Boelens Utrecht University Bijvoet Center for Biomolecular Research Padualaan 8 3584 CH Utrecht The Netherlands

Francesca Cantini University of Florence Department of Chemistry and Magnetic Resonance Center (CERM) Via L. Sacconi 6 50019 Sesto Fiorentino Italy

XXIV

j

List of Contributors

Mirko Cevec Goethe University Frankfurt Center for Biomolecular Magnetic Resonance (BMRZ) and Institute of Organic Chemistry and Chemical Biology Max-von-Laue-Straße 7 60438 Frankfurt Germany

Volker Dötsch Goethe University Frankfurt Institute of Biophysical Chemistry and Center for Biomolecular Magnetic Resonance (BMRZ) Max-von-Laue-Strasse 9 60438 Frankfurt am Main Germany

Simone Ciofi-Baffoni University of Florence Department of Chemistry and Magnetic Resonance Center (CERM) Via L. Sacconi 6 50019 Sesto Fiorentino Italy

Elke Duchardt-Ferner Goethe University Frankfurt Center for Biomolecular Magnetic Resonance (BMRZ) and Institute of Molecular Biosciences Max-von-Laue-Straße 9 60438 Frankfurt am Main Germany

Alessio Ciulli University of Cambridge University Chemical Laboratory Lensﬁeld Road Cambridge CB2 1EW UK

Kaushik Dutta New York Structural Biology Center New York, NY 10027 USA

David Cowburn Yeshiva University Albert Einstein College of Medicine 1300 Morris Park Avenue Bronx, NY 10461 USA Abhishek A. Cukkemane Utrecht University Bijvoet Center for Biomolecular Research Padualaan 8 3584 CH Utrecht The Netherlands Claudio Dalvit University of Neuchâtel Department of Chemistry Avenue de Bellevaux 51 2000 Neuchâtel Switzerland Marc van Dijk Utrecht University Bijvoet Center for Biomolecular Research Padualaan 8 3584 CH Utrecht The Netherlands

Lyndon Emsley Université de Lyon CNRS/ENS Lyon/UCB-Lyon 1 Centre de RMN à Très Hauts Champs 5 rue de la Doua 69100 Villeurbanne France Isabella C. Felli University of Florence Department of Chemistry and Magnetic Resonance Center (CERM) Via L. Sacconi 6 50019 Sesto Fiorentino Italy Lucio Ferella University of Florence Magnetic Resonance Center (CERM) Via L. Sacconi 6 50019 Sesto Fiorentino Italy Jan-Peter Ferner Goethe University Frankfurt Center for Biomolecular Magnetic Resonance (BMRZ) and Institute of Organic Chemistry and Chemical Biology Max-von-Laue-Straße 7 60438 Frankfurt am Main Germany

Andreas O. Frank Vanderbilt University School of Medicine Department of Biochemistry 802 Robinson Research Building Nashville, TN 37232-0146 USA W. Trent Franks Leibniz-Institut für Molekulare Pharmakologie NMR-Supported Structural Biology Robert-Rössle-Strasse 10 13125 Berlin Germany Lucio Frydman Weizmann Institute of Science Department of Chemical Physics Chemical Science Building Rehovot 76100 Israel Paul Guerry Université de Lyon CNRS/ENS Lyon/UCB-Lyon 1 Centre de RMN à Très Hauts Champs 5 rue de la Doua 69100 Villeurbanne France Torsten Herrmann Université de Lyon CNRS/ENS Lyon/UCB-Lyon 1 Centre de RMN à Très Hauts Champs 5 rue de la Doua 69100 Villeurbanne France Yoshitaka Hiruma Leiden University Leiden Institute of Chemistry Gorlaeus Laboratories Einsteinweg 55 2333 CC Leiden The Netherlands Hendrik R.A. Jonker Goethe University Frankfurt Johann Wolfgang Goethe-University Center for Biomolecular Magnetic Resonance (BMRZ) and Institute of Organic Chemistry and Chemical Biology Max-von-Laue-Straße 7 60438 Frankfurt Germany

List of Contributors

Ezgi Karaca Utrecht University Bijvoet Center for Biomolecular Research Padualaan 8 3584 CH Utrecht The Netherlands Panagiotis L. Kastritis Utrecht University Bijvoet Center for Biomolecular Research Padualaan 8 3584 CH Utrecht The Netherlands Peter H.J. Keizers Leiden University Leiden Institute of Chemistry Gorlaeus Laboratories Einsteinweg 55 2333 CC Leiden The Netherlands Horst Kessler Technische Universität München Department Chemie Lichtenbergstrasse 4 85747 Garching Germany Lidija Kovacic University of Utrecht Bijvoet Center for Biomolecular Research Padualaan 8 3584 CH Utrecht The Netherlands and Jožef Stefan Institute Department of Molecular and Biomedical Sciences Jamova cesta 39 1000 Ljubljana Slovenia Józef R. Lewandowski Université de Lyon CNRS/ENS Lyon/UCB-Lyon 1 Centre de RMN à Très Hauts Champs 5 rue de la Doua 69100 Villeurbanne France

Hong-Ke Liu Nanjing Normal University College of Chemistry and Materials Science Jiangsu Key Laboratory of Biofunctional Materials Wenyuan Road 1, Nanjing 210046 China Frank Löhr Goethe University Frankfurt Institute of Biophysical Chemistry and Center for Biomolecular Magnetic Resonance (BMRZ) Max-von-Laue-Strasse 9 60438 Frankfurt am Main Germany Claudio Luchinat University of Florence Department of Chemistry and Magnetic Resonance Center (CERM) Via L. Sacconi 6 50019 Sesto Fiorentino Italy Tobias Madl Helmholtz Zentrum München Institute of Structural Biology Ingolstädter Landstrasse 1 85764 Neuherberg Germany and Technische Universität München Biomolecular NMR and Munich Center for Integrated Protein Science Department Chemie Lichtenbergstrasse 4 85747 Garching Germany Vijayalaxmi Manoharan Medical Research Council National Institute for Medical Research Molecular Structure Division The Ridgeway Mill Hill London NW7 1AA UK Dominique Marion Institut de Biologie Structurale – JeanPierre Ebel UMR5075 CNRS-CEA-UJF 41 rue Jules Horowitz 38027 Grenoble Cedex France

j

XXV

Kathleen S. McGreevy University of Florence Department of Chemistry and Magnetic Resonance Center (CERM) Via L. Sacconi 6 50019 Sesto Fiorentino Italy Beat H. Meier ETH Zurich Laboratory of Physical Chemistry Wolfgang-Pauli-Strasse 10 8093 Zurich Switzerland Adrien S.J. Melquiond Utrecht University Bijvoet Center for Biomolecular Research Padualaan 8 3584 CH Utrecht The Netherlands Senada Nozinovic Goethe University Frankfurt Center for Biomolecular Magnetic Resonance (BMRZ) and Institute of Organic Chemistry and Chemical Biology Max-von-Laue-Straße 7 60438 Frankfurt Germany Hartmut Oschkinat Leibniz-Institut für Molekulare Pharmakologie NMR-Supported Structural Biology Robert-Rössle-Strasse 10 13125 Berlin Germany Giacomo Parigi University of Florence Department of Chemistry and Magnetic Resonance Center (CERM) Via L. Sacconi 6 50019 Sesto Fiorentino Italy Maurizio Pellecchia Sanford-Burnham Medical Research Institute 10901 North Torrey Pines Road La Jolla, CA 92037 USA

XXVI

j

List of Contributors

Jose Manuel Perez-Canadillas Consejo Superior de Investigaciones Cientíﬁcas (CSIC) Instituto de Quimica Fisica “Rocasolano” Serrano 119 28006 Madrid Spain Maxim V. Petoukhov European Molecular Biology Laboratory Hamburg Outstation Notkestrasse 85 22607 Hamburg Germany Roberta Pierattelli University of Florence Department of Chemistry and Magnetic Resonance Center (CERM) Via L. Sacconi 6 50019 Sesto Fiorentino Italy Guido Pintacuda Université de Lyon Institut de Sciences Analitiques Centre de RMN à Très Hauts Champs 5 rue de la Doua 69100 Villeurbanne France Janez Plavec National Institute of Chemistry Slovenian NMR Center Hajdrihova 19 1000 Ljubljana Slovenia and EN-FIST Center of Excellence Dunajska 156 1000 Ljubljana Slovenia and University of Ljubljana Faculty of Chemistry and Chemical Technology Askerceva cesta 5 1000 Ljubljana Slovenia

Thomas F. Prisner Goethe University Frankfurt Institute of Physical and Theoretical Chemistry and Center for Biomolecular Magnetic Resonance (BMRZ) Max-von-Laue-Strasse 7 60438 Frankfurt Germany

Jörg Rinnenthal Goethe University Frankfurt Center for Biomolecular Magnetic Resonance (BMRZ) and Institute of Organic Chemistry and Chemical Biology Max-von-Laue-Straße 7 60438 Frankfurt am Main Germany

Andres Ramos Medical Research Council National Institute for Medical Research Molecular Structure Division The Ridgeway Mill Hill London NW7 1AA UK

Antonio Rosato University of Florence Department of Chemistry and Magnetic Resonance Center (CERM) Via L. Sacconi 6 50019 Sesto Fiorentino Italy

Enrico Ravera University of Florence Department of Chemistry and Magnetic Resonance Center (CERM) Via L. Sacconi 6 50019 Sesto Fiorentino Italy Sina Reckel Goethe University Frankfurt Institute of Biophysical Chemistry and Center for Biomolecular Magnetic Resonance (BMRZ) Max-von-Laue-Strasse 9 60438 Frankfurt am Main Germany Marie Renault Utrecht University Bijvoet Center for Biomolecular Research Padualaan 8 3584 CH Utrecht The Netherlands Christian Richter Johann Wolfgang Goethe-University Center for Biomolecular Magnetic Resonance (BMRZ) and Institute of Organic Chemistry and Chemical Biology Max-von-Laue-Straße 7 60438 Frankfurt Germany

Barth-Jan van Rossum Leibniz-Institut für Molekulare Pharmakologie NMR-Supported Structural Biology Robert-Rössle-Strasse 10 13125 Berlin Germany Peter J. Sadler University of Warwick Department of Chemistry Gibbet Hill Road Coventry CV4 7AL UK Michael Sattler Helmholtz Zentrum München Institute of Structural Biology Ingolstädter Landstrasse 1 85764 Neuherberg Germany and Technische Universität München Biomolecular NMR and Munich Center for Integrated Protein Science Department Chemie Lichtenbergstrasse 4 85747 Garching Germany Ulrich Schieborr Goethe University Frankfurt Center for Biomolecular Magnetic Resonance (BMRZ) and Institute of Organic Chemistry and Chemical Biology Max-von-Laue-Straße 7 60438 Frankfurt am Main Germany

List of Contributors

Christophe Schmitz Utrecht University Bijvoet Center for Biomolecular Research Science Faculty Padualaan 8 3584 CH Utrecht The Netherlands Harald Schwalbe Goethe University Frankfurt Center for Biomolecular Magnetic Resonance (BMRZ) and Institute of Organic Chemistry and Chemical Biology Max-von-Laue-Straße 7 60438 Frankfurt am Main Germany Alexander Shekhtman University at Albany Department of Chemistry 1400 Washington Avenue Albany, NY 12222 USA Vladimír Sklenár Masaryk University National Center for Biomolecular Research Faculty of Science and Central European Institute of Technology (CEITEC) Kamenice 5 625 00 Brno Czech Republic

Sridhar Sreeramulu Goethe University Frankfurt Center for Biomolecular Magnetic Resonance (BMRZ) and Institute of Organic Chemistry and Chemical Biology Max-von-Laue-Straße 7 60438 Frankfurt am Main Germany Richard Štefl Masaryk University National Center for Biomolecular Research Faculty of Science and Central European Institute of Technology (CEITEC) Kamenice 5 625 00 Brno Czech Republic Dimitri I. Svergun European Molecular Biology Laboratory Hamburg Outstation Notkestrasse 85 22607 Hamburg Germany Leonardo Tenori University of Florence Magnetic Resonance Center (CERM) and FiorGen Foundation Via L. Sacconi 6 50019 Sesto Fiorentino Italy

Paweł S´ledz University of Cambridge University Chemical Laboratory Lensﬁeld Road Cambridge CB2 1EW UK

Peter Tompa VIB Department of Structural Biology Vrije Universiteit Brussel Pleinlaan 2 1050 Brussels Belgium

Solmaz Sobhanifar Goethe University Frankfurt Institute of Biophysical Chemistry and Center for Biomolecular Magnetic Resonance (BMRZ) Max-von-Laue-Strasse 9 60438 Frankfurt am Main Germany

and Hungarian Academy of Sciences Institute of Enzymology Biological Research Center t 29 Karolina u 1113 Budapest Hungary

j

XXVII

Paola Turano University of Florence Department of Chemistry and Magnetic Resonance Center (CERM) Via L. Sacconi 6 50019 Sesto Fiorentino Italy Marcellus Ubbink Leiden University Leiden Institute of Chemistry Gorlaeus Laboratories Einsteinweg 55 2333 CC Leiden The Netherlands Sjoerd J. de Vries Utrecht University Bijvoet Center for Biomolecular Research Padualaan 8 3584 CH Utrecht The Netherlands and Technische Universität München Physik-Department T38 James Franck Straße 1 85748 Garching Germany Jens Wöhnert Goethe University Frankfurt Center for Biomolecular Magnetic Resonance (BMRZ) and Institute of Molecular Biosciences Max-von-Laue-Straße 9 60438 Frankfurt am Main Germany

j

List of Abbreviations

Ab 9-AA ABC AcP ACRAMTU ACS ADME-T A-E AFM ahaz AIR AKT Amp AO APHH APSY ATP Atx1 AUIM ben BEST bip BMRB BP BPTI bpy bR BRCT BSA BSA BSE BURP BUSI 1C3 CA CA150 Cam cAMP CAP CAP

amyloid beta 9-aminoacridine ATP-binding cassette acetyl phosphate acridinylthiourea automatic sample changer adsorption, distribution, metabolism, excretion, and toxicity alanine-glutamic acid atomic force microscopy 3-aminohexahydroazepine ambiguous interaction restraint serine/threonine protein kinase ampicillin acridine orange adiabatic passage through the Hartman-Hahn condition automated projection spectroscopy adenosine triphosphate antioxidant protein 1 ataxin 3 ubiquitin interacting motif benzene band-selective excitation short-transient biphenyl biological magnetic resonance data bank back-projection bovine pancreatic trypsin inhibitor bipyridine bacteriorhodopsin breast cancer 1 C-terminal buried surface area bovine serum albumin bovine spongiform encephalopathy band-selective uniform-response pure-phase proteinase inhibitor from bull seminal plasma 1-[(3-aminopropyl)amino]-anthracene-9,10-dione certiﬁcation authority coactivator of 150 kDa chloramphenicol cyclic adenosine monophosphate catabolite activator protein cAMP-binding protein

XXIX

XXX

j

List of Abbreviations

CASD CATH CBP Ccc2a CCPN CCR Cdk2 CE CECF CF CFCF chrysi CIDNP CIP1 CJD CLEANEX CMC COSY CP CPMG CPP CPU CREB CRINEPT CRIPT CS CsA CSA CSD CSI CSP CST CT CW CWLG CycA cym CYPA dap DARR DBD D-CF DCP DD DDM DEER DEPT DFS DFT DG dha DHFR DIDC dien DIPAP DIPSI DISCO

critical assessment of automatic structure determination class, architecture, topology, homologous superfamily CREB-binding protein domain a of Ca2+-sensitive cross-complementer2 collaborative computing project for NMR cross correlation rate cyclin-dependent kinase 2 computing element continuous exchange cell-free cell-free continuous ﬂow cell-free 5,6-chrysene quinone diimine chemically induced dynamic nuclear polarization cyclin-dependent kinase inhibitor 1 Creutzfeldt-Jacob disease clean chemical exchange spectroscopy critical micelle concentration correlation spectroscopy cross polarization Carr-Purcell-Meiboom-Gill cellular penetrating peptide central processing unit cAMP-response element-binding protein cross-correlated relaxation-enhanced polarization transfer cross-correlated relaxation-induced polarization transfer chemical shift cyclosporin A chemical shift anisotropy Cambridge structural database chemical shift index chemical shift perturbation chemical-shift tensor constant time continuous wave continuous wave Lee-Goldburg cyclin A p-cymene cyclophilin A 1,12-diazaperylene dipolar assisted rotational resonance DNA binding domain detergent based cell-free double cross polarization dipole-dipole n-dodecyl-b-D-maltoside double electron-electron resonance distortionless enhancement by polarization transfer dynamic frequency shift discrete Fourier transform distance geometry dihydroanthracene dihydrofolate reductase direct interpretation of dipolar couplings diethylenetriamine double in-phase/antiphase decoupling in the presence of scalar interactions differences and sums of traces within COSY spectra

List of Abbreviations

dmen DMF DMSO DNA DNP DO3A DOSY DOTA DOTP DP DPC DPP dppz dpzm DQ DQF DR DREAM DS3E dsDNA dsRBD3 dsRNA DSS DTPA DTPA-BMA DTT dx E2A EAS EBURP ECOSY EDTA EM EmrE en ENDOR EOM EPI EPR ERCC1 ERETIC EROS Eth et-NOESY EXAFS EXSY FABS FBDD FDM FID FKBP FlgM FRB FRET

N-dimethylethylenediamine N,N-dimethylformamide, N,N-dimethylmethanamide dimethylsulfoxide deoxyribonucleic acid dynamic nuclear polarization 1,4,7-tris(acetic acid)-1,4,7,10-tetraazacyclododecane diffusion ordered spectroscopy 1,4,7,10-tetraazacyclododecane-1,4,7,10-tetraacetic acid 1,4,7,10-tetraazacyclododecane-N,N0 ,N00 ,N000 -tetrakis(methylenephosphonic acid) discriminating power dodecyl-phosphocholine dipeptidyl peptidase dipyrido[3,2-a:20 ,30 -c]phenazine 4,40 -dipyrazolylmethane double quantum double quantum ﬁltered dummy residue dipolar recoupling enhanced by amplitude modulation double spin-state-selective excitation double-strand DNA double stranded RNA binding domain 3 double-strand RNA 2,2-Dimethyl-2-silapentane-5-sulfonic acid diethylene triamine pentaacetic acid, 2-[bis[2-[bis(carboxymethyl) amino]ethyl]amino]acetic acid diethylenetriamine pentaacetic acid bis(methylamide) dithiothreitol, (2S,3S)-1,4-bis-sulfanylbutane-2,3-diol doxorubicin glucose-speciﬁc enzyme IIA explicit averaged sum exciting band-selective uniform-response pure-phase exclusive correlation spectroscopy ethylenediaminetetraacetic acid, 2,20 ,200 ,2000 -(ethane-1,2-diyldinitrilo)tetraacetic acid electron microscopy Escherichia coli multidrug resistance protein ethylenediamine electron nuclear double resonance ensemble optimization method echo-planar imaging electron paramagnetic resonance excision repair cross complementing 1 electronic reference to access in vivo concentrations ensemble reﬁnement with orientational restraints ethidium bromide exchange transferred nuclear Overhauser effect spectroscopy extended X-ray absorption ﬁne structure exchange spectroscopy ﬂuorine atoms for biochemical screening fragment-based drug design ﬁlter diagonalization method free induction decay FK506-binding protein ﬂagellar anti-sigma factor FKBP-rapamycin-binding ﬂuorescent resonance energy transfer

j

XXXI

XXXII

j

List of Abbreviations

FF FFT FM fMD FP FT FTA FTIR GB1 GFP GFT GMP GP-AFC GPCR GTP GUI HAMP HCA HDX HEPES HET HETLOC HF HH HiPIP HIV HMBC HMQC HOESY HORROR HoxD9 HPLC HPr HR HRP1 HRS HSA HSQC HTS IDP IDR IEP ILOE ILV ILVA IMP INADEQUATE INEPT INPHARMA IPAP IPSL IPTG IRES ITAM

force ﬁeld fast Fourier transform feeding mixture free molecular dynamics ﬂuorescence polarization Fourier transform ﬂuid-turbulence adapted Fourier transform infrared spectroscopy immunoglobulin binding domain of protein G green ﬂuorescent protein G-matrix Fourier transform guanosine monophosphate Gly-Pro-7-amino-4-triﬂuoromethylcoumarin G protein-coupled receptor guanosine triphosphate graphical user interface histidine kinases, adenylyl cyclases, methyl-accepting chemotaxis proteins, and phosphatases hierarchical clustering analysis hydrogen-deuterium exchange 2-[4-(2-hydroxyethyl)piperazin-1-yl]ethanesulfonic acid, 2-[4-(2hydroxyethyl)-2,3,5,6-tetrahydropyrazin-1-yl]ethanesulfonate 2-hydroxyethanethiolato-2,20 ,200 -terpyridine heteronuclear long-range coupling high ﬁeld head-to-head high-potential iron-sulphur protein human immunodeﬁciency virus heteronuclear multiple bond coherence heteronuclear multiple quantum coherence heteronuclear Overhauser effect spectroscopy homonuclear rotary resonance homeobox protein D9 high-performance liquid chromatography histidine-containing phosphocarrier protein high resolution hetero ribonucleoprotein 1 hepatocyte growth factor-regulated tyrosine kinase substrate human serum albumin heteronuclear single quantum coherence high-throughput screening intrinsically disordered protein intrinsically disordered region isoelectric point interligand Overhauser effect Ile, Leu, Val Ile, Leu, Val, Ala integral membrane protein incredible natural abundance double quantum transfer experiment insensitive nuclei enhanced by polarization transfer interligand NOEs for pharmacophore mapping in-phase/antiphase 3-(2-iodoacetamido-proxyl) induced T7 promoter-lac operator (PT7/lacOp) internal ribosomal entry site immunoreceptor tyrosine-based activation motif

List of Abbreviations

ITC IUPAC IXL JS-ROESY Kan KcsA KH KH3 KID Kip1 k-NN KPi KSRP KTX LAB Lac LacR LBT LC L-CF LDA LED LG-CP LILBID LMPG LOGSY LPPG LRE LV MAP MAS MaxEnt MBP Mbp1 MD MDD MDL MFR Mia40 miRNA MIRROR MLEV MMP MO MP MPL MRI mRNA MRS MS MSA a-MSH MT-II MTSSL MW

isothermal titration calorimetry international union of pure and applied chemistry interstrand cross-link jump-symmetrized rotating frame Overhauser effect spectroscopy kanamycin Streptomyces lividans potassium channel K homology third KH domain kinase inducible domain kinase inhibitor 1 k-nearest-neighbour potassium phosphate KH-type splicing regulatory protein kaliotoxin laboratory frame lactose lactose repressor lanthanide binding tag liquid crystalline lipid based cell-free linear discriminant analysis light emitting diode Lee-Goldburg cross polarization laser induced liquid bead ion desorption 1-myristoyl-2-hydroxy-sn-glycero-3-[phosphorac-(1-glycerol)] ligand observed by gradient spectroscopy 1-palmitoyl-2-hydroxy-sn-glycerol-3-[phosphor-rac-(1-glycerol)] longitudinal relaxation enhancement lowest-value microtubule-associated protein magic angle spinning maximum entropy maltose binding protein Mlu1 cell cycle box binding protein molecular dynamics multidimensional decomposition molecular design limited molecular fragment replacement mitochondrial intermembrane space import and assembly 40 microRNA mixed rotational and rotary resonance Malcolm Levitts composite-pulse decoupling sequence matrix metalloproteinase maximum occurrence membrane protein mass-per-length magnetic resonance imaging messenger RNA magnetic resonance spectroscopy mass spectrometry mobility shift microﬂuid assay alpha-melanocyte stimulating hormone Ac-Nle-c[Asp-His-D-Phe-Arg-Trp-Lys]-NH2 1-oxy-2,2,5,5-tetramethyl-D3-pyrroline-3-methyl methanethiosulfonate molecular weight

j

XXXIII

XXXIV

j

List of Abbreviations

MWCO NMR NIH NN NOE NOESY NTA NTP NUS NusA NzExHSQC OAc Oct-1 OD ODMR OGT OPLS P21 P32/98 P53 PAGE PAIN-CP PAK PAP PAR PARIS PAS PC PCA PCCR P-CF PCR PCS PDB PDMS PDSD PECOSY PEG PFG phen phi PHIP PH-PDMAA PI3 PI3-SH3 PIE PISEMA PKA PLS PME PMSF POP POST-C7 ppm pqx PRDC PRE

molecular weight cutoff nuclear magnetic resonance National Institutes of Health neural network nuclear Overhauser effect nuclear Overhauser effect spectroscopy nitrilotriacetic acid nucleotide triphosphate non-uniformly sampled N utilization substance protein A Nz-chemical exchange heteronuclear single-quantum coherence acetate octamer 1 optical density optically detected magnetic resonance optimal growth temperature optimized potentials for liquid simulations protein 21 (2S,3S)-2-amino-3-methyl-1-(1,3-thiazolidin-3yl)pentan-1-one protein 53 polyacrylamide gel electrophoresis proton assisted insensitive nuclei cross polarization p21-activating kinase protein poly(A) polymerase proton assisted recoupling phase-alternated recoupling irradiation scheme principal axes system principal component principal component analysis paramagnetic cross correlation rate precipitate generating cell-free polymerase chain reaction pseudocontact shift protein data bank poly(dimethylsiloxane) proton driven spin diffusion primitive exclusive correlation spectroscopy polyethylene glycol polyﬂuorinated glycine phenanthroline phenanthrenequinone diimine para-hydrogen induced polarization dimethylacrylamide copolymer phosphoinositide 3 SH3 domain of the PI3 kinase polyadenylation inhibition element polarization inversion spin-exchange at the magic angle protein kinase A partial least square particle mesh Ewald phenylmethanesulfonylﬂuoride prolyl oligopeptidase permutationally offset stabilized C7 parts per million 2-(20 -pyridyl)quinoxaline paramagnetism-based residual dipolar coupling paramagnetic relaxation enhancement

List of Abbreviations

PrPSc PSF PTM PULCON py PyAc 2/3QF-COSY RAD RCSA RDC REBURP REDOR RF RFDR RISC RM rMD RMS RMSD ROCSA rOCT ROE ROESY ROG RPF rpm RNA RRE RRM R, R-Me2trien rRNA RT RXR S/N S3E SA SAG SAIL SAR SANS SAS SAXS Sco SCOP SCRM SD SDS SDSL SDS-PAGE SIA SE SE-DIPAP SH3 SI siRNA SLAPSTIC

misfolded prion protein (Sc for scrapie) point spread function post-translational modiﬁcation pulse length based concentration pyrimidine pyridine-2-yl acetate double/triple quantum ﬁltered correlated spectroscopy RF assisted diffusion residual chemical shift anisotropy residual dipolar coupling refocusing exciting band-selective uniform-response pure-phase rotational echo double resonance radiofrequency radiofrequency driven recoupling RNA-induced silencing complex reaction mixture restrained molecular dynamics root mean square root mean square deviation recoupling of chemical shift anisotropy rat organic cation transporter rotating frame Overhauser effect rotational nuclear Overhauser effect spectroscopy red-orange-green recall, precision and F-measure revolutions per minute ribonucleic acid Rev response element RNA recognition motif 2R, 9R-diamino-4,7-diazadecane ribosomal ribonucleic acid real time retinoic X receptor signal-to-noise spin-state-selective excitation simulated annealing strain-induced alignment in a gel stereo-array isotope labeling structure activity relationship small angle neutron scattering small angle scattering small angle X-ray scattering synthesis of cytochrome c oxidase structural classiﬁcation of proteins self-consistent RDC-based model-free standard deviation sodium dodecyl sulfate site directed spin labeling sodium dodecyl sulphate - polyacrylamide gel electrophoresis scaffold independent analysis storage element sensitivity enhanced DIPAP src-homology domain 3 international system of units small interfering RNA spin labels attached to the protein side chains as a tool to identify interacting compounds

j

XXXV

XXXVI

j

List of Abbreviations

SMILI-NMR snoRNA SOD SOFAST SOM SOS SPAM SPC5 SPECIFIC-CP SPINAL SPITZE-HSQC SPP SPR sPRE SRII SS ssNMR ssDNA ssRNA ST STAM STD Ste5 STINT-NMR Stm Sup35 SVD SVM SW TAR TCEP TCM TEDOR TehA TEMPO TEMPOL terpy tha TIM TINS TIP TLC T-MREV TM tmen TMS TOCSY TPPI TPPM TrII TRIS tRNA

screening small molecule interactor library using in-cell NMR small nucleolar RNA superoxide dismutase selective optimized-ﬂip-angle short-transient self organising map structural information using Overhauser effects and selective labelling solubility, purity and aggregation of the molecule supercycled post-C5 spectrally induced ﬁltering in combination with cross polarization small phase incremental alteration spin state selective zero overlap HSQC single protein production surface plasmon resonance solvent paramagnetic relaxation enhancement sensor rhodopsin II solid state solid state nuclear magnetic resonance single-strand DNA single-strand RNA single transition signal-transducing adapter molecule saturation transfer difference sterile 5 structural interactions using in-cell NMR streptomycin nonsense suppressor 35 singular value decomposition support vector machine spectral width trans-activation reponse element tris(2-carboxyethyl)phosphine, 2,20 ,200 -Phosphanetriyltripropanoic acid traditional chinese medicine transferred echo double resonance potassium-tellurite ethidium and proﬂavin transporter 2,2,6,6-tetramethylpiperidine 1-oxyl, 2,3,4,6-tetramethylpiperidine-1-oxyl 4-hydroxy-2,2,6,6-tetramethylpiperidine 1-oxyl terpyridine tetrahydroanthracene triosephosphate isomerase target immobilized NMR screening transferable intermolecular potential thin layer chromatography transverse Manﬁeld-Rhim-Elleman-Vaughn transmembrane N,N,N-trimethylethylenediamine transmembrane segment total correlation spectroscopy time-proportional phase incrementation two-phase phase-modulated transducer II tris(hydroxymethyl)aminomethane, 2-amino-2-hydroxymethylpropane-1,3-diol transfer RNA

List of Abbreviations

trNOE TROSY TSAR TSP U2AF65 UIM uHTS upl Ure2 UTR UV VEAN VHS VO WaterLOGSY WAXS WT XPF YPD ZiaA ZGPF ZPP ZQ

transferred nuclear Overhauser effect transverse relaxation optimized spectroscopy third spin assisted recoupling 2,2,3,3,-D4-trimethylsilyl propinate U2 small nuclear ribonucleoprotein particle auxiliary factor ubiquitin interacting motif ultra high throughput screening upper distance limit urea uptake protein 2 untranslated region ultraviolet vector angles from dipolar tensor Vps27p, HRS and STAM virtual organization water-ligand observed by gradient spectroscopy wide angle X-ray scattering wild type xeroderma pigmentosum group F yeast extract peptone dextrose zinc ATPase Z-Gly-Pro-Phe Z-prolyl-prolinal zero quantum

Amino Acids Ala

A

alanine

Arg

R

arginine

Asn

N

asparagine

Asp

D

aspartic acid

Cys

C

cysteine

Glu

E

glutamic acid

Gln

Q

glutamine

Gly

G

glycine

His

H

histidine

Ile

I

isoleucine

Leu

L

leucine

Lys

K

lysine

Met

M

methionine

Phe

F

phenylalanine

Pro

P

proline

Ser

S

serine

Thr

T

threonine

Trp

W

tryptophan

Tyr

Y

tyrosine

Val

V

valine

j

XXXVII

XXXVIII

j

List of Abbreviations

Nucleobases A G C U T

adenosine-50 -phosphate guanosine-50 -phosphate nucleotide-50 -phosphate uridine-50 -phosphate thymidine-50 -phosphate

A lower case d in front of the name means that it is a deoxyribonucleotide (otherwise, it is a ribonucleotide). A lower case c placed in front of some nucleotides means cyclic. CpG C-phosphate-G (a phosphate links the two nucleosides together)

j

Part One Introduction

NMR of Biomolecules: Towards Mechanistic Systems Biology, First Edition. Edited by Ivano Bertini, Kathleen S. McGreevy, and Giacomo Parigi. Ó 2012 Wiley-VCH Verlag GmbH & Co. KGaA. Published 2012 by Wiley-VCH Verlag GmbH & Co. KGaA.

1

j

1 NMR and its Place in Mechanistic Systems Biology Ivano Bertini, Kathleen S. McGreevy, and Giacomo Parigi

NMR can be used to solve the structure of biomolecules and to show how they interact. Interactions among biomolecules and between biomolecules and small ligands can be studied from the thermodynamic, structural, and kinetic points of view. Through metabolomic studies, NMR can monitor the whole metabolic process. This provides a way of understanding the mechanisms of life at the molecular level and of modeling them. We can refer to these achievements as the mechanistic contribution to Systems Biology or Mechanistic Systems Biology.

As we begin our exploration of the possibilities afforded by the application of NMR to the study of biomolecules and the resulting contribution to Mechanistic Systems Biology, we should ﬁrst try to deﬁne the term Systems Biology itself. This is an old designation that has traditionally referred to the classical biology of a system, but which acquires a new meaning as more and more biological pathways (i.e., those series of molecular interactions in a cell that lead to certain products or changes) are discovered [1–3]. These pathways cross and intersect and, as a result, the overexpression, under-expression, or inhibition of an individual gene product leads to consequences in all of the pathways in which that product is involved. A full understanding of the biology of a system therefore requires knowledge of all of the pathways that compose that system. We should then try to understand what the latter phrase really means. What constitutes knowledge of a pathway? In many bioengineering departments throughout the Anglo-Saxon world, it is a knowledge of kinetic parameters – how fast the reactions in a given pathway will proceed – that allows processes and alterations to be mathematically simulated [4,5]. In fact, one understanding of Systems Biology is the computational simulation of biological processes. However, while living organisms are not at the thermodynamic equilibrium, a knowledge of the thermodynamic parameters – the relative stabilities of the reactants and products – involved is still of primary importance. NMR can provide both kinetic and thermodynamic parameters, or at least their upper and lower limits, but most importantly, at present, NMR can provide structural information on all of the biomolecules in an interacting group in addition to a structural characterization of the interaction itself. The progression of biochemical pathways in terms of the mechanisms of the interactions along paths can therefore be understood at the atomic level; it is from this that the name Mechanistic Systems Biology originates. As it happened, we became aware of this term in 2008 when Antonio Rosato and Ivano Bertini were explaining their contribution to Systems Biology from a structural point of view to Shankar Subramaniam, who was the Director of the Biomedical Engineering Department at the University of San Diego; he informed them that this is, in fact, called Mechanistic Systems Biology.

NMR of Biomolecules: Towards Mechanistic Systems Biology, First Edition. Edited by Ivano Bertini, Kathleen S. McGreevy, and Giacomo Parigi. Ó 2012 Wiley-VCH Verlag GmbH & Co. KGaA. Published 2012 by Wiley-VCH Verlag GmbH & Co. KGaA.

3

4

j

1 NMR and its Place in Mechanistic Systems Biology

Fig. 1.1 Detection of the conformations experienced by proteins in different conditions and of their adducts allows the reconstruction of the mechanisms occurring in cells. As an example, we show here the folding process of the mitochondrial protein Cox17 induced by Mia40 [7]. Cox17 is a fully unfolded protein in cytoplasm, so that it has the conformational plasticity to enter the intermembrane space of mitochondria through a translocase of the outer membrane channel. A transient complex is formed here with the Mia40 protein, thanks to the interaction of the hydrophobic residues of Mia40 (in red) and Cox17 (in blue). Mia40 induces the formation of an a-helix on a specific region of Cox17 and of an intermolecular disulfide bond (Cys residues are shown in yellow). The next step consists of switching of the intermolecular disulfide bond to the first intramolecular disulfide bond within Cox17, so that Mia40 is released. Finally, the formation of the second a-helix is induced by the interactions of its hydrophobic residues with the first helix and the second intramolecular disulfide bond is formed. Once the two disulfide bonds are formed, the protein, now folded, is trapped in the intermembrane space.

Mechanistic Systems Biology has the potential to become an important ﬁeld within biology. With the term mechanistic, we can imagine a play whose plot is represented by a biochemical pathway. From our seats in the audience, we can see the threedimensional structures of the biomolecular actors as they interact with one another in various scenes (Figure 1.1). In real terms, we can achieve an atomic-level, threedimensional view of proteins interacting with one another or with DNA/RNA to achieve a biochemical result. Observing three-dimensional structures allows one to better understand the biomolecule, and provides a further means to intervene and monitor its function. The goals and challenges of Systems Biology remain the same. Mechanistic Systems Biology will provide the basis for molecular pharmacology, where drugs can be described as inhibitors or activators of a pathway and the sideeffects due to the consequences on other pathways can be predicted. NMR has a primary role within this framework. It can provide a biomolecules structure and can monitor its interactions in a mechanistic frame. Both thermodynamic and kinetic information on the interactions between biomolecules can be obtained, thus providing a powerful tool for the modeling of the system. NMR is especially important when the interactions are weak and the interacting species are in fast exchange. The interacting molecules can be observed and it is often possible to intervene in the interaction. Furthermore, NMR is not yet a mature science – it must still develop in the ﬁeld of membrane proteins and immobilized (though not necessarily crystalline) forms, and must extend its investigative power to larger biomolecules. NMR is a technique capable of monitoring single nuclei. The initial information comes from the chemical shift – the resonance frequency when the substance and the nucleus are in a magnetic ﬁeld. The chemical shift tells us about the surroundings of the atom whose nucleus is observed. Typically, in a protein, the chemical shift of a 13 C backbone nucleus provides information on its secondary structure (see Chapter 2.1). A closer look at the spectra tells us which other atoms/nuclei are bound through chemical bonds. Another type of experiment provides information on the distances of nearby nuclei. All these pieces of information may provide a structural model through dedicated computer software. Another nuclear parameter that should be mentioned is the rate of return to equilibrium of an ensemble of nuclear spins after a perturbation with a radiofrequency pulse. These rates provide precious information on the mobility of atoms.

1 NMR and its Place in Mechanistic Systems Biology

Magnetic nuclei are needed to perform NMR experiments; if an atom has multiple isotopes, the one that provides the sharpest signal is preferred. The naturally most abundant 12 C has no magnetic moment and is silent in NMR experiments. 14 N is magnetic, but not good, while 15 N is. Nowadays procedures are well-developed to produce biomolecules that are 100% labeled with 13 C and 15 N. 1 H is a good nucleus, as is 31 P. Structural and dynamic information represents the strength of NMR: given a biomolecule, the average structure in solution can be obtained, dynamics of nuclei can be monitored, and conformational equilibria can be followed. Unfortunately, there is a limit – the size. The larger the protein, the more crowded and less informative the spectrum. The NMR lines are also broader. More intense magnetic ﬁelds provide better resolution, which is why there is a drive to build higher and higher magnetic ﬁelds, from the 2.2 T of 1970 up to 22 T in 2010 with the prospect of 25 T in 2015. In parallel, probe technology has also been developed up to the modern cryoprobes able to increase the signal-to-noise by a factor of 4 with respect to the traditional, noncryogenically cooled probes. Proteins or domains up to 35 kDa can currently be determined routinely, with 70 kDa as an upper limit [6]. Partial information can be obtained up to 1 MDa. It is not only the developments in NMR instrumentation that are advancing the frontiers of NMR investigations in the Life Sciences. Simultaneous advances in labeling strategies make increasingly complex experiments feasible, thus improving the possibilities of understanding the structural and dynamic features of biosystems. The development of new methodologies such as in-cell NMR, which is attracting a signiﬁcant amount of attention due to its unique role in providing information directly in cells, cell-free expression, and advances in solid-state NMR applied to biomolecules, allowing the study of immobilized proteins such as membrane proteins and ﬁbrils, provide plenty of new opportunities. NMR is now tackling the Systems Biology approach with a new technology called metabolomics that provides information on metabolites in biological ﬂuids, tissues, and bacterial cultures, thus providing an independent source of data for Systems Biology itself. Furthermore, the information technology approach, in our view, is closely bound to the development of NMR; software programs for the analysis of the NMR data are continuously upgraded and interconnected to keep pace with the methodological advances and for satisfying the demand for fast (and accurate) analysis of increasing amounts of data. The contribution of NMR to Mechanistic Systems Biology is thus foreseen as being of increasing importance.

j

5

j

2 Structure of Biomolecules: Fundamentals 2.1 Structural Features of Proteins Lucia Banci and Francesca Cantini

The structural organization of proteins follows a hierarchical order: from the speciﬁc order of amino acids forming the polypeptide chain (primary sequence), to local organization of polypeptide stretches (secondary structure elements), followed by the overall spatial, three-dimensional protein organization (tertiary structure), and ﬁnally the interactions between various folded chains (quaternary structure). The three-dimensional spatial arrangement adopted by the protein is deﬁned by three dihedral angles, v (omega), w (phi), and y (psi), which describe the rotation of the backbone atoms about the C–N0 , N–Ca, and Ca–C bonds, respectively, and determine the relative orientation of the groups forming the protein backbone. The allowed w and y angles that the backbone of a polypeptide chain can adopt are limited within speciﬁc ranges summarized by the Ramachandran plot. Residues located in different types of secondary structure have different w/y combinations and hydrogen bond patterns between the carbonyl oxygen and the amide hydrogen of the amino acids. The most common types of secondary structure elements are helices, parallel and antiparallel pleated b-sheets, and b-turns. They can also combine forming structural motifs that constitute portions of the overall tertiary structure and are often related to speciﬁc functional features. Different combinations of secondary structure elements and motifs form structural domains. The latter are the fundamental units of the tertiary structure, and comprise parts of the polypeptide chain that are structurally independent and can fold separately. Each structural domain can be classiﬁed according to its content of a-helices and b-sheets, and their relative location. At this simplest level of classiﬁcation, the types of folds are grouped into four main classes: a, b, a/b, and a þ b classes. A ﬁfth class is formed by protein domains that have little if any secondary structure. Many proteins can assemble into multimeric proteins composed of two or more monomers, deﬁning a quaternary structure.

2.1.1 Introduction: From Primary to Quaternary Structure

Protein three-dimensional structures form when nascent polypeptide chains fold to acquire speciﬁc, compact, and deﬁned spatial conformations. The amino acid sequence constitutes the protein primary structure of the polypeptide chain, held together by the formation of a peptide bond between the carbonyl group of one amino acid and the amino group of the next. The carbonyl and the amide groups, together with the Ca–H group of each amino acid, constitute the regularly repeating unit of the polypeptide, which is called the protein backbone.

NMR of Biomolecules: Towards Mechanistic Systems Biology, First Edition. Edited by Ivano Bertini, Kathleen S. McGreevy, and Giacomo Parigi. Ó 2012 Wiley-VCH Verlag GmbH & Co. KGaA. Published 2012 by Wiley-VCH Verlag GmbH & Co. KGaA.

7

8

j

2.1 Structural Features of Proteins

primary structure H3

F Q

N+

D I

K

H N A T

S V E R N Y

COO-

secondary structure

tertiary structure

Segments of the polypeptide chain can take ordered, regular conformations characterized by well-deﬁned dihedral angles between the groups of the backbone. These locally organized arrangements typically involve short (5–20 residues) segments and are termed secondary structure elements. Common secondary structure elements include a-helices, b-strands, b-sheets, and b- and c-turns. These structural organizations are stabilized by repeating patterns of hydrogen bonds between backbone carbonyl oxygens and amino groups. Elements of secondary structure pack together to form the tertiary structure, which deﬁnes the three-dimensional arrangement of the protein amino acids. Therefore, in folded proteins, residues that are distal in the primary sequence can be brought into close proximity. Proteins are often multimeric in their functional state, where multiple independently folded polypeptide chains interact with each other to form a supramolecular protein complex. This arrangement is called quaternary structure. Multimeric proteins can be composed of copies of a single folded polypeptide (e.g., a homodimer, such as the two identical subunits of the Cu,Zn superoxide dismutase 1 (SOD1)) or can contain two or more different polypeptides (such as heterodimers or heterotetramers, like the two a- and two b-chains of hemoglobin). The structural organization of proteins therefore follows a hierarchical order: from the speciﬁc order of amino acids forming the polypeptide chain (which constitutes the primary sequence or structure), to local organization of polypeptide stretches (deﬁning the secondary structure elements), followed by the overall spatial, threedimensional protein organization (thus determining the tertiary structure), and ﬁnally the interactions between various folded chains (which form the quaternary structure) (Figure 2.1.1). In this chapter, we review some of the overall properties of the structural organization of proteins that will be, in the following chapters, correlated to the NMR parameters.

2.1.2 Geometrical and Conformational Properties

quaternary structure

Fig. 2.1.1 Levels of structural organization in proteins: from primary (i.e., the amino acid sequence) to secondary structure elements (an ahelix and an antiparallel three b-stranded b-sheet are shown here) to tertiary and quaternary structure. Tertiary and quaternary structures are represented by the monomer and the dimer of the human Cu,Zn superoxide dismutase (PDB ID: 1L3N), respectively.

2.1.2.1 Backbone Dihedral Angles

A dihedral angle is deﬁned as the angle between two planes and therefore involves three chemical bonds between four atoms. The line of intersection between the two planes is deﬁned by the bond between the second and third atom of the set. A dihedral angle can also be deﬁned as the angle at which one plane needs to be rotated about the line of intersection in order to align with the other plane. The three-dimensional spatial arrangement adopted by the protein backbone is determined by the relative orientation of the groups forming the protein backbone, which can be deﬁned by three dihedral or torsion angles, v (omega), w (phi), and y (psi), which describe the rotation of the backbone atoms about each of the three bonds in the basic repeating unit of the polypeptide chain (Figure 2.1.2a). The v angle is deﬁned by the rotation of the carbonyl carbon–amide nitrogen bond and involves the Ca–C–N0 –Ca0 atoms. The partial double character characterizing the central peptide bond makes it planar and restricts the v angle to be 180 in the common trans peptide bonds or 0 in the rare cis peptide bond. Consequently, the atoms Ca–C–N0 –Ca0 are conﬁned to be in a plane in which Ca and Ca0 are in either a trans or a cis orientation with respect to the peptide bond. Trans peptide bonds are favored in the majority of cases because of fewer steric clashes between the side-chain atoms of neighboring residues. Experimentally, in the majority of cases, when a cis peptide bond is present, a Pro is the amino acid on the Cterminal side (i.e., X–Pro, where X is any amino acid) [1]. The w angle designates the rotation about the bond connecting an amide nitrogen and the Ca of the same amino acid, and involves the C–N0 –Ca0 –C0 atoms (Figure 2.1.2a). The y angle designates the rotation between Ca and the adjacent carbonyl atom, thus involving the N0 –Ca0 –C0 –N00

2.1.3 Secondary Structure Elements in Proteins

Cα''

C'

O

ψ φ H

H Cα' R N'

ω C O Cα

(b) 180 Antiparallel β−sheet

Psi (degrees)

135

The most common types of local backbone structural organization are helices and parallel and antiparallel pleated b-sheets. In helices, the backbone polypeptide chain wraps up in a spiral-like arrangement, where the different number of residues per turn and the dihedral angle values deﬁne the type of helix (Table 2.1.1). The most common helix is the right-handed a-helix, which occurs when a stretch of consecutive residues have w and y dihedral angles of approximately 62 and 41 , respectively [4,5]. These values fall in the bottom left quadrant of the Ramachandran plot (Figure 2.1.2b). The a-helix has 3.6 residues per turn, which corresponds to a rise of 5.4 A (a rise of about 1.5 A per residue)

H

N''

2.1.2.2 Side-Chain Dihedral Angles

2.1.3 Secondary Structure Elements in Proteins

9

(a)

atoms (Figure 2.1.2a). As a consequence of their deﬁnition the values of the w and y angles determine the distance between the ﬁrst and fourth backbone atoms of the set deﬁning that dihedral angle (i.e., the backbone C–C0 and N0 –N00 distances, respectively). Steric effects between the main-chain and the side-chain atoms limit the rotation about the single bonds of the protein backbone, thus limiting the w and y angles that the backbone of a polypeptide chain can adopt. Ramachandran [2] summarized the range of values that can be adopted by these angles in a diagram called the Ramachandran plot, where the allowed and disallowed combinations of the w and y dihedral angle values are shown (Figure 2.1.2b). Experimentally, in folded proteins, it can be observed that the majority of the w and y dihedral angle pairs are clustered in the sterically allowed regions, with one important exception represented by the Gly residue. Indeed, the latter lacks the Cb group and therefore no steric collision with a side-chain is present, thus making the number of combinations/pairs for w and y angles higher than for other amino acids.

Rotation around the bonds between the side-chain carbon atoms can in principal deﬁne several different conformations for any side-chain longer than Ala. Actually, the staggered conformations are the most energetically favored and therefore the dihedral angles along the side-chain carbon–carbon bonds tend to assume values close to 60 , 180 , or 300 [3]. These dihedral angles are indicated as x1–x5, with the number increasing by moving from the Ca along the side-chain (Figure 2.1.3a). Consequently, the dihedral angle deﬁned by the atoms N–Ca–Cb–Cc and corresponding to rotation around the Ca–Cb bond is named x1. The stereochemistry about this bond is determined by the polypeptide backbone rather than by the rest of the sidechain. This can be exempliﬁed for a Val residue where, in each of the three staggered conformations, one of the three substituents on the Ca atom must reside between the two methyl groups, which is the most hindered position (Figure 2.1.3b). As the smallest functional group (i.e., the hydrogen atom) occupies this position most frequently, the distribution of the dihedral angle x1 values for Val is such that it assumes values around 180 in the majority of the cases, values around 300 less frequently, and values around 60 very rarely. The amino acids with only one carbon atom bound to Cb, in addition to the Ca, such as the Leu residue, have a x1 value around 300 in more than half of the cases. In this way the bulk substituent is placed between the two least bulky atoms bound to Ca. This holds for the Ile residue as well (Figure 2.1.3c). The dihedral angle x2 is deﬁned by the atoms Ca–Cb–Cc–Cd, and by the rotation of the bond between Cb and Cc (Figure 2.1.3a). In amino acids like Gln, Lys, Arg, and Met, x2 takes, in the majority of cases, a value around 180 , while in aromatic amino acids, the most common value for x2 is 90 , so that the plane of the ring is perpendicular to the bond between the Ca and Cb atoms.

j

Parallel β-sheet

90 Left-handed α−helix

45 0

310 helix Right-handed α helix

-45

π helix

-90

Region occupied mainly by glycine residues

-135 -180 -135 -90 -45

0

45

90 135 180

Phi (degrees) Fig. 2.1.2 (a) Scheme of the backbone dihedral angles v, w, and y. The atoms Ca–C–N0 –Ca0 and Ca0 – C0 –N00 –Ca00 are confined to be in a plane (peptide plane). The atoms bound to Ca and Ca00 (Ha and sidechain) are not shown for clarity. (b) Ramachandran plot. The most favored, the allowed, and the generously allowed regions are shown in red, yellow, and beige, respectively. The sterically disallowed conformations (atoms come closer than the sum of the van der Waals radii) are in white. The graph has been generated with the PROCHECK program [46].

10

j

2.1 Structural Features of Proteins

Fig. 2.1.3 Side-chain dihedral angles are named x1 (defined by the atoms N–Ca–Cb–Cc), x2 (defined by the atoms Ca–Cb–Cc–Cd), x3 (defined by the atoms Cb–Cc–Cd–Ce), and so on. (a) Side-chain dihedral angles for a Lys residue. Schematic representation of the side-chain dihedral angle x1 for Val (b) and Ile (c) residues. For Val the staggered conformation with x1 ¼ 180 is the sterically and energetically more favored because the two methyl groups bound to Cb are both close to the small Ha atom, while for Ile the most favored conformation is the one with x1 ¼ 300 .

and a radius of about 2.3 A, not taking into account the side-chains [6]. An a-helix has 13 atoms per turn and is also known as a 3.613-helix. This secondary structure arrangement is stabilized by regular patterns of hydrogen bonds between the carbonyl oxygen of each amino acid (n) with the amide hydrogen of the amino acid located four positions further in the polypeptide chain (n þ 4). Thus, all NH and CO groups are involved in hydrogen bonds, with the exception of the initial NH and the ﬁnal CO groups at the two ends of the helix (Figure 2.1.4). All the hydrogen bonds in an a-helix are aligned along the same direction as the peptide units forming them are aligned along the helical axis. Each peptide unit has a dipole moment arising from the different polarity of the NH and CO groups that is also parallel to the helical axis with the positive pole towards the N-terminal end of the a-helix and the negative pole toward the C-terminal end. Such a parallel arrangement of dipoles creates a macrodipole moment of the a-helix, which is positive at the N-terminus and negative at the C-terminus [7]. The net electrostatic effect is mainly due to the four NH groups at the N-terminus and to the four CO groups at the C-terminus that are not engaged in hydrogen bonds. The helix dipole determines electrostatic interactions that favor the antiparallel alignment of a-helices in proteins. Hydrogen bond partners of the four initial NH and four ﬁnal CO groups are often provided by polar groups that ﬂank the two helix termini [8]. Helix boundary residues (the ﬁrst and last helical residues) at the N- and C-terminus are called Ncap and Ccap, respectively, and the two pairs of four terminal residues lacking backbone hydrogen bond partners are sometimes named Ncap–N1–N2–N3 and C3–C2–C1–Ccap. There is a striking propensity for certain amino acids to be present in these positions. Gly, Asn, Asp, or Ser are the residues more frequently found as Ncaps as their short polar side-chains can form hydrogen bonds with the NH groups of the second and/or third helical residue (N2 and N3 respectively) [9]. A special form of Ncap, called the capping box, is constituted by a Ser–X–X–Glu motif able to form a reciprocal hydrogen bonding motif. Here, the hydroxyl oxygen of Ser forms a standard Ncap hydrogen bond with the amide of the Glu (located at the N3 position) and the side-chain carbonyl oxygen of Glu (N3) forms a hydrogen bond with the amide of N-cap, thus engaging two polar groups out of the four terminal ones in hydrogen bonds [9,10]. At the C-terminus of a-helices, a Gly-based capping motif is present in the majority of the cases. The most common location of a helix in a protein structure is on its surface, with one side facing the protein interior and the other exposed to the solvent. This property of the a-helix is a consequence of an asymmetrical location of hydrophobic and polar residues, which can be visualized by drawing the helical wheel. Since one turn in an

Table 2.1.1 Mean backbone dihedral angle values and patterns of hydrogen bonds for the various types of secondary structure elements

(for helices, the number of residues per turn and the rise per residue are also reported). Secondary structure elements

w (deg)

y (deg)

Hydrogen bonds between NH and CO groups

Residue per turn

Rise per residue (A)

Right-handed a-helix 310-Helix p-Helix Left-handed a-helix Parallel b-strands Antiparallel b-strands b-Turna type I

62 49 57 62 119 139 64 90 61 82 58 64 75

41 18 70 41 þ 115 þ 135 19 2 132 3 33 26 64

nCO, n þ 4NH nCO, n þ 3 NH nCO, n þ 5 NH nCO, n þ 4 NH iNH, j þ 1CO; j þ 1NH, i þ 2CO iNH, jCO; jNH, iCO iCO, i þ 3NH

3.6 3.0 4.4 3.6 — — — — — — — — —

1.5 1.7 1.1 1.5 — — — — — — — — —

b-Turna) type II b-Turna) type III c-Turn a)

i i i i i i i

þ þ þ þ þ þ þ

1 2 1 2 1 2 1

Mean values of the dihedral angles for amino acids i þ 1 and i þ 2 in b-turns.

iCO, i þ 3NH iCO, i þ 3NH iCO, i þ 2NH

2.1.3 Secondary Structure Elements in Proteins

a-helix is 3.6 residues long, each residue can be plotted every 100 (¼360 /3.6) around a spiral, showing its projection onto a plane perpendicular to the helical axis. The wheel representation is useful for detecting possible amphipathic helices – helices with one hydrophilic and one hydrophobic side. This is found in many helices contained in globular proteins and this feature can play a crucial role in helix–helix interactions or it can be relevant for the formation of pores in membranes, as amphipathic helices can interact with each other and align together in order to span through membranes. Other types of helices have been found in protein structures where the polypeptide chain is either more tightly or more loosely coiled with respect to the right-handed a-helix. These helices are called the 310-helix and p-helix, respectively. The 310-helix has three residues per turn, thus placing an amino acid every 120 along the helix, and a rise of 1.7 A per residue. There are 10 atoms per turn and the hydrogen bonds are between residues n and n þ 3. The w and y dihedral angles of the 310-helix (mean values around 49 and 18 , respectively, Table 2.1.1) lie at the edge of an allowed, minimum energy region of the Ramachandran plot (Figure 2.1.2b). The p-helix is also known as the 4.416-helix as there are 4.4 residues per turn, with hydrogen bonds between residues n and n þ 5, thus having 16 atoms between each hydrogen bond donor and acceptor. This provides a rise of 1.1 A per residue. The w and y dihedral angles of the p-helix (mean values around 57 and 70 , respectively, Table 2.1.1) lie at the very edge of an allowed, minimum energy region of the Ramachandran plot (Figure 2.1.2b). The core of the p-helix is almost perfectly square and has an obviously larger radius with respect to that of the a-helix, thus making the polypeptide backbone no longer in van der Waals contact across the helical axis, but forming an axial hole still too small for solvent water to ﬁll. On the contrary, the 310helix is thinner and rises more steeply than the a-helix, and van der Waals contacts across the helical axis may cause repulsion. These features explain the rarity of both pand 310-helices; they are indeed not energetically favorable since the backbone atoms are either too tightly packed (in the 310-helix) or too loosely packed (in the p-helix). They usually occur only at the end of the a-helix and involve only three or four residues. An a-helix can in principle also be left-handed with w and y dihedral angles of approximately þ 62 and þ 41 (Table 2.1.1), which would place the residues in the slightly forbidden zone in the upper right quadrant of the Ramachandran plot (Figure 2.1.2b). However, this kind of helix is not allowed for the natural L-amino acids due to the steric hindrance between the side-chain and the backbone carbonyl; few examples of left-handed a-helices are therefore observed [11,12]. The other secondary structural elements found in proteins are b-strands, which are hydrogen-bonded to each other to form b-sheets. A b-strand is constituted by a polypeptide stretch adopting an extended conformation in such a way that the residue side-chains are perpendicular to the peptide backbone plane, and alternately point to one side and then to the other. The average values for the dihedral angles w and y of residues in b-strands are about 129 for w and þ 120 for y (Table 2.1.1), which lie within the largest allowed region of the Ramachandran plot (upper left quadrant, Figure 2.1.2b). In reality, b-strands are not fully extended, but have a pleated conformation as a consequence of the tetrahedral geometry of the Ca atom. The full extension of a strand is also prevented by the chirality of the side-chain itself, thus imposing some level of twist to the strand. At variance with a-helices, a b-sheet involves different regions of the polypeptide chain; in other words, b-strands in different segments of the polypeptide chain interact with each other and form pleated sheets, bringing them close to one another in space. The b-strands of a b-sheet can either be parallel or antiparallel, or a combination of these two arrangements. Like helices, b-sheets are also stabilized by regular patterns of hydrogen bonds. In pleated sheets of parallel b-strands, the amide proton and the carbonyl oxygen of an amino acid of one b-strand are hydrogen-bonded to the carbonyl oxygen and the amide proton, respectively, of two different amino acids on the other b-strand two positions apart from each other in the sequence. In pleated sheets of antiparallel

j

11

Fig. 2.1.4 Side view of a right-handed a-helix (a). In the top and bottom views of the same a-helix, the four non-hydrogen-bonded carbonyl groups at the Cterminus (b) and the four non-hydrogen-bonded nitrogen groups at the N-terminus (c) are pointing upwards towards the viewer. The side-chain atoms are not shown for clarity. The residues are represented as balls and sticks, and the hydrogen bonds (i.e., Oi and Ni þ 4) are shown as dashed black lines.

12

j

2.1 Structural Features of Proteins

Fig. 2.1.5 Parallel (a) and antiparallel (b) three b-stranded b-sheets. The side-chain atoms are not shown for clarity. The residues are represented as balls and sticks, and the hydrogen bonds are shown as dashed black lines.

b-strands, the amide proton and the carbonyl oxygen of an amino acid of one b-strand form hydrogen bonds with the carbonyl oxygen and the amide proton, respectively, of the same amino acid located on the other b-strand. This hydrogen bond pattern forms rings of 14 atoms that alternate with rings of 10 atoms. Pleated sheets of parallel b-strands are usually slightly less stable than those formed by antiparallel b-strands due to the nonlinearity of the interstrand hydrogen bonds (Figure 2.1.5). The regular arrangement of b-strands in pleated b-sheets can be distorted by b-bulges [13]. A b-bulge is deﬁned as a segment of two b-strands, located between two consecutive inter b-strand hydrogen bonds, which includes two residues on one strand and one residue on the other (Figure 2.1.6a). A b-bulge is therefore the consequence of an extra residue on the bulged strand, which therefore has an increased backbone length with respect to its pairing strand; this causes the strand to bulge out of the plane of the sheet. This pattern has a major effect on the location of the side-chains of the two strands. While in the standard pleated b-sheet the sidechains are alternatively above and below the plane of the sheet, the presence of a bulge puts this direction alternation out of register on one strand, inducing a slight bend of the sheet and locally accentuating the usual right-handed strand twist. The b-bulges are usually found in antiparallel b-sheets, while they are extremely rare in parallel ones. The irregularities in the hydrogen bond pattern of a b-sheet that lead to the formation of b-bulges have been classiﬁed as: classic, wide, bent, and special. They differ from one another by the number of extra residues on each strand and the number of hydrogen bonds. The presence of a b-bulge can inﬂuence the directionality of b-strands, which together with the orientation of the side-chains affects the position of crucial amino acids in the protein structure; this can play a key role in the overall protein structure. A third type of local structure is the b-turn, which is constituted by a short polypeptide stretch where the protein chain makes a 180 change in direction, doubling back on itself [14]. The b-turn is the most common form of turn and occurs when the carbonyl oxygen of the ﬁrst amino acid in the turn is hydrogenbonded to the amide proton of the amino acid three residues away along the chain (i.e., Oi and Ni þ 3) (Figure 2.1.6b). In addition to this hydrogen bond, b-turns are characterized by the proximity of the a carbons of the ﬁrst and the fourth amino acid, which are usually at a distance of 5–6 A. There are three different types of b-turn: types I, II, and III (Table 2.1.1). They differentiate by the main-chain conformation of the two residues in the loop (i þ 1, i þ 2), with type III being a simple single turn of a 310-helix, and by speciﬁc values of the w and y angles [15]. The mean values of the dihedral angles for amino acids i þ 1 and i þ 2 in the b-turns are wi þ 1 ¼ 64 , yi þ 1 ¼ 19 and wi þ 2 ¼ 90 , yi þ 2 ¼ 2 for type I; wi þ 1 ¼ 61 , yi þ 1 ¼ 132 and wi þ 2 ¼ 82 , yi þ 2 ¼ 3 for type II; and wi þ 1 ¼ 58 , yi þ 1 ¼ 33 and wi þ 2 ¼ 64 , yi þ 2 ¼ 26 for type III. For each of these types there also exists the opposite type in which the backbone dihedral angles of the two loop residues have the opposite sign from those in the original types. They are called type I0 (Figure 2.1.6c), II0 , and III0 . There is a preference for some amino acids to be in b-turns [14]. Type I can have any residue in positions i to i þ 3 with the exception of Pro at position i þ 2. In type II b-turns, Pro is favored at position i þ 1, while Gly is commonly found at position i þ 2. As b-turns are usually solvent exposed, hydrophilic residues such as Asn, Asp, Ser, and Cys are often present at position i, where their polar side-chain can form a hydrogen bond with the backbone amide proton of residue i þ 2. When a b-turn occurs between two consecutive antiparallel b-strands it is known as a b-hairpin. This is the tightest possible turn between two b-strands and also constitutes the simplest supersecondary structure motif involving b-strands (see Section 2.1.5). Other types of turns are found in protein structures and differentiate in the number of residues in the loop deﬁning the turn. They are three in the a-turn and only one in the c-turn. In the latter type the hydrogen bond occurs between the CO of the ﬁrst amino acid and the NH of the third one. Experimental and theoretical approaches to the study of b-turns and their role in protein folding and stability have been recently reviewed [16,17].

2.1.5 Structural Motifs and Structural Domains – Combination of Secondary Structural Elements and Structural Motifs

j

13

Protein regions of variable length and irregular shape that are neither helices nor b-sheets, usually located between two secondary structure elements, are deﬁned as loops. Even though no regular repeating patterns of backbone dihedral angles are present for these segments, they fall in the lowest energy regions of the Ramachandran plot. Loop regions are rich in charged and polar residues as they are often found on the surface of the protein.

2.1.4 Prediction of Secondary Structure

The most straightforward way to predict secondary structure elements is the analysis of known three-dimensional protein structures in relation to their sequence. There is indeed a general preference for some amino acids to adopt a speciﬁc conformation. For example, long side-chain residues, such as Met, Glu, and Gln, are often present in a-helices [18], while Pro residues are disfavored due to the ring conformation of the backbone, which imposes ﬁxed dihedral angles far from those of the a-helix, and the lack of the backbone amide proton, which prevents the formation of hydrogen bonds. Indeed a Pro residue, when located in the middle of an a-helix, breaks the hydrogen bond pattern thus producing a kink in this secondary structural element [8]. Gly residues are often found only in the ﬁrst turn of a-helices or in b-turns. The b-strands are able to accommodate Gly residues giving rise to b-bulges (see Section 2.1.3) that kink the main-chain, but do not disrupt the tight structure formed by the hydrogen bonds. Gly residues are also particularly favored in loop regions because of their intrinsic ﬂexibility; loop regions are also rich in polar residues as they are often solvent exposed. Statistical analyses of the propensity of residues to be located in a given secondary structure element are used by several algorithms developed to predict the occurrence of secondary structural elements as a function of the primary sequence [19]. Together with its own propensity for a given secondary structure, it has also been found that the conformation of a residue is strongly inﬂuenced by the nature of the preceding and following residues. The prediction methods that also take into account the modulation of the neighboring residues give signiﬁcantly more accurate predictions with respect to those methods based only on individual residue preferences. More recent approaches, which have led to signiﬁcant improvements in the accuracy of secondary structure prediction, include information derived from protein sequence alignments. Instead of analyzing the sequence of only the protein of interest, multiple alignments of homologous sequences are also introduced as input for the prediction. State-of-the-art algorithms incorporate this information with the use of computational neural networks designed and parameterized to predict protein secondary structure elements, thus drastically improving the accuracy of the prediction. Some examples of such programs include JPRED,1) YASPIN,2) PHD/ PHDpsi,3) and PSIPRED.4) A more complete list can be found within the Network Protein Sequence Analysis web site.5)

2.1.5 Structural Motifs and Structural Domains – Combination of Secondary Structural Elements and Structural Motifs

Local three-dimensional organizations of two or three secondary structure elements are termed structural motifs or supersecondary structures. They constitute portions 1) 2) 3) 4) 5)

http://www.compbio.dundee.ac.uk/jpred. http://www.ibi.vu.nl/programs/yaspinwww. http://www.predictprotein.org. http://www.bioinf.cs.ucl.ac.uk/psipred. http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_server.html.

Fig. 2.1.6 (a) Example of a classic b-bulge. The sidechains (green spheres) of the three residues forming the b-bulge (in this example residues Ile15, Lys16, and Lys24 of the solution structure of staphylococcal nuclease, PDB ID: 2KHS) are all above the plane. The hydrogen bonds are shown as dashed black lines. (b) Type I and (c) type I0 b-turns. The residues forming the b-turns are shown as balls and sticks. PDB IDs 9PAP and 2ACT were used to generate type I and I0 b-turns, respectively. The hydrogen bonds (i.e., Oi and Ni þ 3) are shown as dashed black lines.

j

14

2.1 Structural Features of Proteins

(a) N-ter C-ter

3

2

1

4

(b)

N-ter

C-ter

(c)

7

2

5

4

3

6

1

8

N-ter C-ter Fig. 2.1.7 (a) Topology diagram of a Greek key motif. Three antiparallel b-strands are connected by two short b-turns, followed by a long turn connecting the fourth b-strand, which is hydrogen-bonded in an antiparallel arrangement with the first b-strand. (b) Up-and-down topological diagram of eight antiparallel b-strands joined by hairpins. An up-anddown b-barrel is formed when the first b-strand is joined by hydrogen bonds to the last one. (c) Jelly roll topological diagram.

of the overall tertiary structure and are often related to speciﬁc functional features. One of the most common motifs is the helix–turn–helix [20]. It is often related to speciﬁc binding sites such as DNA binding [21]. When the stretch connecting the two helices is longer than a turn, the structural motif is called a helix–loop–helix; this motif can also be involved in DNA or calcium binding. The helix–loop–helix, in which the two helices are perpendicular, is also called an EF-hand motif [22]. If the stretch connecting the two helices is instead only formed by two residues, the latter are orientated approximately perpendicular to the helical axes. In this case the second amino acid of the loop is located in the positive w region of the Ramachandran plot and is therefore often a Gly residue, while the ﬁrst amino acid takes a conformation typical of an a-helix. Another common structural motif is the b-hairpin, which involves a b-turn connecting two antiparallel consecutive b-strands. The most common is a tworesidue loop b-hairpin, where either type I0 or II0 turns connect the two b-strands (see also Section 2.1.3), which indeed have the correct geometry to produce the twist of the b-sheet. In b-hairpins with a type I0 turn, the ﬁrst residue adopts a left-handed a-helical conformation; for this reason, the favored residues in this position are Gly, Asp, or Asn. They can adopt conformations with positive w angles due to the absence of a side-chain (as in the case of Gly) or because of the hydrogen bonds between the side-chain atoms of residue i and the main-chain atoms of the residue i þ 1, as in the case of Asp or Asn. The residue i þ 2 is usually a Gly as it is the only residue that can take backbone w dihedral angles consistent with this conformation. The b-hairpins involving three or four residues in the turn have also been found in some protein structures. Another supersecondary structure, common for many proteins, is formed by a b-hairpin and an a-helix. The latter is tightly packed on one side of the sheet formed by the b-hairpin forming a left-handed a-helix–b-hairpin fold [23]. While a b-hairpin produces an antiparallel arrangement of two consecutive b-strands, the b–a–b structural motif is formed by two parallel consecutive b-strands connected by an a-helix. Each secondary structure element is separated by loops that can vary in length. The a-helix is essentially parallel to the b-strands, but lies above the plane formed by the two b-strands (this motif is also called a right-handed b–a–b motif). It has also been found that the two b-strands could be connected by a 310helix [24]. A more complex structural motif is the Greek key motif formed by four antiparallel b-strands. In this motif three antiparallel b-strands are connected by two short b-turns, followed by a long turn connecting the fourth b-strand, which is hydrogen-bonded in an antiparallel arrangement with the ﬁrst b-strand (Figure 2.1.7). Structural domains are the fundamental units of the tertiary structure and comprise sequential or nonsequential parts of the polypeptide chain that are structurally independent and can fold separately [25]. Domains are formed by different combinations of secondary structure elements and motifs. A domain is therefore constituted by any region within the whole protein that maintains its tertiary structure even if separated from the entire polypeptide. Protein three-dimensional structures can be constituted by a single domain or by two or more domains, which are usually associated to speciﬁc functions. The same type of domain can be found in a variety of different protein structures. Indeed, protein domains can be considered as units of protein structures that, assembling in a variety of different combinations, give rise to different three-dimensional structures. As domains often also constitute functional units, different combinations of them are used by nature to modulate and differentiate functional properties. Domains typically have 50–200 residues, although smaller and larger domains do occur. An example of multidomain proteins is matrix metalloproteinases, which are multidomain enzymes containing propeptide, catalytic, and hemopexin domains [26]. These domains are usually linked by a hinge region that permits them to move with respect to one another. There are also examples in which a domain of a multidomain protein cannot fold properly if the polypeptide chain is detached from its neighbors because it is formed by nonsequential parts of the polypeptide chain. An example of this is the ATP-binding domain

2.1.6 Types of Folds and their Classification

j

15

of P-type ATPases [27], which is constituted by two smaller domains: the nucleotidebinding domain and the phosphorylation domain. The sequence of the former is a stretch inserted in the latter. The nucleotide-binding domain can fold independently, maintaining its proper three-dimensional structure even when isolated. On the contrary, the sequence of the phosphorylation domain is interrupted by the stretch forming the nucleotide-binding domain and would therefore require a chimeric linker in order to be properly folded.

2.1.6 Types of Folds and their Classification

Each structural domain can be classiﬁed according to its content of a-helices and b-sheets, and their relative location [25,28]. At this simplest level of classiﬁcation, the types of folds are grouped into four main classes: a class, which contains domains with only a-helices as secondary structure elements; b class, in which protein domains have only b-sheet secondary structure; a/b class, which groups domains with b-strands connected by a-helices; and a þ b class, where the a-helix and b-strand secondary structure elements do not mix together, but tend to pack separately in the protein. A ﬁfth class is formed by protein domains that have little if any secondary structure, but are stabilized by disulﬁde bonds or metal ions. CATH,6) which is a manually curated database for the classiﬁcation of protein domain structures, identiﬁes 1282 folds and 152 920 domains from the 104 238 Protein Data Bank (PDB) structures deposited in the PDB (the latest version 3.4 of CATH is based on the PDB release of 13 November 2010) [29]. In the CATH database, 21.2% of the domains are classiﬁed as belonging to the a class, 25.6% to the b class, 51.7% to the a/b class and a þ b class, and the remaining 1.5% to the class of proteins with low secondary structure content. Another structure classiﬁcation database is SCOP (Structural Classiﬁcation of Proteins), which is a comprehensive ordering of all proteins of known structure, according to their structural and evolutionary relationships [30]. In this database protein domains are grouped according to their sequence and structural and functional relationships. Statistics on the current release and other information are available from the SCOP web site.7) 2.1.6.1 Folds of the a Class

Coiled-coil, four-helix bundle, and globin types of fold represent the three major ways for a-helices to pack together to generate domains of this class [31] (Figure 2.1.8). The coiled-coil fold consists of two amphipathic a-helices interacting via hydrophobic side-chain interactions, while the side-chains of the hydrophilic residues point towards the solvent. The packing of the side-chains on the hydrophobic side follows the knobs-in-holes arrangement, in which each side-chain of a residue in the ﬁrst helix (the knob) interacts with the side-chain of four residues located in the second helix (the hole). This structural arrangement is the basis for the structural arrangement of ﬁbrous proteins and can be comprised of hundreds of amino acids [32]. The four-helix bundle fold is formed by four antiparallel or parallel a-helices arranged in a bundle [33]. Each helix makes an angle of about 20 with the following one, so that the entire motif has a left-handed twist. The four-helix bundle fold is stabilized by interactions between the side-chain residues, which are arranged to form ridges separated by grooves on each helix. The four helices are usually placed in such a way that the ridges ﬁt into the grooves of the adjacent helix (ridges-in-grooves 6) 7)

http://www.cathdb.info. http://scop.mrc-lmb.cam.ac.uk/scop.

Fig. 2.1.8 Example of coiled-coil (a, PDB ID: 2XU6, MDV1 coiled-coil domain), four-helix bundle (b, PDB ID: 1QPU, cytochrome b562), and globin (c, PDB ID: 2HHB, human deoxyhemoglobin) types of fold. They represent the three major ways for a-helices to pack together to generate domains of the a-class. The heme porphyrin rings are shown as sticks in cytochrome b562 and human deoxyhemoglobin.

16

j

2.1 Structural Features of Proteins

model) and the two helices are tilted by an angle of about 50 or about 20 depending whether the ridges occur every four or three residues, respectively [4,34]. A hydrophobic core in the middle of the bundle is present. Four-helix bundles can also be formed by a dimer of coiled-coil structures packed against each other, according to the ridges-in-grooves arrangement, where the two helices of each coiled-coil structure interact with each other according to the knobs-in-holes model. The four-helix bundle fold is present in a large variety of all a-class proteins, where it carries out different functions. Examples of proteins with such a fold are cytochrome b562, ferritin, and cytokines. The third type of a-class protein is the globin fold, typical of myoglobin and hemoglobin (heme-containing proteins) [35]. It consists of seven or eight a-helices arranged at an angle of about þ 50 with respect to each other and forming a hydrophobic pocket that can accommodate organic as well as metal-containing cofactors. In myoglobin and hemoglobin, it is large enough to accommodate a porphyrin ring. 2.1.6.2 Folds in the b Class

Structural domains that contain only b-sheets and turns or irregular loops are called b-domains and are classiﬁed as b-class [31] (Figure 2.1.9). Folds within this class are constituted almost entirely by b-sheets, in the majority of cases with an antiparallel arrangement. The antiparallel b-strands can be arranged into two b-sheets usually packed one against the other (b-sandwich) or can form a distorted barrel where the strands are connected by b-turns or larger loops (b-barrel). In the latter case a closed cylindrical structure is formed where all the strands are hydrogen-bonded with each other (Figure 2.1.9a). In the former case the last strand of each b-sheet is not hydrogen-bonded to any other strand (Figure 2.1.9b). In both types of fold, the b-strands can interact in two ways. When the barrel is formed by antiparallel b-strands that are adjacent in sequence and connected by hairpins, the resulting structure has an up-and-down three-dimensional arrangement, with the b-strands curved and twisted (Figure 2.1.7b). The side-chains of the hydrophobic residues point inside the barrel, where hydrophobic molecules can also be accommodated, while the hydrophilic residues point towards the solvent.

Fig. 2.1.9 Example of b-barrel (a, PDB ID: 1PRN, membrane channel porin), b-sandwich (b, PDB ID: 1F6L, variable light chain dimer of antiferritin antibody) and six b-propeller types of fold (c). In the b-propeller the six motifs formed by four up-anddown antiparallel b-strands are shown with different colors. (d) cB crystallin protein (PDB ID: 4GCR). The protein is composed of two domains. Each domain is built from an eight-stranded antiparallel b-sandwich structure composed of two Greek key motifs shown in red and green, respectively, for the N-terminal domain. The C-terminal domain is shown as a white ribbon for clarity. (e) Greek key b-barrel fold adopted by the N-terminal domain of Fusarium oxysporum trypsin protein (PDB ID: 1FN8). The Greek key motif is shown in red. The four b-strands forming the Greek key motif are numbered as in Figure 2.1.7a.

2.1.6 Types of Folds and their Classification

j

17

An up-and-down arrangement is also present in proteins where six b-sheets, each composed of four adjacent antiparallel b-strands, are arranged in a circle around a central axis to generate a b-propeller type of fold (Figure 2.1.9c). Each up-and-down four antiparallel b-strand motif is called a propeller blade, as the whole structure is a six-blade propeller [36]. The other way to connect antiparallel b-strands is through the formation of a Greek key motif (see Section 2.1.5 and Figure 2.1.7a); the resulting barrel is called a Greek key barrel or sandwich [37] (Figure 2.1.9d–e). The Greek key motif is the most common arrangement of b-strands found in antiparallel b-structures. Cases have also been reported of b-barrels where only one b-sheet has a Greek key arrangement, while the other b-stranded sheet takes an up-and-down arrangement. A variation of the Greek key motif is the jelly roll motif (Greek key-like motif), formed by eight b-strands paired to make four antiparallel interactions forming like a long b-strand hairpin, where only two strands (strands 4 and 5) are adjacent in sequence. This motif then wraps around the two central strands, forming a roll (Figure 2.1.7c). 2.1.6.3 Folds in the a/b Class

This class includes domains formed by b-sheets where the b-strands are connected through a-helices, thus forming parallel stranded b-sheets. Therefore, these types of folds are frequently formed by combinations of b–a–b structural motifs. There are three main subgroups within this class: a/b-barrels, a/b-twists or opentwisted sheets, and a/b-horseshoe folds (Figure 2.1.10). In a/b barrels, the b–a–b–a motif is repeated 4 times or more and the twisted b-strands create a barrel surrounded by the a-helices. This fold is also called the TIM fold [38] because it was ﬁrst discovered in the three-dimensional structure of the triosephosphate isomerase (Figure 2.1.10a). The antiparallel a-helices are amphipathic and are packed against the barrel, shielding it from the solvent. The eight b-strands form a tightly packed hydrophobic core entirely constituted by the side-chains of hydrophobic residues. The structure is further stabilized by hydrophobic interactions between residues, predominantly Val, Leu, and Ile, located on the hydrophobic side of the a-helices and on the external side of the b-strands. In the a/b-twist type of fold the parallel b-strands, which are not consecutive in sequence, form a twisted open sheet surrounded by a-helices on both sides. A peculiarity of this structure, also called the Rossman fold [39], which was ﬁrst observed for lactate dehydrogenase enzyme in 1970, is the presence of a crevice between two b–a–b motifs that are connected by a further a-helix (Figure 2.1.10b). This crevice places the a-helices on one side for half of the b-sheet and on the opposite side for the other half of the b-sheet plane. The number of b-strands involved in the formation of this type of fold can vary from 4 to 10. There are also cases in which the b-sheet is formed by a mix of parallel and antiparallel b-strands, the latter connected by hairpins. In this type of fold the a-helices are amphipathic, while the b-sheet is predominantly hydrophobic as it is shielded by the a-helices from interaction with the solvent. The a/b horseshoe fold domains are formed by repeats of Leu-rich motifs where a characteristic pattern of conserved Leu residues play a relevant structural role [40] (Figure 2.1.10c). Each repeat is formed by a right-handed b–loop–a motif, connected by loops in similar ways to those observed for the a/b-barrels, thus forming an arrangement which resembles a horseshoe. The concave surface of the horseshoe consists of a large parallel b-sheet formed by a variable number of b-strands, whereas the convex surface is formed by the a-helices. 2.1.6.4 Folds in the a þ b Class

This class includes domains formed by both b-sheets and a-helices, but located independent from one another. There is no speciﬁc organization of the secondary

Fig. 2.1.10 (a) a/b-barrel/TIM fold present in the N-terminal domain of Ala racemase (PDB ID: 1EPV). (b) a/b-twist/Rossman fold in lactate dehydrogenase enzyme (PDB ID: 1A5Z). (c) a/b-horseshoe fold in porcine ribonuclease inhibitor (PDB ID: 2BNH).

18

j

2.1 Structural Features of Proteins

structure elements within this class, but they can take any arrangements already described for the a and b classes. The ﬁfth class of fold, also called the cross-linked irregular domain, groups small proteins that do not have an extensive hydrophobic core or a large number of secondary structure elements interacting with each other. These folds can be stabilized by disulﬁde bonds or metal ions that can connect different parts of the protein far away in sequence and with little or no secondary structure. Metal ion crosslinked domains are common (e.g., in zinc ﬁnger transcription factors).

2.1.7 Tertiary Structure

In a folded protein, secondary structure elements as well as loops and other parts of the polypeptide chain fold into a compact structure stabilized by interactions involving hydrophilic and hydrophobic residues. The resulting relative position of each atom of the protein in space is called the proteins tertiary structure. The association of secondary structure elements is not random, but is mainly driven by hydrophobic interactions of side-chains of hydrophobic residues, which thus become buried from the solvent and packed with each other to establish van der Waals interactions, forming the so-called hydrophobic core. The distribution of amino acids therefore varies from the surface of the protein, which remains in contact with the solvent, to its interior, which is more or less inaccessible by the solvent. The differences in amino acid distribution are related to the hydrophobicity of the individual amino acids. This property can be quantiﬁed through a score associated with individual residues that indicates how strongly the side-chains are repelled from the water (i.e., the tendency of amino acids to favor or disfavor interaction with water) [41]. The more positive the score, the more the amino acid tends not to be in an aqueous environment, while negative numbers indicate hydrophilic side-chains. The hydrophilic or hydrophobic properties for each residue can be scored by also taking into account the properties of the preceding and following amino acids, thus providing the hydropathy index. The latter is deﬁned as the mean value of the hydrophobicity score of each amino acid in a stretch, usually 19 residues long, centered on the amino acid of interest. Formation of the hydrophobic core upon protein folding does not only imply that the hydrophobic side-chains are packed into the protein core, but also that their mainchain must fold into the interior. The latter, being highly polar, must be neutralized, as it should be in a hydrophobic environment. One way to compensate the hydrophilicity of the peptide backbone is the engagement of its atoms in networks of hydrogen bonds. The a-helix and b-sheet arrangements maximize the hydrogen bond network between the peptide bonds of the backbone, thus reducing its hydrophilicity. If hydrophilic residues are buried in the protein interior, their side-chains are usually hydrogen-bonded to backbone NH or CO groups or are involved in salt bridges (interactions between side-chains with opposite charge) [42]. Sometimes, extensive hydrogen bond networks involving the side-chains of partially buried charged residues have been observed within a protein structure. These clusters of hydrogen bonds can be essential to functionally orient important amino acids. For example, a complex network of hydrogen bonds stabilizes the conformation of the metal binding residues (which are partially buried) in SOD1 [43,44]. This network is completely conserved among the several SOD1s from various organisms for which structural characterization has been reported and involves both side-chains and backbone groups of polar and nonpolar amino acids. Charged residues, such as Arg, Lys, Asp, Glu, and His, when located in the protein interior, often function as catalytic or metal binding sites. In the majority of the cases, however, hydrophilic residues are usually located on the protein surface with their backbone and side-chain groups interacting with the solvent molecules. The hydrophilic residues are mostly located in loop regions or in

2.1.8 Quaternary Structure

amphipathic a-helices or b-strands. Nevertheless, protein surfaces are never totally polar, as isolated nonpolar residues can be in contact with the solvent. A number of hydrophobic residues can also be clustered on the protein surface, forming speciﬁc binding sites. One example is Mia40 (mitochondrial intermembrane space import and assembly 40) – a protein having a key role in oxidative protein folding in the mitochondrial intermembrane space and containing a hydrophobic cleft formed by a number of hydrophobic and aromatic residues, all of which are highly conserved [45]. This hydrophobic cleft functions as a substrate recognition and binding site, stabilizing initial noncovalent interactions that appropriately position the unfolded substrates, which usually have exposed hydrophobic segments. The protein surface distribution of hydrophobic and hydrophilic residues is therefore important in protein–protein recognition processes, as the protein interface in protein–protein complexes can involve different types of interactions (i.e., hydrogen bonds, salt bridges, and hydrophobic contacts).

2.1.8 Quaternary Structure

The quaternary structure of proteins is deﬁned as the combination of two or more folded polypeptide chains to form a complete, functional unit. It describes the arrangement and position of each of the subunits in a multiprotein complex. Many proteins can indeed self-assemble into multimeric proteins composed of two or more subunits, also called monomers, thus forming protein complexes usually called dimers, trimers, tetramers, and so on, depending on whether they have two, three, four, and so on, subunits. These multimeric complexes can be formed by a single type of subunit, thus being called homodimers, homotrimers, and so on, or by different types of subunits, with the complex taking the preﬁx hetero-. The noncovalent association of the various subunits is mostly driven by hydrophobic interactions on the protein surface among nonpolar side-chains that thus become buried at the subunit interface. The interactions between subunits are tight and speciﬁc, as they should exclude wrong molecules from interfering with the protein assembly. Subunit interactions are further stabilized by electrostatic interactions between charged groups and by hydrogen bonds between polar groups. Subunit complementarity is essential because formation of the quaternary structure requires that all van der Waals contacts, as well as more hydrogen bonds with the other subunit than with water molecules, are established. A well-characterized example of complementarity between interacting surfaces occurs in the coiled-coils structures (see Section 2.1.6.1), which are dimers of a-helices formed through the knobs-inholes arrangement. It is possible that one or more subunits of oligomeric proteins are only partially folded before oligomerization and that only upon interaction with the other subunits do they assume their ﬁnal, correct, tertiary structure. In coiled-coils proteins, for example, the two subunits are frequently unfolded as monomers and assume their folded structure only upon dimerization.

j

19

j

2 Structure of Biomolecules: Fundamentals 2.2 Nucleic Acids Mirko Cevec, Hendrik R.A. Jonker, Senada Nozinovic, Christian Richter, and Harald Schwalbe

Six torsion angles (a, b, c, d, e, and f) deﬁne the conformation of the DNA and RNA phosphodiester backbone. The structure of the nucleobase is restricted to a planar conformation, while the relative orientation to the sugar described by the torsion angle x around the glycosidic bond is ﬂexible. The bimodal distribution of x reﬂects the major anti and minor syn conformations of nucleobases. The ribose puckering can be described by the pseudorotation phase angle P and the puckering maximum amplitude nmax, which are deﬁned in terms of ﬁve endocyclic ribose torsion angles n0-n4. While DNA is mostly found in a C20 -endo (S-type) conformation, RNA is mainly found in a C30 -endo (N-type) conformation. The right-handed B-form DNA composed of Watson– Crick base pairs is the most common and biologically relevant DNA structure. Variability of RNA folds is considerably larger than for DNA and an RNA double helix most often resembles A-form DNA. Mismatched base pairs, sheared base pairs, base triplets, and base wobbles, such as GA, GT/U, AA, GG, AC, or CGC þ , often lead to a variety of extrahelical loop regions and bulges. DNAs and RNAs can fold into complex tertiary structures (e.g., i-motif, DNA/RNA quadruplexes, U- and K-turn, C- and E-loop, tetraloops, riboswitches, and DNA–RNA hybrids).

2.2.1 Introduction

This chapter describes the diverse conformations of DNA and RNA structures. To date, only 5% of the biomolecular structures solved and deposited in structural databases (Protein Data Bank (PDB) and Nucleic Acid Database (NDB)) contain DNA, and just 3% contain RNA. The strength of NMR in solving those structures is eminent, as 18% of the DNA- and as many as 23% RNA-containing entries were solved using this method in comparison to 11% for proteins. NMR contributes even for 41 and 51% to the experimental structure elucidation of the DNA and RNA entries alone. For the large complexes of nucleic acids with proteins, 95% were solved by X-ray crystallography; however, the contribution of NMR is especially evident for the smaller (lower than 40 kDa) protein–RNA complexes, of which nearly 40% have been solved by NMR [1]. In nucleic acids, four different building blocks (i.e., four nucleotides) are covalently connected via a phosphodiester backbone – a polymeric chain. A nucleotide consists of three chemically and structurally distinct components: a nucleobase, a sugar, and a phosphodiester group. The backbone of the chain is made by the sugars and phosphodiesters joined by ester bonds at the 50 and 30 carbons of the sugar.

NMR of Biomolecules: Towards Mechanistic Systems Biology, First Edition. Edited by Ivano Bertini, Kathleen S. McGreevy, and Giacomo Parigi. Ó 2012 Wiley-VCH Verlag GmbH & Co. KGaA. Published 2012 by Wiley-VCH Verlag GmbH & Co. KGaA.

21

22

j

2.2 Nucleic Acids

Fig. 2.2.1 (a) Chemical structures of nucleobases adenine (A), guanine (G), thymine (T), uridine (U), and cytosine (C). (b) Nucleotide structure and definition of torsion angles. The 20 -OH group present only in RNA is indicated in blue. (c) Watson–Crick base pairs. (d) Left: structure of two most populated sugar conformations C30 -endo and C20 -endo. Right: presentation of the sugar pucker mode described by the pseudorotation phase P and puckering maximum amplitude nmax.

The phosphodiester group gives nucleic acids their acidic properties, as they are fully ionized at physiological pH. The sugar moiety is a ribose furanose ring for RNA or a deoxyribose furanose ring for DNA. The presence of the 20 -OH group is essential for the structure and function of RNA, which exhibits higher conformational and chemical diversity than DNA. Genetic information is encoded by four different nucleobases that are attached to the sugars by N-glycosidic bonds. Two nucleotides, adenosine-50 -phosphate (A) and guanosine-50 -phosphate (G), contain fused-ring purines. Cytosine-50 -phosphate (C) and uracil-50 -phosphate (U) are single-ring pyrimidines. In DNA, cytosine is replaced by the methylated thymine-50 -phosphate (T) (Figure 2.2.1). The sequence of four nucleotides given by the single-letter code of the attached nucleobases (A, G, C, U, or T) comprises the primary structure of the nucleic acid [2]. 2.2.1.1 Conformations

The local conformation of RNA/DNA is divided into three conformational regions: backbone, sugar, and nucleobase. The conformation of the phosphodiester backbone

2.2.1 Introduction

is deﬁned by six torsion angles (a, b, c, d, e, and f). The structure of the nucleobase is restricted to a planar conformation, while the relative orientation to the sugar described by the torsion angle x around the glycosidic bond is ﬂexible. The ribose puckering can be described by the pseudorotation phase angle P and the puckering maximum amplitude nmax, which are deﬁned in terms of ﬁve endocyclic ribose torsion angles n0-n4 (Figure 2.2.1). Moreover, the backbone torsion angle d correlates to the sugar pucker mode. Commonly, the sugar pucker is characterized by which carbon atom is displaced out of the plane and is termed endo (e.g., C30 -endo). In RNA, the orientation of the additional 20 -OH group is described by the torsion angle . In total, there are 9 and 10 degrees of freedom in DNA and RNA, respectively. Statistical analysis of torsion angle populations based on crystal structures reveals that single angle distributions show multiple and broad peaks [3,4]. The torsions a and c have trimodal distributions, while the torsion b is mainly restricted to the trans conformation (180 ) with a broad Gaussian-like distribution. The bimodal distribution of x reﬂects the major anti and minor syn conformations of nucleobases. For sterical reasons, the torsion angle e is limited to values above 180 , but shows a ﬂat distribution. The sugar pucker is mostly observed to be in either a C30 -endo or C20 -endo conformation, referring to the bimodal distribution of the torsion angle d. RNA and DNA have fundamentally different preferences for ribose conformation linked to the presence of the 20 -OH group. While DNA is mostly found in a C20 -endo conformation, corresponding to the S-type conformation, RNA is mainly found in a C30 -endo conformation, corresponding to the N-type conformation. In noncanonical regions of structure, RNA adopts a C20 -endo conformation or exists at an equilibrium of both conformations. The highest ﬂexibility in both RNA and DNA is seen for the torsion angle f, which populates all possible values without any sharp preference except for RNA/DNA in a canonical helical structure. In RNA, this is strongly correlated to the sugar conformation (Figure 2.2.2). Generally, the torsion angles of RNA and DNA folded in a helical structure populate a very narrow range of values (Table 2.2.1). In noncanonical regions, diverse conformations are possible showing less interdependence between torsion angles. Figure 2.2.2 underlines this fact and shows the comparison of torsion angles when RNA is present in C30 -endo and C20 -endo conformations. In proteins, the backbone conformation is described by the Ramachandran diagram showing the correlation of w,y in torsion angle space. In RNA, the corresponding fi,ai þ 1 correlation diagram deﬁnes the two torsions around the phosphorous atom along the 50 ! 30 chain direction of RNA (Figure 2.2.2). Both torsion angles show high variability, yet form deﬁned conformational clusters [4].

j

23

Fig. 2.2.2 Ribosomal RNA torsion angle distribution extracted from the crystal structure (PDB ID: 1JJ2, 2736 residues). The one-dimensional torsion angle distributions show two sets of residues that have been separated by residues with C30 -endo (black) and C20 -endo (blue) sugar conformation. The two-dimensional plot shows the correlation of torsion angles fi and ai þ 1 for all residues.

24

j

2.2 Nucleic Acids

Table 2.2.1 Backbone torsion angles and torsion angle around the glycosidic bond in ( ) [5].

a B-DNA C-DNA D-DNA A-DNA A-RNA Z (G)-DNA Z (C)-DNA

30 37 59 52 69 52 140

b 136 160 156 175 179 179 137

c 31 37 64 42 55 174 51

d

e

f

x

143 157 145 79 82 95 138

141 161 163 148 154 104 97

161 106 131 75 71 65 82

98 97 102 157 161 59 154

Each nucleobase possesses a different combination of functionalities (keto, imino, and amino groups) that allow for hydrogen-bonding interactions. These are fundamental for the formation of base pairs that build up secondary and tertiary structures.

2.2.2 DNA Structure 2.2.2.1 B-DNA and Derivatives

DNA molecules need to be structurally ﬂexible for their function. Double-stranded DNA can form several structurally different duplexes A, B, C, and D. The righthanded B-form DNA (Figure 2.2.3) is the most common and biologically relevant DNA structure. Watson and Crick suggested its model in 1953 with the two antiparallel polynucleotide strands wrapped around the helix axis [6]. The most important feature of the double-helix model is speciﬁc bonding of complementary bases with hydrogen bonds. There are two hydrogen bonds between adenine and thymine and three between guanine and cytosine (Figure 2.2.1c). Nucleobases are turned inside the duplex and are stacked perpendicular to the helix axis. The repeating unit is one nucleotide. Each nucleobase is turned by 36 around the axis and separated by 3.4 A [5]. Almost 10.5 base pairs comprise one complete turn with a helix pitch of 34 A and helix diameter of 20 A. The glycosidic torsion angle adopts an anti conformation and the sugar is primarily in a C20 -endo conformation, which makes the intrastrand phosphate–phosphate distance equal to 7.0 A. The B-DNA structure shows sequence-dependent modiﬁcations that inﬂuence the helical twist and backbone torsion angles. The backbone of sugars and phosphodiesters runs at the outside of the double helix, and deﬁnes a major and a minor groove. Oxygen atoms of

Fig. 2.2.3 Structures of B-DNA, Z-DNA, A-DNA, and A-RNA.

2.2.2 DNA Structure

the phosphate groups are negatively charged, and are counterbalanced by water molecules and metal ions. The major groove of B-DNA is almost 2 times wider than the minor groove, but the depths are almost identical [5]. Other molecules can more easily interact with the major groove because purine and pyrimidine nucleobases are more exposed to the solution. The right-handed C-DNA is a member of the B-family and is present at lowhumidity conditions. C-form DNA has 9.3 base pairs per turn, base pairs tilted by 8 , a more shallow major groove, and a deeper minor groove. D-DNA is another member of the B-family, but its structure has not yet been precisely deﬁned. D-form DNA is formed by purine/pyrimidine ATsequences and has only 8 base pairs per helical turn with each base rotated by 45 around the axis. D-DNA has a very deep and narrow minor groove – a perfect place for binding water molecules and cations. 2.2.2.2 A-DNA

The structure of nucleic acids is highly affected by salt concentration and hydration. At lower relative humidity and higher salt concentration conditions, B-DNA is transformed into the right-handed A-DNA. A-form DNA is shorter, wider, and more conformationally rigid than B-DNA (Figure 2.2.3) [7]. Eleven base pairs are needed for one complete helical turn. The glycosidic torsion angle is anti. Sugar puckers are C30 -endo, which place consecutive phosphate groups closer to each other (5.9 A). The base pairs are twisted, tilted, and displaced outside of the helix axis, generating the deep major groove and shallow minor groove. The A-DNA backbone torsion angles have more narrow ranges than those of B-DNA. A-form DNA has important biological functions, and has been found in many protein–DNA complexes and DNA–RNA hybrids. 2.2.2.3 Z-DNA

The left-handed Z-DNA (Figure 2.2.3) was named after the characteristic zigzag form of phosphate groups formed by alternating d(CG) sequences under high salt conditions [8]. The repeating unit is composed of two nucleotides (CpG) with purine and pyrimidine nucleosides having different conformations. Deoxyguanosine adopts a syn glycosidic torsion angle and C30 -endo sugar pucker, while deoxycytidine adopts an anti torsion angle and C20 -endo sugar pucker. Twelve base pairs compose one helical turn with a pitch of 45 A and helix diameter equal to 18 A. The Z-DNA helix does not have a distinct major groove, while the minor groove is deep and narrow. The functional importance of Z-DNA remains debated; however, Z-DNA function has been assigned a role in gene regulation. 2.2.2.4 Nonstandard DNA Structures

In DNA, a large number of base-pairing schemes are possible besides Watson– Crick hydrogen bonding, which allows DNA molecules to also form high-order structures including triplex and quadruplex structures. These nonstandard DNA base pairings include reversed Watson–Crick pairings, a GT wobble pair, Hoogsteen pairings, and reversed Hoogsteen pairings. Mismatched base pairs, such as GA, GT, AA, GG, or AC, usually occur during replication and are repaired by enzymes in cell. Bulged and looped-out bases in non-B-form DNA are also essential for DNA recognition. 2.2.2.4.1 Circular DNA Double-helix DNA in plasmids, bacteria, viruses, mitochondria, and chloroplasts forms a covalently closed circle termed circular DNA. DNA in this form can be

j

25

26

j

2.2 Nucleic Acids

underwound (supercoiled) when fewer helical turns in comparison to the B-form structure are observed. Underwound DNA is important for many biological processes in which separation of the DNA strand is needed (e.g., replication and transcription or in the structural dynamics of nucleosomes). 2.2.2.4.2 Helical Junction Branch sites appear in DNA during replication or homologous genetic recombination when two DNA duplexes need to cross-over. A Holliday junction is a helical junction of a cruciform shape at the meeting point of four antiparallel base-paired B-DNA arms. 2.2.2.4.3 Triple Helix Three polynucleotide strands can form a triple helix with the third strand incorporated in the major groove of a Watson–Crick-bonded DNA duplex. A triplex is composed of base triplets and is stabilized by an arrangement of hydrogen bonds. Figure 2.2.4a shows the crystal structure of the triplex with the third pyrimidine strand bound parallel to the purine strand of the DNA duplex to form TAT and C þ GC triplets through Hoogsteen hydrogen bonds [9]. Crystals were grown at low pH because cytosine N3 needs to be protonated to stabilize C þ GC triplets. Protonation of cytosines and the length of the triplex are the main obstacles when ﬁnding targets to inhibit transcription of target genes via triplex formation. This antigene approach has been used to downregulate the oncogene c-myc [10]. Triangular DNA structures were used as nanotubes that could be loaded with a cargo that is released when a speciﬁc strand of DNA is added [11]. 2.2.2.4.4 i-Motif Under low pH or at high temperature cytosine-rich DNA sequences can form a fourstranded quadruplex termed the i-motif. One of the two cytosines in the CC þ base pair is protonated at the N3 position, which allows formation of three hydrogen bonds. These mismatch base pairs are formed by antiparallel intercalation of two parallel duplexes. An example of an intramolecular i-motif structure was found in the human telomeric repeat sequence d(CCCTAA5meCCCTAACCCUAACCCT) (with methylated (5-methylcytosine) C7) [12]. The structure reveals a C30 -endo sugar conformation, three lateral loops, two narrow, and two wide grooves (Figure 2.2.4b). 2.2.2.4.5 Quadruplex DNA Interest in the quadruplex structure increased in the 1990s after the discovery of guanine-rich repetitive sequences at the end of chromosomes and their involvement in chromosomal maintenance. G-rich DNA sequences at the telomeric ends can form unusual structures called G-quadruplexes. More than 350 000 distinct sites in the human genome have the potential to form G-quadruplex structures. The formation and decay of speciﬁc G-quadruplex structures in promoter regions of the human genome contribute to the regulation of gene expression. Quadruplex DNA molecules are built upon a G-quartet motif (Figure 2.2.4c). A G-quartet consists of four guanine bases held together in a coplanar arrangement by eight Hoogsteen hydrogen bonds. The major requirement for G-quadruplex formation is the presence of cations, which reduce repulsions of guanine carbonyl oxygen atoms and also promote stacking of G-quartets. The phosphate backbones form four grooves of different width and depth. The folding topologies of G-quadruplex structures critically depend on the number of G-rich repeats, sequence details, and the nature as well as concentration of metal ions present in solution (Figure 2.2.4d). G-quadruplex structures can have parallel and antiparallel strand orientations with mixed anti and syn glycosidic torsion angles. They have comparable features as double-stranded DNA with similar torsion angles. Figure 2.2.4e–g shows an example of the unimolecular [13], bimolecular [14], and quadrimolecular G-quadruplex [15], respectively. G-quadruplexes have the potential to be used as nanomolecular devices [16].

2.2.3 RNA Structure

j

27

Fig. 2.2.4 (a) Crystal structure of the triplex DNA (PDB ID: 1D3R). (b) NMR structure of the human telomeric i-motif d (CCCTAA5meCCCTAACCCUAACCCT) (PDB ID: 1EL2). (c) Guanine quartet showing the Hoogsteen hydrogen bonds. (d) Various G-quadruplex strand stoichiometries. (e) Crystal structure of the unimolecular d(AG3(TTAG3)3) G-quadruplex in K þ ions (PDB ID: 1KF1). (f) NMR structure of the bimolecular d(G3T4G4) G-quadruplex in Na þ ions (PDB ID: 1U64). (g) Crystal structure of the quadrimolecular d(TG4T) G-quadruplex (PDB ID: 352D).

2.2.3 RNA Structure

Three major classes of RNA can be distinguished based on their structure and functional role: messenger, transfer, and ribosomal RNA. Even though the chemical constitution of RNA does not differ very much from DNA, X-ray crystallography and NMR structure determination have revealed a huge variability of RNA folds that is considerably larger than for DNA. The presence of the extra hydroxyl group in the sugar (deoxyribose to ribose) is profound and causes the RNA backbone to adopt a different conformation. The B-form duplex is unfavorable due to steric clashes. A RNA double helix would therefore most often resemble A-DNA more closely than B-DNA. In addition, the removal of the methyl in the pyrimidine base thymine, resulting in a uracil, makes the RNA particularly susceptible to mutations replacing a GC base pair with a GU wobble or an AU base pair. RNAs display the classic Watson–Crick basepairing in the helices, but the more complex parts are deﬁned by a large variety of base– base interactions, such as the cis/trans Hoogsteen and base triplets, and diverse basesugar arrangements. The RNA backbone is rotameric and analysis of the available three-dimensional structures reveals a variety of possible conformations while the sugar commonly prefers the C30 -endo and C20 -endo state [4,17,18]. The three-dimensional structure of RNA is primarily directed by the sequence, inﬂuenced by many factors including temperature, solvent, cations, cofactors (such as other RNAs and proteins), and often subject to conformational changes. The RNA

28

j

2.2 Nucleic Acids

Fig. 2.2.5 (a) Some examples of RNA structure elements. The nucleic acid backbone is indicated by a thick line and the bases by thin lines. The size of the stems, loops, connections and bulges are variable. (b) GU wobble base pair. (c) Reverse Hoogsteen AU base pair. (d) cis-Watson–Crick–Hoogsteen GA base pair. (e) CGCH þ base triplet.

readily forms short double-stranded stem and loop structures on itself, frequently stabilized by cations and featuring typical elements such as bulges, hairpin loops, internal loops, junctions, and pseudoknots (Figure 2.2.5). 2.2.3.1 Regular RNA Structure – A-Form Helices

RNA can adopt a rigid helix conformation: the A-form RNA is an 11-fold helix with a deep narrow major groove (Figure 2.2.3). A slightly different 12-fold helix can also be formed called A0 in which the major groove is a bit wider, resembling more that of ADNA [19,20]. Both helices feature the typical C30 -endo sugar conformation and an anti glycosidic torsion angle to the nucleobase. An extensive analysis of RNA structures identiﬁed 14 A-form and 18 non-A-form dinucleotide conformational classes [4], of which one, with an anticlockwise rotation of the nucleobases, is A-like as some of the subsequent backbone torsion angles (ai þ 1 and ci þ 1) cancel out. A minor, but distinct AII-RNA stacking conformation also has very typical backbone torsion values for the subsequent nucleotide (ai þ 1 155 instead of 295 and ci þ 1 180 instead of 55 ) to compensate for a so-called crankshaft motion. Some A-form conformational classes show a difference in the ﬁrst nucleotide; those show a conjoint change of the a and the

2.2.3 RNA Structure

c torsion angles or display the ribose in a C20 -endo conformation. In a few cases an unusual value of the b torsion angle is observed (around 110 or 220 instead of 180 ) caused by an atypical rotation around c. 2.2.3.2 Mismatches, Bulges, and Unusual Base Pairing

Not surprisingly, like protein and DNA, many RNAs can also fold into complex tertiary structures. Within the RNA tertiary structures, base-pair mismatches often lead to a variety of extrahelical loop regions and bulges (Figure 2.2.5a). These fascinating regions are often subject to detailed structural and dynamical studies as they are frequently involved in diverse regulation mechanisms; among others through loop–loop interactions and binding to other biomolecular targets. Even total structural rearrangements are observed that can be triggered by cations, small ligands, and substrates (as for riboswitches, ribozymes, etc.). The non-Watson–Crick mismatches result in structural elements with interesting features like the formation of sheared base pairs, base triplets, and base wobbles (Figure 2.2.5b–e). For example, as for DNA, diverse stable RNA quadruplex structures can be formed. UG4U forms a parallel-stranded tetramer held together by Hoogsteenpaired G-quartets [21]. A variety of possible quadruplex structures have been identiﬁed that serve many different functions in biological systems, such as ligand binding, gene expression regulation, or stabilization of tertiary structure [22]. 2.2.3.3 Reversal and Alteration of Strand Direction: Commonly Observed Loop and Turn Motifs

The most frequently observed structural elements in RNA are hairpins, loops, and bulges, which are often identiﬁed as potential receptors and therefore ideal drug targets. A broad variety of common structural motifs have been identiﬁed and to describe the subsets of nucleotides in these elements, additional one-letter codes are used: R (for purines: A,G), Y (for pyrimidines: C,U), M (C,A), K (U,G), W (U,A), and S (C,G) for the double subsets; B, D, H and V for the triple subsets, which exclude A, C, G, and U, respectively; and N for any ribonucleoside. 2.2.3.3.1 U-Turn A well-characterized sharp reversal of the RNA backbone is frequently observed after a single-stranded uridine base (U-turn). The U-turn is composed of a UNR sequence pattern ﬂanked by a pyrimidine [23,24] and characterized by two typical hydrogen bonds from the U-sugar to the R-base and the U-base to the phosphate backbone following the sequence (Figure 2.2.6a). One predominant member of this motif is the GNRA-tetraloop-type U-turn [25]. Some of the features observed in U-turns are also observed in other motifs, such as the T-loop [26]. 2.2.3.3.2 K-Turn A very sharp tight kink is introduced into the axis of helical RNA by a so-called kinkturn motif (K-turn) comprising a three-nucleotide bulge that is ﬂanked by conserved sheared AG (A Hoogsteen edge to G sugar edge) base pairs on the 30 side (Figure 2.2.6b) [27,28]. In particular, the ﬁrst AG base pair is strongly buckled. This conserved element is stabilized by a number of critical hydrogen bonds and commonly requires the presence of divalent or monovalent metal ions or interactions with proteins for proper folding [29–32]. In the absence of cations a more extended bulge conformation is formed [33]. 2.2.3.3.3 C-Loop A common asymmetric internal loop motif in which the ﬁrst nucleobase is usually a cytosine (therefore called a C-loop) causes an increase of the helical twist of the stem

j

29

30

j

2.2 Nucleic Acids

Fig. 2.2.6 (a) U-turn RNA motif of the crystal structure of the 58 nucleotide ribosomal RNA fragment (PDB ID: 1HC8). (b) K-turn RNA motif of the Haloarcula marismortui large ribosomal subunit (PDB ID: 1JJ2). (c) C-loop RNA motif of the Escherichia coli threonyl-tRNA synthetase mRNA (PDB ID: 1KOG). (d) Kissing loop RNA motif of the Vibrio vulnificus A-riboswitch with U34A65C61–G37 and A33A66C60–G38 tetrads (PDB ID: 1Y26). (e) The UUCG tetraloop RNA motif and NMR restraints statistics of the 14mer cUUCGg tetraloop RNA structure (PDB ID: 2KOC).

between the two Watson–Crick base pairs ﬂanking the motif (Figure 2.2.6c) [28,34]. This motif is characterized by the formation of two typical crosswise non-Watson– Crick base pairs and stacking interactions between both strands. Extruded nucleobases from the shorter strand are usually involved in tertiary interactions. In some Clike motifs, the high twist angle between the ﬂanking base pairs is maintained, but those miss the usual C residue and the second base pair [28]. 2.2.3.3.4 E-Loop The E-loop is another frequently observed structure motif characterized by an asymmetrical internal loop, generally with AGUA on one side and RAM on the other, and involved in cross-strand base pairing [35–39]. E-loop motifs can also be symmetric and tolerate higher variations of bases than other motifs as identiﬁed by RNA motif scans [40]. 2.2.3.4 Tetraloops and Tetraloop–Receptor Contact

Stem–loop motifs show a wide variety in size and sequence, yet the four-nucleotide tetraloops are frequently found capping RNA double helices. A wealth of structural and dynamic data is available for many different tetraloop sequences, displaying their characteristic stabilizing hydrogen bond networks. Two well-conserved stable motifs are commonly observed– the GNRA and YNMG (containing the UNCG and more rarely the CUYG) tetraloop (Figure 2.2.6e) [41–43]. In each of these motifs, the turn in the RNA strand is formed between the second and third residue, which frequently present a C20 -endo sugar conformation. The tetraloops are generally stabilized by typical unusual base pairings (such as syn–anti, side-by-side, and wobble) between the ﬁrst and fourth nucleotide, which are different for both motifs. The stability of the loops is also dependent on the composition of the closing base pairs [44].

2.2.3 RNA Structure

In a less common WGNN tetraloop motif, the turn occurs at the 30 side of the universal syn G residue, which is strongly exposed [45]. This well-deﬁned loop conformation is suggested to recognize double-stranded RNA substrates. Another unusual GANC tetraloop consensus sequence is found in group IIC introns and forms a U-turn-like sharp angle [46]. The second through fourth nucleotides in this motif stack on one another and the ﬁrst and fourth form a cis–trans sugar edge/ Watson–Crick base pair. Long-range tertiary hydrogen bonding contacts between a tetraloop motif and a receptor motif combine, and add additional base pairing and stacking interactions, forming a stable global tetraloop–receptor RNA fold [47,48]. Typically, the second exposed nucleotide of the tetraloop stacks and forms additional hydrogen bonds to nucleotides within the receptor. 2.2.3.5 Higher-Order RNA Tertiary Structure Elements: Coaxial Stacking Motifs

The most fundamental method for RNA to achieve a higher-order organization is enabled by coaxial stacking of bases [49]. The structural organization of junctions in which the helices come together frequently involves metal ion binding. This is clearly observed in the structure of the hammerhead ribozyme catalytic center, where one helix is oriented towards two coaxially stacked helices by tertiary contacts and speciﬁcally bound magnesium ions [50–53]. Coaxial stacking between motifs in the organization of the RNA structure is also evident in the structures of the P4–P6 domain and the hepatitis delta ribozyme, both having two sets of coaxially stacked helices that are packed against one another [54]. In the case of riboswitches, the tertiary structure can be stabilized by ligand binding [55]. Riboswitches have two key domains – an aptamer for ligand binding and an expression platform. The two parallel helices originating from the three-way junction ligand-binding pocket are dynamically linked together by a number of hydrogen bonds in their loops in order to regulate gene expression. Prominent coaxial stacking motifs in large RNA molecules are the kissing loop interaction and the pseudoknot (Figure 2.2.5). In a kissing loop (Figure 2.2.6d), the single-stranded loop regions of two hairpins interact through base pairing, forming a composite coaxially stacked helix [56,57]. The pseudoknot is a common feature that is always deﬁned by two stems and by two or three loop regions [58,59]. This complex structural motif occurs when a loop pairs with a complementary sequence outside the loop. A number of very detailed structures have been determined by solution NMR and X-ray crystallography, showing important roles for base triplets and bound water molecules [60–62]. The hepatitis delta virus manifests a complex ﬁve-helical arrangement forming a double pseudoknot [63,64]. 2.2.3.6 DNA–RNA Hybrids

In addition to structures of solely RNA and DNA, hybrid combinations can be formed [65]. These DNA–RNA hybrids frequently involve essential biological processes, such as (reverse) DNA/RNA transcription. These modiﬁed hybrid systems are therefore very suitable for pharmaceutical drug design purposes, speciﬁcally targeting messenger or viral RNA. The hybrids are unique targets for binding small molecular ligands [66]. The structures of RNA–DNA duplexes, hybrids, and variations have been studied extensively, and were generally found to vary in stability [67,68] and to exhibit characteristic features that are not common in DNA or RNA alone. The actual helical conformation of an RNA–DNA hybrid duplex can be A-form, B-form, or in between, depending upon the relative humidity [69–72]. Even the incorporation of a single ribonucleotide can already drive the typical B-form DNA duplex to an A-form [73,74]. NMR studies revealed that the DNA strand generally exhibits a character in between A- and B-form and adopts a C20 -endo sugar conformation, while the RNA has an

j

31

32

j

2.2 Nucleic Acids

A-type character and adopts the C30 -endo conformation [75–79]. An interesting example of a DNA–RNA hybrid is the 10–23 DNA enzyme, which consists of a DNA core with two variable long arms that selectively bind to the complementary RNA target. It is potentially used to downregulate protein expression by cleaving messenger RNA. Its structure is still unknown as X-ray crystallization experiments only elucidated a novel fold of a dimeric inactive complex [80].

j

3 What Can be Learned About the Structure and Dynamics of Biomolecules from NMR 3.1 Proteins Studied by NMR Lucio Ferella, Antonio Rosato, and Paola Turano

NMR can provide structural and dynamic information on proteins. The threedimensional structure of a protein determined from NMR data consists of an ensemble (bundle) of different conformers that equally satisfy the experimental restraints. The precision of the bundle is described quantitatively by the root mean square deviation (RMSD) of the atomic coordinates. The RMSD can be computed by including all of the atoms of the protein or only a selection of them; the RMSD of the backbone atoms of ordered residues is the most commonly reported parameter. Protein dynamics can affect RMSD values. At the per-residue level, the motions of the proteins can reduce the local density of experimental restraints, leading to elevated (with respect to the rest of the protein) backbone RMSD values. However, the reverse is not true – locally high RMSD values do not necessarily indicate the presence of enhanced dynamics, but can simply be a consequence of restraint inconsistency. Protein dynamics takes place on a variety of different timescales, which can be sampled by different types of NMR measurements. The measurement of nuclear relaxation rates is a particularly relevant and common approach to sampling some of these timescales. NMR has provided ample evidence of the importance of protein dynamics for processes such as catalysis and molecular recognition. Finally, NMR experiments provide an easy, fast, and very convenient means of investigating intermolecular interactions. Chemical shift mapping is the most widely used approach for this kind of applications.

3.1.1 Why NMR Structures?

The ﬁrst NMR solution structure of a biological macromolecule was solved by K. W€ uthrich and coworkers in 1984 for proteinase inhibitor IIA from bull seminal plasma – a protein containing 57 amino acids [1]. The value of the NMR approach to protein structure determination was especially conﬁrmed when the NMR structure in solution of rat metallothionein-2 revealed that the available crystal structure of the same protein needed revision [2]. Since these pioneering papers, nearly 9000 NMR structures of proteins, nucleic acids and adducts thereof have been deposited in the Protein Data Bank (PDB), which amounts to about 12% of the total number of structures (Figure 3.1.1). In addition to structural information, NMR provides information about the dynamics of proteins and nucleic acids. It is this unique combination of structure and dynamics that makes NMR a special technique, in

NMR of Biomolecules: Towards Mechanistic Systems Biology, First Edition. Edited by Ivano Bertini, Kathleen S. McGreevy, and Giacomo Parigi. Ó 2012 Wiley-VCH Verlag GmbH & Co. KGaA. Published 2012 by Wiley-VCH Verlag GmbH & Co. KGaA.

33

34

j

3.1 Proteins Studied by NMR

Fig. 3.1.1 Pie chart representation of PDB data. Sectors in the main circle represent the relative proportion of structures solved with the three main structural techniques (X-ray, NMR, and cryo-electron microscopy). For each technique, a pie chart is reported and divided in slices that represent the distribution of the different types of biological systems that have been structurally characterized: isolated proteins, isolated nucleic acids, and macromolecular complexes (mainly protein–protein and protein–nucleic acid complexes).

particular for the analysis of systems and events that are inherently dynamic, such as intrinsically unfolded proteins or macromolecular recognition. Currently, X-ray diffraction studies of crystallized molecules and high-resolution NMR spectroscopy in solution are the two primary experimental methods providing structural information on biological macromolecules at the atomic level (Figure 3.1.1). For both techniques, structural studies have been carried out mainly for proteins (92.5% of PDB entries contain at least one protein chain). It is thus apparent that both methods can reliably produce high-quality structures for a wide variety of proteins targeted in structural biology and structural proteomics projects. Nevertheless, each has its own technical limits and barriers. In particular, NMR methods for determining high-resolution structures are generally conﬁned to smaller (less than 25 kDa) proteins (Figure 3.1.2), whereas X-ray crystallography requires crystals yielding Fig. 3.1.2 Bar chart of the molecular weight distribution of structures solved by NMR, X-ray, and cryo-electron microscopy. The total number of structures solved by each technique is equated to 100 and the frequencies of the various molecular weight classes are reported proportionally. The preferential use of NMR for low-molecular-weight systems and of cryo-electron microscopy for very large assemblies clearly emerges.

3.1.1 Why NMR Structures?

diffraction data of sufﬁcient quality. Cryo-electron microscopy is gaining importance as a means to provide structures of extremely large biological assemblies such as the ribosome, microtubules, and viruses; advances in methodology have steadily decreased the size for which cryo-electron microscopy is routinely applicable, which is currently around 200 kDa. Cryo-electron microscopy is usually at lower resolution than X-ray crystallography, but is more likely to provide biologically relevant conformations due to the absence of crystal lattice constraints. The different approaches can be combined; structures of individual protein components are determined at atomic resolution by X-ray crystallography (or NMR) and then docked into a cryo-electron microscopy map to assemble a pseudoatomic resolution structural model of the whole complex and to detect protein contact interfaces. As a minor remark, it should be noted that NMR restraints such as distance restraints often explicitly include the positions of hydrogen atoms and therefore these positions are reported in the PDB coordinate ﬁles. By contrast, X-ray structures do not generally include hydrogen atoms in atomic coordinate ﬁles, because the heavy atoms dominate the diffraction pattern and the hydrogen atoms are not explicitly seen. Sample preparation for both NMR and X-ray crystallography generally requires that the protein of interest is homogeneous, stable over days or even weeks, soluble enough, and not irreversibly aggregated at high concentrations. Even when these conditions are met, one cannot be certain that the samples will produce welldiffracting crystals or will give NMR spectra of high enough quality for structure determination. As the broad sample requirements for the two techniques are similar, it is not surprising that NMR spectroscopy has been used to rapidly screen protein samples in order to identify those that have the greatest potential for successful crystallization, with some positive outcomes. For example, it has been shown that the analysis of the 1 H one-dimensional NMR spectra allows proteins to be categorized into groups that produce crystals with a similar success rate, but differ signiﬁcantly in the quality of the diffraction pattern that these crystals provide [3]. Indeed, the crystals obtained from the proteins whose spectra have higher chemical shift dispersion (indicative of the presence of a deﬁned tertiary structure) and narrower signal linewidth (ruling out conformational equilibria) on average diffract to higher resolution. The use of 1 H–15 Nheteronuclear single-quantum coherence (HSQC) spectra, where a single peak for each backbone amide is observed in high-resolution and highsensitivity two-dimensional maps of 15 N-labeled samples, is made possible by the relatively low cost of 15 N enrichment and represents a valid screening criterion in high-throughput structural genomics studies. It has been shown that the analysis of 1 H–15 N two-dimensional HSQC spectra of protein constructs enables the optimization of domain crystallization by guiding the rational design of the constructs before initiating crystallization trials [4]. Although there is clearly some correlation between NMR spectral quality and the likelihood of obtaining well-diffracting crystals, this correlation is far from stringent. In other words, not all proteins with excellent or good 1 H–15 N two-dimensional HSQC spectra can be expected to perform well in crystallization trials. In a study of 159 proteins, the percentage of highresolution X-ray structures that resulted from samples with excellent or good 1 H–15 N two-dimensional HSQC spectra was only about 20% [5]. Interestingly, 33% of the proteins with lower quality, promising 1 H–15 N two-dimensional HSQC spectra could be successfully subjected to structure determination by X-ray. As 1 H–15 N twodimensional HSQC spectra are, in general, very good indicators of the feasibility of structure determination by NMR, these data altogether indicate that protein samples that are feasible for structure determination by NMR may not be equally suited for Xray structure determination, even though they meet the requirements of high solubility and monodispersion. In another study, it was shown that only 21 proteins out of 263 (8%) were deemed amenable to structure determination by both NMR and X-ray based on initial 1 H–15 N two-dimensional HSQC spectra and optimized crystallization trials [6]. However, the use of both methods in parallel increased the total number of targets amenable to structure determination to 107 (41%), with 43

j

35

36

j

3.1 Proteins Studied by NMR

Fig. 3.1.3 1 H–15 N HSQC spectra of proteins with different degrees of folding. (a) Spectrum of a fully folded protein: all the peaks are sharp, intense, and distributed over a large chemical shift range; the number of resonances attributable to backbone amides matches the number of protein residues. (b) In a partially unfolded protein the chemical shift spreading of the peaks in the 1 H dimension is reduced; the most intense peaks are broad and collapsed in a narrow region centered around 8.5 ppm, typical for random-coil structures. Peaks outside this range are weak. (c) The backbone amide resonances of an unfolded protein are all collapsed in a narrow range centered at about 8.3 ppm in the 1 H chemical shift dimension. Peak pairs corresponding to the side-chain NH2 groups of Gln and Asn are indicated by horizontal lines: their resolution also decreases with an increasing degree of unfolding.

amenable to NMR only and 43 amenable to X-ray crystallographic methods only. These studies highlight the strong complementarity of NMR and X-ray in terms of the chances of enabling the high-resolution structure determination of a given protein sample. One of the possible reasons for this behavior can be the presence of highly ﬂexible or even unstructured regions within the protein construct. These regions may not affect greatly the quality of NMR spectra, but can be extremely adverse to crystallization. If the unstructured regions are at the protein termini, it may be possible to remove them through redesign of the protein construct; however, this is often impossible if the unstructured regions are, for example, located in a long loop in between secondary structure elements. A simple quantitative evaluation of the HSQC spectrum quality (Figure 3.1.3) can easily be implemented by counting the number of well-detectable (i.e., intense and well-resolved) amide peaks and comparing it with the number of non-Pro residues contained in the protein sequence; this number provides an estimate of the number of residues in folded protein regions (Figure 3.1.3a). Weak minor peaks and broad peaks collapsed in the spectrum center arise from protein regions lacking a deﬁned three-dimensional structure (Figure 3.1.3b and c). On the other hand, a number of amide resonances larger than expected indicates the presence of multiple conformations, which interconvert slowly on the NMR chemical shift timescale. The ﬁndings outlined in the preceding paragraph strongly suggest that there is no a priori scientiﬁc reason to prefer one or the other structural technique for the characterization of a protein target. From the practical point of view it must be pointed out that NMR structure determination of proteins has been supported by automation to a lesser degree than protein structure determination by X-ray. Advancements in the development and use of automated programs for NMR are reviewed in Chapters 4 and 33. In addition, NMR requires isotopic enrichment of the protein samples, with the degree and type of labeling being different for different protein sizes (see Chapter 4). As a distinct advantage of NMR, various reports have demonstrated that the conformation observed in X-ray structures is not necessarily the dominant one in solution, which is typically closer to the physiological state of soluble proteins. For example, in the histidine-containing phosphocarrier protein (HPr) from Enterococcus faecalis, the strained conformation of a key active-loop residue observed in the X-ray structure, which was originally proposed to be important for catalysis, was subsequently shown by NMR to be at most a minority conformation in solution [7]. A likely reason for this kind of behavior is that when more than one conformational state exists in solution the crystallization procedure will select only one of them (the most crystal-prone conformation). Consistently, different conformations of some ﬂexible regions of matrix metalloproteinase-12 have been observed as frozen in different crystalline environments, while the mobility in solution studied by NMR revealed conformational equilibria on various timescales [8]. As already mentioned, a main limitation of solution-state NMR is the size limit of the proteins for which de novo structure determination can be successfully tackled (Figure 3.1.2). The physical reason for this limit is the fact that slower molecular tumbling in solution increases the linewidths of the NMR signals, possibly preventing their detection (see Chapter 4 for a more detailed discussion). At the same time, the increased number of residues in large proteins causes severe spectral overlap, making spectral assignments difﬁcult. Pushing the size limit of solution-state NMR is one of the present frontiers for researchers interested in the advancement of NMR methodology [9–13]. Therefore, X-ray crystallography is the technique of choice to structurally characterize large proteins or macromolecular complexes at atomic resolution; systems that are even larger (or cannot be crystallized) can be investigated through cryo-electron microscopy (Figure 3.1.2). X-ray and NMR can be synergistically combined with the latter technique, which has lower structural resolution, by providing atomic-level information for individual components of macromolecular assemblies whose overall shape has been determined by cryo-electron microscopy [14]. The size limit in NMR imposed by the slow molecular tumbling instead does not apply to solid-state NMR experiments, due to the fact that each protein molecule is not

3.1.2 NMR Bundle

rotating, but is blocked in a ﬁxed orientation (see Chapter 20). Therefore, in the last few years there has been a signiﬁcant effort by a number of research teams worldwide to develop new methodologies for protein structure determination by solid-state NMR. These can be applied to many different sample types, including, among others, soluble proteins in the microcrystalline state, protein ﬁbrils, and membrane proteins embedded in lipid bilayers (note that in favorable cases it is possible to study membrane proteins by solution NMR methods, such as upon reconstitution of the system in micelles formed by detergent molecules) [15]. Microcrystals of soluble proteins are somewhat easier to produce than high-quality crystals suitable for X-ray structure determination, thereby justifying the interest in solid-state NMR as an alternative means for the structural investigation of large proteins. However, as the technique is still in the phase of burgeoning methodological development, there are not standardized or generally accepted protocols available so far. Several proof-ofprinciple cases have appeared in the literature (see Chapter 22). A greater success story for solid-state NMR is probably in the investigation of protein ﬁbrils (see Chapter 23), where this technique has proved crucial and unique to obtain detailed atomic-level information on the packing of ﬁbrils [16]. Membrane proteins have been the subject of solid-state NMR investigations for more than three decades (see Chapter 24) [17]. Membrane proteins are also systems for which successful crystallization is hard to achieve. The pioneering studies on bacteriorhodopsin and gramicidin demonstrated that solid-state NMR permitted the study of molecules for which other structural methods would fail. More recently, solid-state NMR has been applied to a wide range of diverse proteins, yielding extensive characterization of their structures as well as of the structural changes associated with protein function. For example, after light activation, retinal proteins shuttle through various states associated with conformational changes, both in the protein and in the chromophore. They have been characterized for rhodopsin. Another system extensively studied by solid-state NMR is the M2 proton channel, which has been investigated both in terms of the channel structure and function as well as ligand binding. Indeed, this technique has uniquely allowed researchers to study ligand (from drugs to toxins) binding to functional membrane proteins (see also Section 3.1.4 and Chapter 24).

j

37

3.1.2 NMR Bundle

As is described in detail in Chapter 4, the determination of the solution structure of a protein by NMR involves the following steps (Figure 3.1.4): acquisition of experimental data, assignment of resonances to atoms in the molecular structure, assignment of observed interactions to atom pairs, and translation of these interactions into

Fig. 3.1.4 Key steps that guide the iterative process of NMR structure determination. The inner cycle is generally repeated several times to correct possible errors in assignment or restraint evaluation and to enable feedback mechanisms such as the use of preliminary protein structures obtained in early cycles to supplement the initial resonance assignments and the initial collection of structural restraints. When a satisfactory number of conformers sharing good consistency with experimental data is achieved, the cycle branches off to validation/quality checks. The dashed line indicates the possible need for the further analysis of experimental data and structural restraints to improve structure quality.

38

j

3.1 Proteins Studied by NMR

structural restraints, such as distance restraints and orientational restraints; further restraints can be derived from chemical shift information. In addition, for paramagnetic metal-containing proteins, restraints can be obtained by exploiting the interaction of nuclei with the unpaired electron spin density of the metal (Chapter 8). An initial set (typically called a bundle) of conformers is calculated using all the possible restraints. The bundle is then manually or automatically analyzed to identify errors in the restraints and/or assignments, and the whole procedure is iterated until a satisfactory number of consistent (i.e., not conﬂicting with one another in the structure calculation process) restraints have been assembled. Even with the ﬁnal set of restraints, the calculation procedure does not yield a single set of atomic coordinates, but results in a bundle of conformers, of arbitrary size (usually 20–40), which are all equally consistent with the available restraints. In other words, an NMR structure is represented by a bundle of 20–40 different conformers (Figure 3.1.5), each of which is an independent full set of atomic coordinates, rather than, as happens with X-rays, by an individual conformer. This is due to the fact that the large majority of structural restraints, such as distances, are input to the structure generation programs in the form of intervals of allowed values, such as upper and lower distance limits (the latter often being taken as the sum of the van der Waals radii of the involved atoms). The need to use ranges instead of exact values for restraints stems from the fact that the NMR observables are averaged over all the conformations spanned by the polypeptide chain in solution on timescales that, depending also on the exact nature of the interaction measured, can be as long as hundreds of microseconds. Consequently, each restraint derived from the NMR data is the result of an average over the values that the corresponding structural parameter has in each individual conformation sampled in solution. This average is not necessarily structurally meaningful, in that there is no guarantee that it corresponds to the real value assumed in any of the sampled conformations. Notably, in consideration of the fact that all biological macromolecules are inherently ﬂexible in solution, and that this ﬂexibility is often of physiological and functional relevance, it has been proposed that also X-ray structures would be better described as ensembles of different conformations [18]. In addition to the above considerations on protein mobility, it is also important to keep in mind that the structural restraints are generally derived from NMR observables using approximate or heuristic equations and that the error propagation from the NMR measurement to the structural restraint usually cannot be evaluated (indeed, often the error on the measurement itself is difﬁcult to assess or even unknown). Altogether, as already mentioned, these characteristics make the use of restraint ranges the easiest possible computational approach. The above considerations are particularly true for distance restraints, which has stimulated researchers to develop alternative restraints. In this sense, the most plausible alternative are orientational restraints (namely residual dipolar couplings, RDCs), which can in principle be used as the sole source of information for structural determination [19,20], as discussed in detail in Chapter 4. In spite of the complexity of the problem, there have been various attempts to use relatively sophisticated approaches to NMR data analysis aimed at extracting more careful estimates of upper and lower distance limits [21,22]. However, from the practical point of view these approaches require a considerable effort (with respect to what is required by conventional approaches) towards the very careful estimation of the input NMR parameters, largely preventing the application of automated methods, or alternatively require speciﬁcally designed experiments to be carried out, possibly leading to a signiﬁcant increase of the instrument time needed.

Fig. 3.1.5 Bundle of 30 conformers resulting from the solution structure determination of the soluble domain of a copper-transporting protein (PDB ID : 1YJR). The bundle is graphically presented as the best superimposition of the backbone atoms of the structures forming it; side-chains are not shown for the sake of clarity. Regular secondary structures are visualized using different colors (red for a-helices and blue for b-strands). In (a) and (b) each single component is represented by a separate line. (a) A smooth continuous line is used with a cartoon arrow representation of the C-terminal part of b-sheets to emphasize the presence of such structural elements. (b) A broken line connecting the Cas of consecutive residues is used to represent the protein backbone, thus permitting every amino acid to be picked up. (c) The variable precision of the bundle along the protein chain is displayed by a smooth tube with varying radius, so that local structural uncertainty is indicated by a wide tube, while a thin tube indicates the high local precision of the structure.

3.1.2 NMR Bundle

As an NMR structure is represented by a bundle of conformers equally consistent with the data, all subsequent analyses must be run over all the members of the bundle. Therefore, the parameters derived from the analysis of an NMR structure should be reported as an average standard deviation over the bundle. Some typical properties under consideration here are solvent accessible area, cavity volumes, atom–atom distances, and so on. The precision of the NMR bundle itself is clearly a parameter of great interest, as it determines the uncertainty of the various derived parameters. The precision of an observation is generally understood as the measure of reproducibility of repeated measurements and can be expressed as the variance of the measured value around its average value. For NMR structures, the quantity of interest is the uncertainty in the molecular coordinates (in Cartesian space), which is described using the coordinate root mean square deviation (RMSD) and is normally taken as the precision of the structure. For two conformers (A and B), the corresponding equation is: vﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ u N h u1 X 2 i 2 2 ð3:1:1Þ xiA xiB þ yiA yiB þ zAi zBi RMSD ¼ t N i¼1

which must be computed after structure superposition; N is the number of corresponding atoms included in the comparison. In practice, the summation is computed over the heavy atoms (i.e., all atoms but hydrogen) of the entire protein or of the protein backbone only (referred to as the heavy atoms RMSD and the backbone RMSD, respectively). For a bundle of NMR conformers, the current practice is to report the average value of the RMSD between each conformer and the average conformer. The latter corresponds to the set of coordinates generated by calculating the geometric average of the coordinates of each atom over all conformers in the bundle; the average conformer does not correspond to a physically acceptable conformation. Sometimes when an NMR structure is deposited in the PDB, there will be separate entries for both the bundle and the energy-minimized average conformer, namely a mean conformation calculated from the ensemble and then subjected to energy minimization to restore reasonable stereochemical properties. This is a way of generating a single representative structure, as it is much easier to visualize structural features from the energy-minimized average than from the bundle. However, for highly disordered regions, a minimized average will not be informative and may even be misleading. Another approach for deﬁning a single representative structure is to identify the conformer in the bundle which is closer (i.e., has the smallest RMSD) to the average structure; this approach has the advantage of avoiding the intermediate generation of a highly distorted conformation as happens with the calculation of the energy-minimized average conformer. In this case, the PDB page reports which of the conformers within the bundle is the representative one (upon the authors indication). The RMSD is a parameter describing the global precision of the structure, but can also be used to indicate the local precision (e.g., at the per-residue level). This is done by computing the summation only over the atoms of each individual protein residue. An example is shown in Figure 3.1.6b. It is nearly always found that the per-residue RMSD is quite variable along the protein sequence, with local maxima that are typically located at the protein termini and in-loop regions connecting elements of secondary structure (which instead normally contain the local minima). This kind of general proﬁle has a many-faceted motivation: (i) the extent and diversity of protein conformations sampled by each residue are greater in regions lacking secondary structure; (ii) the local density of structural restraints is dependent on the structural features around each amino acid, resulting in the fact that core, densely packed residues are more restrained than surface-exposed or terminal residues (Figure 3.1.6a); and (iii) unfavorable protein motional regimes in a sequence region can quench, partly or totally, internuclear interactions, making nuclear Overhauser effect (NOE) observables undetectable and therefore preventing the determination of structural restraints for the amino acids of that region. The proﬁle of the per-residue RMSD is therefore the result of a combination of the dynamic and structural

j

39

40

j

3.1 Proteins Studied by NMR

Fig. 3.1.6 RMSD values and the number of distance restraints per residue for the NMR bundle of Figure 3.1.5. (a) Intraresidue, sequential, mediumrange, and long-range distance restraints for each residue are indicated by white, light gray, dark gray, and black bars, respectively. Only the restraints used in the final calculation have been included. (b) The per-residue RMSD values to the mean conformer are displayed for the backbone (filled squares) and all heavy atoms (open circles). The correlation between the local number of restraints and the RMSD value is clear (e.g., at the termini or in the region around residue 15).

properties of the protein. It must not be taken as a measure of local protein dynamics, which should instead be experimentally characterized through speciﬁc experiments (see Section 3.1.4). Another frequent misconception is that the NMR bundle might represent the true conformational space sampled by the structure. This is simply wrong, for at least two reasons. (i) As already commented, there is always an effect on the local structural variability of the number of restraints – if the latter is insufﬁcient to deﬁne the local conformation, then the observed variability is simply a random sampling of the entire theoretically available conformational space and not a description of the conformational space actually sampled by the polypeptide chain. (ii) As the restraints, particularly distance and dihedral angle restraints, are provided as ranges of allowed values and the structure calculation programs use simpliﬁed force ﬁelds, the distribution of conformations in the bundle is not energetically meaningful. A further element in this context is the fact that current NMR structure determination protocols implicitly tend to minimize the global RMSD of the ﬁnal bundle. This often leads to overestimating the precision of the structure with respect to what would be allowed by the restraints [23]. In other words, the NMR bundles frequently describe only a subset of the conformational space that would be consistent with the experimental restraints. To conclude this discussion on what the RMSD is not, it is important to stress that the RMSD does not reﬂect at any level the accuracy of the structure. As is described in detail in Chapter 4, the accuracy of NMR structures must be evaluated through speciﬁc parameters, which describe the level of agreement with either the restraints or standard stereochemical parameter distributions. As both the global and the per-residue RMSD depend on the goodness of the superposition of the conformers of the bundle, it is sensible to include only the protein regions that are precisely deﬁned by NMR data in the superposition and, consequently, in the global RMSD calculations. In practice, one wants to include as many amino acids as possible in the global RMSD computation, while avoiding the inclusion of structure regions whose conformation is largely the result of uncontrolled computational factors arising during the structure determination process rather than being determined by the data. This choice can be done manually, such as by including only secondary structure elements or well-deﬁned regions identiﬁed by visual inspection, or through one of the various algorithms available in the literature. For example, Snyder and Montelione have deﬁned a structural order parameter that can be used to identify a set of core atoms for RMSD calculation [24].

3.1.3 Protein Dynamics

The fact that NMR structures are represented as bundles of conformers makes the comparison of two different NMR structures a fairly complex task. To decide whether a structural difference, which may affect the backbone of a region of the protein, is meaningful, it is necessary to assess whether it is larger than the structural uncertainty within each of the two bundles. Qualitatively, this is often done by checking whether the backbone RMSD between the corresponding regions of the two structures (i.e., the RMSD calculated including only the residues in the region that is potentially structurally different) is larger than the sum of the backbone RMSD values calculated over the two bundles for the same protein regions. If this check is positive, then the difference is assumed to be meaningful. Note that the outcome of the test is dependent on how the structures have been superimposed; in some cases it might be possible that the inclusion or exclusion of the regions of interest from the superposition affects the decision of whether or not a structural rearrangement is signiﬁcant. More sophisticated statistical approaches to structure comparison have been proposed [25]. Structure similarity measures other than RMSD might also be useful for the purpose of comparing NMR structures [26], even though they have not been commonly used in the bio-NMR community. As a ﬁnal remark we note that with the deﬁnition given in Equation 3.1.1, the RMSD is dependent on the length of the polypeptide chain – it increases with increasing protein size [27]. A normalized version of the RMSD has been proposed (RMSD100), which could be useful when comparing the precision of the NMR structures of different proteins [27]. However, possibly because of the relatively narrow range of the molecular weights of proteins structurally characterized by NMR (Figure 3.1.2), this measure has not received much attention in the bio-NMR community.

j

41

3.1.3 Protein Dynamics

As we have extensively commented in the previous sections, a protein (as well as any other molecule) in solution does not adopt a single conformation, but is instead continuously sampling a variety of conformational states as a result of thermal energy. Over time, each individual molecule in solution is incessantly going from one conformation to another. In terms of dihedral angle values, changing conformation most commonly involves hopping from one allowed state to another, rather than exploring a continuum of angles. The interconversion among the different conformations therefore involves an activation energy for the system, which in moving from one allowed conformation to another must transiently go through a disallowed state. A complete description of proteins thus requires a multidimensional energy landscape that deﬁnes the relative probabilities of the conformational states (thermodynamics) and the energy barriers between them (kinetics) (Figure 3.1.7). The energy barriers separating the different conformations of a protein can vary dramatically, so the timescales for the transitions between states range from picoseconds (typical for localized motions such as librations of small, relatively unhindered groups) to many seconds (or even hours or days, for collective motions associated with large conformational rearrangements such as global unfolding) (Figure 3.1.8). Owing to their different effects on NMR observables, it is a general practice to separate protein motions into two groups: those that are faster than the tumbling rate

Fig. 3.1.7 Free-energy profile as a function of the one-dimensional representation of the conformational space. A protein is typically characterized by a multiple-minima free-energy landscape. Fluctuations between the different local minima are activated processes with activation barriers of varying height that translate into a wide distribution of rate constants for the protein motions; the higher the barrier, the slower the process.

Fig. 3.1.8 Typical timescale range for different types of protein motions.

42

j

3.1 Proteins Studied by NMR

for the reorientation of the molecule in solution as a whole and those that are slower than it. The tumbling rate of a protein in solution depends on its size and shape as well as on the viscosity of the solution. In a ﬁrst approximation, the correlation time for the tumbling (tr) can be derived from the Stokes–Einstein equation: tr ¼

4pgR3 3kT

ð3:1:2Þ

where the molecule is considered as a rigid sphere of radius R and the correlation time becomes proportional to its molecular weight (where g is the viscosity, k is the Boltzmann constant, and T is the temperature). tr values for proteins range from a few to hundreds of nanoseconds. Motions occurring on timescales shorter than the rotational correlation time are regarded as internal fast motions; instead, the processes occurring on timescales longer than the rotational correlation time are described as collective conformational equilibria. Different NMR experiments have been developed to speciﬁcally investigate the dynamic equilibria of proteins occurring on the various timescales (Figure 3.1.8). Hereafter, and also in Chapter 4, we will refer to protein dynamics, deﬁned as any time-dependent change in atomic coordinates. These experiments provide information on the rates of interconversion among the state; sometimes they can additionally provide an estimate of the relative population or of the different structural properties of the interconverting states. Often the experiments for the study of protein dynamics focus on the measurement of nuclear relaxation times or rates (one being the reciprocal of the other), from which the information of interest can be extracted. In spectroscopy, the relaxation process is the one by which a system returns to equilibrium after a perturbation; the relaxation time is the time constant for such a process, assumed to be of the ﬁrst order (i.e., the equilibrium is reached through random and uncorrelated steps). Given the cylindrical symmetry of the NMR experiment, returning to the equilibrium is different along the direction of the external magnetic ﬁeld and perpendicular to it. Two relaxation rates are therefore deﬁned: the longitudinal relaxation rate (or R1), which refers to the re-establishment of equilibrium along the direction of the applied magnetic ﬁeld (z direction), and the transverse relaxation rate, in the direction perpendicular to it (xy plane). In NMR the rate of return of a nuclear spin system to equilibrium is determined by the time-dependent magnetic ﬁelds experienced by each nucleus, which are due to molecular motions. Therefore, the experimentally determined nuclear relaxation rates can be used, usually under a set of assumptions, to obtain information on protein dynamics. Among the NMR approaches to characterize protein dynamics that are not based on relaxation, one of the most insightful is the modulation of residual dipolar couplings (RDCs). We mentioned in the previous section that RDCs are a powerful source of structural information. However, they can be also used, through a relatively complex experimental approach, described in Chapter 4, to gain knowledge on dynamic processes occurring on timescales between nanoseconds and microseconds. In this timescale, relaxation measurements are not informative. The investigation of protein dynamics by NMR methods has provided insight on the properties of a wide variety of diverse systems, contributing signiﬁcantly to our current understanding of how catalysis works. For example, it has been shown that cyclophilin A (CYPA), which catalyzes the reversible cis/trans isomerization of prolyl peptide bonds, experiences a global conformational exchange process on a timescale of about 1 ms that coincides with the chemical step of peptidylprolyl isomerization of the substrate on the enzyme. This global process was separated in terms of the dynamics of the binding, dissociation, and isomerization steps along the catalytic cycle. In addition, it was shown that the motions detected during catalysis are already present in the free enzyme with frequencies similar to the turnover numbers, indicating that free CYPA presamples the conformational substates observed during catalysis [28,29]. A similar behavior (i.e., presampling by the free protein of conformations that are ﬁxed in the interaction with the substrate or with a

3.1.3 Protein Dynamics

macromolecular partner (conformational selection)) has been demonstrated also for ubiquitin on the microsecond timescale [30]. Interestingly, for the enzyme dihydrofolate reductase (DHFR), which has a multistep catalytic cycle, the protein at each step dynamically samples a conformation similar to that observed at the subsequent step of the catalytic cycle; this again indicates that the protein motions enable the biological function of DHFR by allowing it to sample the adjacent states at each step in the cycle [31]. Protein motions faster than those discussed in the previous paragraph are also relevant to biology. In the bio-NMR community the amplitude of the motions that are called fast (taking place on the pico- to nanosecond timescale) is typically described by the square of a so-called generalized order parameter (S2) (see Chapter 4) [32,33]. This parameter, which has site-speciﬁc values, is used to describe the amplitude of internal motions that reorient a bond vector (the vector coinciding with, for example, a backbone 15 N–1 H bond or a side-chain 13 C–1 H bond) with respect to the rest of the molecule. Site-speciﬁc values of S2 can range from 0 for a completely disordered bond vector to 1 for a completely rigid bond vector [34]. Empirically, S2 values for backbone amide sites are found to be greater than 0.8 in the secondary structures, and between 0.5 and 0.8 for loops, turns, and termini [35,36]. The order parameters for side-chain methyl 13 C sites can reﬂect the ﬂexibility of the entire side-chain, and are experimentally observed to span the entire range between 0 and 1 [37]. An example of a case of biological relevance is the NMR characterization of cyclic AMP (cAMP) binding to the catabolite activator protein (CAP) – a transcriptional activator that has been a prototype for understanding effector-mediated allosteric control of protein activity. In this system, cAMP switches CAP from an inactive conformation, which binds DNA weakly and nonspeciﬁcally, to the active conformation, which binds DNA strongly and speciﬁcally. This conformational change does not take place when cAMP binds to the S62F mutant of CAP, which is nevertheless highly competent for DNA binding. The driving force for the latter interaction is due to the favorable change in conformational entropy due to enhanced protein motions on the pico- to nanosecond timescale induced by DNA binding in the mutant CAP. The wild-type protein instead becomes more rigid upon DNA binding (i.e., unfavorable entropy of binding). Pico- to nanosecond protein motions therefore provide a means for the activation of allosteric response [38]. The examples shown in the preceding paragraphs focused on the investigation of the dynamics of the protein backbone. Albeit more challenging, especially from the point of view of isotope-labeling requirements, it is possible to perform a variety of very informative measurements that directly sample the dynamics of protein side-chains. Methyl groups have been the subject of particular attention, as already mentioned in the previous paragraph. For example, NMR relaxation measurements for the methyl groups of Ile, Leu, and Val residues of proteins encapsulated in the proteasome antechamber showed that the folded structure is destabilized, as shown by the low S2 value for the methyl-axis bond vectors (which are due to enhanced mobility caused by unfolding) [39]. The destabilization is caused by the interaction of the proteins with the antechamber walls; this is demonstrated by the high correlation time for reorientation of the methyl group axis, which is consistent with the overall tumbling of the molecule being slowed down with respect to the free form by the interaction with the proteasome (a hetero 28mer). Dynamics can be investigated not only by solution-state NMR, as already described, but also through solid-state NMR methods (see also Chapter 20b). Here, we would like to stress that solid-state NMR is currently the principal technique for the study of the dynamic properties of systems such as immobilized membrane proteins; consequently, given the relevance of protein dynamics to catalysis and function, in general, it has a unique role in providing a view on the functioning of these important cellular components. For example, site-resolved information has been obtained on the dynamics of proteorhodopsin, indicating that protein hydration has an effect on local mobility only for the residues in ﬂexible loops and tails [40].

j

43

44

j

3.1 Proteins Studied by NMR 3.1.4 Intermolecular Interactions Involving Proteins

The same experimental techniques described in Section 3.1.1 constitute the main methods used today to determine the structure of macromolecular protein complexes. As with individual protein structures, the majority of the structures of protein complexes have been determined by X-ray crystallography (Figure 3.1.1); nevertheless, NMR spectroscopy has also played an important role for protein adducts. As discussed below, there are actually a number of speciﬁc cases where NMR is clearly the methodology of choice or even the only suitable approach. Indeed, inherent limitations of crystallography are the requirement for the complex to crystallize in the ﬁrst place, which may be difﬁcult if one/both of the components of the complex has/ have signiﬁcant amounts of disorder and if the complex has a transient nature, and the possibility that crystals do not contain the biologically relevant conformation of the proteins. The latter possibility is related to the balance between the energy of intermolecular interactions in the complex versus the energy of crystal packing, similarly to the case of individual protein structures described in Section 3.1.1. In cases where the interaction is very weak (Kd > 100 mM), NMR is essentially the only approach that permits the determination of high-resolution structures that can be regarded with conﬁdence as being biologically relevant. NMR however is by no means limited to the study of such weak protein–protein interactions (see Chapters 9–11). Another ﬁeld of application is the design of protein ligands/inhibitors, which is aimed at obtaining molecules that bind tightly and speciﬁcally to the intended target. In the process one often begins with compounds that bind loosely. This has led to the development of a very active area called structure–activity relationships by NMR, where lead compounds with dissociation constants as low as 10 mM can be rapidly screened for interaction with the active site of the target protein (see Chapters 14 and 15) [41]. The most widely used approach for probing the interface of intermolecular interaction of a protein by NMR spectroscopy is the chemical shift perturbation (CSP) mapping experiment, described in more detail in Chapter 9. Brieﬂy, the 15 N–1 H HSQC spectrum of one protein is monitored when the unlabeled interaction partner is titrated in, and the perturbations (changes) of the chemical shifts are recorded. The perturbations are caused by the fact that the interaction modiﬁes the environment of the protein interface, thereby modifying the chemical shifts of the nuclei that are at the interface or close to it. The usefulness and popularity of the approach are due to the straightforward nature of the technique and the high sensitivity of the experiment, which, with state-of-the-art equipment, can be recorded in a few minutes on a 100- to 200 mM protein sample. To be able to interpret the experimental results, only the sequence-speciﬁc assignment of NMR frequencies to individual atoms is needed for just the protein backbone. If a structure of the protein, solved by either NMR or X-ray methods or predicted via homology modeling is also available, the chemical shift mapping can be used in data-driven docking calculations (e.g., with the program HADDOCK (Chapter 32)) to obtain the structure of the complex. If both partners in the interaction are proteins, the chemical shift mapping is typically repeated for both of them by using one protein enriched in 15 N and the other protein unlabeled in one experiment and vice versa in the other. This allows one to experimentally identify the interaction interface for both partners, making more data available for docking. 13 C enrichment can be exploited as well, typically to look at methyl resonances in systems of larger molecular weight [42]. When one of the partners in a protein–protein complex cannot be enriched isotopically or cannot be directly investigated because of its size, indirect information on the location of its interaction interface can be obtained from other in vitro experiments, such as sitedirected mutagenesis, or from bioinformatics; this information can be advantageously used in docking calculations together with the chemical shift mapping of the other partner (Chapter 32).

3.1.4 Intermolecular Interactions Involving Proteins

Chemical shift perturbation is clearly the simplest and quickest technique for the investigation of intermolecular interactions by NMR. Nevertheless, a quite large portfolio of alternative and complementary experiments has been developed by NMR spectroscopists. These can provide more detailed information (e.g., on the involvement of amino acid side-chains in the interaction) or can be applied to systems of higher complexity (e.g., multisubunit complexes). These are analyzed in Chapters 9–18 and 34).

j

45

j

3 What Can be Learned About the Structure and Dynamics of Biomolecules from NMR 3.2 Nucleic Acids Studied by NMR Janez Plavec

DNA and RNA are two different nucleic acids found in the cells of every living organism. Both play signiﬁcant roles in cell biology. DNA and RNA structures are similar, as they formally both consist of long chains of nucleotide units. DNA oligonucleotides usually exist as a tightly associated double helix. The arrangement of the 50 and 30 ends of DNA strands is antiparallel. DNA contains the genetic instructions used in the development and functioning of all known living organisms, whereas RNA molecules are involved in protein synthesis and in the transmission of genetic information. One of the major differences between DNA and RNA is the sugar moiety, with 20 -deoxyribose being replaced by the ribose in RNA. The four bases found in DNA are adenine (A), cytosine (C), guanine (G), and thymine (T). A ﬁfth pyrimidine base, called uracil (U), usually takes the place of thymine in RNA. In the early 1950s, the structure of the double-stranded DNA molecule demonstrated its unique ability to self-replicate. Hydrogen-bond complementarity is at the heart of the recognition and stabilization of doublestranded DNA. Although the general structural features of regular DNAs and RNAs are well known, there exist a plethora of structural motifs that are different from the regular double helix. Detailed knowledge of their structural characteristics is necessary for understanding their biological function. NMR spectroscopy allows the study of the structures of nucleic acid oligomers in solution at close to physiological conditions. In order to delineate the relationships between the structures, thermodynamic stabilities, and biological function of nucleic acids, it is highly desirable to complement the structural studies by characterization of equilibrium conformational ﬂuctuations. NMR is a powerful method to study molecular motions and sequence-dependent ﬂexibility at atomic resolution on a wide range of timescales.

3.2.1 Structure, Mobility, and Function

Nucleic acids and their interactions with proteins remain at the center of molecular biology. Both DNA and RNA are still providing surprises despite decades of structural, biophysical, and biochemical investigation. Recent studies utilize nucleic acids to make new materials or focus on interactions of small-molecule ligands with large assemblies like the ribosome, while novel in vivo methods enable probing of RNA structure and proteins that remodel nucleic acid structures (e.g., helicases), correct for chemical damage to DNA, modulate gene expression by binding to RNAs, and so on.

NMR of Biomolecules: Towards Mechanistic Systems Biology, First Edition. Edited by Ivano Bertini, Kathleen S. McGreevy, and Giacomo Parigi. Ó 2012 Wiley-VCH Verlag GmbH & Co. KGaA. Published 2012 by Wiley-VCH Verlag GmbH & Co. KGaA.

47

48

j

3.2 Nucleic Acids Studied by NMR

Since the determination of the B-form DNA duplex structure over 50 years ago [1], DNA has continued to surprise with new and polymorphic structures. Apart from its enormous biological impact, DNA is used as a material for the creation of novel nanotechnologies that rely on building complex higher-order shapes known as DNA origami [2–4]. The ability of multiple strands to come together as with the four-way holiday junction in DNA recombination allows the construction of complex higherorder DNA structures. Uncommon structures of DNA have been established recently, together with diverse interactions with its ligands. New G-quadruplex structures [5,6] continue to emerge and provide insight into their biological roles in the context of telomere function [7–9]. Complexes with G-quadruplex-binding small molecules are promising for the modulation of telomere capping and therefore protection against exonucleases. Other ligand complexes with DNA include structures of minor-groovebinding molecules that recognize the minor groove in a sequence-speciﬁc way. Duplex DNA continues to exhibit polymorphism and certain sequences have the propensity to form unusual base pairs, which may facilitate sequence-speciﬁc protein recognition. In addition, a number of structures have emerged of chemically modiﬁed DNA, which is important for understanding the effects of mutagenic lesions and how they are recognized by the DNA repair machinery. An increasing list of non-natural DNA analogs includes both non-native backbone and nonstandard nucleobases. NMR spectroscopy is a unique method to screen for formation of nonstandard structural elements, and if quality of a sample allows, establish local details and stability as well as dynamic behavior of noncanonical helical-type structures with single-residue resolution. It is possible to unambiguously identify hydrogen-bonding schemes in GC and AU Watson–Crick base pairs and nonWatson–Crick base pairs that involve imino to nitrogen hydrogen bonds, such as reverse AU Hoogsteen and head-to-head GA base pairs [10]. Traditionally, hydrogenbonding schemes were identiﬁed based on their characteristic nuclear Overhauser effect (NOE) patterns (see Chapter 5). Single-crystal structure determination of DNA four decades ago [11] provided representation of the duplex families at atomic resolution, thus offering insights into the relationship between sequence and geometry, and also revealed water structure, cation coordination, and binding modes of small-molecule ligands and drugs. However, d(CGCGAATTCGCG), a sequence known as the Dickerson–Drew dodecamer, adopts the B-form duplex and accounts for around 10% of all DNA structures deposited in the Protein Data Bank (PDB). The structural dissection of the Dickerson–Drew dodecamer has helped our understanding of DNA geometry and the narrow A-tract minor groove, in particular. The double helix exhibits bending and coordinates cations in a speciﬁc fashion. Mg2 þ ion binding at the GpC and alkali metal ion binding at the ApT step in the major and minor grooves, respectively, are interspersed with water molecules localized in the narrow minor groove (the so-called spine of hydration). Narrowing of the minor groove and the resulting strongly negative electrostatic potential, ﬁrst observed in the structure of the Dickerson– Drew dodecamer, are widely used by proteins for DNA recognition as an indirect readout [12]. Despite all the progress, most common DNA sequences remain structurally unexplored and it is evident that the sequence space has not been sampled very well. Abundant sequences in noncoding regions of eukaryotic genomes serve speciﬁc purposes, such as the formation of nucleosome structures, and therefore their structural properties, including potentials for drug binding and other features, should be studied in more detail. Bioinformatics surveys have mapped previously known sequence motifs such as Grich stretches that were typically associated with the telomeric ends of eukaryotic chromosomes to other regions, such as promoters and 50 -untranslated regions. A better understanding of G-quadruplexes in promoters and the factors associated with their formation and disintegration is of interest in drug discovery and could lead to new cancer therapies. For example, the G-quadruplex of the c-myc oncogene that is involved in tumor biology could be targeted with small molecules in order to repress its transcriptional topology and recognition [13,14]. Structure determinations based

3.2.1 Structure, Mobility, and Function

on X-ray crystallography and solution-state NMR spectroscopy of natural human telomere sequences demonstrated signiﬁcant deviations in conformation and loop geometry depending on the method used [15]. Subsequent investigations revealed that the symmetrical propeller-like structure in the crystal with all parallel G-quartets and three double-chain reversal loops was not a major form in solution [16,17]. However, NMR structures of intramolecular G-quadruplexes formed by human telomere oligonucleotides also exhibit signiﬁcant differences as a result of the nature of the monovalent metal ions present (Na þ versus K þ ions). Different conformations have been observed even in the presence of K þ ions. Three potassium forms have been established by NMR studies in solution so far [18–22]. Interestingly, the structure with only two G-quartets is thermodynamically more stable than the previously identiﬁed K þ forms consisting of three G-quartets each. Detailed structural analysis by solution-state NMR studies has demonstrated that the increased stability can be attributed to extensive base pairing and stacking interactions in the loops. Human telomeric DNA sequences represent a nice example of the heterogeneity of strand topology and loop conformation in G-quadruplexes. NMR spectroscopic studies (a simple heteronuclear correlation experiment is shown in Figure 3.2.1), further described in Chapters 5 and 13, offer a unique insight into their structural details as well as dynamic properties that might be relevant for the functional roles of quadruplexes and other secondary structures. Small molecules interact with DNA though electrostatic interactions, intercalation, and groove binding (see Chapter 16). The latter provides the highest sequence speciﬁcity and groove binders have been shown to prefer AT-rich regions with narrower, negatively polarized minor grooves that permit favorable interactions with their cationic moieties. Minor-groove-binding agents have been extensively studied as potential therapeutics with antibacterial, antiprotozoal, antitumor, and antiviral activity. The shapes of minor-groove binders typically match the curvature of the groove, thus providing optimal van der Waals contacts and hydrogen bonds. Progress in the structural biology of RNA has lagged behind that of DNA, although crystal structures have now been solved for most of the large primordial RNAs, including the ribosome, group I and group II self-splicing introns, and RNase P. Many functional RNAs are very large or ﬂexible, making crystallization difﬁcult. RNA-binding proteins control the expression of genetic information by regulating transcription, splicing, translation and RNA metabolism. Genomics has revealed the deep richness of RNA-binding proteins in different organisms. The ﬁeld of nucleic acid structure and nucleic acid–protein interactions has progressed in recent years due to an expanding perception of the roles of RNA in biology. Structural studies follow the exciting developments and have revealed the basic architectures of many key systems, including the ribosome and transcription machinery. A basic understanding of nucleic acids from a structural, biophysical, and biochemical standpoint has played an important role in exploring novel biological functions. NMR spectroscopy can merge structure and mechanism. An emerging idea is that much of the functional complexity of RNA is not correlated to only details of its intricate threedimensional structure, but also to its potential to adaptively adopt very distinct conformations on its own or in response to speciﬁc cellular signals including the recognition of proteins, nucleic acids, metal ions, metabolites, vitamins, changes in temperature, and even RNA biosynthesis itself [23,24]. The conformational transitions that are spatially and temporally tuned to achieve a variety of functions can be probed through NMR studies (see Chapters 6, 12 and 17). Motions in RNA range from rearrangements in secondary structure and large-scale collective bending and twisting of helical domains to more localized changes in base pairing and stacking, sugar repuckering, and ﬂuctuations along the phosphodiester backbone, all of which occur over a range of timescales [25]. The free energy landscape consists of several energy minima. Different folding pathways can lead to distinct kinetically trapped RNA structures. Many RNAs exhibit self-induced transitions involving short-lived non-native structural motifs that dynamically form during cotranscriptional folding.

j

49

Fig. 3.2.1 Two-dimensional 13 C–1 H correlation heteronuclear multiple-bond correlation (HMBC)type NMR experiment that utilizes long-range JC–H scalar coupling constants to correlate imino and H8 protons within a guanine base. One of the correlations in the above two-dimensional spectrum is indicated by a straight line. The through-bond correlations between guanine imino and H8 protons proceed via 13 C at position 5 at natural abundance using long-range J couplings, which are indicated by green arrows.

50

j

3.2 Nucleic Acids Studied by NMR

Fig. 3.2.2 (a) Multinuclear NMR study has demonstrated that the G-quadruplex adopted by d (G3T4G4) exhibits two cation-binding sites between three of its G-quartets. The titration of tighter binding K þ ions into the solution of the d(G3T4G4)2 quadruplex folded in the presence of 15NH4 þ ions uncovered a mixed mono-K þ –mono-15NH4 þ form that represents an intermediate in the conversion of di-15NH4 þ into the di-K þ form. (b) Riboswitches are noncoding RNA elements that directly bind smallmolecule metabolites and thereby switch gene expression on or off in response to the binding of a specific metabolite. They consist of two components: an aptamer domain that interacts with a small molecule ligand and an expression platform that converts folding changes in the aptamer into altered mRNA processing.

Small RNA molecules play central roles in regulating eukaryotic gene expression by means of RNA interference and related pathways [26,27]. Fundamentally, RNAmediated genetic control begins with the production of 20- to 30-nucleotide RNAs whose sequences can base-pair with segments of mRNA transcripts. Once generated, these microRNAs (miRNAs) or small interfering RNAs (siRNAs) assemble into large multiprotein effectors called RNA-induced silencing complexes (RISCs), which bind to target transcripts and trigger their destruction. Structural studies have provided insights into the critical features of siRNAs and miRNAs that ensure their efﬁcient incorporation into the RISC complex [28]. Riboswitches are cis-acting mRNA elements that allow cells to adaptively change gene expression in response to their changing environment. Riboswitches illustrate how complex RNA dynamics can be used to achieve highly tunable and adaptable biological regulation [29–31]. Riboswitches are capable of sensing and responding to different physiological parameters, such as the concentration of metabolites, vitamins, Mg2 þ ion concentration, or temperature [32]. The resulting signals turn off and, in some rare cases, turn on gene expression [33]. The sensing and signaling operations of riboswitches are made possible by highly orchestrated conformational transitions involving two RNA domains. A conserved aptamer domain binds to its metabolite target with high selectivity and afﬁnity. Recent NMR studies have yielded detailed insights into the structure of the more disordered ligand-free state of purinesensing aptamer domains [34]. These studies show that while the long-range loop– loop interactions that stabilize global structure are transiently preformed and reinforced by Mg2 þ binding, the ligand-binding pocket is largely disordered in the absence of ligands. The probing of conformational changes in aptamer domains helped to unravel the detailed mechanism. The results have so far revealed complex multistep mechanisms that vary even across related riboswitches (Figure 3.2.2). A conformational change caused by binding of a metabolite to the aptamer domain leads to the secondary structure conversion of a downstream expression platform. As a result, a transcription terminating helix that inhibits translation or activates catalytic self-cleavage and mRNA degradation is formed. Enzymatic action by RNAs, called ribozymes, is an inherently dynamic process. Just like protein enzymes, ribozymes probably exploit the dynamics of functional groups and domains to guide the catalytic process along a speciﬁc reaction coordinate [35,36]. The dynamics contributing to catalysis mostly involve vibrations, torsion angle changes, sugar repuckering, and longitudinal and lateral motions of nucleobases. These processes take place on the timescale of tens of femtoseconds to a few nanoseconds and can be probed by spin relaxation measurements. More global structural changes required to properly position reaction participants in the catalytic core are coupled to the local dynamics at the nanosecond to minutes timescale. The results of biophysical methods such as single-molecule ﬂuorescence resonance energy transfer (FRET) and ﬂuorescence correlation spectroscopy can be complemented by NMR methods to elucidate the conformational dynamics using Carr– Purcell–Meiboom–Gill (CPMG), residual dipolar coupling (RDC), ZZ-exchange, lineshape analysis, or hydrogen/deuterium exchange. RNA functions are assisted in the cellular environment by proteins forming RNA– protein complexes. Proteins contribute to RNA function and protect it from chemical or enzymatic degradation. The intracellular binding of proteins begins alongside RNA transcription. It is often energetically driven by ATP (or GTP) consumption. The formation of RNA–protein complexes can trigger large refolding of a RNA. The intricate interplay of protein binding and induced changes in RNA-folding pathways and RNA dynamics are still poorly understood. Advances in biophysical techniques including NMR spectroscopy are allowing the resolution and visualization of intrinsic RNA motions that may potentiate speciﬁc functional transitions.

j

Part Two Role of NMR in the Study of the Structure and Dynamics of Biomolecules

NMR of Biomolecules: Towards Mechanistic Systems Biology, First Edition. Edited by Ivano Bertini, Kathleen S. McGreevy, and Giacomo Parigi. Ó 2012 Wiley-VCH Verlag GmbH & Co. KGaA. Published 2012 by Wiley-VCH Verlag GmbH & Co. KGaA.

51

NMR can capture the structural and dynamic properties of proteins

NMR allows researchers to capture the structural and dynamic properties of proteins, and to study how these properties vary depending on the states sampled during protein function. Here, the NMR structures of oxidized (a) and reduced (b) rat cytochrome b5 are reported as backbone traces (the protein hydrophobic core is in red). The position of backbone amide moieties in very slow exchange with the solvent is also shown (cyan spheres), defining protein regions with restricted global dynamics. This constitutes a demonstration that reduction of the heme iron ion from the þ 3 to þ 2 state (left to right), which occurs during the physiological function of the protein, has a significant effect on the dynamics of the polypeptide chain [1].

j

4 Determination of Protein Structure and Dynamics Lucio Ferella, Antonio Rosato, and Paola Turano

High-resolution NMR spectroscopy and X-ray diffraction are the only techniques providing atomic-level structural information on proteins. In addition, NMR has the unique capability of allowing researchers to investigate the internal dynamics of the polypeptide in solution over a wide range of timescales. By coupling structural and dynamic aspects, NMR spectroscopy thus affords a complete picture of the behavior of proteins. We start by describing socalled multidimensional NMR experiments, which are performed on samples enriched in 15 N and 13 C stable isotopes (and possibly 2 H, for larger proteins) to determine the frequency of resonances of each individual nucleus in the protein. This ﬁrst step is necessary to enable all further studies on structure and dynamics. To structurally characterize a protein, a number of conformational restraints must be collected that typically consist mainly of upper limits on hydrogen–hydrogen distances and dihedral angle restraints; these can be supplemented with residual dipolar couplings. The methods for the collection of these restraints and the computational approaches that exploit them to obtain an energy-optimized structure are explained in some detail. Structure calculation methods relying exclusively on chemical shift information are also mentioned. After a protein structure has been determined, it has to be thoroughly validated before making it available to the scientiﬁc community. This step is carried out through a combination of evaluation against experimental data and comparison to standard stereochemical/geometric features. Finally, we address the characterization of protein dynamics based on NMR relaxation measurements as well as on other experimental approaches on various timescales. Protein dynamics can be investigated independently of or in parallel with structure determination projects.

4.1 Determination of Protein Structures 4.1.1 Resonance Assignment

The original procedure for solution structure determination via NMR was based on the use of two-dimensional 1 H–1 H NMR experiments for spectral assignment and derivation of structural restraints [2,3]. Multidimensional NMR experiments in structural biology can be grouped into two broad classes: coherence transfer experiments, where connectivities are provided between nuclei connected through covalent bonds, and dipole–dipole experiments, where through space interactions between pairs of nuclei that are close in space, independently of the presence of chemical bonds between them, are observed. In the case of 1 H–1 H two-dimensional maps, the correlation spectroscopy (COSY) [4,5] experiment provides cross-peaks between pairs

NMR of Biomolecules: Towards Mechanistic Systems Biology, First Edition. Edited by Ivano Bertini, Kathleen S. McGreevy, and Giacomo Parigi. Ó 2012 Wiley-VCH Verlag GmbH & Co. KGaA. Published 2012 by Wiley-VCH Verlag GmbH & Co. KGaA.

53

54

j

4 Determination of Protein Structure and Dynamics

of germinal or vicinal protons, exploiting the 2 J and 3 J 1 H–1 H scalar couplings; protons that are more than three chemical bonds apart give no cross-signal because the coupling constants of 4 J and above are close to 0. In the total correlation spectroscopy (TOCSY) experiment [6], magnetization is dispersed over a complete spin system of an amino acid by successive scalar couplings. Therefore, in addition to the cross-peaks present in a COSY spectrum, additional signals that originate from the interaction of all protons of a spin system do appear. The pattern of interactions is interrupted by the presence of nonprotonated heteroatoms. As a consequence, due to the presence of backbone carbonyls, TOCSY patterns remain conﬁned within a certain amino acid. In the case of aromatic residues, due to the presence of the nonprotonated Cc, two distinct patterns are observed, one for HN, Ha, and Hbs, and the other for the aromatic ring protons; no scalar connectivities exist between them. TOCSY experiments (possibly in combination with COSY) provide the identiﬁcation of the chemical nature of amino acids responsible for a given spin pattern. Nevertheless, ambiguities exist because some amino acids have identical spin systems and therefore identical signal patterns. This is, for example, the case of the HN/Ha/Hbs of Cys, Asp, Asn, and aromatic residues (Phe, His, Trp, and Tyr) or of the HN/Ha/Hbs/ Hcs of Glu, Gln, and Met. Spin pattern identiﬁcation has to be followed by the essential step of sequence-speciﬁc assignment. After the identiﬁcation of the various spin patterns, each amino acid or group of amino acids identiﬁed by TOCSYshould be unequivocally assigned to a speciﬁc amino acid in the polypeptide sequence. Due to the lack of 1 H–1 H scalar connectivities among different residues, this has to be accomplished via through-space interactions. The gold standard for the obtainment of such information is the nuclear Overhauser effect spectroscopy (NOESY) experiment [7], which provides cross-peaks for all pairs of protons at a short distance (within about 5 A) from one another. Of course, such a condition is accomplished for most of the protons belonging to the same amino acid, so that most of the cross-peaks present in TOCSY experiments are also present in the NOESY maps. However, as dipolar interactions are independent of the presence of scalar couplings, cross-peaks are observed among aromatic ring protons and their backbone protons, thus allowing the linking of a certain aromatic spin pattern to the HN, Ha, and Hb resonances of the same amino acid. More importantly, cross-peaks are observed between backbone protons of amino acids that are in sequence along the polypeptide chain. The sequence-speciﬁc assignment therefore consists of connecting different spin systems through NOESY peaks among their backbone protons. Possible ambiguities in the process are resolved by the consideration that the chemical nature of the amino acids that have sequential spin systems should match the protein sequence. NOESY maps do contain much more information than the simple, albeit essential, sequential interactions. A whole range of short proton–proton connectivities that are speciﬁc for given secondary structure elements characterize protein NOESYspectra. They extend to residues separated up to ﬁve amino acids and are therefore classiﬁed as short/ medium-range nuclear Overhauser effects (NOEs). Their identiﬁcation helps to deﬁne local structural features such as the presence of helices and turns. Last, but not least, NOE connectivities are present in NOESY maps that simply emerge from the tertiary structure of the protein; most of them connect the nuclei of residues that are close in space though more than ﬁve amino acids apart in sequence. As such, they are deﬁned as long-range NOEs and have a fundamental importance for the deﬁnition of the protein fold, and should therefore be unambiguously assigned to speciﬁc proton pairs. Exhaustive COSY/TOCSY and NOESY assignment constitutes the experimental input data for the subsequent structure calculations. Evaluation of the intensity of NOESY cross-peaks through their integration provides an estimate of proton–proton distances. Such distances represent the restraints to be used as input data in structural calculations. The translation of NOE intensities into distances is not straightforward; as will be discussed in more detail later, a number of factors may affect the NOESY cross-peak intensity (mobility, spin diffusion, solvent exchange, etc.) so that the linear correlation suggested by the NOE equation between peak volume and r–6 (where r is

4.1 Determination of Protein Structures

the proton–proton distance) is generally not valid. Nevertheless, if NOESY spectra have been acquired in such a way as to avoid severe spin-diffusion effects, peak volumes can be safely converted into upper bounds for the corresponding 1 H–1 H distances. A large number of such upper distance limits (usually at least 15–20 per amino acid) has to be simultaneously satisﬁed by the calculated structure. The use of 1 H–1 H NMR experiments imposes severe limitations on the size of the proteins to be studied: the limited chemical shift range for protons combined with the high number of proton pairs that fall within 5 A in a folded protein causes heavy overlap of cross-peaks, thus giving rise to ambiguities in their attribution to speciﬁc proton–proton pairs and making their integration difﬁcult. In order to overcome the resolution problems related to protein size, spreading along a third frequency axis, which corresponds to the NMR frequencies of the 15 N spins in the 15 N-labeled protein, was introduced [8–10]. As a result the NMR peaks of the two-dimensional 1 H–1 H spectrum are distributed over several 1 H–1 H planes. Each plane corresponds to a different 15 N chemical shift (or better to a small 15 N chemical shift range) and on that plane only 1 H–1 H peaks from the HN bound to nitrogen atoms having the proper 15 N chemical shift value to correlated protons are present. The use of 15 N-edited TOCSY and NOESY experiments allowed NMR spectroscopists to increase the size threshold of proteins that can be structurally determined by solution NMR. Nevertheless, with proteins larger than 18–20 kDa, 15 N editing is not enough to provide the peak separation needed for spectral analysis. As a consequence, a novel approach was developed that relies on the use of doubly labeled (13 C,15 N) protein samples in triple-resonance NMR experiments [11–16]. These experiments are called triple resonance because three different nuclei (1 H, 13 C, and 15 N) are correlated through heteronuclear scalar couplings. The most important advantage of the tripleresonance spectra is their simplicity: they contain only a few signals on each frequency and often only one or two; the problem of spectral overlap is therefore markedly reduced. The magnetization is efﬁciently transferred through 1 J or 2 J heteronuclear scalar couplings, which are relatively large (Figure 4.1); experiment sensitivity is thus high, transfer times are shorter than for homonuclear couplings, and the signal intensity losses due to fast relaxation associated with high molecular weight are smaller than in 1 H–1 H experiments. A quite large set of triple-resonance experiments exists. Their nomenclature is systematic: the names of all nuclei that are used for magnetization transfer during the experiment are listed in the order of their use, bracketing the names of nuclei that are used only for transfer and whose frequencies are not detected in the spectra. For example, in the HNCO experiment, the magnetization is transferred from the HNi proton via the Ni atom to the directly attached COi – 1 carbon atom and returns the same way to the HNi nucleus, which is directly detected. The frequencies of all three nuclei are recorded. In the companion experiment, the HN(CA)CO, the magnetization is transferred from the HNi proton via the Ni atom and the Cai and Cai – 1 nuclei to the COi and COi – 1 carbon atoms, respectively, and back the same way. The Ca atom acts only as a relay nucleus, its frequency is not detected; only the frequencies of HN, N, and CO are recorded. A similar situation is encountered in the HNCA/HN(CO)CA (Figure 4.2): in the former experiment the magnetization transfer path goes from the HNi proton via the Ni atom to the Cai and Cai – 1 carbon atoms (Figure 4.2, red); in the latter experiment the magnetization is transferred from the HNi proton via the Ni atom to the directly attached Cai – 1 carbon atom via the COi – 1 (Figure 4.2, blue). The HNCO/HN(CA)CO pair consists of a sensitive and an insensitive experiment, whereas both experiments in the HNCA/HN(CO)CA pair are reasonably sensitive. The differences in sensitivity are due to differences in the scalar coupling constants responsible for the magnetization transfer. The most basic triple-resonance experiments are used ﬁrst to provide backbone sequential assignments, while others are used at more advanced stages for the identiﬁcation of side-chains. This procedure results in a strategy that is reversed with respect to the originally developed philosophy, which was based on 1 H–1 H

j

55

Fig. 4.1 1 J and 2 J coupling constant values in proteins.

Fig. 4.2 HNCA and HN(CO)CA experiments are used in tandem to achieve the assignment of HN, N, and Ca backbone resonances. Pairs of strips are obtained at a given 15 N chemical shift value, which corresponds to the chemical shift value of an amide nitrogen. At the frequency of the attached HNi , a connectivity to the Ca of the previous amino acid is observed in the HN(CO)CA experiment (left strip in the pair, blue peak), while connectivities to both the Ca of the same amino acid and the Ca of the previous amino acid are present in the HNCA experiment (right strip in the pair, red peaks). The sequential assignment is made by matching the Ca chemical shifts.

56

j

4 Determination of Protein Structure and Dynamics

Fig. 4.3 The HCCH-TOCSY experiment is specifically designed to correlate side-chain aliphatic proton and 13 C resonances via 1 JCH and 1 JCC coupling constants. Using known Ca and Cb chemical shifts from the backbone assignment one can get the Ha and Hb chemical shifts by finding strips at each carbon shift that have peaks at the same hydrogen chemical shift values. Further peaks for the Hc and Hd atoms (if present in that particular amino acid type) will be visible in the same strip, which in turn will allow identification of the carbons they are attached to.

connectivities: sequence-speciﬁc assignment of the backbone resonances precedes the identiﬁcation of side-chains. Triple-resonance experiments for sequential assignment are usually analyzed in pairs: the HNCO/HN(CA)CO pair provides the identiﬁcation of the triplets HNi –Ni– COi and HNi–Ni–COi – 1, while the HNCA/HN(CO)CA pair provides the identiﬁcation of the triplets HNi–Ni–Cai and HNi–Ni–Cai – 1, thus leading to the full backbone assignment. The CBCANH/CBCA(CO)NH pair provides the identiﬁcation of the sets HNi–Ni–Cai–Cbi and HNi–Ni–Cai – 1–Cbi – 1. These two experiments, although characterized by a much lower sensitivity with respect to the other two pairs, additionally provide the 13 C b frequencies that are instrumental for the identiﬁcation of residue type and secondary structure (via the chemical shift index (CSI)). Several triple- and double-resonance experiments have been developed for sidechain assignment. Among the triple-resonance experiments, the HBHA(CO)NH [17] (which correlates the backbone amide proton and nitrogen frequencies of one residue with the Ha and Hb frequencies of the preceding residue) and the H(CC)(CO)NH [18] (which correlates the backbone amide proton and nitrogen frequencies of one residue with the side-chain proton frequencies of the preceding residue) are worth mentioning; their experimental sensitivity is, however, rather low. One of the most popular experiments for side-chain assignment is the HCCHTOCSY experiment [19] (Figure 4.3), which relies on the strong one-bond 1 H–13 C (125–250 Hz) and 13 C–13 C (35–55 Hz) couplings for the magnetization transfer. It provides nearly complete assignments of aliphatic 1 H and 13 C resonances, with the exception of some signals of long aliphatic side-chains for which substantial overlap remains even in the three-dimensional spectrum. The experiment does not allow for the aromatic side-chain assignment because, due to the large difference in carbon chemical shift between aliphatic and aromatic nuclei, it would require too much radiofrequency (RF) power. Alternative approaches for aromatic residues may include 1 H–1 H TOCSY (if the protein is not too large) or aromatic-tailored 1 H–13 Cheteronuclear single-quantum coherence (HSQC)-NOESY. The use of triple-resonance experiments, which contain few peaks along three different nuclear dimensions, provides the best answer for reducing signal overlap through a reduction of the expected number of peaks. Strategies for line sharpening for systems above 30 kDa combine at least partial 2 H labeling of nonexchangeable hydrogens with novel NMR pulse schemes. Transverse relaxation-optimized spectroscopy (TROSY) [20], which makes use of interference effects between different relaxation mechanisms, is a spectroscopic means to reduce broadening to such an extent that satisfactory linewidths can be achieved in NMR experiments with very large molecules (up to 100 kDa) [21–23]. TROSY works best with deuterated proteins at high magnetic ﬁelds. Cross-correlated relaxation-induced polarization transfer (CRIPT) [24,25] and cross-correlated relaxation-enhanced polarization transfer (CRINEPT) [25] elements have been developed that, in combination with TROSY, allow signal detection in 2 H,15 N systems with molecular weight up to 1000 kDa. An alternative approach for high-molecular-weight systems has been developed: in this case 13 C is directly acquired and its smaller magnetic moment with respect to 1 H translates into a reduction of signal linewidth (see Chapter 26). A full set of experiments that allow sequential assignments in a protonless approach has been developed that applies to systems below 60–70 kDa [26–29]. The coherence transfer at the basis of experiments relying on scalar couplings suffers from reduced transverse relaxation times in higher molecular weight systems. On the contrary, the magnetization transfer phenomena at the basis of NOESYexperiments occur when the magnetization is along the z-axis. Two processes are operative during the NOESY mixing time: crossrelaxation, which is responsible for the magnetization transfer through dipolar coupling, and longitudinal relaxation, which restores the magnetization to equilibrium values. The longitudinal relaxation times are substantially long in large proteins. The cross-relaxation increases with molecular weight, being directly proportional to the rotational correlation time of the molecule. The NOESY intensities therefore gain from both processes. Intraresidue cross-peaks are detectable in 13 C–13 C NOESY [30,31]; the

4.1 Determination of Protein Structures

Protein molecular weight (kDa)

Protein labeling scheme

NMR experiments for spectral assignment