DNA and RNA Modification Enzymes: Structure, Mechanism, Function, and Evolution
 1587063298, 9781587063299, 2009011104

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

MOLECULAR BIOLOGY INTELLIGENCE UNIT

Henri Grosjean GROSJEAN

DNA and RNA Modification Enzymes: Structure, Mechanism, Function and Evolution

MBIU

Structure, Mechanism, Function and Evolution

DNA and RNA Modification Enzymes:

Molecular Biology Intelligence Unit

DNA and RNA Modiication Enzymes: Structure, Mechanism, Function and Evolution

Henri Grosjean, PhD

Université Paris-Sud Institut de Génétique et de Microbiologie CNRS Orsay, France

Landes Bioscience Austin, Texas USA

DNA and RNA Modification Enzymes: Structure, Mechanism, Function and Evolution Molecular Biology Intelligence Unit Landes Bioscience Copyright ©2009 Landes Bioscience All rights reserved. No part of this book may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. Printed in the USA. Please address all inquiries to the publisher: Landes Bioscience, 1002 West Avenue, Austin, Texas 78701, USA Phone: 512/ 637 6050; Fax: 512/ 637 6079 www.landesbioscience.com he chapters in this book are available in the Madame Curie Bioscience Database. http://www.landesbioscience.com/curie ISBN: 978-1-58706-329-9 While the authors, editors and publisher believe that drug selection and dosage and the speciications and usage of equipment and devices, as set forth in this book, are in accord with current recommendations and practice at the time of publication, they make no warranty, expressed or implied, with respect to material described in this book. In view of the ongoing research, equipment development, changes in governmental regulations and the rapid accumulation of information relating to the biomedical sciences, the reader is urged to carefully review and evaluate the information provided herein.

Library of Congress Cataloging-in-Publication Data DNA and RNA modiication enzymes : structure, mechanism, function, and evolution / [edited by] Henri Grosjean. p. ; cm. -- (Molecular biology intelligence unit) Includes bibliographical references and index. ISBN 978-1-58706-329-9 1. Nucleic acids--Metabolism. 2. Nucleosidases. 3. DNA--Methylation. 4. Methyltransferases. I. Grosjean, Henri. II. Series: Molecular biology intelligence unit (Unnumbered : 2003) [DNLM: 1. DNA Restriction-Modiication Enzymes--physiology. 2. DNA Methylation-physiology. 3. DNA Restriction-Modiication Enzymes--ultrastructure. 4. Evolution, Molecular. 5. RNA Processing, Post-Transcriptional--physiology. QU 135 D629 2009] QP620.D585 2009 572.8'6--dc22

2009011104

Dedication To all my former students, postdocs and colleagues with whom I shared the daily excitements of seeking ‘the secret of life’ at the molecular level.

About the Editor...

HENRI GROSJEAN, PhD, began his studies at the University of Brussels in Belgium, earning degrees in chemistry and biochemistry. Ater his postdoctoral stay in the Department of Biochemistry and Biophysics at Yale University, he accepted a Professorship in the Faculty of Sciences at the University of Brussels. His early teaching centered on fundamental biochemistry while he also directed a small research group. He let the post ater 20 years to join the Center of National Research (CNRS) in France as a Group Leader and full time researcher in the Laboratory of Structural Enzymology and Biochemistry in Gif-sur-Yvette near Paris. Ater 42 years working in academic research, he still enjoys working as an Emeritus Scientist at the University of Paris-11 in Orsay. Dr. Grosjean’s interest in science began with problems related to the speciicity of aminoacyl-tRNA synthetases (doctoral research), followed by the accuracy of decoding the genetic message on the ribosome (post-doctoral research) and inally settled on the problems of RNA maturation, in particular the enzymatic formation of modiied nucleotides in RNAs of the three biological domains and its evolutionary aspects. hus his main scientiic interest during his career has been related to the biogenesis and functions of RNA, including the posttranscriptional modiications, the molecular basis of accuracy and eiciency of translation process and the evolution of the decoding machinery. Remarks Dr Grosjean on his career: "Scientiic research is a fantastic ‘full time’ job where everyday you enjoy learning always a little bit more about the mechanism and evolution of very elaborate and fantastic biological systems."

CONTENTS Preface....................................................................................................... xxv 1. Nucleic Acids Are Not Boring Long Polymers of Only Four Types of Nucleotides: A Guided Tour ....................................................................1 Henri Grosjean Origin of Nucleic Acids Research .........................................................................1 Discovery of Noncanonical Nucleosides..............................................................2 Distribution of Modiied Nucleosides in the hree Domains of Life...........6 he Case of Transfer and Ribosomal RNAs.......................................................9 RNA and DNA Modiication Enzymes ............................................................12 2. DNA Methylation:From Bug to Beast .......................................................19 Stephanie R. Coin, Benjamin A. Youngblood and Norbert O. Reich Epigenetic Methylation ......................................................................................... 20 Prokaryotic DNA Methylation ........................................................................... 20 Eukaryotic DNA Methylation ............................................................................ 22 3. DNA Restriction-Modiication Systems in Prokaryotes ...........................31 John H. White, Gareth A. Roberts and David T.F. Dryden RM Systems ..............................................................................................................31 Antirestriction ........................................................................................................ 34 4. Experimental Approaches to Study DNA Base Flipping ...........................37 Saulius Klimašauskas and Zita Liutkevičiūtė he Phenomenon of Base Flipping ......................................................................37 X-Ray Crystallography .......................................................................................... 38 NMR Spectroscopy and Imino Proton Exchange ...........................................41 Biochemical Studies ............................................................................................... 42 Optical Spectroscopy ............................................................................................. 43 Chemical Probing................................................................................................... 44 Photochemical Approaches.................................................................................. 46 5. Molecular Modeling of Base Flipping in DNA ..........................................51 U. Deva Priyakumar and Alexander D. MacKerell Jr Base Pair Opening Versus Flipping in DNA.....................................................52 Base Flipping in Presence of Proteins .................................................................53 heoretical Approaches for Studying Base Flipping ...................................... 54 Summary and Future Prospects ...........................................................................61 6. M⋅HhaI and M⋅EcoRI: Paradigms for Understanding the Conformational Mechanisms of DNA Methyltransferases .................65 Norbert O. Reich and Stephanie R. Coin M⋅EcoRI ................................................................................................................... 66 M⋅HhaI......................................................................................................................69

7. Mechanism and Evolution of DNA Recognition by DNA-(adenine N6)-Methyltransferases from the EcoDam Family.......76 Albert Jeltsch and Tomasz P. Jurkowski DNA Recognition by T4Dam..............................................................................78 DNA Recognition by EcoDam ........................................................................... 80 DNA Recognition by M.FokI and M.EcoRV ..................................................81 Dynamics of DNA Recognition by DNA MTases..........................................83 8. Structures and Activities of Mammalian DNA Methyltransferases ..........87 Xiaodong Cheng and Robert M. Blumenthal Mammalian DNA Methyltransferases.............................................................. 88 he SRA Domain of the Dnmt1 Guide UHRF1 Flips 5-Methylcytosine out of the DNA Helix..................................................... 89 Dnmt3L Is a Regulatory Factor for de Novo DNA Methylation ............... 92 Dimeric Dnmt3a Suggests hat de Novo DNA Methylation Depends on CpG Spacing ............................................................................... 95 Dnmt3L Connects Unmethylated Lysine 4 of Histone H3 to de Novo DNA Methylation ....................................................................... 95 Oligomerization by Dnmt3 Family ................................................................... 96 he Efects of ICF Mutants ................................................................................. 96 9. DNA Methylation and Human Diseases:An Overview ...........................103 Wolfgang A. Schulz and Olusola Y. Dokun Inherited Diseases ................................................................................................ 106 Acquired Diseases ................................................................................................ 107 Aging ....................................................................................................................... 112 10. Expanding the Chemical Repertoire of DNA Methyltransferases by Cofactor Engineering ..........................................................................117 Basar Gider and Elmar Weinhold Modiied Substrates and Cofactors for Enzyme-Mediated Labeling........118 Conclusions and Prospects for Synthetic AdoMet Analogs....................... 125 11. Studying Antibody MaturationUsing Techniques for Detecting Uracils in DNA .........................................................................................127 Rachel Parisien and Ashok S. Bhagwat Biochemical Pathways hat Introduce Uracils in DNA ............................. 128 Pathways for Removing Uracils from DNA................................................... 130 DNA-Cytosine Deaminases and Antibody Maturation............................. 131 Role of Uracil in Antibody Maturation .......................................................... 133 Methods for Detecting and Quantifying Uracils in DNA ........................134 Application to Studies of Antibody Maturation........................................... 138 12. Enzymatic Formation of the Hypermodiied DNA Base J (β-D-Glucopyranosyloxymethyluracil) ...................................................144 Robert Sabatini, Laura Clife, Saara Vainio and Piet Borst Detection of Base J ............................................................................................... 145 he Two-Step Biosynthesis Pathway ................................................................ 147

Characterization of Two Distinct hymidine Hydroxylases in J Biosynthesis ............................................................................................... 148 Identiication of the Glucosyl Transferase .......................................................151 Regulation of J Synthesis by hymidine Hydroxylases ................................151 J in Leishmania ......................................................................................................153 13. DNA Demethylation ................................................................................157 Teresa Roldán-Arjona and Rafael R. Ariza Changes in DNA Methylation Patterns in Animals ................................... 158 Changes in DNA Methylation Patterns in Plants .........................................159 he Search for an Enzymatic Mechanism of Active DNA Demethylation in Animal Cells ....................................................... 160 Active DNA Demethylation in Plants ............................................................ 162 14. Demethylation of DNA and RNA by AlkB Proteins ...............................170 Pål Ø. Falnes, Erwin van den Born and Trine J. Meza he Discovery of the AlkB Mechanism .......................................................... 171 AlkB-Mediated DNA Repair............................................................................ 173 AlkB-Mediated RNA Repair .............................................................................174 Human AlkB Homologues.................................................................................174 Possible Regulatory Roles for AlkB Proteins ................................................. 176 15. he APOBEC1 Paradigm for Mammalian Cytidine Deaminases hat Edit DNA and RNA .........................................................................181 Harold C. Smith he APOBEC Protein Family............................................................................181 Apolipoprotein B mRNA Editing Opens a New Field ............................... 185 Identiication of the Minimal Components of Editosome Assembly ............................................................................................................ 185 Subcellular Distribution of Editing Factors Determines heir Access to Substrates.............................................................................. 187 Stringent Control of APOBEC Proteins ........................................................ 188 Regulation through Macromolecular Complex Formation ....................... 190 16. Mechanism of Action and Structural Aspects of ADARS (A-to-I) and APOBEC-Related (C-to-U) Deaminases .........................................203 Joseph E. Wedekind and Peter A. Beal* he Zinc-Dependent Deaminase (ZDD) Signature Motif.........................205 he Conserved ‘Cytidine Deaminase’ or CDA Architecture ....................206 Adenosine Deaminases hat Act on tRNAs (ADATs/Tads) ....................208 Details of the ADAT/Tad Structure ...............................................................208 he TadA Mechanism as a Paradigm for Adenosine and Cytidine Editing Enzymes ..............................................................................................209 Adenosine Deaminases hat Act on Duplex RNA (ADARs)................... 210 ADAR Function and Structure ........................................................................ 210 he ADAR2 Mechanism ................................................................................... 213

APOBEC-1, AID and APOBEC2 Cytidine Deaminases.......................... 214 hA2 and AID Intersubunit Interactions: A Comparative Modeling Approach ............................................................................................................215 APOBEC3G Domain Organization and Evidence for Subunit Oligomerization............................................................................................... 217 17. Structure of RNA Editing Substrates and heir Recognition by RNA Base Deaminase ..........................................................................224 Christophe Maris and Frédéric H.-T. Allain Introduction: RNA Editing...............................................................................224 Adenosine to Inosine Editing by ADARs: Mechanism of Substrate Recognition....................................................................................................... 225 Cytidine-to-Uridine Editing of apoB mRNA ...............................................230 18. Biological Roles of ADARs ......................................................................243 Bret S.E. Heale and Mary A. O’Connell Classical Editing Substrates of ADARs:Mammalian GluR-B and Serotonin (5-HT2c) Transcripts ........................................................... 245 ADAR Activity in Model Organisms; Mice, Flies, Worms .......................246 Disorders Associated with Lack of RNA Editing.........................................248 RNA Editing of Alu Repeats .............................................................................248 siRNA/miRNA Interference by ADARs....................................................... 249 ADARs in Cancer .................................................................................................251 Innate Immunity and Inlammation ............................................................... 252 19. he Interplay between RNA and DNA Modiications: Back to the RNA World ...........................................................................259 Patrick Forterre and Henri Grosjean Early Pathways from RNA to Modern DNA (T-DNA) ............................. 261 Importance of DNA Stability ........................................................................... 262 Nucleotide Modiications in the Context of Present-Day Viruses/Cell Competition............................................................................. 263 Versatility of the Modiication Apparatus......................................................264 he Virogenesis Hypothesis for the Origin of DNA ...................................269 he First Modiications, Back to the RNA World, Beyond and Ater............................................................................................. 270 20. Folate-Dependent hymidylate-Forming Enzymes: Parallels between DNA and RNA Metabolic Enzymes and Evolutionary Implications .................................................................275 Hannu Myllykallio, Stephane Skouloubris, Henri Grosjean and Ursula Liebl Introduction: Historical Background ............................................................. 275 Folate-Dependent hymidylate Synthase of the DNA Metabolism ........ 278 Folate-Dependent Ribothymidylate Synthase of the RNA Metabolism ................................................................................ 281

21. Folds and Functions of Domains in RNA Modiication Enzymes ..........289 Anna Czerwoniec, Joanna M. Kasprzak, Katarzyna H. Kaminska, Kristian Rother, Elzbieta Purta and Janusz M. Bujnicki he Diversity of 3D-Folds in RNA Modiication Enzymes....................... 290 Catalytic Domains in RNA Methyltransferases ........................................... 290 Domains Involved in RNA-Binding: hree Major Modes of Substrate Recognition................................................................................ 294 General Features of Domains in RNA-Modifying Enzymes and heir Relationship to DNA-Modifying Enzymes ........................... 298 22. Enzyme-RNA Substrate Recognition in RNA-Modifying Enzymes ......303 Robert T. Byrne, David G. Waterman and Alred A. Antson General Principles of Protein-RNA Interactions .........................................306 Modularity in RNA-Modifying Enzymes .....................................................307 he Various Recognition Modes of RNA Substrates by RNA-Modifying Enzymes.......................................................................308 Predominantly Rigid-Body Docking: Modiication of the Anticodon by MnmA .........................................................................308 Conclusions and Future Prospects ................................................................... 321 Supplementary Information—he Physical Forces Involved in Protein-RNA Interactions........................................................................ 325 23. Molecular Basis of tRNA Processing Reactions.......................................328 Michelle Mitchell and Hong Li 5ʹ End Processing.................................................................................................. 329 3ʹ End Processing.................................................................................................. 332 Intron Removal ..................................................................................................... 336 24. RNA-Modifying Metalloenzymes ............................................................347 Mohamed Atta, Marc Fontecave and Etienne Mulliez Redox Iron Centers and RNA Modiication .................................................349 Nonredox Fe Centers and RNA-Modiications............................................ 355 Zinc and RNA Modiication............................................................................. 357 25. Pseudouridine Formation, the Most Common Transglycosylation in RNA......................................................................................................363 Eugene G. Mueller and Adrian R. Ferré-D’Amare Introduction and Nomenclature ...................................................................... 363 hree-Dimensional Structure............................................................................ 365 RNA Recognition ................................................................................................366 Substrate Nucleobase Flipping and Active Site Conservation ................... 369 Catalytic Mechanism .......................................................................................... 371 26. Enzymatic Formation of the 7-Deazaguanosine Hypermodiied Nucleosides of tRNA ................................................................................377 Dirk Iwata-Reuyl and Valérie de Crécy-Lagard Introduction: 7-Deazaguanosine Modiied Nucleosides of tRNA...........377 Overview of 7-Deazaguanosine Biosynthesis ................................................ 379

Structure and Mechanism of the GCHY-1, QueD and QueF Enzymes .........................................................................................380 Structure and Mechanism of the TGT Enzymes ......................................... 382 Other Enzymes of the Pathway .........................................................................384 Distribution of the Pathways ............................................................................. 385 27. Biogenesis and Functions of hio-Compounds in Transfer RNA: Comparison of Bacterial and Eukaryotic hiolation Machineries ..........392 Akiko Noma, Naoki Shigi and Tsutomu Suzuki Biogenesis and Function of 2-hiolated Uridine Derivatives .................... 394 Biogenesis of Sulfur-Containing Cofactors Shares a Common Sulfur-Relay System with 2-hiouridine Formation ...............................400 Biogenesis of the Other Sulfur-Containing Nucleosides in tRNA .......... 401 28. Enzymatic Formation of 5-Aminomethyl-Uridine Derivatives in tRNA: Functional and Evolutionary Implications ..............................406 Yoshitaka Bessho and Shigeyuki Yokoyama Introduction: Properties of 5-Substituents of tRNA Wobble Uridines .............................................................................................................406 Biosynthesis of 5-Aminomethyl-Uridine Derivatives..................................408 Structure and Mechanism of the MnmE Enzyme .......................................409 Structure and Mechanism of the GidA Enzyme .......................................... 412 Mechanistic Features of the Bifunctional Enzyme, MnmC .......................415 Evolutionary Aspects of the U34-Modiication Metabolism .................... 416 29. Deciphering the Complex Enzymatic Pathway for Biosynthesis of Wyosine Derivatives in Anticodon of tRNAPhe ................................ 423 Jaunius Urbonavičius, Louis Droogmans, Jean Armengaud and Henri Grosjean Discovery of the So-Called ‘Y’ Base.................................................................424 Other Members of the Wyosine Families in Eukaryota.............................. 425 Wyosine Derivatives also Exist in Archaea ....................................................426 Biosynthesis of Wyosine Derivatives in Eukarya ..........................................427 Role of Wyosine Derivatives During Translation Process ..........................430 30. Multicomponent 2ʹ-O-Ribose Methylation Machines: Evolving Box C/D RNP Structure and Function ...........................................................436 Keith T. Gagnon, Guosheng Qu and E. Stuart Maxwell Box C/D RNAs: Diversity of Sequence and Structure ............................... 437 Box C/D RNP Structure and Assembly ......................................................... 439 Structure, Function and Evolution of the L7Ae/15.5kD Core Protein................................................................................................................440 Structure, Function and Evolution of the NOP56 and NOP58 Core Proteins ............................................................................442 Structure, Function and Evolution of Fibrillarin .........................................444 he Evolving Box C/D RNP Machinery .......................................................446

31. Multicomponent Machines in RNA Modiication: H/ACA Ribonucleoproteins ..................................................................................450 Petar Grozdanov and U. homas Meier H/ACA RNAs ...................................................................................................... 450 H/ACA Core Proteins .........................................................................................451 Beyond Formation of Pseudouridines ............................................................. 452 Architecture of H/ACA RNPS ........................................................................ 453 Biogenesis of H/ACA RNPs.............................................................................. 454 Dyskeratosis Congenita ...................................................................................... 454 32. Spliceosomal snRNA Pseudouridylation .................................................461 John Karijolich, Chao Huang and Yi-Tao Yu Discovery of U snRNA Pseudouridylation .................................................... 463 Pseudouridylation of snRNA in Vertebrates ................................................. 463 Pseudouridylation of snRNA in Saccharomyces cerevisiae ........................ 466 Spliceosomal snRNA Pseudouridylation Afects Pre-mRNA Splicing ........................................................................................467 Minor Spliceosomal snRNAs Are Pseudouridylated...................................468 33. Transfer RNA Aminoacylation and Modiied Nucleosides.....................475 Richard Giegé and Jacques Lapointe Role of Modiied Nucleosides for tRNA Structure ..................................... 476 Idiosyncratic Involvement of Modiied Nucleosides in tRNA Identity ............................................................................................. 478 Considerations on Evolution ............................................................................. 485 34. Crystallographic Studies of Decoding by Modiied Bases: Correlation of Structure and Function ....................................................493 Albert Weixlbaumer and Frank V. Murphy IV Structural Studies on Inosine ............................................................................ 498 N6-hreonylcarbamoyladenosine 37 ............................................................... 499 Structural Studies on 5-Methylaminomethyluridine 34 ............................500 Structural Studies on cmo5U and m6A............................................................ 501 35. Roles of the Ultra-Conserved Ribosomal RNA Methyltransferase KsgA in Ribosome Biogenesis..................................................................509 Jason P. Rife Biology, Chemistry, and Evolution of KsgA ...................................................511 KsgA Orthologs.....................................................................................................516 KsgA’s Relationship to ERM Methyltransferases ..........................................519 36. Antibiotic Resistance in Bacteria through Modiication of Nucleosides in 16S Ribosomal RNA ...................................................524 Graeme L. Conn, Miloje Savic and Rachel Macmaster Resistance to Antibiotics via Loss of Methylation of 16S rRNA .............. 525 Resistance to Antibiotics hrough Methylation of 16S rRNA ................. 527 Aminoglycoside Resistance 16S rRNA Methyltransferases in Pathogenic Bacteria .................................................................................... 530

Antibiotic Resistance RNA Methyltransferase Enzymes: Structure and Function .................................................................................. 532 37. Antibiotic Resistance in Bacteria Caused by Modiied Nucleosides in 23S Ribosomal RNA ............................................................................537 Birte Vester and Katherine S. Long he Cfr Methyltransferase Targets A2503 at the Peptidyl Transferase Center .......................................................................................... 539 RNA Methyltransferases Acting on Nucleotides in the Peptide Exit Tunnel ............................................................................ 541 he Tsr Methyltransferase Targets Nucleotide A1067 at the GTPase Center .....................................................................................543 hree Diferent RNA Methyltransferases hat Confer Orthosomycin Resistance ..............................................................................544 he TlyA Methyltransferase Targets Nucleotides on Intersubunit Bridge B2A at the Ribosomal Subunit Interface .....................................544 38. Function of Modiied Nucleosides in RNA Stabilization........................550 Armine Hayrapetyan, Salifu Seidu-Larry and Mark Helm Concept of Structural Equilibrium ..................................................................551 Potential Mechanisms of Structural Alteration on the Nucleotide Scale .....................................................................................................................551 Examples for Impediment of Watson-Crick Base-Pairing by Methylation ................................................................................................. 553 Efect of m1A on a Structural Equilibrium in Human Mitochondrial tRNA Lys ................................................................................. 554 Structural Contributions of Pseudouridine ....................................................555 Structural Reinforcement by tRNA Modiications in hermophilic Organisms .......................................................................... 559 39. Roles of tRNA Modiications in tRNA Turnover ....................................564 Eric M. Phizicky, Elizabeth J. Grayhack, Irina Chernyakov and Joseph M. Whipple Pre-tRNA Nuclear Surveillance by the TRAMP Complex and the Nuclear Exosome ..............................................................................566 Biochemical Analysis of tRNA Degradation by the TRAMP Complex and Nuclear Exosome ...................................................................568 Prospects for Future Research on the tRNA Nuclear Surveillance Pathway ..............................................................................................................568 Degradation of Mature tRNA through the Rapid tRNA Decay (RTD) Pathway ................................................................................................ 569 Prospects for Future Research on the Rapid tRNA Degradation Pathway .............................................................................................................. 571 Other Uncharacterized Pathways in Which tRNA Levels Are Modulated ................................................................................................. 572

40. he “PACE” Concept Pointed at New Key Proteins Involved in RNA Metabolism .................................................................................577 Jean Armengaud Central Metabolism............................................................................................. 578 Protein Synthesis, Folding and Posttranslational Modiications ..............580 Maintenance of Genomic Stability .................................................................. 581 mRNA Synthesis and Maturation ................................................................... 581 rRNA Maturation ................................................................................................ 581 tRNA Maturation ................................................................................................ 582 RNA Recycling and Degradation .................................................................... 582 Eleven PACEs Are Still Poorly Characterized .............................................. 582 41. Chemical Synthesis of DNA and RNA Containing Modiied Nucleotides ...............................................................................................586 Sébastien Porcher and Mark Helm he Automated Synthesis of Standard RNA Sequences ............................. 588 Appendix 1: Chemical Structures, Classiication of Modiied Nucleosides in RNA and the MODOMICS Database Concerning the Corresponding RNA Modiication Enzymes ....................................599 Kristian Rother, Anna Czerwoniec, Janusz M. Bujnicki and Henri Grosjean Appendix 2: Databases of DNA Modiications .......................................622 Kristian Rother, Grzegorz Papaj and Janusz M. Bujnicki Appendix 3: RNA Modiication Subsystems in the SEED Database ......624 Valérie de Crécy-Lagard and Gary Olsen Appendix 4: List of Available Phosphoramidites of Modiied Nucleotides for Chemical DNA/RNA Synthesis ....................................629 Salifu Seidu-Larry, Sebastien Porcher, Ronald Micura and Mark Helm Appendix 5: S-Adenosyl-l-Methionine and Analogs ..............................636 Elmar Weinhold and Saulius Klimašauskas Appendix 6: Web Links to Databases about RNA and DNA Modiications and Related Topics ...........................................640 Henri Grosjean and Kristian Rother Index .........................................................................................................645

EDITOR Henri Grosjean

Université Paris-Sud Institut de Génétique et de Microbiologie CNRS Orsay, France Email: [email protected] Chapters 1, 19, 20, 29, Appendices 1, 6

CONTRIBUTORS Note: Email addresses are provided for the corresponding authors of each chapter. Frédéric H. -T. Allain ETH Zurich Institute of Molecular Biology and Biophysics Zürich, Switzerland Email: [email protected]

Peter A. Beal Department of Chemistry University of California, Davis One Shields Avenue Davis, California, USA Chapter 16

Chapter 17

Alfred A. Antson York Structural Biology Laboratory Department of Chemistry University of York York, UK Chapter 22

Rafael R. Ariza Departamento de Genética Ediicio Gregor Mendel Campus de Rabanales Universidad de Córdoba Spain Chapter 13

Jean Armengaud Lab Biochim System Perturb Bagnols-sur-Cèze, France Email: [email protected] Chapters 29, 40

Mohamed Atta CNRS and Université Joseph Fourier Grenoble, France Email: [email protected] Chapter 24

Yoshitaka Bessho RIKEN Systems and Structural Biology Center and Spring-8 Center Harima Institute Tsurumi, Yokohama, Japan Email: [email protected] Chapter 28

Ashok S. Bhagwat Department of Chemistry Wayne State University Detroit, Michigan, USA Email: [email protected] Chapter 11

Robert M. Blumenthal Department of Medical Microbiology and Immunology and Program in Bioinformatics and Proteomics/Genomics University of Toledo Health Science Campus Toledo, Ohio, USA Chapter 8

Piet Borst he Netherlands Cancer Institute Division of Molecular Biology Amsterdam, he Netherlands Chapter 12

Janusz M. Bujnicki Laboratory of Bioinformatics and Protein Engineering International Institute of Molecular and Cell Biology Warsaw, Poland and Bioinformatics Laboratory Institute of Molecular Biology and Biotechnology Adam Mickiewicz University Poznan, Poland Email: [email protected] Chapter 21, Appendicies 1,2

Robert T. Byrne York Structural Biology Laboratory Department of Chemistry University of York York, UK Email: [email protected] Chapter 22

Stephanie R. Coin Department of Chemistry and Biochemistry and Biomolecular Science and Engineering Program University of California Santa Barbara, California, USA Chapters 2, 6

Graeme L. Conn Department of Biochemistry Emory University School of Medicine Atlanta, Georgia, USA Email: [email protected] and Manchester Interdisciplinary Biocentre Faculty of Life Sciences University of Manchester Manchester, UK Email: [email protected] Chapter 36

Anna Czerwoniec Bioinformatics Laboratory Institute of Molecular Biology and Biotechnology Adam Mickiewicz University Poznan, Poland Chapter 21, Appendix 1

Xiaodong Cheng Department of Biochemistry Emory University School of Medicine Atlanta, Georgia, USA Email: [email protected] Chapter 8

Irina Chernyakov Department of Biochemistry and Biophysics University of Rochester School of Medicine Rochester, New York, USA Chapter 39

Valérie de Crécy-Lagard Department of Microbiology and Cell Science University of Florida Gainesville, Florida, USA Email: [email protected] Chapter 26, Appendix 3

Olusola Y. Dokun Department of Urology Heinrich Heine University Düsseldorf, Germany Chapter 9

Laura Clife University of Georgia Department of Biochemistry and Molecular Biology Athens, Georgia, USA

Louis Droogmans Université Libre de Bruxelles Laboratoire de Microbiologie Institut de Recherches Microbiologiques J.-M. Wiame Bruxelles, Belgium

Chapter 12

Chapter 29

David T.F. Dryden School of Chemistry University of Edinburgh Edinburgh, Scotland Email: [email protected] Chapter 3

Pål Ø. Falnes Department of Molecular Biosciences University of Oslo Oslo, Norway Email: [email protected] Chapter 14

Adrian R. Ferré-D'Amaré Howard Hughes Medical Institute and Division of Basic Sciences Fred Hutchinson Cancer Research Center Seattle, Washington, USA Email: [email protected] Chapter 25

Marc Fontecave CNRS and Université Joseph Fourier Grenoble, France Chapter 24

Patrick Forterre Institut Pasteur Département de Microbiologie Paris, France Email: [email protected]

Richard Giegé Architecture et Réactivité de l'ARN Université Louis Pasteur de Strasbourg CNRS, IBMC Strasbourg, France Email: [email protected] Chapter 33

Elizabeth J. Grayhack Department of Biochemistry and Biophysics University of Rochester School of Medicine Rochester, New York, USA Chapter 39

Petar Grozdanov Department of Anatomy and Structural Biology Albert Einstein College of Medicine Bronx, New York, USA Chapter 31

Armine Hayrapetyan Institute of Pharmacy and Molecular Biotechnology University of Heidelberg Heidelberg, Germany Chapter 38

Bret S.E. Heale MRC Human Genetics Unit Western General Hospital Edinburgh, UK

Chapter 19

Chapter 18

Keith T. Gagnon Department of Molecular and Structural Biochemistry North Carolina State University Raleigh, North Carolina, USA

Mark Helm Institute of Pharmacy and Molecular Biotechnology Department of Chemistry University of Heidelberg Heidelberg, Germany Email: [email protected]

Chapter 30

Basar Gider Institute of Organic Chemistry RWTH Aachen University Aachen, Germany Chapter 10

Chapters 38, 41, Appendix 4

Chao Huang Department of Biochemistry and Biophysics University of Rochester Medical Center Rochester, New York, USA Chapter 32

Dirk Iwata-Reuyl Department of Chemistry Portland State University Portland, Oregon, USA Email: [email protected] Chapter 26

Albert Jeltsch Biochemistry Laboratory School of Engineering and Science Jacobs University Bremen Bremen, Germany Email: [email protected] Chapter 7

Tomasz P. Jurkowski Biochemistry Laboratory School of Engineering and Science Jacobs University Bremen Bremen, Germany Chapter 7

Katarzyna H. Kaminska Laboratory of Bioinformatics and Protein Engineering International Institute of Molecular and Cell Biology Warsaw, Poland Chapter 21

John Karijolich Department of Biochemistry and Biophysics University of Rochester Medical Center Rochester, New York, USA Chapter 32

Joanna M. Kasprzak Bioinformatics Laboratory Institute of Molecular Biology and Biotechnology Adam Mickiewicz University Poznan, Poland Chapter 21

Saulius Klimašauskas Institute of Biotechnology Laboratory of Biological DNA Modiication Vilnius, Lithuania Email: [email protected] Chapter 4, Appendix 5

Jacques Lapointe Biochimie et Microbiologie CREPSIP Université Laval Pavillon Marchand Québec, Qué, Canada Chapter 33

Hong Li Institute of Molecular Biophysics Department of Chemistry and Biochemistry Florida State University Tallahassee, Florida, USA Email: [email protected] Chapter 23

Ursula Liebl Laboratoire d’Optique et Biosciences Ecole Polytechnique, CNRS and INSERM Palaiseau, France Chapter 20

Zita Liutkevičiūtė Institute of Biotechnology Laboratory of Biological DNA Modiication Vilnius, Lithuania Chapter 4

Katherine S. Long Department of Biology University of Copenhagen Copenhagen, Denmark Chapter 37

Alexander D. MacKerell Jr. Department of Pharmaceutical Sciences School of Pharmacy University of Maryland Baltimore, Maryland, USA Email: [email protected] Chapter 5

Rachel Macmaster Manchester Interdisciplinary Biocentre Faculty of Life Sciences University of Manchester Manchester, UK

Ronald Micura Leopold Franzens University Institute of Organic Chemistry Center of Molecular Biosciences Innsbruck, Austria Appendix 4

Michelle Mitchell Institute of Molecular Biophysics Department of Chemistry and Biochemistry Florida State University Tallahassee, Florida, USA Chapter 23

Eugene G. Mueller Department of Chemistry University of Louisville Louisville, Kentucky, USA

Chapter 36

Chapter 25

Christophe Maris ETH Zurich Institute of Molecular Biology and Biophysics Zürich, Switzerland

Etienne Mulliez CNRS and Université Joseph Fourier Grenoble, France

Chapter 17

Chapter 24

E. Stuart Maxwell Department of Molecular and Structural Biochemistry North Carolina State University Raleigh, North Carolina, USA Email: [email protected]

Frank V. Murphy IV NE-CAT Advanced Photon Source Argonne National Laboratory Argonne, Illinois, USA

Chapter 30

U. homas Meier Department of Anatomy and Structural Biology Albert Einstein College of Medicine Bronx, New York, USA Chapter 31

Trine J. Meza Department of Molecular Biosciences University of Oslo Oslo, Norway Chapter 14

Chapter 34

Hannu Myllykallio Institut of Genetics and Microbiology Université Paris-Sud and Laboratoire d’Optique et Biosciences Ecole Polytechnique and INSERM Palaiseau, France Email: [email protected]; [email protected] Chapter 20

Akiko Noma Department of Chemistry and Biotechnology Graduate School of Engineering University of Tokyo Tokyo, Japan

Sébastien Porcher Laboratory of Nucleic Acids Chemistry Department of Chemistry Lausanne, Switzerland Email: [email protected] Chapter 41

Chapter 27

Mary A. O’Connell MRC Human Genetics Unit Western General Hospital Edinburgh, UK Email: [email protected]

U. Deva Priyakumar Department of Pharmaceutical Sciences School of Pharmacy University of Maryland Baltimore, Maryland, USA Chapter 5

Chapter 18

Gary Olsen Department of Microbiology University of Illinois at Urbana-Champaign Urbana, Illinois, USA

Elzbieta Purta Laboratory of Bioinformatics and Protein Engineering International Institute of Molecular and Cell Biology Warsaw, Poland

Appendix 3

Chapter 21

Grzegorz Papaj Laboratory of Bioinformatics and Protein Engineering International Institute of Molecular and Cell Biology Warsaw, Poland

Guosheng Qu Department of Molecular and Structural Biochemistry North Carolina State University Raleigh, North Carolina, USA Chapter 30

Appendix 2

Rachel Parisien Department of Chemistry Wayne State University Detroit, Michigan, USA Chapter 11

Eric M. Phizicky Department of Biochemistry and Biophysics University of Rochester School of Medicine Rochester, New York, USA Email: [email protected] Chapter 39

Norbert O. Reich Biomolecular Science and Engineering Program University of California Santa Barbara, California, USA Email: [email protected] Chapters 2, 6

Jason P. Rife Department of Medicinal Chemistry Institute for Structural Biology and Drug Discovery Virginia Commonwealth University Richmond, Virginia, USA Email: [email protected] Chapter 35

Chapter 3

Salifu Seidu-Larry Institute of Pharmacy and Molecular Biotechnology University of Heidelberg Heidelberg, Germany

Teresa Roldán-Arjona Departamento de Genética Ediicio Gregor Mendel Campus de Rabanales Universidad de Córdoba Spain

Naoki Shigi Biomedical Information Research Center National Institute of Advanced Industrial Science and Technology Tokyo, Japan

Gareth A. Roberts School of Chemistry University of Edinburgh Edinburgh, Scotland

Chapter 38

Chapter 13

Chapter 27

Kristian Rother Laboratory of Bioinformatics and Protein Engineering International Institute of Molecular and Cell Biology Warsaw, Poland Email: [email protected]

Stephane Skouloubris Institut of Genetics and Microbiology Université Paris-Sud CNRS, France

Chapters 21, 41, Appendices 1, 2, 6

Robert Sabatini University of Georgia Department of Biochemistry and Molecular Biology Athens, Georgia, USA Email: [email protected]

Chapter 20

Harold C. Smith Department of Biochemistry and Biophysics University of Rochester School of Medicine and Dentistry Rochester, New York, USA Email: [email protected] Chapter 15

Miloje Savic Manchester Interdisciplinary Biocentre University of Manchester Manchester, UK

Tsutomu Suzuki Department of Chemistry and Biotechnology Graduate School of Engineering University of Tokyo Tokyo, Japan Email: [email protected]

Chapter 36

Chapter 27

Wolfgang A. Schulz Department of Urology Heinrich Heine University Düsseldorf, Germany Email: [email protected]

Jaunius Urbonavičius Université Libre de Bruxelles Laboratoire de Microbiologie Institut de Recherches Microbiologiques J.-M. Wiame Bruxelles, Belgium and Université Libre de Bruxelles Institut de Biologie et de Médecine Moléculaires Gosselies-Charleroi, Belgium Email:[email protected]

Chapter 12

Chapter 9

Chapter 29

Saara Vainio he Netherlands Cancer Institute Division of Molecular Biology Amsterdam, he Netherlands Chapter 12

Erwin van den Born Department of Molecular Biosciences University of Oslo Oslo, Norway Chapter 14

Birte Vester Departtment of Biochemistry and Molecular Biology University of Southern Denmark Odense, Denmark Email: [email protected] Chapter 37

David G. Waterman Diamond Light Source Ltd Harwell Science and Innovation Campus Oxfordshire, UK Chapter 22

Joseph E. Wedekind Department of Biochemistry and Biophysics University of Rochester School of Medicine & Dentistry Rochester, New York, USA Email: [email protected] Chapter 16

Joseph M. Whipple

Department of Biochemistry and Biophysics University of Rochester School of Medicine & Dentistry Rochester, New York, USA Chapter 39

John H. White School of Chemistry University of Edinburgh Edinburgh, Scotland Chapter 3

Shigeyuki Yokoyama RIKEN Systems and Structural Biology Center and Spring-8 Center Harima Institute Tsurumi, Yokohama and Department of Biophysics and Biochemistry Graduate School of Science he University of Tokyo Tokyo, Japan Chapter 28

Benjamin A. Youngblood Department of Microbiology and Immunology Emory University School of Medicine Atlanta, Georgia, USA Chapter 2

Elmar Weinhold Institute of Organic Chemistry RWTH Aachen University Aachen, Germany Email: [email protected] Chapter 10

Albert Weixlbaumer MRC Laboratory of Molecular Biology Cambridge, UK Chapter 34

Yi-Tao Yu Department of Biochemistry and Biophysics University of Rochester Medical Center Rochester, New York, USA Email: [email protected] Chapter 32

PREFACE Modiied deoxy- and ribonucleosides, distinct from the canonical nucleosides adenosine, guanosine, cytosine and uridine or thymidine, are found in DNA and RNAs of all living organisms, as well as of viruses, mitochondria and chloroplasts. In DNA, chemical alteration of a base or a phosphate occurs by pre-replicative or post-replicative enzymatic processes, while in RNAs, chemical alteration of a base or a ribose always occurs ater RNA synthesis, at the polymer level. DNA and RNA editing, that is the replacement of a canonical base by another at the polymer level, also exists in eukaryotic cells, certain archaea, in mitochondria and chloroplasts. he variety of biochemical processes allowing such nucleic acids modiication and editing are astonishing. hey inluence the maturation, folding and stabilization of RNAs and allow an accurate, eicient and regulated translation process. In DNA, they allow genetic imprinting, immunoglobulin class switch recombination, somatic hypermutation, self-defence against viruses and probably many other functions that have still to be discovered. he challenge is to understand how and why these intriguing, very diversiied types of ‘ine-tuning’ the structure and functions of nucleic acids by so-called ‘minor nucleosides’ have emerged since the irst living cells appeared on earth some millions of years ago. his volume is a timely and comprehensive description of the many facets of DNA and RNA modiication-editing processes and to some extent repair mechanisms. Each chapter ofers fundamental principles as well as up to date information on recent advances in the ield (up to the end of 2008). hey conclude with a short ‘conclusion and future prospect’ section and an exhaustive list of 35 to up to 257 references (in average 87). Contributors are geneticists, structural enzymologists and molecular biologists working at the forefront of this exciting, fast-moving and diverse ield. his book will be a major interest to PhD students and University teachers alike. It will also serve as an invaluable reference tool for new researchers in the ield, as well as for specialists of RNA modiication enzymes generally not well informed about what is going on in similar processes acting on DNA and vice-versa for specialists of the DNA modiication-editing and repair processes usually not much acquainted with what is going on in the RNA maturation ield.

he book is comprised of 41 chapters. he common links between them are mainly the enzymatic aspects of the diferent modiication-editing and repair machineries: structural, mechanistic, functional and evolutionary aspects. It starts with two general and historical overviews of the discovery of modiied nucleosides in DNA and RNA and corresponding modiication-editing enzymes. hen follows 11 chapters on DNA modiication and editing (mechanistic and functional aspects). Two additional chapters cover problems related to DNA/RNA repair and base editing by C-to-U deaminases, followed by three chapters on RNA editing by C-to-U and A-to-I type of deamination. Discussions about the interplay between DNA and RNA modiications and the emergence of DNA are covered in two independent chapters, followed by 20 chapters on diferent but complementary aspects of RNA modiication enzymes and their cellular implications. he last chapter concerns the description of the present state-of-the art for incorporating modiied nucleosides by in vitro chemical synthesis. At the end of the book, six appendices give useful details on modiied nucleosides, modiication-editing enzymes and nucleosides analogs. his information is usually diicult to obtain from current scientiic literature. Henri Grosjean, PhD

Acknowledgements he editor, Henri Grosjean, would like to thank each author individually for accepting the invitation to contribute to this book and providing me with an excellent, well-focused and up-to-date chapter within a reasonable period of time. hanks also for accepting, in some cases, revising or slightly modifying the original galley proof for a better coordination of the chapter within the general framework of the book. I also thank all the authors and co-authors that have helped me in my duty to scientiically edit all the book chapters by reviewing, commenting, advising one, or some time several other chapters of the book (cross-referencing system). For several chapters, advice to ‘external’ refereeing persons were also asked. he editor is especially indebted to Juan Alfonzo (Ohio State University, Columbus, OH, USA), Brenda Bass (University. of Utah, Salt Lake City, UT, USA), Glenn Björk (University Umea, SE), Bertrand Castaing (CNRS, Orléans, FR), Wayne Decatur (University of Massachusetts, Amherst, MA, USA), Aaron Dinner (University of Chicago, Chicago, IL, USA), Steve Douthwaite (University of Southern Denmark, Odense, DK), Catherine Florentz (IBMC-CNRS, Strasbourg, FR), Skip Fournier (University of Massachusetts, Amherst, MA, USA), Nicolas Glansdorf (University of Brussels, BE), Elizaveta Gromova (University of Moscow, Russia), Wilhelm Guschlbauer (France), Steve Hadjuk (University of Georgia, Athens, GA, USA), Anne-Lise Haenni (University of Paris 7, FR), Andrew Hanson (University of Florida, Gainesville, FL, USA), Roland Hartmann (University of Marburg, DE), Anita Hopper (Ohio State University, Columbus, OH, USA), David Hornby (University of Sheield, UK), Huang Niu (Yale University, New Haven, CT, USA), Mike Ibba (Ohio State University, Columbus, OH, USA), Alain Krol (CNRS-IBMC, Strasbourg, FR), Gordona Maravic (University of Zagreb, Croatia), Mario Mörl (University of. Leipzig, DE), Olivier Namy (IGM, University of. Paris-11, FR), Jacques Ninio Jacques (ENS, Paris, FR), Ohman Marie (University of Stockholm, SE), Nina Papavasiliou (Rockefeller University, New York, USA), Pingoud Alfred (University of Giessen, DE), Pascale Romby (IBMC, CNRS, Strasbourg, FR), Roy Todd (MIT, Cambridge, MA, USA), Barbara Sedgwick (Cancer Reseearch, Potters Bar, UK), Mike and Rebecca Terns (University of Georgia, Athens, USA) and David Tollervey (University of Edinburg, Scotland). In ine, each chapter of this book has been reviewed by a minumum of two, sometimes three and in few cases even four independent reviewers. Last but not least, during the scientiic editing process of each individual chapter I greatly appreciated the help and pertinent advice of the publisher's staf at Landes Bioscience in the US and esspecially by Celeste Carlton, Erin O’Brien and Cynthia Conomos. Without their help, this book would certainly not have been produced in a reasonable period of time. I enjoyed working with them.

Chapter 1

Nucleic Acids Are Not Boring Long Polymers of Only Four Types of Nucleotides: A Guided Tour Henri Grosjean

Abstract

C

hemically altered nucleosides derived from canonical ribo- or deoxyribonucleoside-derivatives of adenosine, cytosine, guanosine, and uridine or thymidine are found in all types of nucleic acids, DNA and RNA. hey are particularly abundant in noncoding RNAs, such as transfer RNAs and ribosomal RNA of higher organisms. By increasing the structural diversity of nucleic acids, modiied nucleosides play important roles in gene expression and in regulating many aspects of RNA functions. hey also contribute to nucleic acid stability and to protection of genetic materials against virus aggression. In this chapter we present a historical overview of the discovery, occurrence, and diversity of the many naturally occurring modiied nucleosides that are present in both DNA and RNA of diverse organisms. We also briely describe the diferent enzymes that accomplish these nucleic acid ‘decorations’. More information about the structure, function, biosynthesis and evolutionary aspects of selected modiied nucleosides in DNA and RNA and their corresponding modiication enzymes can be found elsewhere in this volume.

Origin of Nucleic Acids Research Discovery of Deoxyribonucleic Acid (DNA)

Friedrich Miescher discovered an unknown compound, later identiied as chromatin, in 1869. He extracted a gelatinous material from various cells (initially human pus), and discovered it contained much inorganic phosphorus. his newly identiied biochemical material was named ‘nuclein’ because it was always associated with what the histologists designated nuclei. During the period 1885-1900, it was discovered that beside phosphorus, ‘nuclein’ was also rich in a carbohydrate (later identiied as a deoxypentose) and in the organic bases adenine, thymine, guanine, and cytosine. he linear structure of the puriied organo-phosphate polymer was inally solved by Phoebus Levene (period 1909-1929). At that time, the DNA polymer was thought to be the scafold of some important elements within the chromatin. No connection was made between this ‘boring long polymer with only four types of nucleotides’ and the molecular basis of transmission of hereditary characteristics that geneticists were eagerly seeking. Detailed study of polymeric DNA began in 1928 when Fred Griith suspected that a “genetic transforming principle” was associated with the ‘nuclein’. However, it was only in 1944 that Oswald *Corresponding Author: Henri Grosjean—Institute of Genetics and Microbiology, Université Paris-Sud, CNRS UMR 8621, F-91405, France. Email: [email protected]

DNA and RNA Modiication Enzymes: Structure, Mechanism, Function and Evolution, edited by Henri Grosjean. ©2009 Landes Bioscience.

2

DNA and RNA Modii cation Enzymes

Avery and his research group,1 using almost-pure DNA from Streptococcus cell extracts, inferred that DNA contains genetic information. It took another year before Avery demonstrated that the transforming activity disappeared ater DNAse treatment.2 hen the race to identify the detailed chemical structure of the ‘genetic’ DNA really started. First, Rollin Hotchkiss3 conirmed the genetic nature of DNA, while Erwin Chargaf4 discovered that adenine with thymine and guanine with cytosine always exist in a 1:1 ratio, although the ratio of G+C/A+T varies from species to species. Based on these crucial observations, together with the very irst crystallographic data of DNA ibers obtained by Rosalind Franklin working in the laboratory of Maurice Wilkins, and based on competitor Linus Pauling’s suggestion that DNA could have an helical shape, Francis Crick and James Watson proposed in 1953 the double helix structure of DNA,5,6 which revolutionized our concept of the transmission of genetic characters. Next came the identiication and puriication of the irst DNA restriction enzymes7,8 that recognize a deined sequence in DNA and cut it speciically. Together with the invention of techniques for DNA sequencing,9,10 these advances allowed the development of recombinant DNA technology11 and opened the ield of modern molecular biology.12

Discovery of Ribonucleic Acids (RNAs)

It was not until later in the 20th century that scientists realized there are two types of nucleic acids, DNA and RNA, the latter involving ribose instead of deoxyribose and uridine (or pseudouridine, the ‘ith ribonucleoside’—see below) instead of thymidine. he reason was that little attention was given to the presence of RNases, and any ‘RNA’ identiied in cell extracts was just a mix of degradation products a few nucleotides long.13,14 Degradation of DNA by metal-dependent DNases was easier to avoid. hus while DNA research was progressing well, the chemistry of the second type of nucleic acid (RNA) remained obscure until the 1950s. Only ater introducing detergent (as for DNA preparation15), associated with phenol for puriication, were the irst long RNA polymers inally identiied in 1956-58 (ribosomal RNA16 and ‘soluble’ RNA—now called transfer RNA17). Wide interest in these new types of nucleic acids emerged only ater Crick hypothesized in 1955 (but published only in 1958) that an RNA molecule should be the intermediate between DNA and proteins (known as the ‘RNA adaptor hypothesis18), and later on advanced the ‘Wobble hypothesis’ for decoding mRNA.19 Initially, Crick thought the adaptor molecules might be the small RNA molecules that were known to be present in cell extracts, until the ‘soluble’ RNAs (tRNAs), able to be speciically aminoacylated,20 were identiied and characterized in 1958. he concept of messenger RNA and regulatory mechanisms in the synthesis of proteins was formulated in 1961 (refs. 21, 22). he genetic code was inally solved and oicially presented during a Cold Spring Harbor Symposium23 in 1966. In the meantime (1965-67) the irst fully sequenced tRNAs speciic for alanine,24 tyrosine,25 serine26 and phenylalanine,27 all from yeast were fully sequenced. hese sequences included the identiication and location of no less than 17 diferent noncanonical nucleosides, among them two hypermodiied nucleosides N6-isopentenyladenosine (i6A) and wyosine (yW). he irst crystals of tRNAs were produced and the irst three dimensional structure of one of them28 was inally solved in 1974. his was the birth of structural biology of the nucleic acids.

Discovery of Noncanonical Nucleosides Modiied Nucleosides in Genomic DNAs

During the period 1920-45, naturally occurring nucleic acid polymers (DNA and RNA) were thought to contain only four canonical nucleosides (ribo- or deoxy-derivatives): adenosine, cytosine, guanosine, and uridine or thymidine. However, ater analyzing a picrate precipitate from a hydrolysate of DNA of avian tubercule bacilli, Johnson and Coghill29 detected a minor amount of a methylated cytosine derivative (m5dC, Fig. 1). his report was later disputed by Vischer et al30 because they could not reproduce the result, but Johnson and Coghill were in fact correct. Only in 1948 was the presence of m5dC in DNA from calf thymus31,32 irmly established

Nucleic Acids Are Not Boring Long Polymers of Only Four Types of Nucleotides

3

Figure 1. Modified bases (and phosphate) in DNAs. In boxes are the chemical structures or the description of the chemical composition of adducts to selected atoms of pyrimidine ring (upper part) and purine ring (bottom part) found in cellular (nuclear) genomic DNA of prokaryotes and eukaryotes (in grey boxes) or in viral DNA (mainly bacteriophages; open boxes). Conventional symbols used in scientific literature are given. Information comes mainly from references 40, 41, 60 and those given in text. The corresponding full names and chemical characteristics can be found in references 61, 62 and in the MODomics data base (see Appendix 1 by Rother et al in this volume). A color version of this image with all the atoms belonging to cytosine and uracil rings is available at www.landesbioscience.com/curie.

using the new technique of paper chromatography of DNA hydrolysates.31,32 his was followed in 1958 by the detection of N6-methyl adenine (m6dA) in microbial DNA.33 It was not until much later—1964—that the methylation of cytosines and adenosines within DNA molecules was shown to occur by enzymatic post-replicative modiication (see below the section concerning enzymes). A surprise discovery during the period 1953-63 was that the DNA of some bacterial viruses lacks deoxycytosine (dC) or deoxythymidine (dT) and instead contains 5-hydroxymethyldeoxycytosine34 (hm5dC), 5-hydroxymethyldeoxyuridine35 (hm5dU) or simply deoxyuridine36 (dU). hese modiied cytosines or thymidines (100% replacing of the standard base dC or dT completely), unlike the m5dC and m6dA in bacterial and mammalian DNA, are generated at the precursor level (prereplicative modiication) and subsequently incorporated into phage DNA by the bacteriophage polymerase.37 However, the hm5dC in phage DNA can be further glucosylated at the polymer level38,39 by direct transfer of glucose from UDP-glucose to form hexosylated derivatives glc-hm5dC / glc-hm5dU and even di-glucosylhydroxymethyldeoxyuridine glc-glc-hm5dC. Note that a minor amount of hm5dU and glc-hm5dU (also designated Base J) have been found recently in genomic DNA of lagellated protozoa of the order Kinetoplastida

4

DNA and RNA Modii cation Enzymes

(Trypanosoma brucei for example) and in the closely related unicellular alga Euglena gracilis (see chapter by Sabatini et al). In this case hydroxylation of deoxyribothymine and the subsequent glycosylation step occur at the polymer level. Later (1972-81) came the discovery of new uridine derivatives containing putrescinyl-, glutamyl- or dihydroxpentyl groups linked to C5 of the uracil ring (symbolized by Put-m5dU, Glu-m5dU and Dhp5dU respectively40,41). In the case of Dhp5dU, glucose or gluconolactone-1-phosphate can be further attached on one of the two free hydroxyl groups leading to hypermodiied Glc-Dhp5dU and GlcP-Dhp5dU, respectively. In these latter cases, depending on the type of chemical alteration, the extent of replacement of the canonical dT by modiied dU derivatives was estimated to be 15-60%. In E. coli phage Mu, a substantial number of adenines were found modiied to N6-carbamoylmethyl adenine (ncm6A, 15% of dA), while in S. elongatus phage S-2L 100% of adenines are methylated to 2-aminoadenine42 (m2A) or N2-N6-dimethyladenine43 (m2,6A). Also, in the phage DDV1 infecting Shigella sonnei, a trace amount of 7-methylguanine (m7dG, about 1% of dG) was found. In contrast, other types of noncanonical modiied deoxynucleosides would most probably be identiied were more bacterial and phage DNAs to be explored—an endeavor that unfortunately has been much neglected in the past decade (discussed in the chapter by Forterre and Grosjean). Quite recently (1983-87), N4-methylcytosine (m4dC) and also deoxyinosine (dl) were identiied in some bacterial DNA,44-46 especially from thermophiles. he selective advantage of m4dC over m5dC at high temperatures is thought to be to avoid production of mutagenic m5dU resulting from heat-induced deamination of m5dC, and m4dC is indeed more resistant to deamination at high temperature than m5dC (discussed in ref. 47). A surprising recent discovery (2005) is that the phosphoryl group in bacterial DNA can be thiolated to form a phosphorothionate linkage of the Sp chiral coniguration;48,49 the mechanism remains to be elucidated (commented by Eckstein50). In conclusion, so far relatively few naturally occurring modiied deoxynucleosides have been identiied in genomic DNAs (summarized in Fig. 1). he most common modiications are simple methylation of either the C5 atom of the cytidine ring (m5dC in almost all kinds of organisms) or the exocyclic amine groups of adenine (m6dA mainly in bacteria and archaea) or cytidine (m4dC mainly in thermophilic bacteria and Archaea). Unusual, deoxynucleosides (sometimes hypermodiied) are conined to bacteriophages and viruses (reviewed in refs. 40-41).

Modiied Nucleosides in Coding and Noncoding RNAs

In the case of RNA, the story is very diferent and far more complex. In contrast to DNA, we now know that every position of a pyrimidine or a pyrimidine ring (Figs. 2 and 3 respectively) can be posttranscriptionally modiied, not only by methylation or hydroxymethylation, but also by deamination, transglycosylation, acetylation, reduction, thiolation, oxidation, ribosylation, formylation, isomerization, selenation, or multiple group additions or transfer …. singly or sequentially (Fig. 2). Moreover, the 2’-hydroxyl group of the ribose moiety can be methylated (alone or in combination with base modiications) or ribosylated with a bulky adenosine-5’-phosphate group. To date 110 - 119 (depending on how certain ‘hypermodiied’ modiied nucleosides are considered) naturally occurring modiied nucleosides have been identiied in diferent types of RNAs, not only tRNAs and rRNAs, but also mRNAs and snRNAs like sn/snoRNAs, miRNAs, and chromosomal RNAs. he most widespread RNA modiications are base or ribose methylations (symbolized by mX or Xm respectively) and isomerization of uridine into pseudouridine (Psi). he majority of hypermodiied ribonucleosides occur in transfer RNAs; these modiications include long lateral chains or multiple substituents on two or more atoms of the same purine or pyrimidine ring (see below). How was this vast body of information on the identity and location of the many modiied nucleosides in RNA acquired? he story starts only in 1951, ater the discovery of m5dC in DNA, when W. Cohn and his colleagues51 used paper chromatography of an acid hydrolyzate of enriched ‘soluble RNA’ of yeast to identify a new compound in addition to the four expected ribonucleosides. his compound, initially designated by a question mark ‘?’, was shown later in 1957-58 to be 5-ribosyluridine, also called the ‘ith nucleoside in RNA’ (now designated pseudouridine52,53). Pseudouridine accounts for about 4% of the molecular weight of the total constituent

Nucleic Acids Are Not Boring Long Polymers of Only Four Types of Nucleotides

5

nucleosides in yeast tRNAs and is the most abundant modiied nucleoside identiied so far in all kinds of tRNA as well as rRNA. he next modiications to be identiied (about one year later) were 2’-O-methylribose derivatives (Xm = Cm, Um, Gm, Am; these are the second most abundant class.54,55 Also identiied in the same period were 5-methylribouridine (m5U, ribothymine or ribo-T, one of which occurs in almost every tRNA), 5-methylribocytosine (m5C, also abundant in mammalian tRNAs, up to 2-3 per tRNA) and a few other simple methylated adenine and guanine derivatives (m1A, m2G, m1G, m7G…; refs. 56-59). Mainly from sequence information of tRNAs from yeast, E. coli and mammals, many other modiied nucleosides were identiied, including N6-isopentenyladenosine (i6A and its variant ms2i6A), wybutosine (yW—see chapter by Urbonavicius et al), and N6-carbamoylthreonineadenosine (t6A). By 1970, thus just 20 years ater the discovery of Psi in RNAs, 35 well-characterized modiied nucleosides had been identiied compared to only ive in DNA (reviewed in the book by RH Hall,60 the only one available to date dealing with RNA and DNA modiications. For details concerning chemical structures, occurrence and classiication of modiied ribonucleosides, as well as on metabolic pathways, and enzymes catalyzing RNA modiication reactions, consult the MODOMICS database at http:// modomics.genesilico.pl; see also Appendix 1 by Rother et al and additional Web links in Appendix 6 in this volume. Other useful sources of information are in references 61-64.

Degree, Extent, and Pattern of Nucleoside Modiications

A modiied nucleoside at a given position within a population of RNA molecules may not be present in all of them so that the molar ratio (or % proportion) of a given nucleoside in a population of RNA molecules (referred to as the degree of modiication) can be less than 1/1 (less than 100% modiied) at a given site. he degree of modiication may vary according to the physiological conditions (oxygen concentration, temperature, availability of metabolic intermediates or cofactors, metabolic stress, malignancy…) of the cell from which the RNA came, thus creating a micro-heterogeneity in the RNA population (‘modivariants’; see chapter by Giegé and Lapointe). In some cases, modivariants can be separated by simple chromatographic procedures. For example, the molar ratio of ribothymine (m5U, ribo-T) at position 54 in the T-Psi loop of all types of tRNAs is usually 1/1 (100% U-54 methylated), while the molar ratio of thiolation on C2 of the ring of the same uridine-54 in the tRNA of thermophiles (harboring m5s2U instead of m5U, as in hermus thermophilus—see chapter by Noma et al) can be less than 1/1 or even zero, especially when the organism is grown at temperatures below that optimal for growth. It is important to remember that in the RNA modiication data banks (tRNA, rRNA, snRNA) the presence of a given modiied nucleotide at a given position of an RNA molecule is indicated (m5s2U as in the example above), but never the degree of modiication of that particular base or ribose. his caveat is particularly relevant for ribosomal RNA, where for instance the degree of modiication of a particular Psi or 2’-O-methylribose (Xm) can be very low. Since the DNA genome in principle exists in only one copy per cell, the notion of degree of modiication does not apply to DNA. However in some microorganisms this is not the case: Synechococcus for example has about 10 copies of the chromosome, while in certain hyperthermophilic and halophilic archaea, this number can be as high as 20 copies of the chromosome. hen the notion of degree of methylation should apply. he extent of nucleic acid modiication concerns the relative amount of a given modiied nucleoside that exists at several positions within a given RNA or a DNA molecule, usually expressed as % replacement of total nucleosides (or total of a particular canonical one) in the whole nucleic acid molecule. For example, the extent of post-replicative modiication of dC into m5dC for bacterial, archaeal, and eukaryal genomic DNA is generally 1-8 % of the total dC, except for mammalian and plant DNAs where m5dC can reach 30% of total dC. In phage DNAs, where modiications arise by a prereplicative event, the extent of modiication can reach 100%. he extent of total modiications in tRNA molecules from plant and mammals is also high (up to 25%), whereas that in homologous tRNAs from bacteria is lower (2-15%—reviewed in refs. 40, 60). he pattern of modiications in RNA/DNA is a more complex, qualitative concept. Here, comparison of diferent nucleic acids is made by taking into the account type, location and diversity of

6

DNA and RNA Modii cation Enzymes

Figure 2. Modified bases and ribose in RNAs. In the boxes are the various types of chemical groups that can be enzymatically attached to selected atoms of a pyryrimidine ring (in red in the color version available on the Web) during maturation of RNA precursor in Bacteria, Eukarya or Archaea. The base modifications that are also found in DNA are circled. Conventional symbols are also given; the complete scientific names can be obtained in references 61 and 62.

modiications which of course difer greatly from one type of nucleic acid to another (for example DNA versus RNA, or rRNA versus tRNA or mRNA). More interesting is that distinct and characteristic patterns of modiication exist between homologous nucleic acids from phylogenetically distant organisms (see Fig. 1), as well as between tRNAs of the same organism (see below). he pattern of modiication is the ‘ingerprint’ or ‘identity card’ of a RNA molecule, in the same way that a restriction pattern is the ‘ingerprint’ or ‘identity card’ of a DNA molecule. As more sequences of RNAs themselves (not the sequence of their genes or RT-PCR products) become available, this important feature of nucleic acids will become more evident.

Distribution of Modiied Nucleosides in the hree Domains of Life Nucleosides Found in Coding and Noncoding RNAs

Figure 4, shows the symbols of 107 structurally distinct modiied ribonucleosides identiied so far in diferent RNAs from various Eukarya, Bacteria, or Archaea.65 he information comes primarily from RNA sequence data and from analysis of RNA nucleoside composition by thin-layer chromatography, high performance liquid chromatography and/or mass spectrometry (for examples see refs. 66-70). Symbols indicated in normal characters (in red in the version on the Web Site of this chapter), in italics, bold (in blue) or in normal charaters, underlined (in black) correspond to modiied nucleosides found in tRNAs, rRNAs, or in both t+rRNAs, respectively. Organelle (mitochondrial and chloroplastic) tRNAs and rRNAs contain their own set of modiied nucleosides, some of which (like cmnm5U, k2C, τm5U, τm5s2U, f5C, f5Cm) are not present in cytoplasmic RNAs of the eukaryotic host cell. he corresponding modiication enzymes, now encoded in the host genome, are believed to have originated from ancient bacterial endosymbionts. herefore, while present in the Eukaryal domain, mitochondrial-modiied nucleosides should be considered as ‘bacterial by origin’ or at least belonging to both Eukarya and Bacteria (they are boxed in the intersector E-B in Fig. 4). Symbols of modiied nucleosides outside the circles correspond to those found in eukaryal mRNAs (normal

Nucleic Acids Are Not Boring Long Polymers of Only Four Types of Nucleotides

7

Figure 3. Modified bases and ribose in RNAs. In the boxes are the various types of chemical groups that can be enzymatically attached to selected atoms of a purine ring (in red in the color version available at www.landesbioscience.com/curie) during maturation of RNA precursor in Bacteria, Eukarya or Archaea. The Wyosine G-hypermodification leads to the formation of a third purine ring, see chapter by Urbonavicius et al. Concerning G-transglycosylation of azaguanine derivatives (with chemical group attached to C7 instead of N7 as in guanine) and further hexosylations of the G-derivatives, see chapter by Ywata-Reuyl and de Crécy-Lagard. The base modification that is also found in DNA is circled. The complete scientific names of each symbol can be found in references 61 and 62.

characters, in red) and snRNAs (italics, bold, in green) or in both mRNAs and snRNAs (italics, underlined, in black). Five members of this eukaryal group are unique to mRNAs and/or snRNAs, while others are also present in eukaryal tRNAs and/or rRNAs (indicate by an arrow). Figure 4 shows that more than half of the modiied ribonucleosides are domain speciic. hese presumably arose later during evolution, ater the separation of organisms into the three domains. About one ith of the other modiied nucleosides are located within overlapping sectors of the circles and thus found in two or more domains: either between Eukarya and Bacteria (E+B), or Bacteria and Archaea (B+A) or Archaea and Eukarya (E+A). he remaining ith of modiied nucleosides are present in all kinds of organisms (E+B+A). hey are the simplest types of modiication, several are found in all types of RNAs. From this observation, it has been inferred71 that they correspond to relics of modiied nucleosides that were present in primordial organisms existing before the three biological domains separated. However, the reality may not be so simple. Symbols like m1G or m5U within the central common sector E+B+A correspond to modiications that are located in diferent positions and in diferent types of RNA molecules, each of them being produced by site-speciic as well as RNA-speciic enzymes that do not necessarily belong to the same protein family. Some cases most probably represent convergent rather than divergent evolution, so that the evolutionary history of the emergence of RNA modiication machinery is

8

DNA and RNA Modii cation Enzymes

Figure 4. Phylogenetic distribution of modified nucleosides present in RNAs from the three domains of life. Symbols are written differently according they were found in tRNAs, rRNAs, mRNAs and/or in sn(o)RNAs. For details see text. A color version of this image is available at www.landesbioscience.com/curie.

complex (see for examples refs. 72,73, also chapters by Czerwoniec et al, by Myllykallio et al and by Forterre and Grosjean). Concerning doubly modiied nucleosides of the type xNm (like m2Gm or ac4Cm), a majority of them were found so far in archaeal RNAs. hey correspond in fact to combinations of simple methylation of the ribose (Gm or Cm) and of enzymatic alteration of the base (m2G or ac4C), each of the ‘independent’ modiications being found within the three overlapping E+B+A sectors, or in the E+A sectors (Fig. 5). hus, while modiied nucleosides like xNm’s are indeed found mainly in archaeal RNAs, the corresponding modiication enzymes may not necessarily be unique to archaea. Lastly, modiications like imG, imG2, mimG, yW, OHyW, o2yW and OHyW*, or preQo, preQ1, Q, oQ, gluQ, manQ, GalQ and G+, or nm5U, or cmnm5U, mnm5U are merely intermediates of the same phylogenetically related stepwise metabolic reaction chain (see chapters by Urbonavicius et al for wyosine derivatives, by Iwata-Reuyl and de Crécy-Lagard for queuosine derivatives, and by Bessho and Yokoyama for the modiied uridines series). Consequently, the real diversity of naturally occurring modiied nucleosides as it appears in Figure 4 could probably be reduced from 107 to about half truly distinct, biosynthetically unrelated types of chemical

Nucleic Acids Are Not Boring Long Polymers of Only Four Types of Nucleotides

9

Figure 5. Localization of ‘doubly modified’ nucleosides at the base and the ribose (in red in the color version available at www.landesbioscience.com/curie) of RNAs of Archaea. Lines point out which ones among these hypermodified nucleosides correspond to non ribose methylated counterparts in Eukarya and Bacteria.

structures in RNAs. However, exploring more RNAs, especially from extremophiles, may uncover new types of modiied nucleoside.

Nucleosides Found in Genomic DNAs

Figure 6 summarizes the types of modiied deoxyribonucleosides found in DNA from diferent origins. Symbols for modiied deoxynucleosides found in genomic DNA of a cell, or in DNA of bacteriophages and eukaryal viruses, are indicated in diferent types of boxes. In the cases of cellular genomic DNA, almost all (if not all) modiied deoxynucleosides are formed by post-replicative enzymatic modiication processes; their extent ranges from 1% to 30% (refs. 40,60). In viruses, on the other hand, modiied deoxynucleosides are derived either by post-replicative modiication processes or via incorporation of modiied deoxynucleotide precursors directly into DNA by the virus DNA-dependent DNA polymerase (prereplicative process). In this later case the extent of DNA modiication can reach 100%. In the case of viruses and bacteriophages, only few modiied deoxynucleosides are found to be common (m5dC, m6dA and hm5dU, there are also those found in genomic DNA of bacteria or Eukarya (reviewed in refs. 40,41).

he Case of Transfer and Ribosomal RNAs

To date ( January 2009) more than six hundreds of tRNAs from more than one hundred different organisms of the three domains of life (with strong bias for bacterial tRNAs) have been sequenced, and the type and location of each individual naturally occurring modiied nucleoside have been identiied (see ref. 74 and http://trnadb.bioinf.uni-leipzig.de). Figure 7 summarizes the available information in one cumulative ‘tRNA modiication map’. As can be seen, a large number of nucleotides in tRNAs can be enzymatically altered in many diferent ways, the most common modiication being pseudouridine. Independent maps for Eubacteria, Archaea, protists, animals, plants, mitochondria and chloroplasts, are available in reference 75 (not updated since 1995, but nevertheless still useful). As a rule, tRNAs from eukaryotes (and plants) are more heavily modiied than the homologous tRNAs from Eubacteria. Transfer RNAs from organelles and parasitic organisms like Mollicutes76 are those for which the extent of modiication is the lowest (1- 6 %). Only 60 archaeal tRNAs have been sequenced so far (majority from halophiles), so it is hard to

10

DNA and RNA Modii cation Enzymes

Figure 6. Phylogenetic distribution of modified deoxynucleosides present in genomic DNA from the three domains of life. Distinction is made according their origins: from cellular/nuclear DNAs (in circles) or from viruses/bacteriophages DNA (in squares). This figure complements information in Figure 1. Special attention is drawn to m4dC that is mostly found in thermophilic organisms. This methylated cytosine is more resistant to chemical deamination that become important at high temperature than m5dC, and once deaminated, it is enzymatically repaired while the deaminated product of m5dC (=dT) is not (see text ).

generalize about them. However, analysis of the base composition of bulk tRNAs from several hyperthermophilic organisms indicates that they are heavily modiied and are rich in stabilizing 2’-O-methylated nucleosides (reviewed in ref. 77), while for tRNAs of halophiles,78 where there is a compensatory stabilizing efect of high salt concentration in the cytoplasm, the extent of modiication is rather low. Some modiied nucleosides, like m5U (ribo-T) and Psi located at positions 54 and 55 of the so-called T-Psi loop, are almost ubiquitous in all kinds of tRNAs. hey usually correspond to modiied nucleosides whose function is to stabilize the 3D-core of the nucleic acid. Other modiied nucleosides are unique to a given tRNA isoacceptor, like the wyosine derivatives found exclusively at position 37 of eukaryal and archaeal tRNA-Phe (see chapter by Urbonavicius et al) or lysidine (k2C) present in all bacterial and most organelle. hey are generally located in the tRNA anticodon loop, whose function is to decode the genetic information in mRNAs. Note that the distribution of the modiied nucleosides of the anticodon loop is clearly ‘domain speciic’ (Fig. 8). Among them, 5’-substituted hypermodiied uridines of the type Xo5(s2)U(m) and Xm5(s2)U(m) involved in decoding the two-codon boxes (discussed in chapters by Bessho and Yokoyama and by Weixelbaumer and Murphy) are the most diversiied. hese modiied nucleosides are genuine ‘signatures’ of the origin of tRNA; this applies also to certain anticodon

Nucleic Acids Are Not Boring Long Polymers of Only Four Types of Nucleotides

11

Figure 7. Schematic representation of transfer RNA cloverleaf and positions where a given modified nucleoside has been found (majority from sequencing data, about 500 have been compiled). Information about modified nucleosides in tRNAs from selected groups of organisms like Bacteria, fungi, animals, chloroplasts, mitochondria, Archaea, Mollicutes can be obtained in references 75-78.

base modiications (essentially for the wobble base of anticodon) identiied in the ‘endosymbiotic’ mitochondrial and chloroplastic tRNAs. he same kind of analysis can be performed with ribosomal RNAs of the small and large subunits (refs. 62-64 and http://biochem.umass.edu/fournierlab/3dmodmap). Much less information about the types and locations of modiied nucleosides is available for rRNAs than for tRNAs. However, from what is known, rRNAs from eukaryotes are much more heavily modiied than their homologs in bacteria (commented in ref. 76). Concerning archaeal rRNAs, the only

12

DNA and RNA Modii cation Enzymes

Figure 8. Distribution in the three domains of life of hypermodified nucleosides located at position 34 (wobble position of anticodon, upper part of the figure) or at position 37, 3’ adjacent of anticodon (bottom part of the figure). This figure complements the information given in Figure 7.

ones that have been carefully investigated are those from the halophile Haloferax volcanii and the closely related Haloarcula marismortui (refs. 63,64,78), which are not representative of the whole domain of Archaea. Only pseudouridine and 2’-O-methylation of various archaeal rRNAs (as well as of tRNAs) are being currently studied because of their special interest to RNA-guide machineries.79,80

RNA and DNA Modiication Enzymes Discovery of RNA Modiication Enzymes

he irst evidence for existence of enzymes able to modify nucleic acids at the polymer level came in 1962-63. Ater incubating transfer RNA with E. coli cell extract and S-AdoMet labeled in the methyl group, three groups81-83 demonstrated independently that radioactivity appeared in methylated bases in RNA. he irst identiied modiication enzyme82 was tRNA:m5U54 methyltransferase, now designated TrmA in Bacteria and Trm2 in Eukarya. Soon ater followed the discovery of similar activities for other methyl transfers speciic for the formation of m1G, m7G, m2A, m6A, m2,2G and m5C in E. coli transfer RNAs84 and four additional distinct activities for

Nucleic Acids Are Not Boring Long Polymers of Only Four Types of Nucleotides

13

the formation of m6A, m6,6A, m7G and m5C in ribosomal RNA.85 hese simple but important experiments illuminated a new feature of RNA metabolism, namely that methyl group incorporation can take place ater polymerization and not, as was shown earlier (1958) for bacteriophage DNA, by incorporation of deoxyribonucleotide triphosphate analogs (such as m5dCTP) during replication.37 Since the pioneering work on RNA methylation, all subsequent modiications identiied in RNAs of many diferent types of cells have been found to occur the same way, i.e., by enzymatic posttranscriptional alteration of a base and/or of the ribose at the RNA precursor level. Many other RNA processing enzymes catalyzing reactions as diverse as 5’- and 3’-trimming, 5’-capping, RNA-splicing, CCA and polyA addition likewise act posttranscriptionally (RNA maturation process, see chapter by Hall and Li). he precise interplay of these various types of RNA alterations allows in ine to produce fully mature RNAs with many new chemical ‘decorations’ as described in preceding paragraphs. Later in 1975, a completely diferent type of RNA methyltransferase using 5,10-methylene tetrahydrofolate (CH2-THF) instead of S-AdoMet as methyl donor was discovered in Streptococcus faecalis (ref. 86, and chapter by Myllykallio et al). hus while S-AdoMet is by far the major cellular source of methyl groups (and is oten called the ‘universal methyl donor’), an alternative solution exists for methylating RNAs. An enzyme catalyzing the insertion of a guanine in tRNA (via a transglycosylation reaction)87 was identiied in rabbit erythrocytes 1973-75. It was only few years later that the physiological function of this ‘G’-inserting enzyme was discovered:88-89 the insertion of a deazaguanine derivative in the anticodon of few selected tRNAs. his enzyme, now designated tRNA-guanine-34 insertase (abbreviated tgt), removes the encoded guanine located at the irst position of the anticodon of precursor tRNA by cleaving the canonical C1-N1 glycosidic bond and inserting in its place a premodiied 7-deazaguanosine derivative precursor (or a guanine as in the original observation of Farkas and coworkers;87 see chapter by Iwata-Reuyl and de Crécy-Lagard). A similar type of enzyme was recently found in Archaea (in 1997). In this case,90 formation of archaeosine (G+, another type of deazaguanine derivative) at position 15 in the D-loop of archaeal tRNAs depends on a similar, phylogenetically related tRNA-guanine-15 insertase designated a-tgt. It should be mentioned that formation of pseudouridine in RNA proceeds by a similar mechanism, except that it is the genetically encoded uracil base that is replaced in RNA ater a 180° rotation and reformation of a noncanonical C1-C5 glycosidic bond (cis-transglycosylation or isomerization reaction—see chapter by Mueller and Ferre d’Amare). Another remarkable recent discovery (1996-97) is that some RNA modiication enzymes are ‘guided by RNA’. his was irst demonstrated in the case of enzymatic formation of 2’-O-methyl ribose in yeast and mammalian rRNAs,91,92 immediately followed by the same discovery in the case of Psi formation also in rRNAs.93,94 his observation has since been extended to the formation of 2’-O-methylribose and Psi in many other RNAs (tRNAs, snrRNAs, snoRNAs) of Eukarya and/ or Archaea; however, neither bacteria, nor organelles examined so far use this ‘RNA-assisted’ type of enzyme, in fact ‘RNA-assisted’ multiprotein enzymatic complex (see chapters by Gagnon et al, by Grozdanov and Meier and by Karijolich et al). Note that a given ribose methylation or uridine isomerization in RNA can be mediated by a ‘classical’ all–protein enzyme in one organism, while in another organism the same modiication is catalyzed by the RNA-assisted multiprotein machinery (see for examples refs. 95-99). his observation raises interesting questions about the evolutionary pressures that favour one type of RNA modiication system over the other. Perhaps the main advantage for a cell using an ‘RNA-assisted’ enzyme machinery instead of an ‘only protein’ enzyme is that, with only few proteins (besides the enzyme) required for elaborating the RNA-guided RNA machineries, and with a huge array of guide RNAs (of which the sequence is more versatile than that of proteins) many more nucleosides in RNAs can be targeted. However, this might not be the sole advantage (discussed in chapters by Gagnon et al, by Grozdanov and Meier and by Karijolich et al).

14

DNA and RNA Modii cation Enzymes

Discovery of DNA Modiication Enzymes

At almost the same time as tRNA:m5U54 methyltransferase was discovered (1963), however before the irst sequence-speciic restriction enzyme was identiied8,12 (and the importance of restriction/modiication self-defence mechanism in bacteria was recognized), enzymatic activities for ‘post-replicative’ methylation in polymeric DNA were beginning to be identiied.100 Partially puriied S-AdoMet-dependent methyltransferases of E. coli were shown to catalyze the formation of m5dC and m6dA in double-stranded DNA.101 Similar enzymes were subsequently identiied in many other types of bacterial and eukaryotic cells, as well as in certain bacteriophages (reviewed in refs. 40,41; and chapters by Coin et al; by Cheng and Blumenthal and by Jeltsch and Jurkowski). Enzymatic post-replicative DNA glucosylation (in fact formation of hyper-modiied glucopyranosyloxymethyluracil, base J) was discovered with DNA of bacteriophages38,39,102 before it was found in eukaryotic DNA103, 104 (see chapter by Sabatini et al). In the 1980s, enzymes catalyzing formation of m4dC in bacteria were discovered,44 and also a new family of demethyl/dealkyl-methylases acting on both RNA and DNA (AlkB family of enzymes—see chapter by Falnes et al). Another family of dual-enzymes exists, catalyzing the conversion of C-to-U in single-stranded DNA or RNAs and cellular mRNAs (Apobec deaminases—see chapter by Smith). hese deaminases play an essential role in cellular defense against viruses and allow new opportunities for variability in gene expression

Conclusion and Future Prospects

DNA and RNA are key cellular polymers in all organisms. To fulil their multiple functions, these molecules need more than just four canonical nucleosides. To date more than one hundred of chemically distinct noncanonical modiied nucleosides have been identiied in nucleic acids of many diferent organisms of the three domains of life (although mesophilic free-living bacteria and viruses have received most attention). he majority of these modiied nucleosides occur in RNAs, especially tRNAs. However, the organisms that have been explored represent only a tiny fraction of extant terrestrial taxa. he analysis of nucleic acids of more organisms, especially of the many types of extremophiles (oten Archaea) is consequently very likely to reveal additional peculiar ‘decorations’ of nucleic acids. Another limitation is the type of RNA species that can be examined. Some, such as mRNA, sn(o)RNA, microRNA, and viral RNA, are hard to isolate in suicient amounts for unambiguous identiication of their modiied nucleoside content (see however refs. 105-109). Hopefully, technical developments, including a new generation of very sensitive mass spectrometers, will help the identiication of new modiied (deoxy)ribonucleosides, their ine structures, and most importantly their distributions (pattern of modiication/ identity card) among many diferent nucleic acids (RNA and DNA) of the three domains of life. To account for the many diferent modiied (deoxy)ribonucleosides identiied so far in different types of nucleic acids, a correspondingly large number of diferent enzymes with distinct speciicities must exist. Already 130 RNA-modiication enzymes are catalogued in MODOMICS (end 2008). hey correspond to more than one hundred distinct types of chemical reactions, most of which are S-AdoMet-dependent methylations of a base or a base already modiied, or the 2’-hydroxyl of ribose (see Appendix 1 by Rother et al). In the case of DNA-modiication enzymes, due to their considerable interests (and commercial values) in relation to restriction/modiication process, the few DNA-methyltransferases from many diferent organisms have been characterized, puriied and studied (see Appendix 1 by Rother et al). he number of identiied RNA or DNA modiication enzymes is increasing very fast, and within the next decade we might reasonably expect it to double or triple. How many diferent DNA/RNA modiication enzymes exist in a given cell is still diicult to estimate, and of course, how many such enzymes exist in all types of living organisms is impossible to predict. Nowadays, we have techniques that allow identiication and characterization of both genes and corresponding modiication enzymes. he enzymes can be produced in recombinant form and studied in vitro to identify their mechanism and speciicity, as well as their crystal structure. he next challenges will be to understand how all these enzymatic activities are coordinated/

Nucleic Acids Are Not Boring Long Polymers of Only Four Types of Nucleotides

15

regulated in the cell, where each individual reaction occurs within the cellular milieu, how enzymes are organized in complexes with other proteins of the nucleic acid maturation process, how post-replicative and post-transcriptional processes emerged and diversiied within each of the three domains of life, and—most importantly—what are the functions of these entire ‘dam’ modiied nucleosides in RNA and DNA. Nucleic acids are emphatically not ‘boring long polymers of only four nucleotides’. he purpose of this book is precisely to respond, at least in part, to the important questions that they raise.

Acknowledgements

HG is Emeritus Scientist at University of Paris-XI in Orsay, working in the laboratory of Professor Jean-Pierre Rousset who is acknowledged for his kind hospitality. I deeply acknowledge critical reading of this manuscript by Prof. Andrew Hanson (University Florida, Gainesville) and by Kristian Rother (Laboratory of Bioinformatics, Warsaw).

References

1. Avery OT, MacLeod CM, McCarthy M. Studies on the chemical nature of the substance inducing transformation of pneumococcal types. J Exp Med 1944; 79:137-158. 2. McCarthy M, Avery OT. Studies on the chemical nature of the substance inducing transformation of pneumococcal types: II. Efect of deoxyribonuclease on the biological activity of the transforming substance. J Exp Med 1946; 83:89-96. 3. Hotchkiss RD, Marmur J. Double marker transformation as evidence of linked factors in deoxyribonucleate transforming agents. Proc. Natl Acad Sci USA 1954; 40:55-60. 4. Chargaf E. Structure and function of nucleic acids as cell constituents. Fed Proc 1951; 10:654-659. 5. Watson JD, Crick FHC. A structure for deoxyribose nucleic acid. Nature 1953; 171:737-738. 6. Watson JD, Crick FHC. General implications of the structure of deoxyribonucleic acid. Nature 1953; 171:964-967. 7. Meselson M, Yuan R. DNA restriction enzyme from E. coli. Nature 1968; 217:1110-1114. 8. Smith HO, Wilcox KW. A restriction enzyme from hemophilus-inluenza. I. Puriication and general properties. J Mol Biol 1970; 51:379-391. See also the paper by Danna K, Nathans D. Speciic cleavage of simian virus 40 DNA by restriction endonuclease of Hemophilus inluenzae. Proc Natl Acad Sci 1971; 68:2913-2917. 9. Sanger F, Nicklen S, Coulson AR. DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci USA 1977; 4:5463-5467. 10. Maxam A, Gilbert W. A new method for sequencing DNA. Proc Natl Acad Sci USA 1977; 74:560-564 11. Berg P Dissections and reconstructions of genes and chromosomes. Science 1981; 213:296-303. 12. Roberts RJ. How restriction enzymes became the workhorses of molecular biology. Proc Natl Sci USA 2005; 102:5905-5908. 13. Singh H, Lane BG. he separation, estimation and characterization of alkali-stable derived from commercial ribonucleate preparation. Can J Biochem 1964; 42:87-93. 14. Holley RW, Apgar J, Merrill SH. Evidence for the liberation of a nuclease from human ingers. J Biol Chem 1961; 236:PC42. 15. Marko AM, Butler GC. he isolation of sodium deoxyribonucleate with sodium dodecyl sulphate. J Biol Chem 1951; 190:165-176. 16. Colter JS, Brown RA. Preparation of nucleic acids from Ehrlich ascites tumor cells. Science 1956; 123:1077-1078. 17. Hoagland MB, Stephenson ML, Scott JF et al. A soluble RNA intermediate in Protein Synthesis. J Biol Chem 1958; 231:241-257 18. Crick FHC. On protein synthesis. Symp Soc Exp Biol 1958; 12:138-163 19. Crick FHC. Codon-Anticodon pairing: he Wobble hypothesis. J Mol Bio 1966; 19:184-191. 20. Hoagland MB, Stephenson ML, Scott JF et al. A soluble ribonucleic acid intermediate in protein synthesis. J Biol Chem 1958; 231:241-257. 21. Jacob F, Monod J. Genetic regulatory mechanisms in the synthesis of proteins. J Mol Biol 1961; 3:318-356. 22. Brenner S, Jacob F, Meselson M. An unstable intermediate carrying information from genes to ribosomes for protein synthesis. Nature 1961; 190:576-581. 23. he Genetic Code. Vol 31. Cold Spring Harbor Symp Quant Biol 1966. 24. Holley RW, Apgar J, Everett GA et al. Structure of a ribonucleic acid. Science 1965; 147:1462-1465. 25. Madison JT, Everett GA, Kung H. Nucleotide sequence of a yeast tyrosine tRNA. Science 1966; 153:531-534.

16

DNA and RNA Modii cation Enzymes

26. Zachau HG, Dutting D, Feldman H. Nucleotidsequenzen zweier serin speziischer tRNA. Angew Chem 1966; 78:392-393. 27. RajBhandary UL, Chang SH, Stuart A et al. Studies on polynucleotide: he primary structure of yeast phenylalanine tRNA. Proc. Natl Acad Sci 1967; 57:751-758. 28a. Kim SH, Suddath FL, Quigley GJ et al, hree-dimensional tertiary structure of yeast tRNA. Science 1974; 185:435-440. 28b. Robertus JD, Ladner JE, Finch JT et al. Structure of yeast phenylalanine tRNA at 3 angstroms resolution. Nature 1974; 250:546-551. 29. Johnson TB, Coghill RD. he discovery of 5-methyl-cytosine in tuberculinic acid, the nucleic acid of the Tubercle bacillus. J Am Chem Soc 1925; 47:2838-2844. 30. Vischer E, Zamenhof S, Chargaf E. Microbial nucleic acids: the desoxypentose nucleic acids of avian tubercle bacilli and yeast. J Biol Chem 1949; 177:429-438. 31. Hotchkiss RD. he quantitative separation of purines, pyrimidines and nucleosides by paper chromatography. J Biol Chem 1948;175:315-332. 32. Wyatt GR. Occurrence of 5-methylcytosine in nucleic acids. Nature 1950; 166:237-238. 33. Dunn DB, Smith JD. he occurrence of 6-methylaminopurine in microbial deoxyribonucleic acids. Nature London 1955; 175:336-339, and Biochem J 1958; 68:627-636. 34. Wyatt GR, Cohen SS. he base of the nucleic acids of some bacterial and animal viruses: the occurrence of 5-hydroxymethylcytosine. Biochem J 1953; 55:774-782. 35. Kallen RG, Simon M, Marmur J. he occurrence of a new pyrimidine base replacing thymine in a bacteriophage DNA: 5-hydroxymethyluracil. J Mol Biol 1962; 5:248-250. 36. Takahashi I, Marmur J. Replacement of thymidylic acid by deoxyurydilic acid in the DNA of a transducing phage for B. subtilis. Nature 1963; 197:794-795. 37. Bessman MJ, Lehman IR, Adler J et al. Enzymatic synthesis of DNA. 3. he incorporation of pyrimidine and purine analogues into DNA. Proc Natl Acad Sci USA 1958; 44:633-640. 38. Lehman IR, Pratt EA. On the structure of the glucosylated hydroymethylcytosine nucleotides of coliphages T2, T4 and T6. J Biol Chem 1960; 235:3254-3259. 39. Takahashi I, Marmur J. Glucosylated DNA from a transducing phage for B. subtilis. Biochem Biophys Res Commun 1963; 10:289-292. 40. Warren RAJ. Modiied bases in bacteriophage DNAs. Ann Rev Microbiol 1980; 34:137-158 (review). 41. Gommers-Ampt JH, Borst P. Hypermodiied bases in DNA. FASEB J 1995; 9:1034-1042 (review). 42. Kirnos MD, Khudyakov IY, Alexandruschkina NI et al. 2-aminoadenine in an adenine substituting for a base in S-2L cyanophage DNA. Nature 1977; 369-370. 43. Khudyakov IY, Kirnos MD, Alexandrushkina NI et al. Cyanophage S-2L contains DNA with 2,6-diaminopurine substituted for adenine. Virology 1978; 88:8-18. 44. Janulaitis A, Klimasauskas S, Petrusyte M et al. Cytosine modiication in DNA by BcnI methylase yields N4-methylcytosine. FEBS Lett 1983; 161:131-134. 45. Ehrlich M, Gama-Sosa MA, Carreira LH et al. DNA methylation in thermophilic bacteria: N4-methylcytosine and N6-methyladenine. Nucl Acids Res 1985; 13:1399-1412. 46. Ehrlich M, Wilson GG, Kuo KC et al. N4-methylcytosine as a minor base in bacterial DNA. J Bact 1987; 169:939-943. 47. Grosjean H, Oshima T. How nucleic acids cope with high temperature. In: Gerday C, Glansdorf N, eds. Physiology and Biochemistry of Extremophiles. Washington, DC: ASM Press, 2007:39-56. 48. Zhou X, He X, Liang J et al. A novel DNA modification by sulphur. Mol Microbiol 2005; 57:1428-1438. 49. Wang L, Chen S, Xu T et al. Phosphothioation of DNA In bacteria by dnd genes. Nature Chem Biol 2007; 3:709-710. 50. Eckstein F. News and views: Phosphorothioation of DNA in bacteria. Nature Chem Biol 2007; 3:689-670. 51. Cohn WE, Volkin E. Nucleoside-5’-phosphates from ribonucleic acid. Nature 1951; 167:483-484. 52. Davis FF, Allen FW. Ribonucleic acids from yeast which contain a ith nucleotide. J Biol Chem 1957; 227:907-915. 53. Cohn WE. 5-Ribosyl uracil, ribofuranyl nucleoside in RNA. Biochim. Biophys Acta 1959; 32:569-571. 54. Smith JD, Dunn DB. An additional sugar component of RNA. Biochim Biophys Acta 1959; 31:573-575. 55. Lane BG, Butler GC. he isolation, identiication and properties of dinucleotides from alkali hydrolyzates of RNA. Can J Biochem Physiol 1959; 37:1329-1350. 56. Littleield JW, Dunn DB. he occurrence and distribution of thymine and three methylated adenine bases in RNA from several sources. Biochem J 1958; 70:642-651.

Nucleic Acids Are Not Boring Long Polymers of Only Four Types of Nucleotides

17

57. Adler M, Weissmann B, Gutman AB. Occurrence of methylated purine bases in RNA. J Biol Chem 1958; 230:717-723. 58. Smith JD, Dunn DB. he occurrence of methylated guanines in ribonucleic acids from several sources. Biochem J 1959; 72:294-301. 59. Dunn DB. Additional components in RNA of rat liver fractions. Biochim Biophys Acta 1959; 34:286-288. 60. Hall RH. he Modiied Nucleosides in Nucleic Acids. New York/London: Columbia University Press, 1971. 61. Limbach PA, Crain PF, McCloskey JA. Summary: the modiied nucleosides of RNA. Nucl Acids Res 1994; 22:2183-2196. 62. McCloskey JA, Rozenski J. he small subunit rRNA modiication database. Nucleic Acids Res 2005; 33:D135-138. 63. Piekna-Przybylska D, Decatur WA, Fournier MJ. New bioinformatics tool for analysis of nucleotide modiications in eukaryotic rRNA. RNA 2007; 13:1-8. 64. Piekna-Przybylska D, Decatur WA, Fournier MJ. he 3D rRNA modiication maps database: with interactive tools for ribosome analysis. Nucl Acids Res 2008; 36:D178-183. 65. Woese CR, Kandler O, Wheelis ML. Towards a natural system of organisms: proposal for the domains Archaea, Bacteria and Eukarya. Proc Natl Acad Sci USA 1990; 87:4576-4579. 66. Gehrke CW, McCune RA, Gama-Sosa MA et al. Quantitative reversed-phase high-performance liquid chromatography of major and modiied nucleosides in DNA. J Chromatogr 1984; 301:199-219. 67. Gehrke CW, Kuo KC. Ribonucleoside analysis by reversed-phase high-performance liquid chromatography. J Chromatogr 1989; 471:3-36. 68. Grosjean H, Keith G, Droogmans L. Detection and quantiication of modiied nucleotides in RNA using thin-layer chromatography. In: Gott JM, ed. RNA Interference, Editing and Modiication - Methods in Molecular Biology. Totowan: Humana Press, 2004; 265:357-392. 69. Wagner TM, Nair V, Guymon R et al. A novel method for sequence placement of modiied nucleotides in mixtures of tRNA. Nucleic Acids Symp Series 2004; 48:263-264. 70. Gott JM. Methods in Enzymology. Vols. 424 and 425. Academic Press-Elsevier, 2007. 71. Cermakian N, Cedegren R. Modiied nucleosides always were: an evolutionary model. In: Grosjean H, Benne R, eds. Modiication and Editing of RNA. Washington DC: ASM Press, 1998:535-541. 72. Anantharaman V, Koonin EV, Aravind L. Comparative genomics and evolution of proteins involved in RNA metabolism. Nucl Acids Res 2002; 30:1427-1464. 73. Uurbonavicius J, Auxilien S, Walbott E et al. Acquisition of a bacterial RumA-type tRNA(uracil-54,C5)-methyltransferase by Archaea through an ancient horizontal gene transfer. Mol Microbiol 2008; 67:323-333. 74. Jühling J, Mörl M, Hartmann V et al. tRNAdb 2009: compilation of tRNA sequences and tRNA genes. Nucleic Acids Res 2009; 37, Database issue: D159-D162. 75. Grosjean H, Sprinzl M, Steinberg S. Posttrancriptionally modiied nucleosides in tRNA: their locations and frequencies. Biochimie 1995; 77:139-141. 76. de Crécy-Lagard V, Marck C, Grosjean H. Comparative RNomics and modomics in mollicutes: prediction of gene function and evolutionary implications. IUBMB Life 2007; 59:634-658 77. Grosjean H, Gupta R, Maxwell S. Modiied nucleotides in archaeal RNAs. In: Blum P, ed. Archaea, New Models for Prokaryotic Biology. Norwich: Horizon Press, 2008:164-196; www.caister.com. 78. Grosjean H, Gaspin C, Marck C et al. RNomics and Modomics in the halophile Haloferax volcanii: identiication of RNA modiication genes. BMC Genomics 2008; 9:470-496. 79. Omer AD, Ziesche S, Decatur WA et al. RNA-modifying machines in archaea. Mol Microbiol 2003; 48:617-629. 80. Muller S, Charpentier B, Branlant C et al. A dedicated computational approach for the identiication of archaeal H/ACA sRNAs. Methods Enzymol 2007; 425:355-387. 81. Svensson I, Boman HG, Eriksson KG et al. Studies on microbial RNA: Transfer of methyl groups from methionine to soluble RNA from E. coli. J Mol Biol 1963; 7:254-271. 82. FleissnerE, Borek E. A new enzyme of RNA synthesis: RNA methylase. Proc Natl Acad Sci USA 1962; 48:1199-1203. 83. Starr JL. he incorporation of methyl groups into amino acid transfer ribonucleic acid Biochem Biophys Res Comm 1963; 10:175-180. 84. Hurwitz J, Gold M, Anders M he enzymatic methylation of RNA and DNA. 3. Puriication of soluble RNA-methylating enzymes. J Biol Chem 1964; 239:3462-3473. 85. Hurwitz J, Anders M, Gold M et al. he enzymatic methylation of RNA and DNA. 7. he methylation of ribosomal RNA. J Biol Chem 1965; 240:1256-1266 . 86. DelkAS, Rabinowitz JC. Biosynthesis of ribosylthymine in the tRNA of S. faecalis: a folate-dependent methylation not involving S-adenosylmethionine. Proc Natl Acad Sci 1975; 72:528-530.

18

DNA and RNA Modii cation Enzymes

87. Farkas WR, Hankins WD, Sing R. he guanylation of tRNA: an enzymatic reaction. Biochim Biophys Acta 1973; 294:94-105. 88. Okada N, Harada F, Nishimura S. Speciic replacement of Q-base in the anticodon of tRNA by guanine catalyzed by a cell-free extract of rabbit reticulocytes. Nucl Acids Res 1976; 3:2593-2603 . 89. Itoh YH, Itoh T, Haruna I et al. Substitution of guanine for a speciic base in tRNA by extracts of Ehrlich ascites tumor cell. Nature 1977; 267:467. 90. Watanabe M, Matsuo M, Tanaka S et al. Biosynthesis of archaeosine, a novel derivative of 7-deazaguanosine speciic to archaeal tRNA, preceeds via a pathway involving base replacement in the tRNA polynucleotide chain. J Biol Chem 1997; 272:20146-20151. 91. Kiss-Laszlo Z, Henry Y, Bachellerie JP et al. Site-speciic ribose methylation of preribosomal RNA: a novel function for small nucleolar RNAs. Cell 1996; 85:1077-1088. 92. Nicoloso M, Qu LH, Michot B et al. Intron-encoded, antisense small nucleolar RNAs: the characterization of nine novel species points to their role as guides for 2’-O-ribose methylation of rRNAS. J Mol Biol 1996; 260:178-195. 93. Ni J, Tien AL, Fournier MJ. Small nucleolar RNAs direct site-speciic synthesis of pseudouridines in ribosomal RNA. Cell 1997; 89:565-573. 94. Ganot P, Bortolin ML, Kiss T. Site-speciic pseudouridine formation in preribosomal RNA is guided by small nucleolar RNA. Cell 1997; 89:799-809. 95. Bonnerot C, Pintard L, Lutfalla G. Functional redundancy of Spb1p and a snR52-dependent mechanism for the 2’-O-ribose methylation of a conserved rRNA position in yeast. Mol Cell 2003; 12:1309-1315. 96. Renalier MH, Joseph N, Gaspin C et al. he Cm56 tRNA modiication in archaea is catalyzed either by a speciic 2’-O-methylase, or a C/D sRNP. .RNA 2005;11:1051-1063. 97. Ma X, Yang C, Alexandrov A et al. Pseudouridylation of yeast U2 snRNA is catalyzed by either an RNA-guided or RNA-independent mechanism. EMBO J 2005; 24:2403-2413. 98. Gurha P, Joardar A, Chaurasia P et al. Diferential roles of archaeal box H/ACA proteins in guide RNA-dependent and independent pseudouridine formation. RNA Biol 2007; 4:101-109. 99. Decatur WA, Schnare MN. Diferent mechanisms for pseudouridine formation in yeast 5S and 5.8S rRNAs. Mol Cell Biol 2008; 28:3089-3100. 100. Gold M, Hurwitz J, Anders M. he methylation of RNA and DNA. II. On the species speciicity of the methylation enzymes. Proc Natl Acad Sci USA 1963; 50:164-169. 101. Gold M, Hurwitz J. Enzymatic methylation of ribonucleic acid and deoxyribonucleic acid.V. Puriication and properties of DNA-methylating activity of E. coli. J Biol Chem 1964; 239:3858-386. 102. Kornberg SR, Zimmerman SB, Kornberg A. Glucosylation of deoxyribonucleic acid by enzymes from bacteriophage-infected E. coli. J Biol Chem 1961; 236:1487-1493. 103. Rae P, Steele R. Modiied bases in the DNAs of unicellular eukaryotes. Biosystems 1978; 10: 37-53. 104. Borst P, Sabatini R. Base J: Discovery, Biosynthesis and possible Functions. Ann Rev Microbiol 2008; 62:235-251. 105. Yu B, Yang Z, Li J et al. Methylation as a crucial step in plant microRNA biogenesis. Science 2005; 307:932-935. 106. Ebhardt HA, hi EP, Wang MB et al. Extensive 3’ modiication of plant small RNAs is modulated by helper component-proteinase expression. Proc Natl Acad Sci 2005; 102:13398-13403. 107. Ohara T, Sakaguchi Y, Suzuki T et al. he 3’ termini of mouse Piwi-interacting RNAs are 2’-O-methylated. Nature Struct Biol 2007; 14:349-350. 108. Kawahara Y, Megraw M, Kreider E et al. Frequency and fate of micro-RNA editing in human brain. Nucl Acids Res 2008; 36:5270-5280. 109. Habig J, Taraka D, Bass B. mi-RNA editing, we should have inosine this coming. Molec Cell 2007; 25:712-713.

Chapter 2

DNA Methylation: From Bug to Beast Stephanie R. Coin, Benjamin A. Youngblood and Norbert O. Reich*

Abstract

I

n this chapter, the history of DNA modifying enzymes is briely summarized with a focus on DNA methyltransferases. he current understanding of methylation in prokaryotes and eukaryotes is summarized and recent indings in both areas discussed. he future outlook for research for both kingdoms will be articulated.

Introduction

Essentially all biological processes involving DNA such as replication, transcription, recombination, transposition and modiication require proteins that interact with speciic DNA sequences. A thorough understanding of the proteins and enzymes that contribute to this elegant, albeit complex recognition system that comprises all kingdoms of life has been investigated predating the landmark discovery of the structure of DNA in 1953.1 DNA modifying enzymes play key roles in virtually all of these biological processes; from helicases unwinding the DNA so that replication/ transcription can occur,2 to ligases that repair or link disconnected DNA strands.3 One of the most intriguing subields within the DNA modifying enzyme family is that of DNA methylation. his base-speciic modiication contributes to gene regulation, genomic imprinting and other biological pathways. In this introductory chapter, we will focus mainly on the history, understanding and scientiic outlook for methyltransferases in both prokaryotic and eukaryotic organisms. Although the first discovery of 5-methyl-cytosine in 1925 involved DNA obtained from Tubercle bacillus,4 the mechanisms of eukaryotic DNA methylation was not described until a second observation was made by Hotchkiss in 1948 on calf-thymus DNA using paper chromatography.5 Host-controlled modiications in bacteria that programmed for degradation of foreign genomes while protecting the host genome were then described in the 1950s and later became known as restriction-modiication systems.6 Further, the occurrence and importance of DNA and RNA methylation in eukaryotes was also reported during this period.7 he underlying mechanisms of DNA restriction and modiication in prokaryotes were irst described using bacteriophage λ in various hosts in 1962.8 he restriction-modiication hypothesis was further corroborated by the identiication of the irst methyltransferase in E. coli.9 he two enzymes that contribute to the restriction (cleavage) and modiication (methylation) of DNA were given the names endonuclease and methyltransferase because of their respective modiications to the DNA.10 hese discoveries, in conjunction with the irst puriication11 and application12 of a restriction endonuclease resulted in a jointly awarded 1978 Nobel Prize in Physiology or Medicine for Werner Arber, Hamilton Smith and Daniel Nathans. Soon ater the discovery of prokaryotic restriction endonucleases, Herbert Boyer and Stanley Cohen invented the now common biochemical technique of DNA cloning which exploited the use of such enzymes to allow genes to be transferred between species.13 his discovery combined *Corresponding Author: Norbert O. Reich—Biomolecular Science and Engineering Program and the Department of Chemistry and Biochemistry University of California, Santa Barbara, California, 93103. Email: [email protected]

DNA and RNA Modiication Enzymes: Structure, Mechanism, Function and Evolution edited by Henri Grosjean. ©2009 Landes Bioscience.

20

DNA and RNA Modii cation Enzymes

with over-expression and puriication techniques has led to both structural and mechanistic characterizations of many proteins, including the focus of this book, the RNA and DNA modifying enzymes. In fact, the ield of biotechnology as it is known today would not be possible without the ability to enzymatically manipulate DNA. Further, future successes in biotechnology will be correlated with understanding how these enzymes function. DNA modifying enzymes ofer unique opportunities to be used as tools to answer fundamental questions regarding enzyme function and DNA recognition. herefore, scientists continue to investigate DNA modifying enzymes; this book summarizes the current eforts being made in this regard.

Epigenetic Methylation

Epigenetic regulation occurs by changes in gene expression as a result of modiications made ater the genetic sequence has been established. Methylation is the best understood epigenetic DNA modiication. here are two kinds of epigenetic methylation: the initial laying down of the methylation pattern on a chromosome, or de novo methylation, is distinct from the preservation of this pattern or maintenance methylation. Although the relevance of DNA methylation to diverse organisms and biological pathways continues to expand, the mechanisms which determine site location and modiication remain poorly understood. A signiicant efort remains focused on determining how these enzymes both individually and in concert with other cellular factors, function to modify DNA.

Prokaryotic DNA Methylation

DNA methyltransferases (MTases) are a family of enzymes responsible for transferring a methyl group from the cofactor S-adenosyl methionine (AdoMet) to adenines at the N-6 position, or cytosines at the N-4 or C-5 position which occurs predominately in duplex DNA. Although all MTases share a set of common motifs to bind AdoMet (X-I-II-III) and catalyze the methyltransfer (IV, V, VI, VII, VIII, IX), the relative position of these motifs further distinguishes the exocyclic MTases into various classes: α, β, γ, ζ, δ, ε, with the α- β- and γ-class enzymes being most common (Fig. 1). Interestingly, motif IX is not found in the exocyclic MTases whereas it is involved in the proper folding of the TRD within C-5 endocyclic MTases as was revealed by domain swapping experiments.14,15 Prokaryotic MTases frequently co-exist with a cognate restriction endonuclease to form the restriction-modiication system present in bacteria.16 hese systems protect the cell from invading DNA, although alternative functions have been proposed and tested.17 he restriction endonuclease cleaves foreign DNA at speciic sites also recognized by the cognate MTase and generates double strand breaks thereby rendering the invading DNA inactive. In order to distinguish self from nonself DNA, the MTase modiies the native DNA, thus protecting it from cleavage by the restriction endonuclease. Over 4000 restriction-modiication systems have been identiied bioinformatically, 3800 of which are characterized to some extent.18 Many of these systems are heavily used in the biotechnology industry. he study of endonucleases and methyltransferases has also provided insights into the mechanisms of sequence-speciic nucleic acid modiication which have found application in many other ields. A few examples of well characterized, prokaryotic MTases will be discussed. he best characterized bacterial MTase was irst cloned from a restriction modiication system found in Haemophilus haemolyticus.19 M.HhaI recognizes the sequence 5ʹ-GCGC-3ʹ and methylates the C-5 position of the irst cytosine within this sequence. M.HhaI was the irst MTase to be crystallized with its cofactor AdoMet20 and has since been crystallized with numerous DNA substrates.21-24 Additionally, a high resolution wild-type crystal structure has made it possible for numerous mutants with perturbed kinetic and thermodynamic parameters to be structurally characterized.25-30 Perhaps the most intriguing observation made by the acquisition of the M.HhaI ternary complex structure was the 180˚ extra-helical lipping out of the target cytosine base (see chapter by Klimasauskas and Liutkeviciute in this book). he ten conserved motifs found within the M.HhaI structure serve as a scafold to which other MTases can be compared and aligned with. Additionally, because M.HhaI is a cytosine speciic MTase, the mechanistic and structural relevance to the human methyltransferases is compelling. Extensive studies on the speciic and

DNA Methylation: From Bug to Beast

21

Figure 1. Motif arrangements of bacterial methyltransferases. The linear arrangement of motifs involved in AdoMet binding (■) and catalysis (■) are shown.

catalytic mechanism of this enzyme have provided insights into the mechanisms of the entire class of DNA cytosine methyltransferases.31,32 Orphan methyltransferases, which lack a cognate endonuclease, are found in bacteria but do not function as a part of a restriction-modiication system. he cell-cycle regulated MTase (CcrM), irst identiied in Caulobacter cresentus, is responsible for methylating the adenine in the sequence 5ʹ-GANTC-3ʹ and is commonly found in α-proteobacteria.33,34 CcrM is essential for viability.35 Expression of CcrM occurs in the predivisional cell and is later degraded prior to replication.33 Tight control of CcrM expression is thought to aid in replication timing, cellular structure and cellular division as exposure to CcrM throughout the cell cycle caused deiciencies in each of these areas.33,34 Expression of DnaA, a transcription regulator that is required for DNA replication initiation,36 was found to be dependent on the methylation state of two GANTC sites within the dnaA promoter.37 his inding provides a direct link between the methylation state of the chromosome and DNA replication timing. In a broader sense, it expands the role of DNA methylation in prokaryotes from merely a means for protection as seen with restriction-modiication MTases to regulation of transcript production. he DNA adenine MTase found in the γ-proteobacterium E. coli (EcoDam) is also an orphan MTase and methylates the N-6 position of the adenine within the DNA sequence 5ʹ-GATC-3ʹ. EcoDam shares high sequence identity with other orphan methyltransferases found in γ-proteobacteria and, similar to CcrM, is involved in postreplicative mismatch repair, gene regulation, chromosome replication timing and nucleoid structure determination.38,39 Although Dam is not essential for viability in E. coli, homologues have been found to be essential in other organisms.39 Knock-out studies in E. coli have revealed widespread changes in both RNA and protein expression levels upon

22

DNA and RNA Modii cation Enzymes

deletion of the dam gene.40 Further, a growing number of bacterial pathogens have been found to require adenine methylation for virulence41 thus making EcoDam and its homologues viable targets for the design of antibiotics.42 he unique processive mechanism by which EcoDam methylates multiple GATC sites on the same DNA substrate further distinguishes it from restriction-modiication methyltransferases which generally methylate multiple sites in a distributive manner.39 Recently, EcoDam processivity has been demonstrated to be modulated by the composition and amount of lanking DNA surrounding the GATC sites.43,44 A better understanding of processive methylation could result an additional intervention point for antibiotic design (see also chapter by Jeltsch and Jurkowski in this volume).45

Eukaryotic DNA Methylation

Our understanding of DNA methyltransferase structure and function has largely been the result of extensive work on the bacterial enzymes, as summarized above. In contrast, extensive examination of eukaryotic DNA methylation using genetic and molecular biological approaches has only recently been complemented by biochemical dissection of the eukaryotic enzymes. he following section briely summarizes the known role of DNA methylation in eukaryotes and the mechanisms which lead to epigenetic gene regulation. Epigenetic DNA modiications contribute to the exquisite and complex regulatory process of eukaryotic cell diferentiation that occurs in all stages of mammalian life. Heritable changes in gene regulation are observed in the initial steps of mammalian cellular diferentiation during embryogenesis and continue to play a role in cell-lineage speciic gene regulation in terminally diferentiated cells, such as the epigenetic regulation of immune response cytokines during the adaptive immune response of T-cells. he complex process of cellular diferentiation invokes a common theme among the diferent cell types and stages: the regulation of tissue-speciic de novo methylation. he biochemical analyses applied to the prokaryotic homologs of the eukaryotic DNA MTases cannot fully explain the targeting mechanism of the eukaryotic methyltransferases. We will irst discuss the role of DNA methylation in tissue diferentiation, oten referred to as cell fate decisions and provide examples. Next, we will discuss aberrant states of DNA methylation and its association with disease. To conclude, some of the proteins involved in catalyzing DNA methylation and the potential mechanism for target speciicity will be identiied and explained. he mechanism for propagation of heritable DNA methylation patterns during eukaryotic cell division is understood with much greater detail compared to the propagation of heritable histone modiications, thus this section will focus predominantly on recounting the observations and conclusions of biological phenomena associated with DNA methylation.

DNA Methylation: Normal and Aberrant Cellular Diferentiation

DNA methylation along with other epigenetic modiications provide for eukaryotic tissue-speciic gene regulation which is imprinted during embryogenesis.46,47 he importance of the epigenome imprint applied during embryogenesis is emphasized by the many embryonic lethal phenotypes associated with the knock-out of the imprinting enzymes and interpreters, some of which will be discussed in the following section.48 It is generally believed that as cells become terminally diferentiated the imprinted epigenetic code for the cell provides information allowing for expression of the tissue-speciic transcriptome.46,48 Recent reports have described the successful reprogramming of diferentiated cells into cells retaining stem cell-like properties correlating with a revision of the epigenetic imprint.49,50 Also, it has been observed that during an adaptive immune response “fully” diferentiated naïve T-cells utilize changes in DNA methylation to tightly regulate the efector response of the T-cells and then,51-54 following antigen clearance, reprogram the antigen-speciic T-cells to generate memory T-cells.52,53 hese observations have been used to explain some of the improved quality of memory antigen speciic T-cells compared to naïve T-cells.52-54 Moreover, both of these examples clearly show that the epigenetic “status” of cells is both malleable and adaptive. Further, changes in DNA methylation at particular loci in both the adaptive immune response and reprogramming of stem cells suggests that understanding the

DNA Methylation: From Bug to Beast

23

mechanism for speciicity of the eukaryotic DNA methyltransferase(s) will go beyond a “simple” linear searching mechanism along the DNA. As stated previously, properly orchestrated epigenetic modiications are essential for the normal development of mammals.48,55 Furthermore, it is now clear that many human diseases such as the ICF syndrome (severe immunodeiciency associated with mutations in the DNA methyltransferase DNMT3b gene),56 Fragile X syndrome (one of the most common causes of mental retardation associated with expansion of a CGG repeat stimulating DNA methylation and silencing of the FMR1 gene),56 Rett syndrome resulting from mutation of the methyl interpreter MeCP2,57 many diferent cancers (lung, skin and colon associated with the mutagenic potential of 5-methyl cytosine)58,59 and inally the most unavoidable and inluential environmental factor that has an impact on every living being, aging-dependent demethylation60-62 involve or arise from disruption of the cellular machinery involved in epigenetic processes. hese examples of disease arising from aberrant targeting of the methylation machinery provide motivation for better understanding the driving forces behind eukaryotic DNA methyltransferase speciicity.

Genomic Imprinters and Interpreters

Unlike prokaryotes which methylate both adenines and cytosines within duplex DNA, the predominate substrate for nearly all eukaryotic DNA methylation is cytosine at the C-5 position. he role of DNA methylation in eukaryotic biology has received much attention;63,64 yet we are just beginning to scratch the surface of the role and mechanism for tissue-speciic DNA methylation patterns. We will discuss the eukaryotic DNA methyltransferases (DNMT1, DNMT3a, DNMT3b and DNMT2) and the methyl-binding domain (MBD) proteins in more detail (see also chapter by Cheng and Blumenthal in this volume). DNMT1: Maintenance methylation is the propagation of a methylation pattern through the semi-conservative nature of DNA replication during cell division. he biochemical fractionation of eukaryotic methylation activity was accomplished in the late 60’s and was utilized to describe the basic mechanism of CpG methylation.65-68 he preference of the puriied MTase for hemi-methylated DNA, as suggested by Holliday, Pugh63 and Riggs,64 was demonstrated by Gruenbaum et al in 1982.69 his observation served as the foundation for what is now considered epigenetic heritable programming. he discovery provided a mechanism for the heritable transmission of DNA methylation programming following cell division, that is the parental strand which contained the methylated cytosine served as a substrate for the methyltransferase. hus the preference for a hemi-methylated substrate by the MTase promotes the methylation of the daughter strand following DNA replication thereby propagating the methyl pattern during cell division. he study of eukaryotic DNA methylation remained a correlative science until the cloning of the DNA MTase known as DNMT1 which allowed for expression of the enzyme and gene knockout experiments.70 hese studies then established a causal role for DNA methylation in tissue-speciic gene regulation. In addition to the core motifs common to the cytosine MTase catalytic domain, a bioinformatic analysis of DNMT1 revealed several additional motifs in the N-terminal domain of the enzyme associated with protein-protein and protein-nucleic acid interactions (Figs. 1 and 2). Characterization of these motifs should provide further insight into the complex speciicity of the eukaryotic DNMTs. For example, the retinoblastoma protein (Rb) interacts with the N-terminus of DNMT1 and has been shown to inhibit DNMT1 methylation.71,72 It is tempting to postulate that tissue-speciic methylation patterns simply arise from nuclear localization of the associated proteins such as Rb which in turn regulate methyltransferase activity. he list of proteins that are found to directly and indirectly modulate DNMT speciicity has become quite large, providing evidence that the tissue-dependent locus speciicity of the enzyme may arise through interactions with other proteins. For further review and a list of these proteins please see the chapter by Cheng and Blumenthal in this book. Another provocative hypothesis regarding DNMT1 speciicity has invoked the allosteric binding of nucleic acids. Several labs have now demonstrated that a DNMT1 binding of nucleic acid, most likely using the Zn inger motif in the N-terminal domain of the enzyme,73 serves as an allosteric inhibitor.74-76 Finally, the speciicity of the enzyme may be

24

DNA and RNA Modii cation Enzymes

Figure 2. Cartoon representation of eukaryotic C5 methyltransferases. The N-terminal domain contains a motif reported to bind to PCNA, a cysteine rich (CXXC) motif, a DNA replication foci motif (Repli.), a charge rich domain (Charge) and a polybromo-1 (Polyb) homologous region. The C-terminal domain contains the conserved methyltransferase motifs described in the prokaryotic section, along with the catalytic motif (C). The amino acid length of the human protein is listed next to the protein; the mouse amino acid length is in parentheses. Not shown is the oocyte-specific splicing variant of DNMT1. Amino acid lengths for isoform 1 are listed for DNMT3a and 3b.

modulated by splicing variants. his could directly and indirectly modulate the enzyme speciicity by changing the conformation of the folded protein, thus changing its sequence speciicity, or by modulating or even deleting binding sites for accessory proteins/nucleic acids. Only a few splicing variants of DNMT1 are described, most notably an oocyte-speciic splicing variant (DNMT1o) which results in the deletion of the charge-rich portion of the enzyme. DNMT3a: Interestingly, even ater knocking out DNMT1 a basal level of CpG methylation remained suggesting that there existed another CpG methyltransferase.77 he genetic characterization of DNMT1 paved the way for analysis of the de novo methyltransferases. he mammalian DNMT3 family was cloned in 1998.78 he enzyme localizes to the cytoplasm and nucleus and its expression is developmentally regulated.79 To date there are four alternative splicing transcript variants of DNMT3a. DNMT3b: DNMT3b localizes primarily to the nucleus and its expression is developmentally regulated. As mentioned previously mutations in this gene cause the immunodeiciency-centromeric instability-facial anomalies (ICF) syndrome. To date there are 6 known alternative splicing transcript variants encoding for diferent isoforms of the enzyme. DNMT2: he enzymatic function of DNMT2 was met with some controversy. Recently it has been demonstrated that DNMT2 has de novo RNA-methyltransferase activity targeting the cytosine bases in tRNAAsp.80,81 hus it is probable that this enzyme is actually an RNA methyltransferase (see also chapter by Forterre and Grosjean in this volume). Methyl-Binding Interpreters: Interpretation of the cellular methylation program appears to be mediated mostly by a class of proteins referred to as Methyl-Binding Domain (MBD) Proteins.82 Currently there are 6 characterized proteins that are found to directly and indirectly bind to methylated CpG DNA.83 Each of these proteins contains a conserved MBD. heir general mode of action is to locate the methylated substrate and block transcription. At irst glance this appears to be a simple task, but then why are there 6 diferent proteins for this job? It is likely that interpretation and the degree of transcriptional repression are tissue-speciic events. Indeed, it has been observed that splicing variants of the various MBDs are associated with speciic tissues. Further it has also been shown that accessory proteins to the MBDs are expressed in a tissue-speciic manner. For instance, the protein MTA3 forms part of B-cell-speciic transcriptional repressive complex with the Mi-2/NuRD, which in turn associates with MBD2 and MBD3.84 MBD2 serves mainly as a repressor of transcription that binds to methylated promoters. he transcriptional repressive ability of MBD2 occurs by both sterically blocking other proteins from interacting with the DNA and recruiting chromatin remodeling factors. For example, MBD2 has

DNA Methylation: From Bug to Beast

25

Figure 3. Cartoon representation of DNA C5 methyl-binding proteins. Abbreviations: MBD: methyl-binding domain; TRD: Transcription repression domain; GR: glycine and arginine repeats; CXXC: cysteine rich domain; DNA Glyc.: DNA glycosylase domain. E: poly glutamate region.

been shown to associate with histone deacetylases which suggests that MBD2’s repressive ability can be manifested indirectly by directing the closure of nucleosomes.85 Interestingly MBD2 null mice are viable and fertile. his is not so say the MBD2 is unimportant; rather the importance of this protein is more notable in diferentiated cells. During an immune response, naïve CD4 and CD8 T-cell diferentiation is severely impaired in MBD2 null mice.51,52 Generally, this suggests that one of the reasons for multiple MBD proteins is to enhance the tissue-speciic interpretation of the methylation pattern. he MeCP2 protein also serves to repress transcription via binding to methylated CpG. Interestingly MeCP2 is dispensable in ES cells yet mutation of MeCP2 is most notably associated with the neurological disorder known as Rett syndrome. his has been conirmed in MeCP2 null mice which exhibit symptoms of Rett syndrome. his, like the deiciency of T-cells in MBD2 null mice, suggests that the methyl pattern dependent gene expression is temporal and tissue speciic. For further review of methyl-binding proteins, including MBD4 and Kaiso which were not covered here, please see references 86-88.

Future Outlook

he study of bacterial DNA MTases has provided insights and experimental approaches with impact well beyond those originally intended. For example, the observation of base lipping and the various approaches to study this amazing conformational change, originally made with these enzymes, have now been replicated in DNA repair enzymes and enzymes that work on RNA. Our understanding of eukaryotic DNA MTases is signiicantly less mature. However, through the intertwining of the various epigenetic processes and their requisite multi-protein complexes, it has become very apparent that these enzymes make important structural and functional contributions to tissue-speciic gene expression. In this section, we will discuss the scientiic outlook for both bacterial and eukaryotic MTases.

Bacterial DNA Methyltransferases

It is commonly expected that the speciicity of a “well-understood” class of enzymes such as the bacterial methyltransferases, can be rationally re-engineered and that inhibitors (drugs) can be rationally designed to interfere with enzyme function. We aren’t there yet on either account. Although numerous successes in which the speciicity of DNA binding proteins (e.g., zinc ingers) has been

26

DNA and RNA Modii cation Enzymes

shown89 this same phenomenon is not true for any group of enzymes that modify DNA, including the restriction endonucleases and DNA MTases. his is not for lack of trying and some success can be claimed for isolated examples.90 However, the general fascination with enzymes is driven by their combined eiciency (kcat/Km) and speciicity (essentially the same ratio, applied to preferred and nonpreferred substrates). When looked at objectively, claims of re-engineered enzymes fail by these standards. In other words, the “new” enzyme either lacks the eiciency or the speciicity owned by the wild type enzyme. Why is this, why should we try and how might we proceed?

Why Is It So Challenging to Re-Engineer the Speciicity of DNA Methyltransferases?

One could argue that because DNA modifying enzymes are required to manifest high speciicity; the biological consequences of engineered changes are frequently fatal and thus diicult to re-engineer. his argument invokes an underlying evolution-driven selection against modiications that might lead to speciicity changes. Interestingly, the inherent promiscuity observed with all enzymes has frequently been invoked as a basis for the evolution of new speciicities. Why hasn’t this formed the basis of bona ide successes in the drive to re-engineer DNA methyltransferases, even those relying on directed evolution methods? Emerging work on some bacterial methyltransferases suggests a highly interdigitated indirect readout mechanism that drives requisite conformational changes (for example, see chapter by Jeltsch and Jurkowski in this book). Perhaps recapitulating the needed interdigitation for new sequence speciicity is beyond the capabilities of current directed evolution methods?

Why Should We Try to Re-Engineer DNA Methyltransferases?

he practical use of DNA methyltransferases continues to expand and forms a strong motivator for such studies. he use of enzyme-directed methods to selectively label nucleic acids (e.g., SMILing DNA, see chapter by Gider and Weinhold in this book) is currently limited to those enzymes that work eiciently with the modiied cofactors.91 he use of DNA methyltransferases to selectively modify nucleic acids in vivo92 would be aided with the availability of enzymes with diverse sequence-speciicities.

How Might We Re-Engineer DNA Methyltransferases?

his challenge can be parsed into two subcategories: truly new recognition sequences that are not currently available and sequences for which available endonucleases can provide a basis for selection. he latter are largely a proof of principle device, although for certain applications can provide a novel reagent. he assortment of methods now available for directed evolution of new enzyme function include unnatural amino acid mutagenesis, gene shuling, intein-based methods, error prone PCR, cassette mutagenesis and artiicial cell methods. We anticipate that the use of such methods when coupled with detailed structural insights and comparative analysis of related sequences is likely to lead to the desired changes in methyltransferase sequence speciicity.

Inhibitors of DNA Methyltransferases

Following the reports that knocking out certain bacterial DNA adenine MTases results in avirulent or nonviable bacteria,41 eforts were initiated at several drug companies to develop antibacterial approaches based on interfering with these enzymes. his was in part motivated by the realization that humans lack any DNA adenine MTase activity. In spite of this interest, only a single report described the identiication of several classes of inhibitors that work selectively against the bacterial enzymes, based on screening a chemical library of small molecules.42 he availability of several high resolution crystal structures of various enzymes including those involved in regulating virulence genes in human pathogens makes this situation even harder to understand. Nevertheless, the development of new antibiotics is both a technical and commercial challenge. For example, although many suitable and selective leads were identiied in the efort mentioned above, converting these to cell-active compounds is a frequently encountered obstacle. Also, in spite of the public outcry for the need for new antibiotics, many of the large drug companies studiously

DNA Methylation: From Bug to Beast

27

avoid this class of drugs. Finally, despite the extensive eforts by medicinal chemists to develop selective inhibitors of diverse AdoMet-dependent enzymes based on the AdoMet backbone, no drugs have been forthcoming. Certainly the recent successes with kinase inhibitors suggest it may be possible to design MTase-selective inhibitors based on this approach.

Eukaryotic DNA Methyltransferases

Future studies of these enzymes will likely center on developing a deeper understanding of the complexes which they form with other proteins, particularly those involved in the other “epigenetic pathways” of chromatin/histone remodeling and RNAi. In particular, the fundamental question in human DNA methylation is to understand how the patterns of de novo methylation are laid down, regardless of whether the MTases are causative or simply follow the cues set down by other cellular factors. he study of mammalian DNA MTases is not unique in this regard, as understanding how transient multi-protein complexes function seems to be widely appreciated as essential. he real challenge is; how does one go from pull-downs to real biochemical understanding? Demonstration that two or more proteins are colocalized is important, but remains only the irst step in understanding if there are functional consequences and the underlying mechanisms. his is certainly a new and important frontier in the ield of eukaryotic DNA MTases. Finally, similar to the situation for bacterial DNA MTases, we have very few inhibitors or drugs that target these enzymes. Again, this is not for lack of interest or trying, although the latter are diicult to truly judge as they are carried out behind the cloak of industry. It is telling that the major drugs targeting the human enzymes were developed over 30 years ago and are now being reformulated.93 he few newer inhibitors have not yet been adequately compared with the older drugs. In spite of the most frequent medical indication (cancer) which allows for drugs with reduced therapeutic indices, targeting the methyltransferases themselves is not likely to be the best therapeutic strategy. his is a complex issue which is in part a direct result of the epigenetic nature of the disease; the goal is not to kill the cell as in classical chemotherapy, but rather to reprogram. he underlying assumption that the aberrant, promoter-speciic, hyper-methylated DNA prevalent in tumor cells is precisely that which will be selectively or preferentially reversed upon treatment remains to be shown. More promising may be the disruption of tumor-speciic protein:MTase complexes which lead to the aberrant hyper-methylation of particular promoters. his intriguing approach is directly reliant on a broader understanding of the mechanisms leading to normal (and aberrant) de novo methylation, which as mentioned above, is the paramount unanswered question in the human DNA methylation ield.

References

1. Watson JD, Crick FHC. Molecular Structure of Nucleic Acids - A Structure for Deoxyribose Nucleic Acid. Nature 1953; 171(4356):737-738. 2. Tuteja N, Tuteja R. Unraveling DNA helicases - Motif, structure, mechanism and function. European Journal of Biochemistry 2004; 271(10):1849-1863. 3. Doherty AJ, Suh SW. Structural and mechanistic conservation in DNA ligases. Nucleic Acids Res 2000; 28(21):4051-4058. 4. Johnson TB CR. he Discovery of 5-Methyl-Cytosine in Tuberculinic Acid, he Nucleic Acid of the Tubercle Bacillus. J Am Chem Soc 1925; 47:2838-2844. 5. Hotchkiss RD. he Quantitative Separation of Purines, Pyrimidines, and Nucleosides by Paper Chromatography. J Biol Chem 1948; 175(1):315-332. 6. Arber W. Host-Controlled Modiication of Bacteriophage. Ann Rev Microbiol 1965; 19:365-378. 7. Srinivasan PR, Borek E. Enzymatic Alteration of Nucleic Acid Structure - Enzymes Put Finishing Touches Characteristic of Each Species on RNA + DNA by Insertion of Methyl Groups. Science 1964; 145(363):548-553. 8. Arber W, Dussoix D. Host Speciicity of DNA Produced by Escherichia-Coli .1. Host Controlled Modiication of Bacteriophage Lambda. J Mol Biol 1962; 5(1):18-36. 9. Gold M, Hurwitz J. Enzymatic Methylation of Ribonucleic Acid and Deoxyribonucleic Acid .V. Puriication + Properties of Deoxyribonucleic Acid-Methylating Activity of Escherichia Coli. J Biol Chem 1964; 239(11):3858-3865. 10. Arber W, Linn S. DNA Modiication and Restriction. Ann Rev Biochem 1969; 38:467-500.

28

DNA and RNA Modii cation Enzymes

11. Smith HO, Wilcox KW. A Restriction Enzyme from Hemophilus-Inluenzae .1. Puriication and General Properties. J Mol Biol 1970; 51(2):379-391. 12. Danna K, Nathans D. Studies of Sv40 Dna .1. Speciic Cleavage of Simian Virus 40 DNA by Restriction Endonuclease of Hemophilus Inluenzae. P Natl Acad Sci USA 1971; 68(12):2913-2917. 13. Cohen SN, Chang ACY, Boyer HW, Helling RB. Construction of Biologically Functional Bacterial Plasmids In-Vitro. P Natl Acad Sci USA 1973; 70(11):3240-3244. 14. Klimasauskas S, Nelson JL, Roberts RJ. he Sequence Speciicity Domain of Cytosine-C5 Methylases. Nucleic Acids Res 1991; 19(22):6183-6190. 15. Mi S, Roberts RJ. How M-Mspl and M-HpaII Decide Which Base to Methylate. Nucleic Acids Res 1992; 20(18):4811-4816. 16. Wilson GG. Organization of Restriction-Modification Systems. Nucleic Acids Res 1991; 19(10):2539-2566. 17. Ishikawa K, Watanabe M, Kuroita T et al. Discovery of a novel restriction endonuclease by genome comparison and application of a wheat-germ-based cell-free translation assay: PabI (5 ‘-GTA/C) from the hyperthermophilic archaeon Pyrococcus abyssi. Nucleic Acids Res 2005; 33(13). 18. Roberts RJ, Vincze T, Posfai J et al. REBASE - enzymes and genes for DNA restriction and modiication. Nucleic Acids Res 2007; 35:D269-D270. 19. Caserta M, Zacharias W, Nwankwo D et al. Cloning, Sequencing, In vivo Promoter Mapping, and Expression in Escherichia-Coli of the Gene for the HhaI Methyltransferase. J Biol Chem 1987; 262(10):4770-4777. 20. Cheng XD, Kumar S, Posfai J et al. Crystal-Structure of the HhaI DNA Methyltransferase Complexed with S-Adenosyl-L-Methionine. Cell 1993; 74(2):299-307. 21. Klimasauskas S, Kumar S, Roberts RJ et al. HhaI Methyltransferase Flips Its Target Base Out of the DNA Helix. Cell 1994; 76(2):357-369. 22. O’Gara M, Horton JR, Roberts RJ et al. Structures of HhaI methyltransferase complexed with substrates containing mismatches at the target base. Nat Struct Biol 1998; 5(10):872-877. 23. O’Gara M, Roberts RJ, Cheng XD. A structural basis for the preferential binding of hemimethylated DNA by HhaI DNA methyltransferase. J Mol Biol 1996; 263(4):597-606. 24. O’Gara M, Klimasauskas S, Roberts RJ et al. Enzymatic C5-cytosine methylation of DNA: Mechanistic implications of new crystal structures for HhaI methyltransferase-DNA-AdoHcy complexes. J Mol Biol 1996; 261(5):634-645. 25. Dong AP, Zhou L, Zhang X et al. Structure of the Q237W mutant of HhaI DNA methyltransferase: an insight into protein-protein interactions. Biol Chem 2004; 385(5):373-379. 26. Shieh FK, Youngblood B, Reich NO. he role of Arg165 towards base lipping, base stabilization and catalysis in M.HhaI. J Mol Biol 2006; 362(3):516-527. 27. Shieh FK, Reich NO. AdoMet-dependent methyl-transfer: Glu119 is essential for DNA C5-cytosine methyltransferase M.HhaI. J Mol Biol 2007; 373(5):1157-1168. 28. Youngblood B, Shieh FK, Los Rios S et al. Engineered extrahelical base destabilization enhances sequence discrimination of DNA methyltransferase M.HhaI. J Mol Biol 2006; 362(2):334-346. 29. Youngblood B, Buller F, Reich NO. Determinants of sequence-speciic DNA methylation: Target recognition and catalysis are coupled in M.HhaI. Biochemistry 2006; 45(51):15563-15572. 30. Youngblood B, Shieh FK, Buller F et al. S-Adenosyl-L-methionine-dependent methyl transfer: Observable precatalytic intermediates during DNA cytosine methylation. Biochemistry 2007; 46(30):8766-8775. 31. Sankpal UT, Rao DN. Structure, function, and mechanism of HhaI DNA methyltransferases. Crit Rev Biochem Mol 2002; 37(3):167-197. 32. Svedruzic ZM, Reich NO. he mechanism of target base attack in DNA cytosine carbon 5 methylation. Biochemistry 2004; 43(36):11460-11473. 33. Wright R, Stephens C, Shapiro L. he CcrM DNA methyltransferase is widespread in the alpha subdivision of proteobacteria, and its essential functions are conserved in Rhizobium meliloti and Caulobacter crescentus. J Bacteriol 1997; 179(18):5869-5877. 34. Zweiger G, Marcynski G, Shapiro L. A Caulobacter DNA Methyltransferase hat Functions Only in the Predivisional Cell. J Mol Biol 1994; 235(2):472-485. 35. Stephens C, Reisenauer A, Wright R et al. A cell cycle-regulated bacterial DNA methyltransferase is essential for viability. P Natl Acad Sci USA 1996; 93(3):1210-1214. 36. Gorbatyuk B, Marczynski GT. Physiological consequences of blocked Caulobacter crescentus dnaA expression, an essential DNA replication gene. Mol Microbiol 2001; 40(2):485-497. 37. Collier J, McAdams HH, Shapiro L. A DNA methylation ratchet governs progression through a bacterial cell cycle. P Natl Acad Sci USA 2007; 104(43):17111-17116. 38. Lobner-Olesen A, Skovgaard O et al. Dam methylation: coordinating cellular processes. Curr Opin Microbiol 2005; 8(2):154-160.

DNA Methylation: From Bug to Beast

29

39. Casadesus J, Low D. Epigenetic gene regulation in the bacterial world. Microbiology and Mol Biol Rev 2006; 70(3):830-856. 40. Oshima T, Wada C, Kawagoe Y et al. Genome-wide analysis of deoxyadenosine methyltransferase-mediated control of gene expression in Escherichia coli. Mol Microbiol 2002; 45(3):673-695. 41. Heusipp G, Falker S, Schmidt MA. DNA adenine methylation and bacterial pathogenesis. International J Med Microbiol 2007; 297(1):1-7. 42. Mashhoon N, Pruss C, Carroll M et al. Selective inhibitors of bacterial DNA adenine methyltransferases. J Biomol Screen 2006; 11(5):497-510. 43. Coin SR, Reich NO. Modulation of Escherichia coli DNA methyltransferase activity by biologically derived GATC-lanking sequences. J Biol Chem 2008; 283(29):20106-20116. 44. Peterson SN, Reich NO. GATC lanking sequences regulate dam activity: Evidence for how Dam speciicity may inluence pap expression. J Mol Biol 2006; 355(3):459-472. 45. Breyer WA, Matthews BW. A structural basis for processivity. Protein Sci 2001; 10(9):1699-1711. 46. Reik W. Stability and lexibility of epigenetic gene regulation in mammalian development. Nature 2007; 447(7143):425-432. 47. Surani MA. Imprinting and the initiation of gene silencing in the germ line. Cell 1998; 93(3):309-312. 48. Li E. Chromatin modiication and epigenetic reprogramming in mammalian development. Nat Rev Genet 2002; 3(9):662-673. 49. Bernstein BE, Mikkelsen TS, Xie XH et al. A bivalent chromatin structure marks key developmental genes in embryonic stem cells. Cell 2006; 125(2):315-326. 50. Mikkelsen TS, Hanna J, Zhang XL et al. Dissecting direct reprogramming through integrative genomic analysis. Nature 2008; 454(7200):49-U1. 51. Hutchins AS, Mullen AC, Lee HW et al. Gene silencing quantitatively controls the function of a developmental trans-activator. Mol Cell 2002; 10(1):81-91. 52. Kersh EN. Impaired memory CD8 T cell development in the absence of methyl-CpG-binding domain protein 2. J Immunol 2006; 177(6):3821-3826. 53. Kersh EN, Fitzpatrick DR, Murali-Krishna K et al. Rapid demethylation of the IFN-gamma gene occurs in memory but not naive CD8 T cells. J Immunol 2006; 176(7):4083-4093. 54. Northrop JK, homas RM, Wells AD et al. Epigenetic remodeling of the IL-2 and IFN-gamma loci in memory CD8 T cells is inluenced by CD4 T cells. J Immunol 2006; 177(2):1062-1069. 55. Ting AH, McGarvey KM, Baylin SB. he cancer epigenome - components and functional correlates. Gene Dev 2006; 20(23):3215-3231. 56. Robertson KD. DNA methylation and chromatin - unraveling the tangled web. Oncogene 2002; 21(35):5361-5379. 57. Amir RE, Van den Veyver IB, Wan M et al. Rett syndrome is caused by mutations in X-linked MECP2, encoding methyl-CpG-binding protein 2. Nat Genet 1999; 23(2):185-188. 58. Jones PA, Baylin SB. The fundamental role of epigenetic events in cancer. Nat Rev Genet 2002; 3(6):415-428. 59. Jones PA. Epigenetics in carcinogenesis and cancer prevention. Epigenetics in Cancer Prevention: Early Detection and Risk Assessment 2003; 983:213-219. 60. Egger G, Liang GN, Aparicio A et al. Epigenetics in human disease and prospects for epigenetic therapy. Nature 2004; 429(6990):457-463. 61. Richardson BC. Role of DNA methylation in the regulation of cell function: Autoimmunity, aging and cancer. J Nutr 2002; 132(8):2401S-2405S. 62. Wilson VL, Jones PA. DNA Methylation Decreases in Aging But Not in Immortal Cells. Science 1983; 220(4601):1054-1057. 63. Holliday R, Pugh JE. DNA Modiication Mechanisms and Gene Activity During Development. Science 1975; 187(4173):226-232. 64. Riggs AD. X-Inactivation, Differentiation, and DNA Methylation. Cytogenet Cell Genet 1975; 14(1):9-25. 65. Burdon RH, Martin BT, Lal BM. Synthesis of Low Molecular Weight Ribonucleic Acid in Tumour Cells. J Mol Biol 1967; 28(2):357-371. 66. Kalousek F, Morris NR. Deoxyribonucleic Acid Methylase Activity in Rat Spleen. J Biol Chem 1968; 243(9):2440-2442. 67. Kalousek F, Morris NR. Puriication and Properties of Deoxyribonucleic Acid Methylase from Rat Spleen. J Biol Chem 1969; 244(5):1157-1163. 68. Sheid B, SRINIVAS.PR, Borek E. Deoxyribonucleic Acid Methylase of Mammalian Tissues. Biochemistry 1968; 7(1):280-285. 69. Gruenbaum Y, Cedar H, Razin A. Substrate and Sequence Speciicity of A Eukaryotic DNA Methylase. Nature 1982; 295(5850):620-622.

30

DNA and RNA Modii cation Enzymes

70. Bestor T, Laudano A, Mattaliano R et al. Cloning and Sequencing of A Cdna-Encoding DNA Methyltransferase of Mouse Cells - the Carboxyl-Terminal Domain of the Mammalian Enzymes Is Related to Bacterial Restriction Methyltransferases. J Mol Biol 1988; 203(4):971-983. 71. Pradhan S, Kim GD. he retinoblastoma gene product interacts with maintenance human DNA (cytosine-5) methyltransferase and modulates its activity. EMBO Journal 2002; 21(4):779-788. 72. Robertson KD, Ait-Si-Ali S, Yokochi T et al.. DNMT1 forms a complex with Rb, E2F1 and HDAC1 and represses transcription from E2F-responsive promoters. Nat Genet 2000; 25(3):338-342. 73. Bestor TH. Activation of Mammalian DNA Methyltransferase by Cleavage of a Zn Binding Regulatory Domain. EMBO Journal 1992; 11(7):2611-2617. 74. Bolden A, Ward C, Siedlecki JA et al. DNA Methylation - Inhibition of De novo and Maintenance Methylation In vitro by RNA and Synthetic Polynucleotides. J Biol Chem 1984; 259(20):2437-2443. 75. Glickman JF, Flynn J, Reich NO. Puriication and characterization of recombinant baculovirus-expressed mouse DNA methyltransferase. Biochem Biophys Res Co 1997; 230(2):280-284. 76. Svedruzic ZM, Reich NO. Mechanism of allosteric regulation of DNMT1’s processivity. Biochemistry 2005; 44(45):14977-14988. 77. Lei H, Oh SP, Okano M et al. De novo DNA cytosine methyltransferase activities in mouse embryonic stem cells. Development 1996; 122(10):3195-3205. 78. Okano M, Xie SP, Li E. Cloning and characterization of a family of novel mammalian DNA (cytosine-5) methyltransferases. Nat Genet 1998; 19(3):219-220. 79. La Salle S, Mertineit C, Taketo T et al.. Windows for sex-speciic methylation marked by DNA methyltransferase expression proiles in mouse germ cells. Dev Biol 2004; 268(2):403-415. 80. Goll MG, Kirpekar F, Maggert KA et al. Methylation of tRNA(AsP) by the DNA methyltransferase homolog DNMT2. Science 2006; 311(5759):395-398. 81. Jurkowski TP, Meusburger M, Phalke S et al. Human DNMT2 methylates tRNA(Asp) molecules using a DNA methyltransferase-like catalytic mechanism. RNA 2008; 14(8):1663-1670. 82. Meehan RR, Lewis JD, Mckay S et al. Identiication of A Mammalian Protein hat Binds Speciically to DNA Containing Methylated CpGs. Cell 1989; 58(3):499-507. 83. Hendrich B, Bird A. Identiication and characterization of a family of mammalian methyl-CpG binding proteins. Mol Cell Biol 1998; 18(11):6538-6547. 84. Fujita N, Jaye DL, Geigerman C et al. MTA3 and the Mi-2/NuRD complex regulate cell fate during B lymphocyte diferentiation. Cell 2004; 119(1):75-86. 85. Ng HH, Zhang Y, Hendrich B, Johnson CA et al. MBD2 is a transcriptional repressor belonging to the MeCP1 histone deacetylase complex. Nat Genet 1999; 23(1):58-61. 86. Fatemi M, Wade PA. MBD family proteins: reading the epigenetic code. J Cell Sci 2006; 119(15):3033-3037. 87. Klose RJ, Bird AP. Genomic DNA methylation: the mark and its mediators. Trends Biochem Sci 2006; 31(2):89-97. 88. Sansom OJ, Maddison K, Clarke AR. Mechanisms of Disease: methyl-binding domain proteins as potential therapeutic targets in cancer. Nat Clin Prac Oncol 2007; 4(5):305-315. 89. Santiago Y, Chan E, Liu PQ et al. Targeted gene knockout in mammalian cells by using engineered zinc-inger nucleases. P Natl Acad Sci USA 2008; 105(15):5809-5814. 90. Cohen HM, Tawik DS, Griiths AD. Altering the sequence speciicity of HaeIII methyltransferase by directed evolution using in vitro compartmentalization. Protein Eng Design Select 2004; 17(1):3-11. 91. Klimasauskas S, Weinhold E. A new tool for biotechnology: AdoMet-dependent methyltransferases. Trends Biotechnol 2007; 25(3):99-104. 92. Vogel MJ, Peric-Hupkes D, van Steensel B. Detection of in vivo protein–DNA interactions using DamID in mammalian cells. Nat Prot 2[6], 1467-1478. 2008. 93. Stresemann C, Lyko F. Modes of action of the DNA methyltransferase inhibitors azacytidine and decitabine. Int J Cancer 2008; 123(1):8-13.

Chapter 3

DNA Restriction-Modiication Systems in Prokaryotes John H. White, Gareth A. Roberts and David T.F. Dryden*

Abstract

D

NA Restriction-Modiication systems are found in most bacteria. he Type I, II and III systems modify speciic nucleotide sequences within the host genome using a methyltransferase. he absence of this speciic methylation pattern on invading foreign DNA triggers the destruction of the invading DNA by the restriction endonuclease. Type IV systems only have an endonuclease function and attack foreign DNA containing methylated or otherwise modiied sequences that are not found in the host genome. Invading mobile genetic elements, such as phage, plasmids and transposons have developed a range of counter-measures including specially-modiied nucleotides, antirestriction and antimodiication proteins.

Introduction

Restriction-Modiication (RM) systems, originally observed in model prokaryotes,1,2 are now known to be widespread in the Bacteria and Archaea,3 falling into four Types.4 An RM system protects the cell from subversion by foreign DNA whether introduced by transformation, conjugation or transduction. It does this by identifying short DNA recognition sequences on the foreign DNA and rendering the whole DNA molecule inviable with an endonuclease. Classic RM systems, Types I to III,4 are bipartite in nature featuring two antagonistic enzymatic functions: a restriction endonuclease function (REase) which cuts unmodiied foreign DNA and a methyltransferase function (MTase) which methylates speciic bases in the recognition sequence on the host DNA. MTases from all known RM systems produce just three modiied bases: 5-methylcytosine (m5C), N4-methylcytosine (m4C) and N6-methyladenine (m6A) (see also chapter by Reich and Coin in this volume).5 In this review we shall introduce the four Types of RM system in prokaryotes and indicate how mobile genetic elements evade RM systems.

RM Systems Type I

Type I R-M enzymes were the irst RM systems to be identiied and characterised and are widespread in Bacteria and Archaea.3 Progress in the ield has been extensively reviewed.6-9 Genetic complementation, antibody cross-reactivity, DNA hybridisation and sequence comparison indicate ive distinct families of Type I RM systems: IA–IE.9,10 Type I RM systems are composed of three proteins encoded by three genes. hsdR speciies the restriction subunit (R), hsdM, the methyltransferase modiication subunit (M) and hsdS encodes the DNA sequence recognising speciicity subunit (S). Type I RM systems exist as two functional complexes: an MTase composed of 2 M subunits *Corresponding Author: David T.F. Dryden—School of Chemistry, Joseph Black Building, University of Edinburgh, West Mains Road, Edinburgh, Scotland, EH9 3JJ, UK. Email: [email protected]

DNA and RNA Modiication Enzymes: Structure, Mechanism, Function and Evolution, edited by Henri Grosjean. ©2009 Landes Bioscience.

32

DNA and RNA Modii cation Enzymes

and an S subunit and the REase composed of 2 R subunits, 2 M subunits and an S subunit. he adenine methylation pattern of the bipartite DNA recognition sequence determines the activities of the Type I RM complexes. DNA containing an unmethylated recognition sequence is usually restricted by the REase, given ATP, S-adenosyl methionine (SAM) and Mg2+ as cofactors. Hemimethylated DNA, methylated on one strand (with one N6-methylated adenine, m6A, following semiconservative replication of fully methylated DNA), is the preferred substrate for Type I MTase activity. his is not because of enhanced DNA binding relative to unmethylated DNA, rather hemimethylated DNA is a better substrate for MTase. SAM is the methyl donor. DNA methylated on both strands is recognised as “self ” and is neither modiied nor restricted.

HsdS

The specificity subunit is the heart of both Type I MTase and REase, determining the DNA sequences which are bound by the REase and MTase. DNA sequence specificity resides in two target recognition domains (TRDs) which vary in sequence: the N terminal TRD recognises the 5ʹ element (3 or 4 bases) of the DNA sequence and the C terminal TRD recognises the 3ʹ sequence (4 or 5 bases). For example, the sequence recognised by the EcoKI Type I system is AACNNNNNNGTGC. TRDs can be swapped between systems to generate new target specificities(see also chapter by Jeltsch and Jurkowki in this volume).

HsdM

he methyltransferase subunit binds the methyl donor SAM, monitors the degree of methylation of the target sequence and methylates adenine bases as required. For each bipartite recognition sequence, one adenine is methylated per strand. Type I MTases only generate m6A. It is thought that methylation occurs via a base lipping mechanism common to all MTases (see Type II MTases, also in chapter by Klimasauskas and Liutkeviciute in this volume).

HsdR

Endonuclease activity resides in the R subunit which acts on DNA containing unmethylated recognition sequences. Sequence analysis and mutagenesis indicate that the R subunit is an ATP hydrolysing translocase. In the presence of SAM, Mg2+ and ATP, the REase binds the unmethylated recognition sequence, translocates DNA bidirectionally past and through the bound complex, whilst hydrolysing ATP, before cleaving the DNA at a distant site (Fig. 1).11,12

Figure 1. The postulated mechanism of action of Type I restriction enzymes. The enzyme binds to a target sequence (black box) on DNA (thin line) and commences translocation of the DNA using its motor domains (grey boxes) whilst remaining bound to the target sequence. This causes looping and twisting of the DNA. When translocation ceases by, for example, collision with another Type I enzyme, double-strand DNA cleavage occurs at the collision site on the DNA.

DNA Restriction-Modiication Systems in Prokaryotes

33

Figure 2. The base flipping mechanism used by DNA methyltransferases as exemplified by the HhaI methyltransferase (ribbon diagram) bound to a short duplex viewed down the helical axis (stick model) (Structure 4mht from PDB by Xiadong Cheng). The flipped base coming out of the side of the DNA is clearly visible.

Type II

Type II systems are intensively studied as they are an essential tool in modern molecular biology. Many Type II RM systems are encoded by two genes; one encoding a REase subunit which might function as a monomer, dimer or tetramer, the other an MTase subunit which usually exists as a monomer. For most Type II RM systems, the REase and MTase are generally independent entities and the tendency has been to review them independently.13-15 Type II RM systems currently fall into 11 families, based on a variety of criteria, including how many target sites are required to trigger cutting, how many sites are cut, whether the target sequence is symmetric or asymmetric and where the DNA is cut relative to the target sequence.4 Type II REases generally require Mg2+ as a cofactor (with only a few exceptions, e.g., BiI that does not require a divalent cation cofactor), cutting 4-8bp symmetric unmethylated sequences at ixed locations in or adjacent to the recognition site.13 Currently, ive types of catalytic centers are identiied in Type II REases: PD ... (D/E)XK, PLD, GIY-YIG, HNH and “halfpipe”. Among these families, the enzymes of PD ... (D/E)XK family are best characterized both structurally and with respect to the mechanism of DNA cleavage. Type II MTases recognise the same restriction sequence as the partner REase and use SAM as the methyl donor to methylate cytosines at N4 or C5 or adenines at N6.14,15 he structures of a number of Type II MTases have been reported14,15 and it appears that in common with MTases which methylate RNA, proteins and small molecules, there is a common core structure—a seven-stranded β sheet; the “SAM-dependent MTase fold”. he same basic enzyme structure is used to methylate small molecules and macromolecules. In the case of DNA MTases this is achieved by lipping the base to be methylated out of the DNA double helix, breaking hydrogen bonds and stacking interactions (Fig. 2).

Type III

Type III RM enzymes contain two subunits, res and mod, which assemble to form an endonuclease complex with stoichiometry res2mod2.6,7 his enzyme complex has activity as a REase, an MTase and an ATPase. Mod subunits contain the MTase activity with a “SAM-dependent MTase fold” and a single TRD recognising a 5 or 6 base pair asymmetric target sequence. he MTase uses SAM as a methyl donor, while the REase requires ATP and Mg2+.6,7 In common with Type I enzymes, Type III MTases only generate m6A. In order to be restricted by a Type III RM complex, a

34

DNA and RNA Modii cation Enzymes

DNA molecule must possess 2 inversely orientated copies of the asymmetric recognition site, whereas a single copy is suicient for the MTase activity. he REase activity of the Type III RM enzyme EcoP15I has been observed at the level of the single molecule by atomic force microscopy16-18 and a model proposed for the recognition of sites in inverse orientation.17,18

Type IV

Type IV systems difer from classic RM systems as they lack an MTase, instead they restrict modiied DNA. More than 300 predicted Type IV systems have been found, but only a handful of systems have been characterised.3 he best understood system is found in E. coli K-12. McrBC is a REase which only cuts modiied DNA, including phage genomes that have been methylated at the N4 and C5 positions on cytosines. Mg2+ is a cofactor and McrBC activity requires GTP, presumably for translocation of DNA. he enzyme recognises two dinucleotides, a purine followed by a methylated cytosine separated by 40-3000bp and cutting occurs 10, 20, 30, 40 and 50 bp from one site.17,19,20

Antirestriction

RM is primarily a defence against invading mobile genetic elements. It is therefore no surprise that mobile elements have evolved a variety of antirestriction strategies.21-23

Recognition Site Elimination

Phage may evolve a bias in their DNA sequence, eliminating functional restriction sites by mutation. One interesting recent example is the 127kb genome of S. aureus bacteriophage K, which completely lacks Sau3A1 sites,24 a simple 4 base sequence 5ʹGATC 3ʹ which would be expected to occur hundreds of times in an unbiased genome of that size. A more subtle example is bacteriophage T7 which has 36 EcoP15 sites in the same orientation and given the requirement for Type III systems to have two inversely-orientated recognition sequences, is immune to restriction by the REase. he related bacteriophage T3 which has head to head EcoP15 sites is not immune.25

DNA Modiication Masking Recognition Sites

Phage may evade recognition by REases by chemically modifying bases in their genome.5 T-even coliphages contain hydroxymethylcytosine (HMC)23 and the HMC may also be glucosylated or gentibiosylated.21 DNA containing HMC is resistant to many restriction enzymes, with glucosylation enhancing resistance.21 B. subtilis phages employ modiied bases. For example bacteriophage SPO1 replaces thymine with 5-hydroxymethyluracil and bacteriophage PBS1 replaces it with uracil. Both modiications inhibit REases.21 he bacteriophage Mu mom gene product modiies about 15% of adenine residues to N6—(1-acetamido) adenine thereby conferring resistance of its DNA to Type I and Type III REases.27 B. subtilis bacteriophages, such as SPR and φ3T encode MTases which modify the recognition site of BsuRI so the viral sites are recognised as self DNA by the host BsuRI RM system.21

Noncovalent Recognition Site Masking

It is not necessary to carry out a chemical modiication to camoulage restriction sites. Interaction between a protein and DNA can block restriction sites from the RM system. For example, bacteriophage P1 is poorly restricted by Type I RM systems in vivo, but DNA puriied from viral particles is a good substrate in vitro. Two proteins DarA and DarB which are co-injected with the phage P1 DNA protect the DNA against a range of Type I RM systems.28 Another example is ArdC. his was discovered in an IncW plasmid, pSa and may shield single-stranded DNA from degradation as it is transferred from donor to recipient cells during conjugation.29

Stimulation of DNA Modiication

Avoiding restriction by increasing the activity of the host MTase to facilitate modiication of nonself DNA is a strategy adopted by phage λ. Ral is a bacteriophage λ protein which stimulates Type I MTase activity through an unknown mechanism.30

DNA Restriction-Modiication Systems in Prokaryotes

35

Protein Inhibitors of Type I RM Systems

Mobile genetic elements also encode proteins which interfere with RM systems. he bacteriophage T3 0.3 gene product, a small polypeptide expressed very early in phage infection of E. coli, has two distinct properties which act against Type I RM systems: a SAMase inhibits Type I and Type III RM systems by depleting the host E. coli cell of SAM and an antirestriction function inhibits the action of the Type I enzyme, EcoKI by direct interaction. hese two activities are separable by mutation, but their structural basis remains unknown.31 Bacteriophage T7 encodes ocr, an unrelated antirestriction protein which is an efective inhibitor of all families of Type I MTases and REases.32 Ocr inhibits Type I RM systems by mimicking the shape and charge distribution of a bent piece of DNA about 20 bases long.32 Ocr does this so efectively that it has a ity-fold higher ainity than DNA for EcoKI.32 Homologues of T3 SAMase and T7 ocr have been found in related bacteriophages, but are not particularly widespread in nature, however, other families of antirestriction proteins are more common. he ArdA family of proteins are found on bacterial chromosomes, on conjugative plasmids and conjugative transposons such as Tn916.22 Each ArdA gene encodes a protein about 20 kDa in size which, in common with ocr is highly acidic: this strongly implies that it is a DNA mimic. he current model for ArdA action is that it is expressed early in conjugation in the recipient cytoplasm22 and inhibits Type I REases before they can act on the double-stranded form of the invading DNA molecule. he mechanism of action of ArdA proteins is not known but biochemical characterisation has begun. In particular, recent work33,34 suggests that it may be able to interact diferentially between EcoKI REase and MTase, inhibiting REase, but permitting MTase to modify the conjugated DNA as a consequence of its lower ainity for the MTase. ArdB genes are found in pathogenicity islands, prophages, bacteriophages and conjugative plasmids. ArdB proteins are smaller and less acidic than ArdA but could also be DNA mimics. To date they have only been studied in vivo: they are strongly active against Type I restriction and exhibit a modest efect against Type II restriction.22 hey have no efect on modiication functions.

Conclusions and Future Prospects

RM systems are a very diverse group of enzymes and are widespread in the prokaryotes. heir correct operation requires the recognition of various base modiications within speciied DNA target sequences. he cleavage of foreign DNA lacking the correct modiication pattern has forced the evolution of a range of antirestriction counter-measures including further base modiication to render the DNA immune to the RM system and antirestriction proteins that mimic the structure of DNA. Outstanding areas requiring further research are the mechanisms used by Type I and III RM enzymes to switch between their endonuclease and methyltransferase activities and the antirestriction systems. Given the potential role of antirestriction systems in assisting the spread of mobile elements by horizontal gene transfer it is surprising that only a few have been thoroughly investigated biochemically and structurally.

References

1. Bertani G, Weigle JJ. Host controlled variation in bacterial viruses. J Bacteriol 1953; 65:113-121. 2. Luria SE, Human ML. A nonhereditary, host-induced variation of bacterial viruses. J Bacteriol 1952; 64:557-569. 3. Roberts RJ, Vincze T, Posfai J et al. REBASE—enzymes and genes for DNA restriction and modiication. Nucleic Acids Res 2007; 35:D269-D270. 4. Roberts RJ, Belfort M, Bestor T et al. A nomenclature for restriction enzymes, DNA methyltransferases, homing endonucleases and their genes. Nucleic Acids Res 2003; 31:1805-1812. 5. Warren RAJ. Modiied bases in bacteriophage DNAs. Annu Rev Microbiol 1980; 34:137-158. 6. Sistla S, Rao DN. S-adenosyl-L-methionine-dependent restriction enzymes. Crit Rev Biochem Mol Biol 2004; 39:1-19. 7. Dryden DTF, Murray NE, Rao DN. Nucleoside triphosphate dependent restriction enzymes. Nucleic Acids Res 2001; 29:3728-3741.

36

DNA and RNA Modii cation Enzymes

8. Murray NE. Type I restriction systems: sophisticated molecular machines (a legacy of Bertani and Weigle). Microbiol Mol Biol Rev 2000; 264:412-434. 9. Murray NE. 2001 Fred Griith review lecture. Immigration control of DNA in bacteria: self versus nonself. Microbiology 2002; 148:3-20. 10. Chin V, Valinluck V, Magaki S et al. KpnBI is the prototype of a new family (IE) of bacterial Type I restriction-modiication system. Nucleic Acids Res 2004; 32:e138. 11. Dryden DTF. Reeling in the bases. Nat Struct Mol Biol 2004; 11:804-806. 12. Jindrova E, Schmid-Nuofer S, Hamburger F et al. On the DNA cleavage mechanism of Type I restriction enzymes. Nucleic Acids Res 2005; 33:1760-1766. 13. Pingoud A, Fuxreiter M, Pingoud V et al. Type II restriction endonucleases: structure and mechanism. Cell Mol Life Sci 2005; 62:685-707. 14. Cheng X, Roberts RJ. AdoMet-dependent methylation, DNA methyltransferases and base lipping. Nucleic Acids Res 2001; 29:3784-3795. 15. Bheemanaik S, Redddy YVR, Rao DN. Structure, function and mechanism of exocyclic DNA methyltransferases. Biochem J 2006; 399:177-190. 16. Reich S, Gössl I, Reuter M et al. Scanning force microscopy of DNA translocation by the Type III restriction enzyme EcoP15I. J Mol Biol 2004; 341:337-343. 17. Crampton N, Roes S, Dryden DTF et al. DNA looping and translocation provide an optimal cleavage mechanism for the Type III restriction enzymes. EMBO J 2007; 26:3815-3825. 18. Crampton N, Yokokawa M, Dryden DTF et al. Fast-scan atomic force microscopy reveals that the Type III restriction enzyme EcoP15I is capable of DNA translocation and looping. Proc Natl Acad Sci USA 2007; 104:12755-12760. 19. Sutherland E, Coe L, Raleigh EA. McrBC: a multisubunit GTP-dependent restriction endonuclease. J Mol Biol 1992; 225:327-348. 20. Pieper U, Groll DH, Wünsch S et al. he GTP-dependent restriction enzyme McrBC from Escherichia coli forms high-molecular mass complexes with DNA and produces a cleavage pattern with a characteristic 10-base pair repeat. Biochemistry 2002; 41:5245-5254. 21. Kruger DH, Bickle TA. Bacteriophage survival: multiple mechanisms for avoiding the deoxyribonucleic acid restriction systems of their hosts. Microbiol Rev 1983; 47:345-360. 22. Wilkins BM. Plasmid promiscuity: meeting the challenge of DNA immigration control. Environ Microbiol 2002; 4:495-500. 23. Tock MR, Dryden DTF. he biology of restriction and anti-restriction. Curr Opin Microbiol 2005; 8:466-472. 24. O’Flaherty S, Cofey A, Edwards R et al. Genome of Staphylococcal phage K: a new lineage of Myoviridae infecting Gram-positive bacteria with a low G+C content. J Bacteriol 2004; 186:2862-2871. 25. Meisel A, Bickle TA, Kruger DH et al. Type III restriction enzymes need two inversely orientated recognition sites for DNA cleavage. Nature 1992; 355:467-469. 26. Wyatt GR, Cohen SS. A new pyrimidine base from bacteriophage nucleic acids. Nature 1952; 170:1072-1073. 27. Hattman S. Unusual modiication of bacteriophage Mu DNA. J Virol 1979; 32:468-75. 28. Iida S, Streif MB, Bickle TA et al. Two DNA anti-restriction systems of bacteriophage P1, darA and darB: characterisation of darA phages. Virology 1987; 157:156-166. 29. Belogurov AA, Delver EP, Agafonova OV et al. Antirestriction protein Ard (type C) encoded by IncW plasmid pSa has a high similarity to the “protein transport” domain of TraC1 primase of promiscuous plasmid RP4. J Mol Biol 2000; 296:969-977. 30. Zabeau M, Friedman S, Van Montagu M et al. he ral gene of phage lambda I: identiication of a non-essential gene that modulates restriction and modiication in E.coli. Mol Gen Genet 1980; 179:63-73. 31. Spoerel N, Herrlich P, Bickle TA. A novel bacteriophage defence mechanism: the anti-restriction protein. Nature 1979; 278:30-34. 32. Walkinshaw MD, Taylor P, Sturrock SS et al. Structure of ocr from bacteriophage T7, a protein that mimics B-form DNA. Mol Cell 2002; 9:187-194. 33. Nekrasov SV, Agafonova OV, Belogurova NG et al. Plasmid-encoded antirestriction protein ArdA can discriminate between Type I methyltransferase and complete restriction-modiication system. J Mol Biol 2007; 365:284-297. 34. Seriotis-Mitsa D, Roberts GA, Cooper LP et al. he ORF18 gene product from conjugative transposon Tn916 is an ArdA antirestriction protein that inhibits type I DNA restriction-modiication systems. J Mol Biol 2008; 383: 970-981.

Chapter 4

Experimental Approaches to Study DNA Base Flipping Saulius Klimašauskas* and Zita Liutkevičiūtė

Abstract

T

he most dramatic and localized enzyme-induced conformational distortion to the helical structure of DNA is base lipping, in which a nucleobase is unpaired, removed from the stack and further rotated out 180˚ to assume a fully extrahelical position. Since its irst demonstration in crystal structures of cytosine methyltransferase-DNA complexes, numerous studies revealed that base lipping is a fundamental mechanism in DNA modiication and repair, is involved in initiation of replication, transcription and recombination and lately has been shown to mediate sequence-speciic recognition by restriction endonucleases. Here we discuss the variety of experimental approaches that are used to study enzyme-induced base lipping in diferent systems. X-ray crystallography of protein-DNA complexes is the sole method providing the ultimate proof of base lipping. NMR spectroscopy ofers important inroads into dynamic aspects of base lipping, but its potential has not been fully exploited. An attractive method to detect and study base lipping in solution is luorescent spectroscopy; it uses DNA substrates containing luorescent base analogs, most oten 2-aminopurine. Chemical probing, which exploits enhanced chemical reactivity of lipped out bases in DNA, is a simple method that can be performed in a standard laboratory. Biochemical binding studies oten show an enhanced ainity for substrates containing mismatched base pairs, which indirectly points to a disruption of the target base pair upon interaction with enzyme.

he Phenomenon of Base Flipping

Normally, DNA exists as the B-form double-stranded helix in which partner bases on the two complementary strands make Watson-Crick pairs. he base pairs are stacked face-to-face to form the inner core of the double helix with the sugar-phosphate backbone wrapping around the outer edge of the structure. An import inherent feature of the DNA is its conformational plasticity and lexibility. Although the double helix is thermodynamically stable at physiological conditions, it undergoes dynamic conformational luctuations including spontaneous transient disruptions of base pairing interactions (a phenomenon called DNA breathing). Besides slight sequence-dependent variations, the helical structure is oten perturbed by interactions with proteins and other cellular components. he most common distortions of the DNA helix include bending/kinking, unwinding and strand separation, which may occur to a diferent extent during various stages of DNA metabolism. At the nucleotide level, these changes constitute base unstacking (on one or both faces), base pair twisting and base pair opening events, respectively. he most dramatic and yet highly localized noncovalent distortion to the regular structure is base lipping, in which a nucleobase is unpaired, removed from the stack and further rotated out 180˚ to assume an extreme *Corresponding Author: Saulius Klimašauskas—Institute of Biotechnology, V.A. Graicˇiuˉno 8 LT-02241 Vilnius, Lithuania. Email: [email protected]

DNA and RNA Modiication Enzymes: Structure, Mechanism, Function and Evolution, edited by Henri Grosjean. ©2009 Landes Bioscience.

38

DNA and RNA Modii cation Enzymes

extrahelical conformation. Although such conformations are very unstable in free DNA and can only occur transiently, they can be stabilized upon interaction with other biomolecules. he irst demonstration of base lipping appeared in 1994 with a high-resolution crystal structure of the HhaI methyltransferase-DNA complex in which the target cytosine is completely lipped out of the DNA helix and into the catalytic site of the enzyme (see Fig. 1).1 Although greeted with much surprise, this new phenomenon was subsequently shown to occur in many systems where an enzyme needs to gain access to a DNA base. Numerous studies revealed that base lipping is a fundamental mechanism in DNA modiication and repair2 and is also used by proteins responsible for the opening of the DNA or RNA helix during replication, transcription and recombination.3,4 More recent and fairly unexpected indings, in which sequence-speciic target recognition by restriction endonucleases5 and hemimethylated CpG-speciic UHRF1 proteins6-8 involves a complete expulsion of nucleotides out of the DNA helix, suggest that many other enzymes or DNA-binding proteins may employ this mechanism in their interactions with DNA. Protein-induced lipping of bases in RNA is also well documented in a variety of systems.9-11 Numerous structural and mechanistic studies of DNA base lipping had since been performed in diferent systems. Examples of the most studied systems are the HhaI DNA methyltransferase and uracil-DNA glycosylase. An important motivation to study base lipping was its wide-spread occurrence among DNA enzymes. As a localized conformational distortion, it ofered the promise of an ideal model for new inroads into fundamental mechanisms of protein DNA interactions. On the down side, base lipping presented a signiicant experimental challenge due its extreme and dynamic nature. Structural features, occurrence in diferent systems and mechanistic aspects of base lipping have been summarized in a series of review articles.2,12-15 Computational analysis of base lipping is discussed in the chapter by Priyakumar and MacKerell in this book. Here we attempt to discuss the variety of experimental approaches that were developed to study the occurrence and the mechanisms of base lipping in double helical nucleic acids.

X-Ray Crystallography

X-ray crystallography of protein-DNA complexes holds the crown among experimental methods for providing the ultimate proof of base lipping. Indeed, a high resolution cocrystal structure of a reaction complex can reveal the position of a target base relative to the rest of the helix, show the conformation of the nucleotide and its neighbors on both strands of the DNA. Examples of crystallographically proven base-lipping systems include DNA methyltransferases, DNA glycosylases, apurinic/apyrimidinic endonucleases, glucosyltransferases, restriction endonucleases and

Figure 1. Types of enzymatic DNA base flipping observed in crystal structures of protein-DNA complexes (left to right): target nucleotide flipping (HhaI DNA methyltransferase, PDB entry 1mht), opposite nucleotide flipping (T4-pdg, formerly known as T4 endonuclease V, 1vas), damaged dinucleotide flipping (DNA photolyase, 1tez) and flipping of both nucleotides in a central base pair (restriction endonuclease Ecl18kI, 2fqz). Highlighted are DNA sites targeted by the enzymes, arrows point at flipped out bases. Protein residues are omitted for clarity.

39

Experimental Approaches to Study DNA Base Flipping

Table 1. Base-flipping systems proven by crystal structures of protein-DNA complexes Specific Protein

Catalytic Reaction

Primary Reference PDB Entry

DNA methyltransferases M.HhaI

Forms 5-methylC on both strands of a DNA recognition site

1

1mht

M.HaeIII

Forms 5-methylC on both strands of a DNA recognition site

23

1dct

M.TaqI

Forms N6-methylA on both strands of a DNA recognition site

101

1g38

M.T4Dam

Forms N6-methylA on both strands of a DNA recognition site

102

1q0t

M.EcoDam

Forms N6-methylA on both strands of a DNA recognition site

19

2g1p

T4-Pdg (formerly known Removes pyrimidine dimers from DNA as T4 endonuclease V)

18

1vas

Human UDG

Removes uracil from DNA

103

4skn

E. coli MUG

Removes uracil or thymine from DNA containing G:T or G:U

104

1mwi

DNA glycosylases

Human AAG

Removes 3-methylA from DNA

105

1bnk

E. coli AlkA

Removes 3-methylA from DNA

106

1diz

hOGG1

Removes 8-oxoG from DNA

107

1ebm

B. stearothermophilius EndoIII

Removes oxidized pyrimidine from DNA

108

1p59

E. coli MutY

Removes adenines from mismatch base pair

109

1rrq

Apurinic/apyrimidinic endonucleases E. coli endonuclease IV

Cleaves the DNA backbone at apurinic/ apyrimidinic sites

20

1qum

Human apurinic/ apyrimidinic endonuclease (HAP1 or APE1)

Cleaves the DNA backbone at apurinic/ apyrimidinic sites

110

1dew

Other DNA repair proteins S. cerevisiae Rad4

Binds to the lesion and recruits the multi-subunit transcription factor TFIIH

17

2qsh

E. coli AlkB

Oxidizes N-alkylated base lesions to restore standard bases in single-stranded DNA and RNA

111

3bkz

Human ABH2

Oxidizes 1-methylA damage to restore A in double-stranded DNA

111

3btx

continued on next page

40

DNA and RNA Modii cation Enzymes

Table 1. Continued

Specific Protein

Catalytic Reaction

Anacystis nidulans DNA Repairs pyrimidine dimers via photo-induced photolyase cleavage of the cyclobutane ring

Primary Reference PDB Entry 71

1tez

Glucosyltransferases T4 bacteriophage BGT

Transfers the glucose moiety of UDP-glucose to the 5-hydroxymethylC bases making β -glucosidic bond

16

1m5r

T4 bacteriophage AGT

Transfers the glucose moiety of UDP-glucose to the 5-hydroxymethylC bases making α-glucosidic bond

112

1y8z

Sequence-specific endonucleases R.HinP1I

Cleaves phosphodiester bonds on both strands of a recognition site

113

2flc

R.Ecl18kI

Cleaves phosphodiester bonds on both strands of a recognition site

5

2fqz

R.PspGI

Cleaves phosphodiester bonds on both strands of a recognition site

96

3bm3

Tn5 transposase

Excises and integrates a transposon

114

1muh

6-8

2zkf 3clz 2zo1

Other DNA binding proteins SRA domain of UHRF1 Directs Dnmt1 methylation to (also known as ICBP90, hemi-methylated CpG sites Np95)

some other systems (see Table 1). Crystallographic studies showed that DNA base lipping comes in a variety of lavors (see Fig. 1) such as sole lipping of the target base itself,1,16 lipping of a base located on the opposite DNA strand to the target base (repair enzymes)17,18 or lipping of both nucleosides of a target base pair (repair enzymes, M.EcoDam, restriction endonucleases).5,19,20 In many cases, a concerted bending of the DNA helix is also observed.16,20 Although crystal structures reveal many structural details at atomic resolution, they provide only static snapshots, usually at the end of a lipping pathway; many dynamic and mechanistic aspects can only be discerned using other methods (see below). hus, crystallography lays down a structural basis for further solution studies. An important extension of the method is the use of DNA substrates containing conformationally restricted nucleotide analogs, or mutant proteins to trap base-lipping intermediates.21,22 However, interpretation of such experiments requires utmost caution since chemical alterations to a system may cause unnatural conformations in the target nucleotide. A major limitation of the method is that cocrystallization of proteins with their DNA substrates is oten tedious or even impossible. Covalent cross-linking with catalysis-based analogs1,23,24 or alkyldisulide tethers25 can be used to obtain stable protein-DNA complexes amenable to crystallization. In lack of cocrystals, base lipping can be predicted on the basis of topological considerations. his is valid in cases when catalytic residues are located in a concave pocket of a protein and thus cannot come to close proximity with the target base in B-DNA without a substantial conformational rearrangement of the protein-DNA complex. Many examples show that the rod-shaped helical DNA molecule is more lexible than a globular protein and thus the former oten undergoes the

Experimental Approaches to Study DNA Base Flipping

41

required conformational changes, although cases when conformational changes in the protein accompany binding of the lipped out base are not uncommon.13

NMR Spectroscopy and Imino Proton Exchange

NMR spectroscopy is a powerful technique that is well established to tackle various aspects of nucleic acids structure.26 In contrast to crystal structures, NMR can potentially give insights into dynamic aspects of base lipping. Smaller molecules are amenable to structure determination using heteronuclear labeling and 2D or 3D sampling techniques. However, dealing with larger protein-DNA complexes may be a challenge due to slow molecular tumbling or insuicient solubility. he irst attempt to study enzyme-induced base lipping by NMR in solution was performed for the M.HhaI DNA methyltransferase.27 Two 5-luorocytosine residues were incorporated into the target and a reference position within a cognate DNA substrate. 19F chemical shit analysis of the free DNA duplex and the M.HhaI-DNA complexes revealed the existence of multiple conformers of the target 5-luorocytosine along the base lipping pathway that were not seen in the previous crystal structures. To assess the exchange dynamics between stacked and lipped-out states, the T1, T2 and T1ρ spin relaxation times of 19F for the free duplex and the enzyme-DNA binary complex were determined. he observed relaxation parameters indicated that base pair lifetimes of the target and the reference residue are longer than 1 ms and are most likely similar; hence no dramatic acceleration of the internal motional processes in the DNA duplex upon binding of M.HhaI could be detected in these experiments. More recent NMR analysis of interactions between cyclobutane pyrimidine dimer photolyase and its single and double-stranded DNA substrates was performed employing 13C or 15N segmentally labeled DNA substrates.28 Chemical shit diferences of 1H-13C HSQC resonances from the cyclobutane pyrimidine moiety upon binding of the deuterated protein and its mutant indicated intimate contacts between the DNA lesion and a Trp residue in a cavity in the enzyme. In light of largely preserved base pairing in the rest of the DNA duplex (derived from analysis of the imino region of a 1H-15N HSQC spectrum), a very localized but dramatic conformational change at the damaged dinucleotide (i.e., base-lipping) was proposed. A series of NMR experiments have been devoted to study the dynamics of base pairing in DNA in solution29,30 and in solid state.31 As mentioned above, double helical nucleic acids undergo spontaneous conformational luctuations at physiological conditions which include transient disruptions of base pairing interactions. he imino protons, which reside on N1 of guanine and N3 of thymine/uracil, are not accessible to bulk solvent in a closed base pair, but can be exchanged with those of water in an open state. Based on a two-state model, the lifetimes of the closed and open state for individual base pairs can be derived from the analysis of spin inversion recovery or spin saturation transfer from water. In general, the base pair lifetimes (in the closed state) have been found to be in he range of 1-5 ms for A:T base pairs and 10-50 ms for G:C pairs at 15˚C, but can vary by a large margin in diferent sequence contexts.30 Analogous comparative experiments have also been performed using DNA-protein complexes and corresponding free DNA duplexes in order to establish the roles of enzymes in the base lipping mechanism.27,32 A lack of or a small acceleration of the breathing rate upon binding of an enzyme was typically observed and interpreted as a passive mechanism by which the enzyme merely catches the spontaneously lipped out base.13,22,32 It should be noted that, due to their dynamic nature, the NMR-detectable open base pairs have not been structurally characterized by other experimental means. Computational estimates of the minimum rotation of a base that is required to allow hydrogen exchange with solvent are in the range of 30-40˚, which is only 20-25% of the full 180˚ rotation observed in most lipped out complexes.32 An estimated free energy barrier for the open state derived by Arrhenius treatment of an average equilibrium constant of 10-7 30 is around 9 kcal/mol, which accounts for roughly a half of the total 15-20 kcal/mol required for a complete rotational expulsion of the nucleotide.33,34 he majority of stacking interactions may still be preserved in such open intermediates especially in cases when the complementary bases move asymmetrically towards opposite DNA grooves. In all likelihood, the nucleobases remain largely obscured within the DNA stack in such open

42

DNA and RNA Modii cation Enzymes

base pairs and therefore, they cannot be regarded as extrahelical or lipped-out. However, in many reports dealing with mechanistic issues of passive and active role of enzymes, hardly any distinction is made between the terms “base pair opening”, “base lipping”, “extrahelical base”, which are indiscriminately used as synonyms.22,32 Most importantly, the conformational motions that are observed in such NMR experiments largely relect early events along the pathway to a fully lipped out state and such bases are insuiciently exposed to be simply captured in a concave catalytic site of an enzyme in a passive manner. A more realistic model for a passive base lipping comes from observing the capture of extrahelical guanine bases by macrocyclic glycans such as β-cyclodextrin. he β-cyclodextrin macrocycle traps a guanine base in a high ainity guest-host complex. Due to nearly irreversible capture of extrahelical guanines at saturating concentrations of this compound, DNA undergoes a irst-order denaturation reaction (low temperature melting) with a rate of 0.003 s-1 at 51˚C.35 Remarkably, the latter number matches the apparent rate of target cytosine lipping (klip ∼ kchem = 0.2 min−1 = 0.003 s−1 at 37˚C) in a mutant (Q237G) of the HhaI methyltransferase that is deicient in promoting active base lipping.36 Although such a close match of the rates observed in a chemical and enzymatic systems may appear fortuitous, it clearly illustrates that the events of spontaneous lipping of nucleobases into extended extrahelical positions in DNA occur at frequencies several orders of magnitude lower than the NMR-detectable imino proton exchange. his means that the NMR derived exchange rates are less predictive than were generally thought (and were oten overexploited) for assigning an active or passive role for an enzyme in base lipping and at best can provide an upper estimate for the rate of spontaneous appearance of unpaired bases in DNA. Since most DNA modiication and repair enzymes operate at turnover rates (kcat or kchem) faster than 1 min−1 they cannot fully rely on DNA breathing for their base lipping needs. For example, extensive NMR and kinetic studies of DNA uracil glycosylase conclude that partial capture (with 80% of cases, but also in a substantial fraction of benign regions from cancer-carrying prostates. Detailed analyses show that hypermethylation afects essentially each CpG-site in the cancers, whereas the methylation pattern in the benign tissues is patchy, partial and heterogeneous.66 hese patterns suggest a precursor relationship, whereby partial methylation changes accumulating in an aging organ predispose to the development of a cancer, in which the methylation changes are aggravated. Similar indings have been reported for further genes in bladder and colon cancers.73,74 Although other aging-related diseases are not studied in so much detail, there are intriguing hints at similar relationships. For instance, two genes found to become more strongly methylated in human cortex with age showed hypermethylation in Alzheimer’s patients.75 A second major question in aging research is to which extent the phenomenon of replicative senescence that can be observed in cultured normal cells contributes to aging at the organism level. Normal cells can be propagated in culture for a number of passages, but depending on the cell type and donor, they eventually cease to proliferate irreversibly and exhibit characteristic changes in

DNA Methylation and Human Diseases: An Overview

113

cell morphology and gene expression, which together constitute the senescent phenotype. Various mechanisms contribute, including telomere erosion and accumulation of cell cycle inhibitor proteins, which activate cellular checkpoints and arrest cell cycle progression. Intriguingly, cellular senescence in vitro is associated with changes in DNA methylation. For instance, normal human ibroblasts in culture accumulate CpG island methylation at the ER and IGF2 genes.76 Other genes hypermethylated in senescent cells are involved in cytoskeleton regulation and interferon signaling.77 Intriguingly, related genes are also downregulated and hypermethylated in cancer types most consistently associated with aging, such as prostate cancer.78 Conversely, cells prevented from undergoing senescence by DNA tumor viruses do not show according methylation changes.79 hus, although the extent to which senescent cells contribute to aging in humans is certainly controversial, the changes in DNA methylation patterns in aging tissues—and more severely in cancers, resemble those in senescent cells in culture.

Conclusions and Future Prospects

Several large scale studies on DNA methylation and gene expression across the human genome are underway.80. hese are expected to lay a better foundation to the study of DNA methylation changes in many human diseases, to detail the relationship of DNA methylation with gene expression and to elucidate the diferences in DNA methylation patterns among individuals and between cell types. In such studies, it may be important to consider that DNA methylation may exert functions in the speciication of individual cell types, e.g., in the immune system and perhaps nervous system and certainly in the distinction between mesenchymal and epithelial cells within a tissue. hese functions may become a major focus of future research. Large-scale studies and investigations on individual genes will hopefully combine to yield a comprehensive description of DNA methylation changes in various human cancer types and their relation to subtypes and progression, thereby providing a solid basis for the use of DNA methylation assays in cancer diagnostics. In the immune system, the mechanisms by which DNA methylation contributes to the speciication and functional selection of the various cell types appears to emerge as a focus of research. Clearly, much more needs to be known about the involvement of DNA methylation in immune diseases and especially on the more common situations, in which the immune system reacts insuiciently or inappropriately. Despite intriguing hints, not least from hereditary disesases, the functions of DNA methylation in the speciic workings of the human brain remain largely conjectural. Accordingly, although the involvement of epigenetic mechanisms in the pathogenesis of psychiatric diseases is plausible,81 there is to date no substantial evidence, especially regarding DNA methylation. he evidence for changes of DNA methylation in degenerative diseases, e.g., of the brain and cardiovascular system, is accumulating. In these diseases, the hardest task may be to disentangle DNA methylation changes from general disturbances of gene expression and methyl group metabolism and establish their functional contribution. Finally, gradual changes of DNA methylation in aging tissues that aggravate in the actual diseases may link degenerative diseases and cancers to aging. he study of the mechanisms establishing DNA methylation patterns in human cells has made enormous progress recently. here is now a consensus that DNA methylation at any particular sequence is brought about by an interaction of DNA methyltransferases, chromatin-modifying and chromatin-remodeling protein complexes, transcriptional activators and repressors and probably RNA components. In a sense, however, this recognition has complicated rather than simpliied the issue, as the particular combinations and interactions of these factors appear to difer from site to site. It may take considerable efort for general rules to emerge. he aberrant methylation patterns observed in some inherited diseases and more frequently in cancers cells need speciic investigation, but may be particularly helpful to identify those rules. he ultimate goals of basic research into human diseases are improved prevention, diagnosis and therapy. Obviously, describing DNA methylation changes in human diseases, understanding their role in pathophysiology and the mechanisms bringing them about, should aid in achieving all three purposes. DNA hypermethylation assays to detect cancers highlight the diagnostic application. Drugs targeting DNA methylation or chromatin modiications are already in use, primarily for cancer

114

DNA and RNA Modii cation Enzymes

treatment.81,82 Unfortunately, the record especially of drugs inhibiting DNA methyltransferases has so far not been brilliant. he discussion of the several reasons for their limited eicacy deserves a chapter of its own.57 To mention one major problem in brief, the DNA methyltransferase inhibitors presently approved for clinical use are nucleoside inhibitors that react covalently with DNA methyltransferases ater having been incorporated into DNA, thereby depleting the cell of these enzymes during successive rounds of replication. his roundabout way of action damages DNA, induces repair and cellular checkpoints, but inhibits DNA methyltransferases only partially and induces methylation changes unselectively and rather slowly. Moreover, the changes oten revert. Newer approaches therefore aim to develop DNA methyltransferase inhibitors acting directly on the enzymes or at speciic genes. Moreover, techniques for the selective enhancement of DNA methylation at speciic sites are under development. A more general problem in the development of epigenetic inhibitors has so far been rarely addressed, but it should be evident from this chapter. If DNA methylation indeed contributes to the speciication of cell function in the immune and neuronal systems, systemic use of unspeciic epigenetic inhibitors may interfere. he detailed elucidation of the role of DNA methylation in various tissues may help to estimate the degree of this interference and to develop methods to detect and prevent it.

Acknowledgements

Work on DNA methylation in our laboratory is supported by the Deutsche Forschungsgemeinschat and the Deutsche Krebshilfe.

References

1. Viré E, Brenner C, Deplus R et al. he polycomb group protein EZH2 directly controls DNA methylation. Nature 2006; 439:871-874. 2. Muegge K. Lsh, a guardian of heterochromatin at repeat elements. Biochem Cell Biol 2005; 83:548-54. 3. Vaissière T, Sawan C, Herceg Z. Epigenetic interplay between histone modiications and DNA methylation in gene silencing. Mutat Res 2008; 659:40-48. 4. Kangaspeska S, Stride B, Métivier R et al. Transient cyclical methylation of promoter DNA. Nature 2008; 452:112-115. 5. Reik W. Stability and lexibility of epigenetic gene regulation in mammalian development. Nature 2007; 447:425-432. 6. La Salle S, Oakes CC, Neaga OR et al. Loss of spermatogonia and wide-spread DNA methylation defects in newborn male mice deicient in DNMT3L. BMC Dev Biol 2007; 7:104. 7. Mertineit C, Yoder JA, Taketo T et al. Sex-speciic exons control DNA methyltransferase in mammalian germ cells. Development 1998; 125:889-897. 8. Heard E, Disteche CM. Dosage compensation in mammals: ine-tuning the expression of the X chromosome. Genes Dev 2006; 20:1848-1867. 9. Ogawa Y, Sun BK, Lee JT. Intersection of the RNA interference and X-inactivation pathways. Science 2008; 320:1336-1341. 10. Goll MG, Bestor TH. Eukaryotic cytosine methyltransferases. Annu Rev Biochem 2005; 74:481-514. 11. Schulz WA, Steinhof C, Florl AR. Methylation of endogenous human retroelements in health and disease. Curr Topics Microbiol Immunol 2006; 310:211-250. 12. Steinhof C, Schulz WA. Transcriptional regulation of the human LINE-1 retrotransposon L1.2B. Mol Genet Genomics 2003; 270:394-402. 13. Jelinic P, Shaw P. Loss of imprinting and cancer. J Pathol 2007; 211:261-268. 14. Miranda TB, Jones PA. DNA methylation: the nuts and bolts of repression. J Cell Physiol 2007; 213:384-390. 15. Weber M, Hellmann I, Stadler MB et al. Distribution, silencing potential and evolutionary impact of promoter DNA methylation in the human genome. Nat Genet 2007; 39442-39453. 16. Farthing CR, Ficz G, Ng RK et al. Global mapping of DNA methylation in mouse promoters reveals epigenetic reprogramming of pluripotency genes. PLoS Genet 2008; 4:e1000116. 17. Mohn F, Weber M, Rebhan M et al. Lineage-speciic polycomb targets and de novo DNA methylation deine restriction and potential of neuronal progenitors. Mol Cell 2008; 30:755-66. 18. Dokun OY, Florl AR, Seifert HH et al. Relationship of SNCG, S100A4, S100A9 and LCN2 gene expression and DNA methylation in bladder cancer. Int J Cancer 2008; 123:2798-2807. 19. Ehrlich M. he ICF syndrome, a DNA methyltransferase 3B deiciency and immunodeiciency disease. Clin Immunol 2003; 109:17-28. 20. Robertson KD. DNA methylation and human disease. Nat Rev Genet 2005; 6:597-610.

DNA Methylation and Human Diseases: An Overview

115

21. Jin B, Tao Q, Peng J et al. DNA methyltransferase 3B (DNMT3B) mutations in ICF syndrome lead to altered epigenetic modiications and aberrant expression of genes regulating development, neurogenesis and immune function. Hum Mol Genet 2008; 17:690-709. 22. Scarano MI, Strazzullo M, Matarazzo MR et al. DNA methylation 40 years later: its role in human health and disease. J Cell Physiol 2005; 204:21-35. 23. Garrick D, Sharpe JA, Arkell R et al. Loss of atrx afects trophoblast development and the pattern of X-inactivation in extraembryonic tissues. PLoS Genet 2006; 2:e58. 24. Tang P, Park DJ, Marshall Graves JA et al. ATRX and sex diferentiation. Trends Endocrinol Metab 2004; 15:339-344. 25. Bienvenu T, Chelly J. Molecular genetics of Rett syndrome: when DNA methylation goes unrecognized. Nat Rev Genet 2006; 7:415-426. 26. Chang Q, Khare G, Dani V et al. he disease progression of mecp2 Mutant mice is afected by the level of BDNF expression. Neuron 2006; 49:341-8. 27. Lalande M, Calciano MA. Molecular epigenetics of Angelman syndrome. Cell Mol Life Sci 2007; 64:947-60. 28. Horsthemke B, Wagstaf J. Mechanisms of imprinting of the Prader-Willi/Angelman region. Am J Med Genet 2008; 146A:2041-2052. 29. Mackay DJ, Hahnemann JM, Boonen SE et al. Epimutation of the TNDM locus and the Beckwith-Wiedemann syndrome centromeric locus in individuals with transient neonatal diabetes mellitus. Hum Genet 2006; 119:179-184. 30. Peters J, Williamson CM. Control of imprinting at the gnas cluster. Adv Exp Med Biol 2008; 626:16-26. 31. Jin P, Warren ST. Understanding the molecular basis of fragile X syndrome. Hum Mol Genet 2000; 9:901-908. 32. van der Maarel SM, Frants RR. he D4Z4 repeat-mediated pathogenesis of facioscapulohumeral muscular dystrophy. Am J Hum Genet 2005; 76:375-386. 33. Tsumagari K, Qi L, Jackson K et al. Epigenetics of a tandem DNA repeat: chromatin DNaseI sensitivity and opposite methylation changes in cancers. Nucleic Acids Res 2008; 36:2196-2207. 34. Polansky JK, Kretschmer K, Freyer J et al. DNA methylation controls Foxp3 gene expression. Eur J Immunol 2008; 38:1654-63. 35. Landoli MM, Scollay R, Parnes JR. Speciic demethylation of the CD4 gene during CD4 T-lymphocyte diferentiation. Mol Immunol 1997; 34:53-61. 36. Wilson CB, Makar KW, Shnyreva M et al. DNA methylation and the expanding epigenetics of T-cell lineage commitment. Semin Immunol 2005; 17:105-119. 37. Reiner SL. Epigenetic control in the immune response. Hum Mol Genet 2005; 14:41-46. 38. Corcoran AE. Immunoglobulin locus silencing and allelic exclusion. Semin Immunol 2005; 17:141-154. 39. Santourlidis S, Grafmann N, Christ J et al. Lineage-speciic transition of histone signatures in the killer cell Fg-like receptor locus from hematopoietic progenitor to NK cells. J Immunol 2008; 180:418-25. 40. McStay B, Grummt I. he epigenetics of rRNA genes: from molecular to chromosome biology. Annu Rev Cell Dev Biol 2008; 24:131-157. 41. Strickland FM, Richardson BC. Epigenetics in human autoimmunity. Epigenetics in autoimmunity—DNA methylation in systemic lupus erythematosus and beyond. Autoimmunity 2008; 41:278-286. 42. Lee BH, Yegnasubramanian S, Lin X et al. Procainamide is a speciic inhibitor of DNA methyltransferase 1. J Biol Chem 2005; 280:40749-40756. 43. Sánchez-Pernaute O, Ospelt C, Neidhart M et al. Epigenetic clues to rheumatoid arthritis. J Autoimmun 2008; 30:12-20. 44. Perl A, Nagy G, Koncz A et al. Molecular mimicry and immunomodulation by the HRES-1 endogenous retrovirus in SLE. Autoimmunity 2008; 41:287-297. 45. Gilliet M, Lande R. Antimicrobial peptides and self-DNA in autoimmune skin inlammation. Curr Opin Immunol 2008; 20:401-407. 46. Karikó K, Weissman D. Naturally occurring nucleoside modiications suppress the immunostimulatory activity of RNA: implication for therapeutic RNA development. Curr Opin Drug Discov Devel 2007; 10:523-532. 47. Herrmann W. Signiicance of hyperhomocysteinemia. Clin Lab 2006; 52:367-374. 48. Girelli D, Friso S, Trabetti E et al. Methylenetetrahydrofolate reductase C677T mutation, plasma homocysteine and folate in subjects from northern Italy with or without angiographically documented severe coronary atherosclerotic disease: evidence for an important genetic-environmental interaction. Blood 1999; 93:1118-1120. 49. Ulrey CL, Liu L, Andrews LG et al. he impact of metabolism on DNA methylation. Hum Mol Genet 2005; 14:R139-147. 50. Wilson AS, Power BE, Molloy PL. DNA hypomethylation and human diseases. Biochim Biophys Acta 2007; 1775:138-162.

116

DNA and RNA Modii cation Enzymes

51. James SJ, Melnyk S, Pogribna M et al. Elevation in S-adenosylhomocysteine and DNA hypomethylation: potential epigenetic mechanism for homocysteine-related pathology. J Nutr 2002; 132:2361S-2366S. 52. Choi SW, Friso S. Interactions between folate and aging for carcinogenesis. Clin Chem Lab Med 2005; 43:1151-1157. 53. Coppen A, Bolander-Gouaille C. Treatment of depression: time to consider folic acid and vitamin B12. J Psychopharmacol 2005; 19:59-65. 54. Laird PW. Cancer epigenetics. Hum Mol Genet 2005; 14:R65-76. 55. Esteller M. Epigenetic gene silencing in cancer: the DNA hypermethylome. Hum Mol Genet 2007; 16:R50-59. 56. Jones PA, Baylin SB. he epigenomics of cancer. Cell 2007; 128:683-692. 57. Stresemann C, Lyko F. Modes of action of the DNA methyltransferase inhibitors azacytidine and decitabine. Int J Cancer 2008; 123:8-13. 58. Strunnikova M, Schagdarsurengin U, Kehlen A et al. Chromatin inactivation precedes de novo DNA methylation during the progressive epigenetic silencing of the RASSF1A promoter. Mol Cell Biol 2005; 25:3923-3933. 59. Park IK, Qian D, Kiel M et al. Bmi-1 is required for maintenance of adult self-renewing haematopoietic stem cells. Nature 2003; 423:302-305. 60. Ohm JE, McGarvey KM, Yu X et al. A stem cell-like chromatin pattern may predispose tumor suppressor genes to DNA hypermethylation and heritable silencing. Nat Genet 2007; 39:237-242. 61. Schlesinger Y, Straussman R, Keshet I et al. Polycomb-mediated methylation on Lys27 of histone H3 premarks genes for de novo methylation in cancer. Nat Genet 2007; 39:232-236. 62. Hofmann MJ, Schulz WA. Causes and consequences of DNA hypomethylation in human cancer. Biochem Cell Biol 2005; 83:296-321. 63. Rodriguez J, Vives L, Jordà M et al. Genome-wide tracking of unmethylated DNA Alu repeats in normal and cancer cells. Nucleic Acids Res 2008; 36:770-784. 64. Issa JP. CpG island methylator phenotype in cancer. Nat Rev Cancer 2004; 4:988-993. 65. Jass JR. Classiication of colorectal cancer based on correlation of clinical, morphological and molecular features. Histopathology 2007; 50:113-130. 66. Florl AR, Steinhof C, Müller M et al. Coordinate hypermethylation at speciic sites in prostate carcinoma precedes LINE-1 hypomethylation. Brit J Cancer 2004; 91:985-994. 67. Howard G, Eiges R, Gaudet F et al. Activation and transposition of endogenous retroviral elements in hypomethylation induced tumors in mice. Oncogene 2008; 27:404-408. 68. Futscher BW, Oshiro MM, Wozniak RJ et al. Role for DNA methylation in the control of cell type speciic maspin expression. Nat Genet 2002; 31:175-179. 69. Boyer LA, Lee TI, Cole MF et al. Core transcriptional regulatory circuitry in human embryonic stem cells. Cell 2005; 122:947-956. 70. Li JY, Pu MT, Hirasawa R et al. Synergistic function of DNA methyltransferases Dnmt3a and Dnmt3b in the Methylation of Oct4 and Nanog. Mol Cell Biol 2007; 27:8748-8759. 71. Feinberg AP. Phenotypic plasticity and the epigenetics of human disease. Nature 2007; 447:433-440. 72. Dammann R, Schagdarsurengin U, Seidel C et al. he tumor suppressor RASSF1A in human carcinogenesis: an update. Histol Histopathol 2005; 20:645-663. 73. Neuhausen A, Florl AR, Grimm MO et al. DNA methylation alterations in urothelial carcinoma. Cancer Biol her 2006; 8:993-1001. 74. Shen L, Kondo Y, Rosner GL et al. MGMT promoter methylation and ield defect in sporadic colorectal cancer. J Natl Cancer Inst 2005; 97:1330-1338. 75. Siegmund KD, Connor CM, Campan M et al. DNA methylation in the human cerebral cortex is dynamically regulated throughout the life span and involves diferentiated neurons. PLoS ONE 2007; 2:e895. 76. Issa JP. Aging, DNA methylation and cancer. Crit Rev Oncol Hematol 1999; 32:31-43. 77. Fridman AL, Tang L, Kulaeva OI et al. Expression proiling identiies three pathways altered in cellular immortalization: interferon, cell cycle and cytoskeleton. J Gerontol A Biol Sci Med Sci 2006; 61:879-889. 78. Schulz WA, Alexa A, Jung V et al. Factor interaction analysis for chromosome 8 and DNA methylation alterations highlights innate immune response suppression and cytoskeletal changes in prostate cancer. Mol Cancer 2007; 6:14. 79. Liu L, Zhang J, Bates S et al. A methylation proile of in vitro immortalized human cell lines. Int J Oncol. 2005; 26:275-285. 80. Suzuki MM, Bird A. DNA methylation landscapes: provocative insights from epigenomics. Nat Rev Genet 2008; 9:465-476. 81. Ptak C, Petronis A. Epigenetics and complex disease: from etiology to new therapeutics. Annu Rev Pharmacol Toxicol 2008; 48:257-276. 82. Strathdee G, Brown R. Aberrant DNA methylation in cancer: potential clinical interventions. Expert Rev Mol Med 2002; 4:10-17.

Chapter 10

Expanding the Chemical Repertoire of DNA Methyltransferases by Cofactor Engineering Basar Gider and Elmar Weinhold*

Abstract

R

edesigning enzyme catalysis is of general interest in biological research and biotechnology. Enzymes oten possess some degree of substrate promiscuity that can be exploited to change the course of enzymatic reactions. In this chapter we discuss examples of modiied substrates for various enzymes and their use for targeted labeling of proteins, carbohydrates and nucleic acids. hese modiied substrates are either used to directly attach reporter groups to the targets or to connect chemical handles suitable for subsequent bioorthogonal labeling reactions. A particular emphasis is given to cofactor engineering for DNA methyltransferases (MTases) to expand their catalytic function. Naturally, these enzymes catalyze the transfer of the activated methyl group from the ubiquitous cofactor S-adenosyl-l-methionine (AdoMet or SAM) to nucleobases within speciic DNA recognition sequences. With engineered AdoMet analogs, larger chemical groups than methyl can be delivered by DNA MTases. his method for sequence-speciic DNA labeling is very lexible both in terms of reporter groups as well as in DNA sequences. In addition, these cofactor analogs could provide powerful tools for targeted functionalization and labeling of RNA and proteins using appropriate RNA and protein MTases as catalysts.

Introduction

Nature provides us with a huge variety of enzymes that catalyze an enormous number of chemical transformations. hese biocatalysts oten show remarkable catalytic eiciencies with their natural substrates and expanding their catalytic scope to nonnatural substrates is of general interest in biotechnology and chemical biology. Broadening substrate speciicity is typically achieved by protein engineering that uses site-directed or random mutagenesis to exchange individual amino acid residues within the enzyme of interest. Although protein engineering has proven to be extremely powerful,1 this approach inherently focuses on changing the catalyst itself. Another approach to expand the catalytic repertoire of enzymes makes use of enzymatic substrate promiscuity. Carefully designed synthetic substrate and cofactor analogs can be used to trick the enzymes and lead to new useful transformations. his chapter will focus on substrate and cofactor engineering for enzyme-mediated technologies leading to targeted labeling of biopolymers. hese technologies will be classiied under the three main groups of biopolymers: proteins, carbohydrates and nucleic acids. Diferent aspects of substrate and cofactor design will be discussed for the various enzymes. In particular, we will concentrate on *Corresponding Author: Elmar Weinhold—Institute of Organic Chemistry, RWTH Aachen University, Landoltweg 1, D-52056 Aachen, Germany. Email: [email protected]

DNA and RNA Modiication Enzymes: Structure, Mechanism, Function and Evolution, edited by Henri Grosjean. ©2009 Landes Bioscience.

118

DNA and RNA Modii cation Enzymes

cofactor engineering to expand the chemical repertoire of the DNA methyltransferases (MTases). Naturally, DNA MTases catalyze the transfer of the activated methyl group from the ubiquitous cofactor S-adenosyl-l-methionine (AdoMet or SAM) to their target nucleotides within speciic DNA sequences. Two classes of AdoMet analogs have been developed which convert these enzymes into alkyltransferases. In the irst class, the amino acid side chain of AdoMet is replaced by an aziridinyl residue leading to coupling of the cofactor analogs with DNA. In the second class of cofactor analogs the methyl group of AdoMet is enlarged resulting in direct transfer of extended carbon chains. Design considerations and mechanistic aspects of these engineered cofactor analogs will be described. In the last section prospects and applications of these AdoMet analogs will be discussed.

Modiied Substrates and Cofactors for Enzyme-Mediated Labeling Protein and Carbohydrate Labeling

Comprehensive understanding of biological systems oten needs protein and carbohydrate labeling methods to investigate cellular processes in vitro as well as in their native environments. Commonly, genetic fusions with luorescent proteins, e.g., green luorescent protein (GFP),2 are used for cellular imaging of proteins and excellent results can be achieved. GFP consists of 238 amino acids residues and, unfortunately, there are several potential disadvantages in using luorescent protein fusions. hese include possible structural perturbations or steric hindrance of protein interactions, low luorescent brightness and low photostability. hus, other approaches for site-speciic protein labeling are of interest.3 Covalent protein labeling within cells can be achieved by fusion with the human DNA repair protein O6-alkylguanine-DNA alkyltransferase (hAGT).4 his protein naturally transfers the methyl group from the damaged nucleobase O6-alkylguanine in DNA 1 to a cysteine residue within its active site resulting in self modiication and inactivation (Fig. 1A). Importantly, hAGT also serves as a substrate to accept a benzyl group from O6-benzylguanines 2, even if not a residue in DNA, which provides a basis for the use of engineered benzyl analogs (Fig. 1B). Several substrate analogs with additional biotin or luorophores attached to the benzyl group of O6-benzylguanine have been synthesized and used to label hAGT fusion proteins in vitro and in vivo. his system called SNAP-tag is very versatile in terms of molecular labels. In a recent study a mutant hAGT was selected that accepts O6-benzylcytosine derivatives and used in combination with the SNAP-tag to perform simultaneous two-color labeling in cells.5 he hAGT protein is still quite large (about 200 amino acid residues) which makes the development of shorter peptide tags interesting. Very recently, lipoic acid ligase (LplA) was employed to label cell surface proteins with reporter molecules. LplA from Escherichia coli catalyzes an adenosine triphosphate-dependant covalent ligation of lipoic acid (Compound 3 in)) with speciic lysine residues in three proteins involved in oxidative metabolism (E2p, E2o and H-protein).6 Besides its high peptide sequence speciicity, this enzyme shows a pronounced promiscuity for lipoic acid analogs. Several carboxylic acids containing a terminal azide or alkyne were tested and 8-azidooctanoic acid (4) was found to be the best substrate among the tested lipoic acid analogs (Fig. 1D). In addition, a 22-amino-acid recognition sequence for LplA was designed and genetically fused to cell surface proteins. Ater expression of the cell surface fusion protein the peptide tag was modiied with an azide by incubation with 8-azidooctanoic acid (4) and LplA and then luorescently labeled in a strain-promoted 1,3-dipolar cycloaddition with cyclic alkynes.7 Lipoic acid ligase allows enzymatic modiication with a chemical reporter group which is then speciically labeled with luorescence or ainity probes in a second chemical step. However, one-step direct labeling requires that the label is part of the substrate for posttranslational modiication. he phosphopantetheinyltransferase (PPTase) from Bacillus subtilus (Sfp) and from Escherichia coli (AcpS) show pronounced substrate promiscuity towards modiications of coenzyme A (CoA) at the terminal thiol while maintaining high peptide sequence speciicity. Naturally, they catalyze the transfer of the 4ʹ-phosphopantethienyl group from CoA (5) to speciic serine residues in peptidyl carrier protein (PEP) or in acyl carrier proteins (ACP) (Fig. 1E). Simple conjugation of the CoA thiol with maleimide-functionalized reporters yielded substrat analogs 6 for direct labelling of

Expanding the Chemical Repertoire of DNA Methyltransferases by Cofactor Engineering

119

Figure 1. Protein labeling with synthetic substrates for human O 6 -alkylguanine-DNA alkyltransferase (hAGT), lipoic acid ligase from Escherichia coli (LplA) and the phosphopantetheinyltransferases Sfp and AcpS. A) Natural methyl group transfer from damaged O 6 -alkylguanine in DNA 1 to a specific cysteine residue within the active site of hAGT. Note that hAGT becomes inactive after covalent self-modification. B) Genetic fusion of hAGT to a target protein and direct transfer of a label (gray sphere) attached to the benzyl group of O 6 -benzylguanines 2 which serve as substrate analogs for hAGT. C) Natural ligation of lipoic acid (3) with a specific lysine residue within the natural substrate dihydrolipoamide acetyltransferase subunit (E2p) of the pyruvate dehydrogenase complex by LplA. D) Ligation of the lipoic acid analog 4 containing an azide function to a short recognition sequence for LplA genetically fused to a target protein. The bioorthogonal azide function can be selectively modified in a second step with reporters containing alkynes. E) Natural transfer of the 4ʹ-phosphopantethienyl group from CoA (5) to specific serine residues in peptidyl carrier protein (PEP) or to acyl carrier proteins (ACP) by Sfp and AcpS. F) Direct transfer of labels (gray sphere) from thiol-modified CoA analogs 6 to short peptide tags genetically fused to target proteins.

120

DNA and RNA Modii cation Enzymes

the short proteins PEP (80 amino acids) and ACP (77 amino acids) as well as fusion proteins on cell surfaces.8,9 In addition, it was possible to select short peptide tags (12 amino acid residues) for Sfp and AcpS. hey minimize steric hindrance when fused to a target protein and allow one-step labeling with CoA analogs carrying various reporter groups with high eiciency and speciicity (Fig. 1F). Furthermore, Sfp and AcpS show high speciicity for their individual tag that opens the door to sequential orthogonal labeling with two reporters.10 Glycosylation of proteins and lipids participates in central biological events inside and outside the cell. his makes glycans attractive targets for imaging their localization, traicking and dynamics by labeling. he complex glycan structures are assembled by glycosyltransferases (GTs) which transfer nucleotide donor sugar to acceptor sugars. GTs show some degree of substrate promiscuity which has been utilized for GT-mediated tagging of glycoconjugates. For example, uridine 5ʹ-diphospho-α-d-galactose (UDP-Gal, 7) acts as donor sugar for various galactosyltransferases (GalTs) and the human β3GalT5 transfers the galactose residue to N-acetyl-d-glucosamine (GlcNAc) forming Gal(β1-3)GlcNAc structures (Fig. 2A). his enzyme and some other GalTs were also able to transfer the biotinylated UDP-Gal analog 8 leading to labeling of the formed glycan attached to the test protein bovine serum albumin (BSA) (Fig. 2B).11 Introducing carbohydrate-reporting groups into cellular glycoconjugates can also be performed by metabolic incorporation which takes advantage of several permissive enzymes within a biosynthetic pathway. Typically, peracetylated sugar analogs carrying small bioorthogonal chemical groups are taken up by the cell, hydrolyzed, metabolized to the activated donor sugars,

Figure 2. Carbohydrate labeling with a synthetic substrate for the human galactosyltransferase

β3GalT5 and metabolic incorporation of sugar analogs for cell surface labeling. A) Natural transfer of galactose from the donor uridine 5ʹ-diphospho- α- D -galactose (UDP-Gal, 7) to the

acceptor N-acetyl- D -glucosamine conjugated to bovine serum albumin (BSA). B) Analogous enzymatic reaction with the biotinylated UDP-Gal analog 8 leading to direct biotin labeling of the glycoconjugate. C) Metabolic incorporation of peracetylated sugar analogs carrying terminal azido or alkynyl groups for bioorthogonal labeling within glycoconjugates (R = additional sugar residues) after cell surface display.

Expanding the Chemical Repertoire of DNA Methyltransferases by Cofactor Engineering

121

incorporated into glycoconjugates and displayed on the cell surface. In an early study, peracetylated N-azidoacetylmannosamine (9) was delivered into the biosynthetic pathway of sialic acid leading to azido-tagged glycoconjugate 10 on the cell surface (Fig. 2C). he azido group was then chemoselectively labeled with biotin in a Staudinger ligation reaction and modiied sialic acids on the cell surface were visualized by addition luorescein-conjugated avidin.12 More recently, the peracetylated fucose analog 11 carrying a terminal alkynyl group was used for metabolic incorporation (Fig. 2C). Modiied glyconjugate 12 was visualized by coupling the terminal alkyne with a biotin azide in a copper-catalyzed 1,3-diploar cycloaddition (click reaction) followed by the addition of luorescein-conjugated streptavidin.13 Such a strategy was also successful for imaging membrane-associated glycans in a whole organism. Zebraish embryos were incubated with N-azidoacetylgalactosamine (13) and azide-labeled glycoconjugates 14 were luorescence-labeled in a nontoxic copper-free click reaction employing a diluorinated cyclooctyne reagent for the strain-promoted 1,3-diploar cycloaddition.14

Nucleic Acid Labeling

Enzymatic labeling of DNA or RNA is typically performed with modiied deoxynucleoside triphosphates (dNTPs) or nucleoside triphosphates (NTPs) carrying reporter groups attached to their nucleobases. Incorporation of the modiied nucleotides is mostly achieved by DNA or RNA polymerases which oten show a pronounced tolerance towards chemical groups at the 5 position of pyrimidines and the 7 position of purines (deazapurines). here are three main labeling methods when using DNA polymerases: random-primed labeling, nick-translation and polymerase chain reaction. Alternatively, terminal deoxynucleotidyl transferase can be used to append the 3ʹ ends of DNA with labeled dNTPs. Although very powerful, these methods do not provide sequence-speciic labeling of native DNA. Bearing in mind the size of DNA and the recurrence of only a small number of monomeric units, sequence-speciic labeling is a challenging task. However, nature has already partly solved this problem. DNA methyltransferases (DNA MTases) catalyze the nucleophilic attack of either adenine or cytosine residues within speciic double-stranded DNA sequences onto the activated methyl group of the cofactor S-adenosyl-l-methionine (AdoMet or SAM, 15) and can be regarded as enzymes which label DNA with a methyl group. DNA MTases can be categorized into three classes deined by their target base and methylation position. hey modify the exocyclic nitrogen at the 6 position of adenine (DNA adenine-N6 MTases), the exocyclic nitrogen at the 4 position of cytosine (DNA cytosine-N4 MTases) or carbon at the 5 position of cytosine (DNA cytosine-C5 MTases) leading to methylated DNA and the demethylated cofactor product S-adenosyl-l-homocysteine (AdoHcy or SAH, 16) (Fig. 3, see also chapter by Coin, Youngblood and Reich in this volume). Most if not all bacterial and archaeal DNA MTases exhibit a clearly deined sequence and base speciicity. In bacteria, these enzymes are oten accompanied by restriction endonucleases (REases), forming restriction modiication systems and protect the host DNA against fragmentation by the cognate REases. REBASE, a database for REases and DNA MTases, currently lists about 1000 DNA MTases with over 200 distinct recognition sequences ranging from two to eight base pairs in length.15 hus, a great number of DNA sequences can be targeted, with the general sequence speciicity repertoire comparable to that of the widely used REases. Unfortunately, the methyl group is not an attractive reporter group and transfer of larger chemical entities is needed for sequence-speciic DNA labeling. Two classes of synthetic AdoMet analogs capable of delivering larger chemical groups have been engineered. In the irst class, the reactive methylsulfonium center of AdoMet is replaced with an aziridinyl group. Aziridines are well known to become good electrophiles upon protonation of the ring nitrogen and nucleophilic attack on one of the carbon atoms leads to covalent bond formation with concomitant ring opening. In the aziridine cofactor N-adenosylaziridine (17) one of the electrophilic methylene groups is placed at a similar position as the electrophilic methyl group of AdoMet (Fig. 4A) and the adenosyl moiety serves as molecular anchor for cofactor binding by DNA MTases. hus, it is not too surprising that N-adenosylaziridine (17) can function as a cofactor for DNA MTases (Fig. 4B).16 Alternatively, the 5ʹ-N-adenosyl mustard 18 containing the full amino acid side has been prepared and demonstrated to be coupled with DNA, presumably via its aziridinium intermediate, by DNA MTases (Fig. 4C).17

122

DNA and RNA Modii cation Enzymes

Figure 3. Reactions catalyzed by DNA methyltransferases (DNA MTases). DNA MTases catalyze the nucleophilic attack of exocyclic amine N6 of adenine, exocyclic amine N4 of cytosine or C5 of the cyclic ring of cytosine within their recognition sequences (thick black lines) onto the activated methyl group of the cofactor S-adenosyl-L-methionine (AdoMet or SAM, 15) leading to N6-methyladenine, N4-metylcytosine or C5-methylcytosine residues within specific DNA sequences ranging from two to eight base pairs.

Most importantly, these cofactors can be used as a delivery system for chemical groups attached to various positions of the adenine ring.18 Azido groups, like in cofactor 19, or terminal alkyne groups, like in cofactor 20, can be delivered to DNA and further modiied by Staudinger ligation or copper-catalyzed 1,3-diploar cycloaddition reactions (two-step labeling) (Fig. 4D,E).19-21 It is also possible to directly attach reporter groups, like luorophores or biotin, to the adenine ring via a lexible linker and use the corresponding cofactors 21 for enzymatic one-step DNA labeling (Fig. 4F).22,23 Using the adenine-speciic DNA MTase from hermus aquaticus (M.TaqI) it was demonstrated that biotinylation of long plasmid DNA with the aziridine cofactor 22 is quantitative, sequence- and base-speciic (Fig. 5A).24 Crystal structure analysis of the complex formed between M.TaqI, 22 and a 10 base pair duplex oligodeoxynucleotide showed that the overall structure is almost identical to the ternary complex structure with the target adenine lipped out of the DNA helix (Fig. 5B). Most importantly, a continuous electron density was observed between the target adenine and the cofactor (Fig. 5C). he structure suggests that the reaction with the aziridine cofactors proceeds in analogy to the natural reaction with AdoMet and demonstrates that modiication occurs at the exocyclic amino group of the target adenine within the double-stranded 5ʹ-TCGA-3ʹ DNA recognition sequence of M.TaqI. Accordingly this method was termed Sequence-speciic Methyltransferase-Induced Labeling of DNA (SMILing DNA). However, a major diference between the natural cofactor AdoMet and the synthetic aziridine cofactors is that the DNA MTase-catalyzed nucleophilic attack of adenine or cytosine residues in DNA on the activated methyl group of AdoMet results in methyl group transfer whereas nucleophilic attack on the aziridine ring leads to ring opening and coupling of the whole cofactor to the target nucleobase. As a result potent product inhibitors preventing further turnovers are formed within the active sites and the DNA MTases have to be used in stoichiometric amounts with respect to target sites for DNA labeling. Although prokaryotic DNA MTases can be easily obtained in milligram quantities and microgram amounts of labeled DNA are generally suicient for various applications, this inherent feature prompted the development of more eicient AdoMet analogs. AdoMet analogs with simple methyl group replacements, like ethyl or propyl, have been obtained from l-ethionine or l-propionine and adenosine triphosphate using AdoMet synthetases.

Expanding the Chemical Repertoire of DNA Methyltransferases by Cofactor Engineering

123

Figure 4. Aziridine cofactors and reactions mediated by DNA methyltransferases (DNA MTases). A) Natural transfer of the activated methyl group from the ubiquitous cofactor AdoMet (15). B) and C) Enzymatic coupling of the synthetic cofactor N-adenosylaziridine (17) or the 5ʹ-N-adenosyl mustard 18 with DNA. D, E) Functionalization of DNA with abiotic chemical groups attached to the aziridine cofactor 19 or the nitrogen mustard 20 for two-step DNA labeling. F) One-step DNA labeling using aziridine cofactors 21 with attached reporter or affinity groups (gray sphere) for direct enzymatic coupling with DNA.

S-adenosyl-l-ethionine and S-adenosyl-l-propionine can serve as cofactors for MTases, but the enzymatic alkyl transfer rates decline drastically with increasing size of the transferable group (methyl >> ethyl > propyl).25,26 his decreased reactivity could be explained by unfavorable interactions of the extended side chains with the enzymes but also by an energetically less favorable transition state resulting from extra steric crowding within the penta-coordinated transition state of the MTase-catalyzed SN2-type reactions. he latter efect is well known for SN2 reactions in organic chemistry and leads to signiicantly reduced reaction rates. Recently, the allylic and propargylic AdoMet analogs 23 and 24 were chemically synthesized from the corresponding halides or trilates and AdoHcy (16) and it was demonstrated that they serve as eicient cofactors for members of all classes of DNA MTases (Fig. 6A,B).27,28 he enzymes can be used in catalytic amounts and the transfer of the extended side chains is sequence-speciic. his rescue of reactivity can be attributed to a conjugative stabilization of the SN2-type transition state by the neighboring double or triple bond. Hence, this class of cofactors was termed double-activated AdoMet analogs because the reactive carbon placed between the sulfonium center and the unsaturated bonds appears to be activated by both neighboring groups. Targeted labeling of DNA by methyltransferase-directed Transfer of Activated Groups (mTAG) was achieved with the double-activated cofactor 25, which contains a propargylic side chain for activation and a primary amine for further modiication (Fig. 6C). Long

124

DNA and RNA Modii cation Enzymes

Figure 5. Sequence-specific biotinylation of DNA with aziridine cofactor 22 and the DNA methyltransferase M.TaqI. A) Coupling of the biotinylated aziridine cofactor 22 with the target adenine within the double-stranded 5ʹ-TCGA-3ʹ DNA recognition sequence of M.TaqI. B) Three-dimensional structure of M.TaqI in complex with a biotinylated 10 base pair duplex oligodeoxynucleotide formed in the presence of aziridine cofactor 22 at 1.9 Å resolution. C) Magnification of the extrahelical adenine and the covalently linked cofactor (boxed in B) with the electron density distribution (2Fobs-Fcalc) contoured at 2.0 σ. Note that no electron density was observed for the biotin residue indicating flexibility in the crystal.

plasmid DNA was sequence-speciically amino-functionalized using diferent DNA MTases and selectively labeled with activated esters of luorophores or biotin in a second step.29 In addition, it was demonstrated that DNA MTases could not modify DNA when their recognition sequences are methylated. his makes this system attractive for genomic methylation analysis.

Figure 6. Double-activated AdoMet analogs and reactions catalyzed by DNA methyltransferases (DNA MTases). A,B) Transfer of an allylic or propargylic group to DNA from the double-activated AdoMet analogs 23 and 24. C) Transfer of a primary amino from the double-activated AdoMet analog 25 to DNA for subsequent sequence-specific labeling with NHS esters of reporter groups.

Expanding the Chemical Repertoire of DNA Methyltransferases by Cofactor Engineering

125

Conclusions and Prospects for Synthetic AdoMet Analogs

Targeted DNA labeling with aziridine and double-activated AdoMet analogs in combination with sequence-speciic DNA MTases ofers exiting new applications in DNA-based technologies. For example, convenient construction of DNA junctions and selective placement of nanoparticles on long DNA via biotin-streptavidin interactions has been recently demonstrated and could lead to utilizations in nanobiotechnology.30,31 Another area of usage is the directed labeling or functionalization of eukaryotic plasmid DNA for studying and controlling cell transfections with the potential to improve gene delivery.32 Furthermore, applications in biochemistry (functional studies of DNA-binding/modifying enzymes), molecular biology (isolation of DNA-binding proteins) and medical diagnosis (detection of DNA methylation patterns, genotyping) can be envisioned. Besides DNA MTases many more MTases acting on other substrates like RNA, proteins or small molecules are found in nature. hus, it was put forward that these new classes of AdoMet analogs in combination with RNA and protein MTases could provide powerful tools for targeted functionalization and labeling of RNA and proteins.33 In fact, it was demonstrated that aziridine cofactors can be utilized by individual protein and small molecule MTases34,35 and it is expected that many more MTases possess some degree of cofactor promiscuity allowing them to catalyze new reactions not found in nature.

References

1. Toscano MD, Woycechowsky KJ, Hilvert D. Minimalist active-site redesign: teaching old enzymes new tricks. Angew Chem Int Ed 2007; 46(18):3212-3236. 2. Tsien RY. he green luorescent protein. Annu Rev Biochem 1998; 67:509-544. 3. Marks KM, Nolan GP. Chemical labeling strategies for cell biolog y. Nat Methods 2006; 3(8):591-596. 4. Keppler A, Gendreizig S, Gronemeyer T et al. A general method for the covalent labeling of fusion proteins with small molecules in vivo. Nat Biotech 2003; 21(1):86-89. 5. Gautier A, Juillerat A, Heinis C et al. An engineered protein tag for multiprotein labeling in living cells. Chemistry Biology 2008; 15(2):128-136. 6. Green DE, Morris TW, Green J et al. Puriication and properties of the lipoate protein ligase of Escherichia coli. Biochem J 1995; 309(3):853-862. 7. Agard NJ, Baskin JM, Prescher JA et al. A comparative study of bioorthogonal reactions with azides. ACS Chem Biol 2006; 1(10):644-648. 8. Yin J, Liu F, Li X et al. Labeling proteins with small molecules by site-speciic posttranslational modiication. J Am Chem Soc 2004; 126(25):7754-7755. 9. George N, Pick H, Vogel H et al. Speciic labeling of cell surface proteins with chemically diverse compounds. J Am Chem Soc 2004; 126(29):8896-8897. 10. Zhou Z, Cironi P, Lin AJ et al. Genetically encoded short peptide tags for orthogonal protein labeling by Sfp and AcpS phosphopantetheinyl transferases. ACS Chem Biol 2007; 2(5):337-346. 11. Bulter T, Schumacher T, Namdjou DJ et al. Chemoenzymatic synthesis of biotinylated nucleotide sugars as substrates for glycosyltransferases. ChemBioChem 2001; 2(12):884-894. 12. Saxon E, Bertozzi CR. Cell surface engineering by a modiied Staudinger reaction. Science 2000; 287(5460):2007-2010. 13. Hsu T-L, Hanson SR, Kishikawa K et al. Alkynyl sugar analogs for the labeling and visualization of glycoconjugates in cells. Proc Natl Acad Sci USA 2007; 104(8):2614-2619. 14. Laughlin ST, Baskin JM, Amacher SL et al. In vivo imaging of membrane-associated glycans in developing zebraish. Science 2008; 320(5876):664-667. 15. Roberts RJ, Vincze T, Posfai J et al. REBASE-enzymes and genes for DNA restriction and modii cation. Nucleic Acids Res 2007; 35:D269-D270. 16. Pignot M, Siethof C, Linscheid M et al. Coupling of a nucleoside with DNA by a methyltransferase. Angew Chem Int Ed 1998; 37(20):2888-2891. 17. Weller RL, Rajski SR . Design, synthesis and preliminary biological evaluation of a DNA methyltransferase-directed alkylating agent. ChemBioChem 2006; 7(2):243-245. 18. Pljevaljcic G, Schmidt F, Weinhold E. Sequence-speciic methyltransferase-induced labeling of DNA (SMILing DNA). ChemBioChem 2004; 5(3):265-269. 19. Comstock LR, Rajski SR. Conversion of DNA methyltransferases into azidonucleosidyl transferases via synthetic cofactors. Nucleic Acids Res 2005; 33(5):1644-1652. 20. Comstock LR, Rajski SR. Methyltransferase-directed DNA strand scission. J Am Chem Soc 2005; 127(41):14136-14137.

126

DNA and RNA Modii cation Enzymes

21. Weller RL, Rajski SR . DNA methyltransferase-moderated click chemistry. Org Lett 2005; 7(11):2141-2144. 22. Pljevaljcic G, Pignot M, Weinhold E. Design of a new luorescent cofactor for DNA methyltransferases and sequence-speciic labeling of DNA. J Am Chem Soc 2003; 125(12):3486-3492. 23. Pljevaljcic G, Schmidt F, Peschlow A et al. Sequence-speciic DNA labeling using methyltransferases. In: Niemeyer CM, ed. Methods in Molecular Biology: Bioconjugation Protocols. Totowa: Humana Press, 2004:145-161. 24. Pljevaljcic G, Schmidt F, Scheidig AJ et al. Quantitative labeling of long plasmid DNA with nanometer precision. ChemBioChem 2007; 8(13):1516-1519. 25. Parks LW. S-adenosylethionine and ethionine inhibition. J Biol Chem 1958; 232(1):169-176. 26. Schlenk F, Dainko JL. S-n-propyl analog of S-adenosylmethionine. Biochim Biophys Acta General Subjects 1975; 385(2):312-323. 27. Dalhof C, Lukinavicius G, Klimasauskas S et al. Direct transfer of extended groups from synthetic cofactors by DNA methyltransferases. Nat Chem Biol 2006; 2(1):31-32. 28. Dalhof C, Lukinavicius G, Klimasauskas S et al. Synthesis of S-adenosyl-l-methionine analogs and their use for sequence-specific transalkylation of DNA by methyltransferases. Nat Protoc 2006; 1(4):1879-1886. 29. Lukinavicius G, Lapiene V, Stasevskij Z et al. Targeted labeling of DNA by methyltransferase-directed transfer of activated groups (mTAG). J Am Chem Soc 2007; 129(10):2758-2759. 30. Wilkinson S, Diechtierow M, Estabrook AE et al. Molecular scale architecture: engineered three- and four-way junctions. Bioconjugate Chem 2008; 19(2):470-475. 31. Braun G, Diechtierow M, Wilkinson S et al. Biomolecular tools for nanoscale assembly. Bioconjugate Chem 2008; 19(2):476-479. 32. Schmidt FHG, Hüben M, Gider B et al. Sequence-speciic methyltransferase-induced labelling (SMILing) of plasmid DNA for studying cell transfection. Bioorg Med Chem 2008; 16(1):40-48. 33. Klimasauskas S, Weinhold E. A new tool for biotechnology: AdoMet-dependent methyltransferases. Trends Biotechnol 2007; 25(3):99-104. 34. Osborne T, Weller Roska RL, Rajski SR et al. In situ generation of a bisubstrate analogue for protein arginine methyltransferase 1. J Am Chem Soc 2008; 130(14):4574-4575. 35. Zhang C, Weller RL, horson JS et al. Natural product diversiication using a nonnatural cofactor analogue of S-adenosyl-l-methionine. J Am Chem Soc 2006; 128(9):2760-2761.

Chapter 11

Studying Antibody Maturation Using Techniques for Detecting Uracils in DNA Rachel Parisien and Ashok S. Bhagwat*

Abstract

U

racil is a rare base in DNA and its presence can provide a biological starting point for mutagenesis or cell death. his chapter will cover various ways in which uracil may be introduced in DNA, removed from DNA and the consequences of its occurrence. he focus of the chapter is a class of enzymes that convert cytosines in DNA to uracil with particular emphasis on an enzyme required for generating antibody diversity. Various methods for detecting and quantitating uracils in DNA and the application of these techniques to understanding the mechanism of antibody maturation will be discussed.

Introduction and Overview

Uracil can arise in DNA through deamination of cytosines as a result of action of endogenous and exogenous chemicals, as well as a class of enzymes known as APOBECs (see chapter by Smith in this book). It may also be incorporated by DNA polymerases that utilize dUTP present in cells. An important distinction between the two ways in which uracil can arise is that C to U conversion has the potential of causing transition mutations, while pairing of dU with dA during replication has no mutational consequences. To counter the mutational consequences of cytosine deamination, all cells contain one or more DNA glycosylases that excise uracils and initiate repair that ultimately restores a cytosine in its place. his is the principal reason why the uracil levels are low in genomic DNA, making its detection diicult. A variety of chemo-enzymatic techniques have been utilized to quantitate uracils and nearly all the techniques depend on the exquisite ability of Escherichia coli uracil-DNA glycosylase (UDG) to excise only uracils from DNA. he sensitivity of some of the techniques approaches 1 U in 106 bases and hence a human genome must accumulate 10,000 uracils before they become detectable in a bulk detection assay. Activation-induced deaminase (AID) is one member of the APOBEC family and is essential for the maturation of antibodies through hypermutations that increase antibody diversity and recombination that switches original IgM type antibodies to other isotypes. Genetic studies in animals clearly suggest that dU is a necessary intermediate in antibody maturation and biochemical studies of AID show that it is a single-strand speciic DNA-cytosine deaminase that prefers nontemplate strand of transcribing DNA as target. Studies in E. coli have also shown an increase in uracil levels in plasmid DNA following expression of AID, but similar eforts to demonstrate uracils in DNA of B-cells undergoing antibody maturation have failed probably because of the limited sensitivity of the assay. he continuing challenge *Corresponding Author: Ashok S. Bhagwat—443 Chemistry Building, Department of Chemistry, Wayne State University, Detroit, Michigan 48202, USA. Email: [email protected]

DNA and RNA Modiication Enzymes: Structure, Mechanism, Function and Evolution, edited by Henri Grosjean. ©2009 Landes Bioscience.

128

DNA and RNA Modii cation Enzymes

in the ield is to develop tools that will allow one to visualize and quantitate uracils thought to be generated at the active immunoglubulin gene locus in B-cells undergoing maturation.

Biochemical Pathways hat Introduce Uracils in DNA

Uracil is normally found only in RNA and in early work on base composition of DNA uracil was rarely mentioned. he exceptions were Bacillus phages PBS1 and PBS2, which were known to contain uracil in place of thymine.1 However, over the past 35 years several chemical and biochemical pathways have emerged that result in the presence of this thymine analog in DNA. he irst pathway to be recognized for the creation of uracils in DNA was the hydrolytic deamination of cytosines.2 his reaction occurs in simple aqueous bufers and is both pH- and temperature dependent. here are currently two proposed chemical mechanisms for the deamination of cytosine at neutral pH. he irst postulates a direct attack at the 4 position of the pyrimidine ring by a hydroxyl ion, while the second involves an addition-elimination reaction.3 he rate of this process in double-stranded (DS) DNA is 7.0 x 10-13 sec1, while the rate in single-stranded (SS) DNA is 140-fold higher,4,5 showing that pairing of complementary strands protects cytosines. he rate in DS DNA predicts that there should be 80 uracils generated in the human genome per day, but as biological processes such as replication, recombination and transcription create regions of transient localized denaturation, the frequency of deamination in these regions could be much higher.6 Additionally, duplex DNA undergoes spontaneous localized denaturation, called “breathing” which could also provide a better substrate for hydrolytic deamination.7 If unrepaired, the uracil is paired with adenine by both prokaryotic and eukaryotic DNA polymerases resulting in C:G to T:A transition mutations (Fig. 1).

Figure 1. Mutational consequences of cytosine deamination to uracil. The base excision repair (BER) pathway can restore the original C:G pair starting with uracil excison by UNG. However, if replication occurs before repair, half of the daughter molecules will contain C to T mutations.

Studying Antibody Maturation Using Techniques for Detecting Uracils in DNA

129

Uracil may be directly incorporated in DNA during normal DNA replication. his is because all cells contain some amount of dUTP as a result of normal metabolism (Fig. 2A) and both bacterial and eukaryotic DNA polymerases can readily incorporate it through Watson-Crick base pairing to a template adenine. he extent of uracil misincorporation has been found to be directly related to the size of the intracellular dUTP pool8 and in normal cells, the pool of dUTP is kept low by

Figure 2. A) Pathways in human cells for the incorporation of uracil in DNA. Both dCTP and dUDP contribute to the intracellular dUTP pool, however in normal cells the pool is small relative to the pool of dTTP due to the activity of dUTPase. B) Replication of dU in DNA. Uracil in DNA is not mutagenic if it is incorporated by a DNA polymerase across from an adenine in the template.

130

DNA and RNA Modii cation Enzymes

the enzyme dUTPase which eiciently converts dUTP into dUMP. Unlike cytosine deamination, incorporation of dU in place of dT is not mutagenic (Fig. 2B). More recently, a class of enzymes called APOBEC (apolipoprotein B mRNA-editing enzyme, catalytic polypeptide-like) have been found in higher eukaryotes that deaminate cytosines in both DNA and RNA and serve to increase the uracil content in DNA. hese enzymes perform a wide variety of biological functions, ranging from specialized mRNA editing (Apobec1; Chapter by Smith in this book), antibody maturation (activation-induced deaminase; AID) and host cell defense against retroviruses such as HIV [Apobec3 family, Chapter by Wedekind and Beal in this book;9-12]. Some of these will be discussed briely in the next section and in depth in the chapter by Smith and the chapter by Wedekind and Beal in this book. he role of AID in antibody maturation will be discussed in detail below. Finally, chemical agents such as nitrous acid, nitric oxide and bisulite can convert cytosines to uracil and cause C:G to T:A mutations. Nitrous acid and bisulite are frequently used in in vitro mutagenesis experiments involving puriied DNA, but there is little evidence that such chemicals can be generated in vivo. In contrast, nitric oxide is produced endogenously in a variety of mammalian tissues and plays important roles in vasodilation and antimicrobial action of macrophages.13 NO• can react with cytosines in DNA to generate uracils causing mutations in bacteria as well as mammalian cells.14,15 However, the role played by this chemical reaction in the antimicrobial action of NO• is unclear at present.

Pathways for Removing Uracils from DNA

Since uracils generated through “spontaneous” (caused by cellular water) cytosine deamination can lead to mutations, all cells contain one or more pathways for removing this base from DNA. he irst report of uracil removal activity was made by Nyberg and Lindahl in 1974.16 hey puriied an enzyme from E. coli that hydrolyzed the N-glycosidic linkage releasing free uracil base and creating an abasic site in DNA. he enzyme has a turnover number of more than 800 per minute and there are approximately 300 enzyme molecules per cell.17 his uracil-N-glycosylase (UNG) initiates the irst step in a base excision repair pathway that restores C:G base pairs, preventing mutation (Fig. 3) and has been found in every branch of the tree of life.18,19 here are several diferent families of uracil-DNA glycosylases18 and the E. coli enzyme referred to above is the prototype of the irst family discovered. his family of enzymes can eiciently excise uracil from both single-stranded and duplex DNA, with little activity towards uracil found in RNA.19 Human UNG also belongs to this family and removes U misincorporated across from A in DNA, as well as from U•G mispairs resulting from cytosine deamination.20 Additionally, UNG plays an important role in immune function, speciically in antibody maturation that will be described below. he prototypical member of family 2 is the eukaryotic thymine-DNA glycosylase (TDG). his enzyme is unusual in that it excises thymines from T:G mismatches, but also removes uracil from U•G mispairs and N4-ethenocytosine (C) from C:G pairs.21,22 hus the biological function of TDG and its bacterial homolog Mug, remain unclear. Vertebrate SMUG1 (single-strand-speciic monofunctional uracil-DNA glycosylase 1) is representative of family 3. Despite its name, the preferred substrate for this enzyme is in fact duplex DNA,23 with 700-fold higher activity of the Xenopus enzyme under single turnover conditions for duplex DNA compared to SS DNA.24 hus SMUG1 may serve as a relatively eicient backup for UNG in the repair of U•G mismatches.23 Additionally, SMUG1 can excise uracil from U:A pairs and may be the primary enzyme responsible for removal of the oxidation damage product of thymine, 5-hydroxymethyluracil, in mammalian cells.25,26 Finally, mammalian MBD4 (methyl-binding domain 4) is a mono-functional DNA glycosylase related to E. coli Endonuclease III which can also remove T or U mispaired with G, with highest speciicity at methyl-CpG sites.27 here are no homologs of SMUG1 or MBD4 among bacteria. he remaining enzyme families are found only in archaea, some bacteria and hyperthermophiles28 and will not be discussed here. he redundancy of enzymes for removing uracil from DNA only serves to highlight the importance of its removal for maintaining genomic integrity.

Studying Antibody Maturation Using Techniques for Detecting Uracils in DNA

131

Figure 3. Genetic rearrangements during antibody maturation. A fully rearranged IgH gene resulting from V(D)J recombination is shown at the top. In some organisms, including humans, the Ig gene can undergo somatic hypermutations (SHMs, part a) and class switch recombination (CSR, part c). In others gene conversion (GC, part b) is the predominant pathway. During SHM, point mutations (depicted by vertical lines) are introduced into the V(D)J gene segment. In gene conversion, part of the V(D)J segment is replaced with the sequence of a pseudo-V segment (ψV), shown as a darker section. Double-strand breaks are introduced during CSR into two different switch regions (S μ and S γ) which are ligated together resulting in an IgH gene coding for (in this example) the IgG isotype antibody and a circular DNA product comprised of the intervening DNA.

DNA-Cytosine Deaminases and Antibody Maturation

he APOBEC proteins are a subfamily within a large superfamily of enzymes involved in nucleotide metabolism called cytidine deaminases and will be discussed in greater depth in chapters by Smith, by Wedekind and Beal, and by Maris and Allain of this book. hese enzymes edit DNA, RNA or both by converting cytosine to uracil and afect diverse physiological functions. he APOBECs are characterized by a minimum of one zinc-binding catalytic domain and are distributed only within the vertebrate lineage.29 he family consists of AID, APOBEC1, APOBEC2, APOBEC3 (A-F) and APOBEC4,29,30 the most ancient of which may be AID and APOBEC2.31 here is no known function for APOBEC2 and APOBEC4 (see Chapter by Wedekind and Beal in this book), while AID plays an important physiological role in adaptive immunity.32 he APOBEC1 and APOBEC3 subgroup evolved later and are only expressed in mammals. APOBEC1 was the irst enzyme to be identiied and the name of this enzyme family is based on its function in lipid

132

DNA and RNA Modii cation Enzymes

metabolism.33 It is the catalytic subunit of an RNA-editing enzyme that converts cytosine 6666 to uracil in the apolipoprotein B (apoB) mRNA which leads to synthesis of a truncated form of the protein33 (Chapter by Smith in this book). he role of some of the APOBEC3s is still unknown, but several function as an innate immunity defense against viral infections, particularly the retroviruses. It has been speculated that the APOBEC3s originally developed as a way to prevent genomic instability by regulating endogenous retrotransposons and from there may have evolved to gain activity against exogenous invading genetic elements34 (see also Chapter by Smith in this book). In the remainder of this chapter we will focus only on AID. As mentioned above, AID activity is a crucial component of the adaptive immune system- speciically in the creation of high-ainity antibodies. Antibodies are homodimers of heterodimers consisting of a heavy chain and a light chain held together by disulide bonds. Each half of antibody contains one light and one heavy chain which together form the antigen binding pocket. hese binding domains are called variable regions because antibodies that bind diferent antigens have diferent primary sequences in this region. Furthermore, these sequences evolve at the level of primary sequence during antibody maturation. he remainder of each chain is called the constant region, of which there are ive main isotypes- α, γ, δ, ε, and μ.35 he total number of antibody genes has to be considerably less than 50,000 (the estimated number of genes in the mammalian genome), yet the immune system is capable of producing over 1011 diferent antibodies.35 his expansion of the antigen-binding repertoire in the body leading to synthesis of high-ainity antibodies comes in two stages. he irst major mechanism for creating greater diversity is called V(D)J recombination. In humans, there are approximately 40 variable (VH), 26 diversity (DH) and 6 joining ( JH) heavy chain genes.36 Additionally, there are 40 variable (V) and 5 joining ( J) kappa light chains as well as 30 variable (V) and 4 joining ( J) lambda light chains in the genome. As each B-cell undergoes development, it acquires a unique variable domain through genetic rearrangements of these genes [VDJ in the heavy chains and VJ in the light chains; together referred to as V(D)J recombination], which enables the immune system to create 106 to 107 diferent antibodies.37 hese antibodies express only one isotype constant segment and are displayed on the B-cell surface. hey are referred to as IgM antibodies. he second level of antibody diversiication and isotype switching occurs ater the mature, naive B-cell is exposed to antigen (Fig. 3). During this process, the immunoglobulin genes undergo some genetic alterations- either somatic hypermutation (SHM) and class switch recombination (CSR), or gene conversion (GC;38). SHM is a mutational process in many mammals including humans that introduces point mutations (both transitions and transversions) scattered throughout the rearranged V(D)J segment of the Ig genes (Fig. 3). he mutation frequency of SHM is up to million-fold higher than normal39 and transcription of the Ig gene is a necessary requirement for this process to occur.40,41 his is an iterative process of mutations followed by clonal selection. As such, B-cells producing antibodies with higher ainity get stimulated to undergo cell division, whereas B-cells producing antibodies with lower ainity do not and are thus eliminated from the population.35 Each constant segment (Cμ, CΔ etc) is preceded by a sequence called the switch (S) region containing short repetitive sequences. It is located within the intron separating the exons for the diferent constant segments and is transcribed from a promoter that lies within the intron prior to the genetic recombination events of class switch recombination. Double-strand breaks in two diferent S regions are joined together, eliminating the intervening segment as a piece of circular DNA (Fig. 3). his results in an isotype switch from IgM to one of IgG, IgE or IgA.38 Transcription of the S regions is again required to initiate the double-strand breaks. GC is also a recombinational process that switches part of the rearranged V(D)J segment with a pseudo-V gene (Fig. 3). It is found only in some vertebrates, such as rabbits, chickens and sheep, but not in humans. he gene for AID was discovered in the Honjo lab in Japan in 1999, which reported it as a new cytidine deaminase speciic to the germinal center.42 A year later, the same group created AID mice which displayed a phenotype defective in both CSR and SHM.43 Identiication of human

Studying Antibody Maturation Using Techniques for Detecting Uracils in DNA

133

patients defective in AID further strengthened this discovery. Several classes of human patients defective in antibody maturation have been described.44 hey are deined by a defect in which lack of CSR creates an increased level of IgM isotype antibodies. One class of such patients with hyper-IgM syndrome (HIGM2;45) have mutations in AID gene. Subsequently, it was shown that AID is required for GC in Ig genes using a chicken cell line, DT4046,47 demonstrating that this gene is absolutely required for antibody maturation.

Role of Uracil in Antibody Maturation

When UNG mice were irst described,48 no phenotype was attributed to the repair deiciency. In fact, unlike E. coli, murine UNG cells did not show a signiicant increase in mutation frequency, possibly because of the activity of a backup uracil-DNA glycosylase, SMUG1.23,48 However, single cell gel electrophoresis (aka “comet”) assays show that genomic DNA in these cells does accumulate signiicantly higher levels of uracil.48,49 Subsequent studies with UNG mice not only revealed altered SHM spectra and reduced CSR in these mice (see below), but also the presence of B-cell lymphomas and a slightly shortened life span.50 hese mice showed abnormal cell growth in lymphatic tissue such as spleen and lymph nodes at a higher frequency than in WT mice.50 However, this study did not assay for mutations in these tumors and did not assess levels of uracil in genomic DNA and hence the tumor occurrence cannot be directly correlated with increased levels of uracil in DNA. Early evidence that uracil may be created in DNA during SHM came from studies of ung mutants of E. coli. When the mutagenicity of AID was studied using the rifampacin-resistance (Rif R) assay, the Rif R mutant frequency was nine-fold higher in ung cells compared to ung cells.51 In a related study using mice, CSR was substantially reduced and the SHM spectrum was afected by an UNG defect. he percent of C:G to T:A among SHMs was 31% in UNG+/+ mice and increased to 52% in UNG-/- mice.52 Interestingly, the distribution of mutations within the Ig gene was similar in the two genetic backgrounds suggesting that UNG was involved in determining the type of base substitutions found in SHM, but not their local distribution.52 Similar results were also obtained in a chicken cell line where UNG was inhibited by expressing a speciic inhibitor of the enzyme, UGI.53 In this case, the frequency of C to T mutations increased from 38% of the total to 86% when UGI was expressed.54 Both the studies point to an important role for UNG in SHM and suggest that an intermediate in the SHM pathway is uracil- containing DNA. However, as some of the SHMs were not C:G to T:A transitions, it leaves open the possibility that either some of the U•G mispairs created by the action of AID were processed by repair enzymes other than UNG or that cytosine deaminations may not be the only starting point for SHMs. As mentioned above, several classes of human patients defective in antibody maturation have been described.44 One class of hyper-IgM patients (HIGM5) have mutations in the UNG gene55,56. hree of the four UNG mutations found in HIGM5 patients contain deletions that result in premature termination and a substantial shortening of the protein. It is reasonable to assume that the truncated proteins expressed in these cells are completely defective in uracil excision. he remaining patient contained functional UNG, but it was not transported into the nucleus.49 hese results support an important role for UNG in CSR and have generally been interpreted to mean that UNG is required for the formation of double-strand breaks (DSBs) in the switch regions presumably by helping process the uracils created by AID.52 Begum et al57 have questioned such a role for UNG and raised questions about whether the ability of AID to convert cytosines in DNA to uracil was required for CSR. hey reported that the formation of H2AX (i.e., phosphorylation of the minor histone H2AX) required AID, but not UNG. H2AX is phosphorylated in response to DNA strand breaks and is used as an indicator of DSBs during CSR. hese investigators expressed UGI and found that H2AX foci could still be observed in response to AID expression.57 Furthermore, expression of the UNG single mutants, D145N, N204V, H268L or F242S in UNG B-cells through retroviral infections apparently led to normal CSR. However, neither of the double mutants tested, D145N-N204V and

134

DNA and RNA Modii cation Enzymes

H268L-D145N, could complement the UNG knockout. his apparent requirement for UNG in CSR was interpreted as being “structural” rather than catalytic.57 his view of a structural, but not catalytic role for UNG in CSR has been challenged based on theoretical considerations58 as well as experimental results.59 he latter investigators delivered diferent mutants of murine UNG to UNG cells using a retroviral infection and cell extracts were examined for uracil excision activity on a SS DNA substrate. hey found that extracts from cells transfected with UNG single, but not double, mutants contained signiicant catalytic activity and this correlated well with their ability to complement the CSR defect in host cells. he weak activities of the single mutants of UNG were also conirmed using puriied proteins. Similar restoration of CSR activity was obtained by transfecting the UNG cells with retrovirus containing SMUG1, but not TDG, cDNA. hese results led to the conclusion that a DNA glycosylase activity with the ability to excise uracils from SS DNA is absolutely required for CSR. his is consistent with the existence of a uracil-containing DNA intermediate during antibody maturation. When a defect in mismatch repair (MMR) was combined with a UNG defect in mice, CSR was completely ablated.60 he overall hypermutation frequency remained unchanged but now essentally all the mutations (99%) were targeted at C:G pairs and were C:G to T:A transitions.60-62 his contrasts with the MMR defective MSH2 mice where 26% of the mutations were still at T:A pairs and of the mutations at C:G pairs 16% are transversions60,62 or in MSH6 mice 5% of the mutations were at A:T pairs and of the mutations at C:G pairs 20% are transversions.61 MSH2 and MSH6 are components of the principal protein complex that bind and recognize base-base mismatches.63 As noted above, a signiicant fraction of hypermutations in UNG mice are other not C:G to T:A transitions.52,61 he dramatic shit in the mutation spectrum in the double mutant suggests that both UNG and MSH2-MSH6 process U•G mismatches created by AID and this processing leads to mutations other than C:G to T:A during SHM. Together these studies show clearly that conversion of cytosines in DNA to uracil plays an essential role in initiating antibody maturation. AID converts cytosines in SS DNA, but not in DS DNA, SS RNA or DNA-RNA hybrids, to uracil.64-67 It also does not act on free nucleoside cytidine or nucleotide cytidylate and is not a “cytidine deaminase”.66,68 It is regretable that many publications and databases continue to refer to AID and other APOBEC enzymes as cytidine deaminases, when they clearly do not act on such a substrate. Like DNA polymerases and DNA methyltransferases, their substrates are polynucleotides and this should be recognized by calling them DNA (or RNA) cytosine deaminases. AID is thought to contain a Zn2 ion in its active site, based on the fact that, like APOBEC1, treatment with a strong chelator such as 1,10-phenanthroline inactivates the enzyme67,69 and it possesses a characteristic zinc-dependent deaminase amino acid motif (Chapter by Wedekind and Beal in this book). When bacteriophage M13 SS DNA was used as substrate, AID showed preference for certain cytosines in the target sequence.70,71 Based on these preferences, the most prefered sequence for AID is WRC (where W is A or T, R is A or G) and the least preferred sequence is SYC (where S is G or C,71).

Methods for Detecting and Quantifying Uracils in DNA Although it is widely accepted that AID converts cytosines in DNA to uracil during antibody maturation, there are several diiculties in detecting the promutagenic dU lesions. First, they are readily recognized and removed by the UDGs in the cell, or copied over and diluted during replication. As a result, high sensitivity is an essential requirement for a uracil-detection method. Second, some methods are only semi-quantitative and may be of limited use. hird, some methods allow one to identify the regions of DNA, while others don’t. herefore, it is important to review the strengths and weaknesses of various available methods for detecting uracils in DNA. here are two basic ways of detecting uracils in DNA, both of which depend on the high speciicity of UNG towards its substrate. he irst class of methods excises the uracils using UNG and then detects the free uracil. It is exempliied by the method published by the Ames group

Studying Antibody Maturation Using Techniques for Detecting Uracils in DNA

135

using tandem gas chromatography-mass spectrometry (GC-MS;72). Over the years, this assay has been improved to enable detection from about 5 to 200 uracils per 106 base pairs.73 Briely, a sample of DNA is incubated with UNG to remove uracil. he free uracil is derivatized with 3,5-bis(triluoromethyl)benzyl bromide (BTFMBzBr), extracted and analyzed by GC-MS. An internal standard of 13C- and 15N- labeled uracil is used for validation. It requires about 3 g of DNA per sample. his method is robust and quite sensitive but gives bulk results and it cannot determine the location of the uracils in the genome being studied. he second approach to detecting uracils is to excise them from DNA using UNG and then to detect the abasic sites let behind. here are many variants on this method and one example of this approach is the work by Roberts et al.74 hey labeled the abasic site produced by UNG using O-4-nitrobenzylhydroxylamine (NBHA). he DNA is then digested into mononucleosides by incubation with DNaseI, nuclease P1 and phosphodiesteraseI, liberating the NBHA-labeled AP sites. Following digestion, HPLC-ESI-MS/MS was used for the separation of products and mass-based indentiication of the abasic (AP) site labeled with NBHA. his method is sensitive up to 2 abasic sites per 106 nucleotides. Other ways of detecting abasic sites include converting the AP lesions to nicks in DNA and then detecting the nicks by techniques such as gel electrophoresis to detect shortening of DNA fragments,75 elution of DNA bound to ilters76 and single cell gel electrophoresis (“comet” assay).48 An important drawback to some of the techniques mentioned so far is that they do not identify the site where uracils are located in DNA. One technique that allows such localization is the conversion of AP sites to nicks followed by ligation-mediated- (LM-) PCR.75 In this case the two strands in the nicked DNA are separated and a primer speciic for one of the strands and a DNA polymerase is used to create blunt ends at the site of nick. A linker duplex is ligated at the newly created end and PCR is used to amplify DNA using two primers- one speciic for the linker and one for the gene that is the target for AID. he ampliication products are cloned and sequenced to identify the site where the original nick was located.75 A novel reagent for detecting abasic sites in DNA was developed by reacting O-(carboxymethyl) hydroxylamine with biotin hydrazide in the presence of carbodiimide.77 he reagent is called aldehyde-reactive probe (ARP) and it speciically tags abasic sites in DNA with biotin residues. he number of biotin-tagged AP sites can then be determined colorimetrically by an ELISA-like assay using streptavidin conjugated to horseradish peroxidase (HRP) or luorescently tagged streptavidin. An outline of the basic protocol used in our laboratory is presented in Figure 4A. he irst step is to block pre-existing abasic sites in DNA by reacting them with methoxyamine. he uracils are then excised from DNA with E. coli UNG to generate abasic sites. his DNA is then incubated with ARP which labels the abasic sites with biotin. he DNA is then spotted onto a membrane, ixed and incubated with Cy-5-tagged streptavidin. A luorescence scanner is used to quantify the Cy-5 luorescence and hence the uracil content. he sensitivity of this technique, where the streptavidin was tagged with horseradish peroxidase and chemiluminescence was used as the reporter instead of luorescence, was reported to be 1 to 6 uracils/106 base pairs78,79 and we ind similar sensitivities using Cy5-streptavidin. A standard curve is included on every membrane, comprising of varying dilutions of a duplex oligo containing a single uracil. he standard curve is linear over a range of 0.01 pmol to 1 pmol of uracil per sample and used to determine the uracil content of cellular DNA sample (Fig. 4B). he method was validated using genomic DNA from three diferent strains of E. coli- that were respectively ung+ dut+, ung, or ung dut (defective in both UNG and dUTPase, Fig. 2A). he loss of Dut increases the incorporation of uracils in DNA, while the absence of UNG means that they are not excised from DNA. he results show that uracil levels in DNA from ung cells is about 2-fold that of WT cells, while the loss of both Ung and Dut results in a 20-fold increase in uracils in DNA (Fig. 5A). hese results are generally consistent with those reported by Lari et al,78 but the range of uracil levels seen in our assay is narrower. While these investigators found 1000-fold higher levels of uracil in ung dut DNA compared to ung, we found this diference to be 40-fold (Fig. 5A; ref. 78).

136

DNA and RNA Modii cation Enzymes

Figure 4. A) Scheme of the ARP-based assay to detect uracil in DNA. A DNA sample (shown here as a plasmid) is treated with methoxyamine to block endogenous abasic sites and then treated with UDG to excise uracils. The resulting abasic sites are labeled with ARP and the DNA is transferred to a nylon membrane. The membrane is incubated with fluorescently tagged streptavidin and scanned in a phosphorimager. B) Typical standard curve for uracil quantitation. A series of dilutions of a 70-mer duplex containing a single uracil residue were treated with Ung-ARP to create the standard curve.

Studying Antibody Maturation Using Techniques for Detecting Uracils in DNA

137

Figure 5. A) Quantitation of uracils in E. coli DNA. DNAs from three different strains of E. coli were subjected to the UNG-ARP assay. The relevant genetic background of the strains is indicated in the figure and the means (± S.D.) from three independent samples are shown. B) Detecting uracil-containing DNA following Southern transfer. The picture on the left is of an ethidium bromide stained agarose gel containing restriction fragments from bisulfite- and mock-treated plasmids. Shown on the right is a Southern blot of the same gel scanned for fluorescence following incubation of the membrane with Cy5-streptavidin. The black circles represent the restriction products of the bisulfite-treated plasmid, while the black squares represent the restriction products of the mock-treated plasmid.

A signiicant advantage of using Cy5-labeled Streptavidin to detect uracils is that this can be applied easily to DNA immobilized on membranes following Southern transfers. An example of such an application is shown in Figure 5B. Plasmid DNA was treated with bisulite to deaminate cytosines and was digested with restriction enzymes BamHI and EcoRV. In parallel, plasmid DNA

138

DNA and RNA Modii cation Enzymes

mock-treated with bufer was digested with restriction enzyme PspGI. he digestion products were mixed together and were subjected to UNG and ARP treatment as described above. he DNA was then electrophoresed on an agarose gel and stained with ethidium bromide followed by Southern transfer to a nylon membrane. he membrane was incubated with streptavidin-Cy5 conjugate and visualized using a phosphorimager (Fig. 5B). It can be seen that only the bands corresponding to DNA that was bisulite treated are luorescent (Fig. 5B, lanes A and B, bands 1, 4 and 5). In principle, this technique can be extended further by treating DNA containing uracils with UNG, ARP and Streptavidin-Cy5 and then using this DNA as the labeled probe in Southern hybridization experiments to map DNA fragments that contain uracil in plasmid or genomic DNA.

Application to Studies of Antibody Maturation

he Gearhart lab expressed the AID gene on a plasmid in E. coli and isolated plasmid DNA following induction of the gene.75 DNA was incubated with UNG and APE1 and then separated on a denaturing alkaline agarose gel. Twenty minutes ater inducing AID, a 4.5-kb band corresponding to the linear form of the plasmid was seen and intensities of the upper and lower circular plasmid bands decreased concurrently. here was more than a 2-fold increase in the linear band following UNG/APE1 digestion suggesting a signiicant increase in the number of uracils in DNA.75 To identify the strand in which the uracils were located, plasmid DNA was again treated with UNG and APE1, separated on an alkaline agarose gel and then transferred to a membrane. Radioactive probes speciic for either the template or the nontemplate strand of the transcription unit were incubated with the membrane and viewed by phosphorimaging. he images showed that a larger decrease in band intensity was seen for the nontranscribed strand than the transcribed strand, indicating more uracils present in the former strand and that AID is acting mainly on the nontranscribed strand within the transcription unit. hey also used the LM-PCR technique to map the positions of deaminated cytosines within the target gene. hey found that the locations of breaks were diferent between samples treated with and without UNG. Without UNG, the breaks were scattered and presumably represented spontaneous breaks introduced during plasmid extraction. In contrast, the DNA treated with

Figure 6. Uracils in mammalian DNA. The experiment was performed similar to the experiment with E. coli DNA. 92 Tag is UNG +/+, while 210 Tag is UNG−/−. RAMOS is a human cell line that continuously undergoes somatic hypermutations. The Y-axis shows number of uracils in DNA per 106 bases.

Studying Antibody Maturation Using Techniques for Detecting Uracils in DNA

139

UNG had more focused break points. hey found that 67% of the breaks occurred at original cytosine residues, suggesting an increased presence of dU due to AID deamination of dC. If one assumes that AID behaves similarly in B-cells and E. coli, these data strongly support the idea that AID causes dC deaminations in vivo. We have used the UNG-ARP assay on genomic DNA from murine and human cell lines. In particular we compared the amount of uracil in UNG+/+ and UNG-/- murine cell lines. here is a low level of uracils in DNA of UNG -/- cells and this increases slightly in an UNG-defective cell line (Fig. 6). his result is similar to that of Nilsen et al48 and the lack of a large increase in dU in the latter cells is likely to be due to backup uracil removal activity by SMUG1. Additionally, we tested DNA from a well-characterized human B-cell lymphoma cell line, RAMOS. his cell line is thought to be UNG+/+. Although these cells constitutively express AID, the uracil level is no higher than those in murine cells. here are a couple of possible explanations of these results. First, it is possible that the parent cells of Ramos had even lower levels of uracils in DNA and the level seen here in this cell line represents a signiicant increase. his is less likely because all previous reports of uracil in human DNA have found 1 U per 106 bases. he second possibility is that the UDG activities in cells excise most of the uracils created and hence the steady-state levels of U are low. A third possibility is that the activity of AID may be so highly targeted to regions of the genome such as the Ig gene that the overall levels of uracil in genomic DNA does not change signiicantly. Clearly, the possibilities two and three are not mutually exclusive. he lack of a signiicant increase in uracil levels seen here are consistent with preliminary measurements on the levels of uracil in DNA of mouse tissue from a mouse containing an AID transgene (unpublished results).

Future Prospects

It should be clear from these discussions that the task of developing techniques for the detection and quantiication of uracil accumulation in DNA is far from complete. Although a large body of genetic evidence points to the presence of a dU promutagenic lesion during antibody maturation promoted by AID and HIV restriction by APOBEC3 family of enzymes, direct biochemical demonstration for the creation of this base in the Ig genes of maturing B-lymphocytes or in the minus DNA strand of retroviruses is lacking. Some of this failure may simply be due to lack of adequate efort, but the technical challenges are also daunting. It is clear that development of more sensitive techniques that can detect 1 dU in more than 106 other bases and of tools that can be used to localize uracils in speciic chromosomal loci and DNA fragments are sorely needed. here are several questions regarding AID action that also remain unanswered. he irst of which is- how speciic is the action of AID for V(D)J rearranged Ig genes? To put it in a diferent way, does AID create C to U conversions in only the Ig genes or at many sites in diferent chromosomes? here is already considerable direct and indirect evidence that several genes in addition to Ig sufer hypermutations due to targeting by AID.80,81 However, in most of these cases speciic genes were preselected for hypermutation studies and hence the sample may be highly biased. A genomics approach is needed to determine the true extent of gene targeting by AID (or a lack thereof ). If AID is indeed quite selective in its deamination targets, then the obvious next question is- how does it ind its target(s) in a vast ocean of DNA sequences? A second set of questions regarding AID have to do with the requirement that transcription of Ig genes is necessary for their involvement in SHM and CSR. his requirement has been discussed in detail in a previous review82 and remains poorly understood. It is unclear, for example, if transcription of Ig genes is part of the signal that directs AID to these genes. Clearly, this is unlikely to be the whole answer because that would then predict that every transcribed gene in maturing B-lymphocyte is acted upon by AID. hat would certainly be lethal. A related question has to do with strand bias of AID action. Several in vitro studies of AID and some in E. coli have shown that AID targets preferentially the nontemplate strand of a transcribing gene. However, SHMs show no strand bias and some other studies using E. coli model systems also show little strand bias in AID action.82 Additionally, during SHM AID appears to target a very limited part of the Ig gene, 1,500 base pairs starting at 150 nucleotide downstream from

140

DNA and RNA Modii cation Enzymes

the start of transcription. How does AID limit its action to this small segment of DNA? We will be in a better position to answer some of these questions when sensitive techniques for the quantitation and localization of uracils in DNA become available. Finally, is the biological role of DNA cytosine deaminases limited to providing protection against a limited number of viruses (some retroviruses and hepatitis B) and retroelements and in promoting antibody maturation? The recent discovery of sequence homologs of APOBEC family of enzymes in jawless vertebrates that are involved in the rearrangement of variable lymphocyte receptors,83 provides a tantalizing expansion of the function of these enzymes. It is possible that in the near future previously unrecognized biological functions for enzymes that introduce uracils in DNA will be discovered increasing the importance of studying these enzymes.

Acknowledgements

he work presented here was supported by grants from the National Institutes of Health (GM 57200 and CA 97899).

References

1. Takahashi I, Marmur J. Replacement of thymidylic acid by deoxyuridylic acid in the deoxyribonucleic acid of a transducing phage for Bacillus subtilis. Nature 1963; 197:794-795. 2. Lindahl T. Instability and decay of the primary structure of DNA. Nature 1993; 362(6422):709-715. 3. Shapiro R, Klein RS. he deamination of cytidine and cytosine by acidic bufer solutions. Mutagenic implications. Biochemistry 1966; 5(7):2358-2362. 4. Ehrlich M, Norris KF, Wang RY et al. DNA cytosine methylation and heat-induced deamination. Biosci Rep 1986; 6(4):387-393. 5. Frederico LA, Kunkel TA, Shaw BR. A sensitive genetic assay for the detection of cytosine deamination: determination of rate constants and the activation energy. Biochemistry 1990; 29(10):2532-2537. 6. Lindahl T, Nyberg B. Heat-induced deamination of cytosine residues in deoxyribonucleic acid. Biochemistry 1974; 13(16):3405-3410. 7. Lindahl T. DNA glycosylases, endonucleases for apurinic/apyrimidinic sites and base excision-repair. Prog Nucleic Acid Res Mol Biol 1979; 22:135-192. 8. Warner HR, Duncan BK, Garrett C et al. Synthesis and metabolism of uracil-containing deoxyribonucleic acid in Escherichia coli. J Bacteriol 1981; 145(2):687-695. 9. Driscoll DM, Wynne JK, Wallis SC et al. An in vitro system for the editing of apolipoprotein B mRNA. Cell 1989; 58(3):519-525. 10. Yoshikawa K, Okazaki IM, Eto T et al. AID enzyme-induced hypermutation in an actively transcribed gene in ibroblasts. Science 2002; 296(5575):2033-2036. 11. Okazaki IM, Kinoshita K, Muramatsu M et al. he AID enzyme induces class switch recombination in ibroblasts. Nature 2002; 416(6878):340-345. 12. Goff SP. Death by deamination: a novel host restriction system for HIV-1. Cell 2003; 114(3):281-283. 13. Bredt DS. Endogenous nitric oxide synthesis: biological functions and pathophysiology. Free Radic Res 1999; 31(6):577-596. 14. Routledge MN. Mutations induced by reactive nitrogen oxide species in the supF forward mutation assay. Mutat Res 2000; 450(1-2):95-105. 15. Wink DA, Kasprzak KS, Maragos CM et al. DNA deaminating ability and genotoxicity of nitric oxide and its progenitors. Science 1991; 254(5034):1001-1003. 16. Lindahl T. An N-glycosidase from escherichia coli that releases free uracil from DNA containing deaminated cytosine residues. Proc Natl Acad Sci USA 1974; 71(9):3649-3653. 17. Lindahl T, Ljungquist S, Siegert W et al. DNA N-glycosidases: properties of uracil-DNA glycosidase from Escherichia coli. J Biol Chem 1977; 252(10):3286-3294. 18. Aravind L, Koonin EV. he alpha/beta fold uracil DNA glycosylases: a common origin with diverse fates. Genome Biol 2000; 1(4):RESEARCH0007. 19. Pearl LH. Structure and function in the uracil-DNA glycosylase superfamily. Mutat Res 2000; 460(3-4):165-181. 20. Kavli B, Sundheim O, Akbari M et al. hUNG2 is the major repair enzyme for removal of uracil from U:A matches, U:G mismatches and U in single-stranded DNA, with hSMUG1 as a broad speciicity backup. J Biol Chem 2002; 277(42):39926-39936. 21. Neddermann P, Jiricny J. he puriication of a mismatch-speciic thymine-DNA glycosylase from HeLa cells. J Biol Chem 1993; 268(28):21218-21224.

Studying Antibody Maturation Using Techniques for Detecting Uracils in DNA

141

22. Saparbaev M, Laval J. 3,N4-ethenocytosine, a highly mutagenic adduct, is a primary substrate for escherichia coli double-stranded uracil-DNA glycosylase and human mismatch-speciic thymine-DNA glycosylase. Proc Natl Acad Sci USA 1998; 95(15):8508-8513. 23. Nilsen H, Haushalter KA, Robins P et al. Excision of deaminated cytosine from the vertebrate genome: role of the SMUG1 uracil-DNA glycosylase. EMBO J 2001; 20(15):4278-4286. 24. Wibley JE, Waters TR, Haushalter K et al. Structure and speciicity of the vertebrate anti-mutator uracil-DNA glycosylase SMUG1. Mol Cell 2003; 11(6):1647-1659. 25. Krokan HE, Drablos F, Slupphaug G. Uracil in DNA—occurrence, consequences and repair. Oncogene 2002; 21(58):8935-8948. 26. Boorstein RJ, Cummings A, Jr, Marenstein DR et al. Definitive identification of mammalian 5-hydroxymethyluracil DNA N-glycosylase activity as SMUG1. J Biol Chem 2001; 276(45):41991-41997. 27. Hendrich B, Hardeland U, Ng HH et al. he thymine glycosylase MBD4 can bind to the product of deamination at methylated CpG sites. Nature 1999; 401(6750):301-304. 28. Sartori AA, Jiricny J. Enzymology of base excision repair in the hyperthermophilic archaeon pyrobaculum aerophilum. J Biol Chem 2003; 278(27):24563-24576. 29. Conticello SG. he AID/APOBEC family of nucleic acid mutators. Genome Biol 2008; 9(6):229. 30. Rogozin IB, Basu MK, Jordan IK et al. APOBEC4, a new member of the AID/APOBEC family of polynucleotide (deoxy)cytidine deaminases predicted by computational analysis. Cell Cycle 2005; 4(9):1281-1285. 31. Franca R, Spadari S, Maga G. APOBEC deaminases as cellular antiviral factors: a novel natural host defense mechanism. Med Sci Monit 2006; 12(5):RA92-98. 32. Neuberger MS, Harris RS, Di Noia J et al. Immunity through DNA deamination. Trends Biochem Sci 2003; 28(6):305-312. 33. Anant S, MacGinnitie AJ, Davidson NO. apobec-1, the catalytic subunit of the mammalian apolipoprotein B mRNA editing enzyme, is a novel RNA-binding protein. J Biol Chem 1995; 270(24):14762-14767. 34. Turelli P, Trono D. Editing at the crossroad of innate and adaptive immunity. Science 2005; 307(5712):1061-1065. 35. Janeway C, Travers P, Walport M et al. Immunobiology. 5th ed. London: Garland Publishing 2001. 36. Honjo T, Alt FW, Neuberger M, eds. Molecular Biology of B-cells. London, UK: Elsevier Academic Press 2004. 37. Goldsby R, Kindt T, Osborne B et al. Immunology. 5th ed. New York, NY: W.H. Freeman and Company; 2003. 38. Papavasiliou FN, Schatz DG. Somatic hypermutation of immunoglobulin genes: merging mechanisms for genetic diversity. Cell 2002; 109(Suppl):S35-44. 39. Berek C, Milstein C. he dynamic nature of the antibody repertoire. Immunol Rev 1988; 105:5-26. 40. Neuberger MS, Milstein C. Somatic hypermutation. Curr Opin Immunol 1995; 7(2):248-254. 41. Storb U. he molecular basis of somatic hypermutation of immunoglobulin genes. Curr Opin Immunol 1996; 8(2):206-214. 42. Muramatsu M, Sankaranand VS, Anant S et al. Speciic expression of activation-induced cytidine deaminase (AID), a novel member of the RNA-editing deaminase family in germinal center B-cells. J Biol Chem 1999; 274(26):18470-18476. 43. Muramatsu M, Kinoshita K, Fagarasan S et al. Class switch recombination and hypermutation require activation-induced cytidine deaminase (AID), a potential RNA editing enzyme. Cell 2000; 102(5):553-563. 44. Durandy A, Revy P, Imai K et al. Hyper-immunoglobulin M syndromes caused by intrinsic B-lymphocyte defects. Immunol Rev 2005; 203:67-79. 45. Revy P, Muto T, Levy Y et al. Activation-induced cytidine deaminase (AID) deiciency causes the autosomal recessive form of the hyper-IgM syndrome (HIGM2). Cell 2000; 102(5):565-575. 46. Arakawa H, Hauschild J, Buerstedde JM. Requirement of the activation-induced deaminase (AID) gene for immunoglobulin gene conversion. Science 2002; 295(5558):1301-1306. 47. Harris RS, Sale JE, Petersen-Mahrt SK et al. AID is essential for immunoglobulin V gene conversion in a cultured B-cell line. Curr Biol 2002; 12(5):435-438. 48. Nilsen H, Rosewell I, Robins P et al. Uracil-DNA glycosylase (UNG)-deicient mice reveal a primary role of the enzyme during DNA replication. Mol Cell 2000; 5(6):1059-1065. 49. Kavli B, Andersen S, Otterlei M et al. B-cells from hyper-IgM patients carrying UNG mutations lack ability to remove uracil from ssDNA and have elevated genomic uracil. J Exp Med 2005; 201(12):2011-2021. 50. Nilsen H, Stamp G, Andersen S et al. Gene-targeted mice lacking the Ung uracil-DNA glycosylase develop B-cell lymphomas. Oncogene 2003; 22(35):5381-5386.

142

DNA and RNA Modii cation Enzymes

51. Petersen-Mahrt SK, Harris RS, Neuberger MS. AID mutates E. coli suggesting a DNA deamination mechanism for antibody diversiication. Nature 2002; 418(6893):99-103. 52. Rada C, Williams GT, Nilsen H et al. Immunoglobulin isotype switching is inhibited and somatic hypermutation perturbed in UNG-deicient mice. Curr Biol 2002; 12(20):1748-1755. 53. Wang Z, Mosbaugh DW. Uracil-DNA glycosylase inhibitor gene of bacteriophage PBS2 encodes a binding protein speciic for uracil-DNA glycosylase. J Biol Chem 1989; 264(2):1163-1171. 54. Di Noia J, Neuberger MS. Altering the pathway of immunoglobulin hypermutation by inhibiting uracil-DNA glycosylase. Nature 2002; 419(6902):43-48. 55. Imai K, Slupphaug G, Lee WI et al. Human uracil-DNA glycosylase deiciency associated with profoundly impaired immunoglobulin class-switch recombination. Nat Immunol 2003; 4(10):1023-1028. 56. Lee WI, Torgerson TR, Schumacher MJ et al. Molecular analysis of a large cohort of patients with the hyper immunoglobulin M (IgM) syndrome. Blood 2005; 105(5):1881-1890. 57. Begum NA, Kinoshita K, Kakazu N et al. Uracil DNA glycosylase activity is dispensable for immunoglobulin class switch. Science 2004; 305(5687):1160-1163. 58. Stivers JT. Comment on “Uracil DNA glycosylase activity is dispensable for immunoglobulin class switch”. Science 2004; 306(5704):2042; author reply 2042. 59. Di Noia JM, Williams GT, Chan DT et al. Dependence of antibody gene diversiication on uracil excision. J Exp Med 2007; 204(13):3209-3219. 60. Rada C, Di Noia JM, Neuberger MS. Mismatch recognition and uracil excision provide complementary paths to both Ig switching and the A/T-focused phase of somatic mutation. Mol Cell 2004; 16(2):163-171. 61. Shen HM, Tanaka A, Bozek G et al. Somatic hypermutation and class switch recombination in Msh6() Ung() double-knockout mice. J Immunol 2006; 177(8):5386-5392. 62. Xue K, Rada C, Neuberger MS. he in vivo pattern of AID targeting to immunoglobulin switch regions deduced from mutation spectra in msh2 ung mice. J Exp Med 2006; 203(9):2085-2094. 63. Modrich P. Mechanisms in eukaryotic mismatch repair. J Biol Chem 2006; 281(41):30305-30309. 64. Bransteitter R, Pham P, Scharf MD et al. Activation-induced cytidine deaminase deaminates deoxycytidine on single-stranded DNA but requires the action of RNase. Proc Natl Acad Sci USA 2003; 100(7):4102-4107. 65. Chaudhuri J, Tian M, Khuong C et al. Transcription-targeted DNA deamination by the AID antibody diversiication enzyme. Nature 2003; 422(6933):726-730. 66. Dickerson SK, Market E, Besmer E et al. AID mediates hypermutation by deaminating single stranded DNA. J Exp Med 2003; 197(10):1291-1296. 67. Sohail A, Klapacz J, Samaranayake M et al. Human activation-induced cytidine deaminase causes transcription-dependent, strand-biased C to U deaminations. Nucleic Acids Res 2003; 31(12):2990-2994. 68. Beale RC, Petersen-Mahrt SK, Watt IN et al. Comparison of the diferential context-dependence of DNA deamination by APOBEC enzymes: correlation with mutation spectra in vivo. J Mol Biol 2004; 337(3):585-596. 69. Navaratnam N, Morrison JR, Bhattacharya S et al. he p27 catalytic subunit of the apolipoprotein B mRNA editing enzyme is a cytidine deaminase. J Biol Chem 1993; 268(28):20709-20712. 70. Bransteitter R, Pham P, Calabrese P et al. Biochemical analysis of hypermutational targeting by wild type and mutant activation-induced cytidine deaminase. J Biol Chem 2004; 279(49):51612-51621. 71. Pham P, Bransteitter R, Petruska J et al. Processive AID-catalysed cytosine deamination on single-stranded DNA simulates somatic hypermutation. Nature 2003; 424(6944):103-107. 72. Blount BC, Ames BN. Development of a sensitive assay for detection of uracil in DNA. Adv Exp Med Biol 1993; 338:741-744. 73. Mashiyama ST, Courtemanche C, Elson-Schwab I et al. Uracil in DNA, determined by an improved assay, is increased when deoxynucleosides are added to folate-deicient cultured human lymphocytes. Anal Biochem 2004; 330(1):58-69. 74. Roberts KP, Sobrino JA, Payton J et al. Determination of apurinic/apyrimidinic lesions in DNA with high-performance liquid chromatography and tandem mass spectrometry. Chem Res Toxicol 2006; 19(2):300-309. 75. Martomo SA, Fu D, Yang WW et al. Deoxyuridine is generated preferentially in the nontranscribed strand of DNA from cells expressing activation-induced cytidine deaminase. J Immunol 2005; 174(12):7787-7791. 76. Andersen S, Heine T, Sneve R et al. Incorporation of dUMP into DNA is a major source of spontaneous DNA damage, while excision of uracil is not required for cytotoxicity of luoropyrimidines in mouse embryonic ibroblasts. Carcinogenesis 2005; 26(3):547-555. 77. Kubo K, Ide H, Wallace SS et al. A novel, sensitive and speciic assay for abasic sites, the most commonly produced DNA lesion. Biochemistry 1992; 31(14):3703-3708.

Studying Antibody Maturation Using Techniques for Detecting Uracils in DNA

143

78. Lari SU, Chen CY, Vertessy BG et al. Quantitative determination of uracil residues in escherichia coli DNA: contribution of ung, dug and dut genes to uracil avoidance. DNA Repair (Amst) 2006; 5(12):1407-1420. 79. Cabelof DC, Nakamura J, Heydari AR. A sensitive biochemical assay for the detection of uracil. Environ Mol Mutagen 2006; 47(1):31-37. 80. Okazaki IM, Kotani A, Honjo T. Role of AID in tumorigenesis. Adv Immunol 2007; 94:245-273. 81. Liu M, Duke JL, Richter DJ et al. Two levels of protection for the B-cell genome during somatic hypermutation. Nature 2008; 451(7180):841-845. 82. Samaranayake M, Bujnicki JM, Carpenter M et al. Evaluation of molecular models for the ainity maturation of antibodies: roles of cytosine deamination by AID and DNA repair. Chem Rev 2006; 106(2):700-719. 83. Rogozin IB, Iyer LM, Liang L et al. Evolution and diversiication of lamprey antigen receptors: evidence for involvement of an AID-APOBEC family cytosine deaminase. Nat Immunol 2007; 8(6):647-656.

Chapter 12

Enzymatic Formation of the Hypermodiied DNA Base J (β-D-Glucopyranosyloxymethyluracil) Robert Sabatini,* Laura Clife, Saara Vainio and Piet Borst

Abstract

B

ase J (β-D-glucopyranosyloxymethyluracil) is the only hyper-modiied DNA base known in eukaryotes. It is present in the nuclear DNA of all lagellated protozoa of the order of the Kinetoplastida and in the closely related unicellular alga Euglena gracilis. Base J is a minor constituent of DNA, replacing at most 1% of thymidines and it is mainly present in repetitive sequences, invariably including the telomeric repeats. he synthesis of the base involves two enzymatic steps: hydroxylation of a thymidine residue in DNA producing HOMedU in DNA as a free intermediate, followed by addition of the glucose moiety. he enzymes involved in J biosynthesis, thymidine hydroxylase and glucosyl transferase, represent novel enzymes. Base J was originally identiied in Trypanosoma brucei based on its developmentally regulated synthesis and localization correlating with the silencing of telomeric surface antigen genes of this deadly human parasite. In this chapter, we will focus primarily on T. brucei, in which the majority of work on J has been carried out and a potential function for the modiied base is evident. he early history of base J discovery and recent developments in our understanding of J biosynthesis and function have been reviewed in detail.1 his chapter highlights the methods to detect J and HOMeUra, our current knowledge of the regulation of base J synthesis during the parasite’s lifecycle and our most recent attempts to deine the elusive function of base J.

Introduction

Trypanosoma brucei is the causative agent of Human African Trypanosomiasis (sleeping sickness). he success of this group of organisms stems from their ability to persist in the bloodstream of the deinitive host by a process called antigenic variation. Antigenic variation refers to the ability of trypanosomes to evade the mammalian host immune response by regularly changing the trypanosome variant surface glycoprotein (VSG) coat (reviewed in ref. 2). From their pool of over 1000 VSG genes, trypanosomes express only one VSG gene at a time. Monoallelic expression of these genes is achieved through regulated transcription from specialized telomeric units termed expression sites. here are around 20 expression sites, of which only one is active at a time while the remainder are silenced (Fig. 1). It was the study of these silent expression sites that gave the irst clue that a novel form of DNA modiication was present in kinetoplastid lagellates, such as trypanosomes. Attempts to digest VSG genes from silent expression sites with restriction endonucleases yielded only partial digests.3 In contrast, the VSG gene in the active expression site was *Corresponding Author: Robert Sabatini—University of Georgia, Department of Biochemistry and Molecular Biology, Athens, Georgia 30602-7229, USA. Email: [email protected]

DNA and RNA Modiication Enzymes: Structure, Mechanism, Function and Evolution, edited by Henri Grosjean. ©2009 Landes Bioscience.

Enzymatic Formation of the Hypermodiied DNA Base J (β-D-Glucopyranosyloxymethyluracil)

145

Figure 1. The localization of Base J in silent and active telomeric VSG expression sites of T. brucei. The presence of J was determined by immunoprecipitation of J containing DNA fragments using an antibody against the modified base followed by dot blot blotting of the precipitated DNA fragments and hybridization with various probes.

readily digested, strongly suggesting that the DNA of silent expression sites was modiied. he search for the modiied nucleotide led to the discovery of base J in trypanosome DNA (Fig. 2).4 In this chapter we will irst describe how J was initially detected and the techniques we currently use to quantitate and localize this modiied base in genomic DNA. hen we will discuss the J-biosynthesis pathway and the analysis of the two key enzymes involved. Finally, we will briely describe how the analysis of J-biosynthetic enzymes will help to elucidate the biological function of base J, including the regulation of antigenic variation in African trypanosomes.

Detection of Base J

A standard nucleoside analysis of nuclear DNA of bloodstream form T. brucei yielded no candidate,5 indicating that the modiied nucleoside was either not detectable by standard procedures or present below the detection limit of 0.1 mole %. However, the standard postlabelling procedure developed by Randerath and coworkers yielded two novel nucleotides, ater separation of the32 P-labeled 5ʹ-nucleotides on 2D-TLC, called pdV and pdJ.3 Figure 3A shows the kind of resolution that can be obtained with 2D-TLC. his igure also shows that pdJ is only present in DNA from bloodstream form T. brucei and not in insect form DNA. he pdV was later shown to be hydroxymethyldeoxyuridine 5ʹ-monophosphate6 and pdJ the glucosylated form of pdV.4 Although the 2D-TLC analysis remains an unambiguous and robust way to identify pdJ, the assay is not without its problems. he resolution is somewhat dependent on the batch of thin-layer plates; the sensitivity is limited; and the assay cannot be scaled up for simultaneous detection of J in many samples. Another problem is that the analysis gives no precise quantitative

Figure 2. The biosynthesis of base J by a two-step modification of a specific thymine base in the DNA. Step one of the pathway involves the oxidation of a thymidine by a thymidine hydroxylase. The intermediate formed, HOMedU, is then glucosylated in step two, which results in the formation of base J. The dashed line represents the DNA backbone.

146

DNA and RNA Modii cation Enzymes

Figure 3. Methods for detecting base J. A) Detection of base J by 2D-TLC reveals that it is only present in bloodstream form trypanosomes. The autoradiograms show 2D-TLC separations of 32P-postlabeled nucleotides derived from mini-chromosomal DNA of bloodstream- and procyclic-form T. brucei cell lines. The dashed arrow and solid arrow indicate pdV and pdJ, respectively. Reprinted, with permission, from Nucleic Acids Res 1991; 19(8):1745-1751 © 1991 by Nulecic Acids Research. B) The antibody that recognizes base J is highly sensitive and specific. The sensitivity of the antibody was determined by means of dilution series of T. brucei bloodstream form (BF) DNA (0.2 mole % J), procyclic-form (PC) DNA (no J) and bloodstream form DNA diluted in procyclic-form DNA. The specificity of the antibody was confirmed using DNA samples containing different DNA modifications. Calf thymus (5-methylC), E. coli (6-methylA, 4- methylC, 5-methylC), phage φ e (HOMeU), phage T2 (HOMeC, α-gluc- and β -gluc- α -gluc-HOMeC) and phage T4 (α - and β -gluc-HOMeC). Reprinted, with permission, from Genes Dev 1997; 11(23):3232-3241 © 1997 by Cold Spring Harbor Laboratory Press.

results, as shown by Van Leeuwen et al,7 who found that the presence of J in DNA interferes with the enzymatic digestion of DNA by micrococcal nuclease. Notably, this enzyme has problems cleaving the nucleotide bond 5ʹ of dJ. he longer DNA stretches remaining ater digestion with micrococcal nuclease can then not be completely digested by spleen phosphodiesterase. he result is that only about 50% of the pdJ is recovered7 and that the mole % of J reported in some of our early papers8,9 is underestimated by approximately 2-fold. With hindsight, the interference with cleavage by micrococcal nuclease can now be (speculatively) attributed to the formation of two H-bonds between the 2- and 3-hydroxyl groups of the glucose-moiety of J and the nonbridging phosphoryl oxygen of the nucleotide 5ʹ of J (ref. 10 and see below).

Enzymatic Formation of the Hypermodiied DNA Base J (β-D-Glucopyranosyloxymethyluracil)

147

he precise quantitation of the mole % of J in DNA would have been impossible without standards. hese were provided by the group of Van Boom.11,12 hey developed a method to synthesize pdJ and oligonucleotides containing pdJ,11,12 yielding DNA segments with known amounts of J for biochemical experiments. In the initial procedure some of the J was converted into amino-T during deprotection of the protected nucleotide used for oligonucleotide synthesis.13 his was rectiied in a modiied synthesis,14 which has been reproduced in another lab.10 he chemically synthesized pdJ was coupled to carrier protein and used to raise polyclonal antibodies in rabbits.15 hese antisera detect pdJ with high speciicity in DNA, as shown in Figure 3B. he only signiicant cross-reaction was found with DNA from bacteriophage T4, which contains β-glucosyl-hydroxymethylcytosine. he anti-pdJ antibodies can detect as little as 1 J in 106 bases on dot-blots15 and this is at least 100-fold more sensitive than the postlabelling analysis. A further bonus of these antibodies is that they can precipitate J-containing DNA allowing an analysis of the distribution of J in DNA.15,16 It should be emphasized that the immunoprecipitation of J-DNA by these antibodies is somewhat variable and only provides semi-quantitative results.15-17 Unfortunately, attempts to generate monoclonal antibodies against pdJ have failed thus far. he analysis of pdV (HOMedU-mononucleotide), the precursor of pdJ in DNA, has been more laborious. Until very recently the most efective way to detect low levels of HOMedU in DNA or oligonucleotides was digestion with the DNA glycosylase hSMUG1 (human single-strand-selective monofunctional uracil-DNA glycosylase), which is highly speciic for DNA containing uracil or HOMeUra.18,19 Base excision generates an abasic site cleaved in alkali. his can be used as a sensitive oligonucleotide-based assay to detect formation of HOMeUra in DNA. In contrast, J is resistant to all DNA glycosylases tested.19 Our attempts to raise antibodies against HOMedU-mononucleotide failed, but recently a commercial antibody was marketed by Abcam. We have conirmed that this detects the nucleotide in DNA in a dot blot assay with sensitivity comparable to detection of J in DNA with anti-J antiserum. he generation of the speciic J antisera has allowed the analysis of the presence of J among species, as well as detailed localization within the genome of T. brucei and related organisms. Phylogenetic analysis showed that J has been maintained in all kinetoplastids as well as in the closely related marine lagellate Diplonema and in Euglena.8,20 Analysis of the diferent lifecycle stages of T. brucei conirmed that the insect form completely lacks base J.15 his developmental regulation appears to be unique to T. brucei, as two related kinetoplastids T. cruzi and Leishmania contain J both in the insect and mammalian life cycle stages.8,21 In T. brucei, J is found only in silent expression sites and not in the active one (Fig. 1), as already inferred from restriction enzyme digestion.15 Upon transcriptional activation of a silent site, J is lost from the site, but is maintained in the 50bp repeats upstream of the promoter as well as in the telomere downstream of the VSG gene. he presence of J only in bloodstream form cells as well as the correlation of its localization with silent but not active VSG expression sites has led to the obvious hypothesis that J plays a role in the regulation of antigenic variation in T. brucei. his hypothesis, as discussed in the inal section, can now be directly tested. In addition to the telomeric and 50bp repeats, J is found in a number of other repetitive sequences in T. brucei: 70bp repeats within silent expression sites, 177bp repeats in minichromosomes, the 5S RNA repeats and the mini-exon repeats.16 In this organism, around 50% of the total J is found in the telomeric repeats, while the distribution of J in other trypanosomatids is diferent. In Leismania, 98% of the total J appears to be telomeric, as is the case in Crithidia fasiculata, whereas in T. cruzi, 75% of the total J is telomeric.21,22 he remaining 25% of the J in T. cruzi is associated with the subtelomeric regions.21 Interestingly, these regions contain members of the transialidase gene family thought to be involved in host immune evasion and in cell invasion by T. cruzi.

he Two-Step Biosynthesis Pathway

J is synthesised in a two-step pathway (Fig. 2). he irst step involves the oxidation of speciic thymidine residues in DNA by a thymidine hydroxylase, which results in the formation of the

148

DNA and RNA Modii cation Enzymes

intermediate base hydroxymethyluracil (HOMeUra). In the second step, this intermediate is converted into β-D-glucosyl-hydroxymethyluracil (base J) by addition of a glucose molecule by a glucosyl transferase (GT). here are several lines of evidence supporting this pathway. Firstly, the speciic localisation of J within the genome indicates that thymidine residues are modiied in DNA, rather than synthesised and then incorporated during DNA replication. Secondly, as described above, the intermediate in J synthesis, HOMedU, is detectable in the DNA of bloodstream form trypanosomes by postlabelling and TLC analysis. Finally, the expression of the DNA glycosylase SMUG1 (see preceding section) is toxic in bloodstream but not insect stage trypanosomes.18 he resistance of the insect stage cells to the DNA cleavage following SMUG1 expression suggests that they are not capable of the irst step of the J biosynthesis pathway. However, step 1 of the synthesis pathway can be bypassed by feeding the insect stage cells with HOMedU.23 his results in the synthesis of some J at random sites within the genome, implying that the GT is present in insect form cells.23 his indicates that the developmental regulation of J biosynthesis occurs primarily at the level of the thymidine hydroxylase enzyme. Furthermore, it suggests that the GT is nonspeciic and is able to glucosylate HOMedU present anywhere in the genome in both insect and bloodstream form trypanosomes. It is clear that both J and the J biosynthesis pathway have a number of unique features. he modiied base most closely related to J is glucosyl-hydroxymethylC (glu-C) present in T even bacteriophages.24 However, unlike base J, glu-C is synthesised at the nucleotide level and then incorporated into the DNA. Furthermore, the oxidation of thymidine residues by a thymidine hydroxylase enzyme is unusual. hymine hydroxylase enzymes that oxidize the free base are known, but no thymidine hydroxylases that oxidize the base in DNA were known before we found them in kinetoplastids. he presence of a sugar transferase in the nucleus is also apparently unique.

Characterization of Two Distinct hymidine Hydroxylases in J Biosynthesis

JBP1 and JBP2 stimulate J synthesis: For a long time, the search for J-synthesizing enzymes was frustratingly unsuccessful. However, during the search for J function, other experiments identiied a protein binding to base J, therefore named J-binding protein ( JBP).25 In silico screening led to the identiication of JBP2, based on its homology to JBP1.25,26 JBP1 was puriied from kinetoplastid nuclear extracts based on its ability to bind to J-containing oligonucleotides25 and its binding properties were veriied and studied in detail in gel-shit experiments using recombinant JBP1 expressed in E. coli. hese studies indicated that JBP1 binds speciically to J-containing duplex DNA with high ainity (40-140 nM).13 How JBP1 binds to J-DNA will be discussed below. Analysis of JBP2 indicates thus far that the protein is unable to bind J-DNA. To determine the function of JBP1, both alleles of the gene were deleted from bloodstream form T. brucei. Quite unexpectedly, the loss of JBP1 resulted in a 20-fold reduction in J levels in the genome.17 his decrease in J was apparent in all sequences that normally contain J. Re-expression of the JBP1 gene in the JBP1-null resulted in a full rescue of wild-type J levels. While the disruption of JBP1 resulted in no other apparent defects, it was clear from these results that binding of JBP1 to J-DNA plays a crucial role in regulating the levels of J in the genome.26 his activity of JBP1 could either relect the ability of the protein to stimulate increased synthesis of J or its ability to bind to J preventing its turnover. he latter option was tested by increasing J levels 10-fold by growing the JBP1-null cells in medium containing HOMedU, followed by a chase in the absence of HOMedU. In the absence of JBP1 the rate of J loss can be explained by simple dilution due to DNA replication.17 his shows that JBP1 binding does not protect base J from degradation, but rather has a role in stimulating J synthesis (i.e., catalytic). his conclusion is supported by the ratio of base J to JBP1 protein in kinetoplastid nuclei. Analysis of JBP1 concentrations in Trypanosoma, Leishmania and Crithidia indicates there are only 1.0-2.6 × 103 molecules of JBP1 per cell which is 30-60 fold less than the number of J residues ( JBP1 binding sites) in the genome.27 JBP1 would clearly be unable to prevent J turn-over when >95% of the J molecules in the genome are not bound to JBP1.

Enzymatic Formation of the Hypermodiied DNA Base J (β-D-Glucopyranosyloxymethyluracil)

149

Studies with JBP1 thus already suggested that the protein, and by analogy also JBP2, participate in the regulation of J synthesis and experiments with JBP2 conirmed and strengthened this hypothesis. In addition to the N-teminal region which is homologous to JBP1, JBP2 contains homology (24% identity, 45% similarity) to the SWI2/SNF2 family of ATPase DNA helicase proteins involved in chromatin remodelling (Fig. 4).26 JBP2 localizes to the nucleus and interacts with chromatin independent of the presence of base J in DNA. In vivo analysis of T. brucei cells revealed that both JBP1 and JBP2 are developmentally regulated: expressed in bloodstream form, but undetectable in insect stage trypanosomes. Surprisingly, ectopic expression of JBP2 in insect stage trypanosomes that normally lack base J resulted in de novo site-speciic synthesis of basal levels of J, while expression of JBP1 alone in these cells had no such efect.26 However, when insect stage cells expressing JBP2 were induced to co-express JBP1 (using a tetracycline-inducible expression system), the basal levels of J were further increased. hese results lead to a model, in which JBP2 is the key regulator of J synthesis initiating site-speciic de novo J synthesis in bloodstream form trypanosomes and the role of JBP1 is to amplify and maintain the levels of the modiied base (Fig. 5). Results from JBP2-null T. brucei support this model. he deletion of JBP2 from the bloodstream form trypanosome led to a 5-fold reduction in J.28 In addition, a telomere fragmentation assay conirmed the importance of JBP2 in de novo J synthesis. In WT cells, telomeric cleavage results in the growth of a new telomere that contains J. However, in a JBP2-null cell line, the new telomere lacks base J.28 he implications of this model will be discussed below in more detail (see section “Regulation of J synthesis by thymidine hydroxylases”). Every organism that is known to contain J and has had its genome sequenced (i.e., T. brucei, T. cruzi and Leishmania), contains homologues of JBP2 and JBP1. his would suggest that both proteins somehow work together to regulate J synthesis in any organism that contains the modiied base.

Figure 4. Functional domains of JBP1 and JBP2. The region shared between JBP1 and JBP2 at the N-terminus is indicated by the hatched box. Within this region is the ∼70-amino-acid motif, indicated by the solid black box, which is related to the functional domain of the members of the Fe2+ /2-oxoglutarate dependent hydroxylase family. Indicated above and below the putative thymine hydroxylase motif for JBP2 and JBP1, respectively, is the amino acid signature shared among members of this hydroxylase family. The corresponding region of the bacterial AlkB motif is shown in the middle. The four key residues within this motif that have been implicated in catalysis, by all members of this family of proteins, are highlighted in bold. As discussed in the text, these residues have been directly implicated in JBP1 and JBP2 function in J biosynthesis in vivo. The solid black rectangle within the C-terminus of JBP1, labeled JBD, represents the putative 20-kDa minimal J-binding domain (see ref. 1). The solid black rectangle at the C-terminus of JBP2 represents the domain homologous to the SWI2/ SNF2 family of ATPase/DNA helicases. TH; thymidine hydroxylase.

150

DNA and RNA Modii cation Enzymes

Figure 5. Proposed model for the regulation of J biosynthesis by the two thymidine hydroxylases JBP1 and JBP2. JBP2 is thought to recognize and bind specific chromatin domains (i.e., telomeric repeats), hydrolyze ATP and stimulate de novo hydroxylation of thymidine residues in DNA. The glucosyl transferase (GT) converts the HOMeUra intermediate (T-OH) into base J. JBP1 then binds base J and hydroxylates adjacent thymidines followed by the conversion into J by the GT enzyme. Reprinted, with permission, from the Annual Review of Microbiology, Volume 62 ©2008 by Annual Reviews www.annualreviews.org

JBP1 and 2 are thymidine hydroxylases: As mentioned above, JBP2 was identiied in the T. brucei genome database based on homology at the N-terminus (34% identity, 45% similarity) to JBP1. Because of this homology, it was suggested that the proteins are directly involved in the synthesis of HOMedU and that their shared region might contain a thymidine hydroxylase domain.29 Upon close examination it was found that the conserved region shares (weak) homology with enzymes of the family of Fe2+- and 2-oxoglutarate-dependent dioxygenases (hydroxylases).29 hese enzymes catalyze the oxidation of a wide variety of substrates using ferrous iron and 2-oxoglutarate as cofactors and molecular oxygen as cosubstrate.30 A characteristic member of this superfamily, the E. coli AlkB protein, is involved in DNA repair by catalyzing oxidative demethylation of DNA base lesions 1-meA, 3-meC, 1-meG and 3-meT.30-34 (For more information consult chapters by Falnes, Van den Born and Meza as well as of Roldan-Arjona and Ariza in this volume.) he oxidation of this damaged base by AlkB results in a hydroxymethyl moiety, which is spontaneously released as formaldehyde, regenerating the normal base (Fig. 6). One can easily imagine that the putative thymine hydroxylases involved in J biosynthesis could be related to members of this superfamily. he oxidation of the methyl moiety in 5ʹ-MedU (thymidine) that occurs during the formation of the nucleoside dJ intermediate, HOMedU, closely resembles the initial oxidation of the methyl group by AlkB during DNA repair (Fig. 6). However, in the hydroxylation of thymidine residues during J biosynthesis, the hydroxymethyl moiety is stably linked to the 5ʹ carbon of the base rather than to a ring nitrogen and is not spontaneously released. Indeed, the conversion of free thymine into HOMeUra in fungi also involves dioxygenases using Fe2+ and 2-oxoglutarate as cofactors (reviewed in refs. 30 and 34). Enzymes in the AlkB superfamily have a β-strand fold that contains a highly conserved motif consisting of four amino acids that bind Fe2+ and 2-oxoglutarate.31,34 hese four amino acids are conserved in appropriate positions in the N-terminal region of JBP1 and JBP2 and are essential for JBP1 function29: replacement of any of the four conserved residues with alanine or serine abolishes the ability of JBP1 to stimulate J synthesis. his is not due to the inability of the mutant JBP1 to enter the nucleus or to bind J-DNA.29 Recently, we have veriied that these residues are also critical for the function of JBP2 (unpublished results). In order to conirm their role in catalyzing the irst step of J biosynthesis in vivo, we have generated a bloodstream form T. brucei cell line that lacks both JBP1 and JBP2. While the individual JBP1 null and JBP2 null trypanosomes have reduced J levels (20- and 5-fold, respectively), the JBP1/JBP2-double-null trypanosome completely lacks

Enzymatic Formation of the Hypermodiied DNA Base J (β-D-Glucopyranosyloxymethyluracil)

151

Figure 6. Comparison of the catalytic mechanism of AlkB and the proposed mechanism of the thymidine hydroxylases involved in J synthesis.

base J (L. Clife, R. Kiet and R. Sabatini, unpublished data). hese cells still contain GT activity because HOMedU feeding leads to high levels of J. hese results fully support the identiication of JBP1 and JBP2 as Fe2+/2-oxoglutarate dependent hydroxylases that catalyze the hydroxylation of thymidine in DNA to yield HOMedU. However, thus far we have failed to demonstrate hydroxylase activity of either JBP1 or JBP2 in vitro.29,35

Identiication of the Glucosyl Transferase

he culturing of insect stage T. brucei in media containing HOMedU results in the synthesis of base J.23 HOMedUMP is incorporated randomly into the DNA, where it is then glucosylated to form base J and similar results have been obtained with bloodstream form cells.23 herefore, the GT functions regardless of the DNA sequence context of the HOMedU substrate and the enzyme is expressed in both bloodstream and insect form trypanosomes. However, other than this initial characterization of GT activity in vivo, no progress has been made regarding the isolation of the enzyme or cloning the gene. So far all attempts to identify the GT have remained fruitless; neither eforts to pull out the enzyme from nuclear extracts using HOMedU oligos as a substrate nor data-mining trials have been successful. he GT is unique in that it operates in the nucleus, whereas all other known GTs are cytoplasmic. It might therefore require unusual glucose donors or cofactors for function.

Regulation of J Synthesis by hymidine Hydroxylases

As discussed above, JBP1 and JBP2 are thought to represent two distinct thymidine hydroxylases. his raises an interesting question: why are two thymidine hydroxylases required for J synthesis? he diference in substrate speciicity between the two proteins implies a unique role for each in the biosynthesis pathway. JBP1 binds directly to the modiied base in DNA whereas

152

DNA and RNA Modii cation Enzymes

JBP2 interacts with chromatin independent of J.26 We believe that these proteins work together to regulate J synthesis, ensuring the proper localization, levels and maintenance of J. JBP2 is not a true JBP in that it does not to bind J-DNA, but its ability to bind chromatin in the absence of J is critical for the overall regulation of J biosynthesis. Somehow, JBP2 recognizes and binds to speciic regions of the chromosome (i.e., telomeric) bringing the thymidine hydroxylase domain to DNA. One idea is that this interaction is via the SWI2/SNF2 domain of JBP2, as mutation of key residues within the ATPase region of the SWI2/SNF2 domain kill JBP2 function.26 However, whether ATP hydrolysis is required for JBP2 to bind to or remodel chromatin structure to optimize thymidine hydroxylase activity is unknown. How JBP2 recognizes and binds chromatin is also unknown (i.e., whether this recognition is at the level of sequence or structure of the DNA substrate). Elucidating how JBP1 binds J-DNA is essential to fully understand how the protein regulates J synthesis. Gel shit assays revealed that JBP1 bound J only when presented in the context of a double-stranded DNA molecule;13 JBP1 did not bind single-stranded J-DNA or free J-mononucleotide. Furthermore, optimal binding requires one helical turn of duplex DNA (B form). he nature of the helix appears to be crucial as JBP1 cannot bind J when presented in an RNA/DNA duplex (A form).13 JBP1 does not make any sequence speciic contacts with the bases surrounding the modiied base and can recognise J when presented in any sequence context.13,36 However, in vitro analysis showed a bias for JBP1 to bind J when presented in a telomeric repeat, correlating with high J levels in the telomeres in vivo.13 DNA foot-printing techniques indicate that the only critical interactions between JBP1 and J-DNA occur via minor and major groove interactions at base J and a sequence-independent major groove contact at the nucleotide immediately 5ʹ of the base ( J-1 position).36 It appears that this J-1 nucleotide is essential for the proper orientation of the glucose moiety of base J. Analysis of JBP1 binding to various modiied DNA substrates indicates that the phosphoryl oxygen of the base at position J-1 locks the glucose moiety into an ‘edge on’ conformation necessary for optimal JBP1 binding.10 his shows that the presentation/orientation of the glucose moiety in J-DNA also plays a critical role in the JBP1/J-DNA interaction. All current evidence strongly suggests that DNA structure is an essential component of JBP1-J interaction. We believe that the structure of the DNA in vivo might therefore bias J propagation/maintenance by JBP1 in certain regions of the genome (i.e., telomeric and sub-telomeric repeats of T. brucei). Although initial studies suggested that JBP1 cannot bind unmodiied DNA, more recent analysis using a more sensitive luorescence anisotropy approach indicated that JBP1 can actually do so, but that this interaction occurs with 100-fold less ainity than binding to J DNA.10 his nonspeciic interaction of JBP1 and unmodiied DNA may explain why overexpression of JBP1, in the absence of endogenous JBP2 and JBP1 expression, leads to nonspeciically localized J synthesis (R. Sabatini and P. Borst, unpublished results). However, in a wild-type cell, JBP2 provides speciic basal J-DNA for high ainity JBP1 binding directing the localization of JBP1-stimulated J synthesis. he presence of 30- to 60-fold more JBP1 binding sites than JBP1 molecules in the cell27 then further acts to restrict JBP1 function to speciic regions within the genome. he telomere fragmentation data using the bloodstream form JBP2-null T. brucei cells supports this idea. In this cell line, telomere fragmentation results in the growth of new telomeric repeats, which were shown to lack base J, despite the presence of endogenous JBP1.28 Presumably, the remaining large number of high ainity JBP1 binding sites precluded any nonspeciic interactions with the newly generated (and J-less) telomeric array. How a telomeric VSG gene expression site (ES) loses J when activated and regains it on silencing remains to be determined. It is probable that there is no active removal of J from the silent ES when it is activated. We have studied conditions in which J disappears from DNA, e.g., in the transition from bloodstream form to insect form trypanosome and in the loss of excess J from trypanosomes cultured in HOMedU (reviewed in ref. 1) and in all cases J is lost by simple dilution through replication of DNA in the multiplying trypanosomes. We therefore expect the J in ESs to be lost by dilution as well when a silent ES is activated. Both JBP1 and JBP2 contribute to the modiication

Enzymatic Formation of the Hypermodiied DNA Base J (β-D-Glucopyranosyloxymethyluracil)

153

of silent ESs, as some J is retained in the absence of JBP1.17 Once the ES is activated, J is absent from the ∼25-kb polymerase I transcription unit, but is still present in the lanking repetitive DNA sequences (50-bp and telomeric repeats). A possibility, raised 25 years ago (see ref. 1) is that active transcription interferes with modiication. JBP1 fails to bind to single-stranded J DNA or to a J DNA/RNA duplex,13 suggesting that the highly processive Poll transcription of the active ES might interfere with the J synthesis machinery (or minimally the thymidine hydroxylases). If JBP1 and JBP2 are unable to access the DNA, J will be diluted out following replication.17,23 he model of the regulation of J synthesis shown in Figure 5 is rather simplistic, indicating distinct functions of the two thymidine hydroxylases in de novo versus maintenance synthesis of J. However, we now have data that suggest that the story is not so straightforward. For example, T. brucei cell lines that are null for JBP1 show a 20-fold reduction in the levels of J.17 he remaining 5% of the steady state J is presumably due to JBP2 function. However, in a JBP2-null cell, much more than 5% of the steady-state J is lost.28 he simplest explanation for this discrepancy is that JBP1 is quantitatively the main thymidine hydroxylase, but that it cannot maintain HOMedU in certain poorly accessible chromatin locations that require JBP2 for opening up. It remains also possible that JBP2 is more active in the presence of JBP1.

J in Leishmania

Whereas JBP1 is dispensable in T. brucei, it is essential in Leishmania tarentolae and L. major, the two Leishmania species studied thus far (refs. 35 and 37 and unpublished results). We therefore briely summarize our present knowledge of J in Leishmania emphasizing the diferences between Leishmania and T. brucei. Whereas in T. brucei about half of all J is outside telomeres, this fraction is only about 1% in Leishmania species.22 In Leishmania, J is thus a telomeric modiication. he fact that JBP1 is essential in Leishmania also suggests that J is essential, but we cannot rule out that JBP1 has an additional function unrelated to its role in J biosynthesis. We ind this unlikely, however, as little JBP1 is suicient for Leishmania survival. L. tarentolae contains about 1200 molecules of JBP1 per cell, 60-fold less than the number of J-residues in Leishmania DNA.27 In attempts to generate a conditional KO of JBP1 in Leishmania, P-A. Genest35 constructed cell lines that only produce 10-15% of the wild-type level of JBP1 (ref. 35 and unpublished results). his resulted in a 2-fold decrease in J levels, but no growth phenotype. Apparently, 100-150 JBP1 molecules suice to allow Leishmania to survive, albeit with reduced J levels. Unlike JBP1, JBP2 is not essential in Leishmania (S. Vainio, P-A. Genest and P. Borst, unpublished results). Remarkably, the JBP2-null cells gradually lose J as the cells are propagated. Ater 600-700 population doublings, the decrease stabilizes at about 8-fold reduction and can be completely reversed to wild-type level by introducing an ectopic JBP2 copy. hese cells display normal morphology and grow at the same rate as wild-type cells. he slow drop and eventual stabilization of J levels in the JBP2-null Leishmania is not compatible with the Leishmania JBP2 being a constitutive de novo hydroxylase acting during every cell cycle, nor with the protein being needed only as the parasite undergoes a life cycle stage transition. Rather, the phenotype suggests that in the absence of JBP2, JBP1 is able to maintain the full J-modiication pattern for a while, but within time most (80-90%) modiications are lost and cannot be recovered. Why this happens is at the moment unclear. A speculative interpretation is that parts of the telomeric chromatin become inaccessible to JBP1 and that the combined chromatin remodelling—thymidine hydroxylase activity of JBP2 is then needed to re-initiate J synthesis in those locations. Another interesting feature of the JBP2-null cells is that they are hypersensitive to bromodeoxyuridine (BrdU), a thymidine analogue that lowers the levels of J in kinetoplastida by an unknown mechanism.23 he sensitivity of the JBP2-null cells to this compound is increased and correlates with the age (and thus J-levels) of the cells. Although not as simple as one might hope, this synthetic lethal set-up ofers a system that can be used to elucidate the function of base J in Leishmania. hus far, our analysis has revealed a surprisingly normal landscape: the BrdU-treated JBP2-null cells do not stall at a speciic stage of the cell cycle, do not exhibit loss of genome integrity and do not activate

154

DNA and RNA Modii cation Enzymes

DNA damage signalling or display gross telomeric abnormalities. But the search continues. J being a telomeric modiication in Leishmania, the telomeres are at the heart of our research.

Conclusions and Future Prospects

Base J is the only hyper-modiied DNA base found in eukaryotes. Studies thus far have shown base J to be restricted to members of the order kinetoplastida and to the closely related algae Eugena gracilis. hrough analyzing the J biosynthetic pathway in kinetoplastids, we have identiied two novel putative thymidine hydroxylase enzymes involved in the formation of base J. JBP2 is needed for de novo J synthesis, whereas JBP1 binds directly to the modiied base and ampliies J synthesis, probably by hydroxylating adjacent thymine residues. Not only have these enzymes been critical in developing our understanding of how J synthesis is regulated, but they have also proven invaluable tools in addressing the long-standing question: ‘What is the function of base J?’ Recently, the Sabatini lab has generated a T. brucei cell line in which both JBP1 and JBP2 are deleted from the genome. he analysis of J levels conirmed that the resultant cell line completely lacks base J, demonstrating the requirement for both of these proteins in J biosynthesis. he generation of this J-null cell line has allowed us to start to analyze the function of base J. To our surprise, these data show that J plays no role in silencing of VSG expression sites, despite its colocalization with the silent but not the active ones. RT-PCR analysis indicates no derepression of the 19 silent ESs in the J-null cell line. However, our results indicate an increase in VSG switching rate and increased DNA rearrangements at the 70bp repeats immediately upstream of the telomeric VSG genes (L. Clife, R. Kiet and R. Sabatini, unpublished data). his suggests that base J is involved in the regulation of homologous recombination of the telomeric VSG genes. Further work to characterise the nature and extent of the efects of J on recombination in T. brucei is underway. he function of J in other organisms is unknown. he diferential localisation of J in various genomes (e.g., 98% telomeric in Leishmania versus largely internal in Euglena) implies that its function is not conserved among the J-containing species. Moreover, the generation of a viable J-null cell line in T. brucei demonstrates that the modiied base is not essential in this organism. Yet in Leishmania, as discussed above, J is believed to be essential given the inability to generate JBP1-null cell lines as well as their heightened sensitivity to BrdU. Despite the recent advances in our knowledge of the J biosynthetic pathways, there are still a number of key questions that remain to be answered. One unknown is how JBP2 and JBP1 work together to regulate the genomic distribution of J. How does JBP2 recognize and bind speciic regions of chromatin? We still do not know whether the JBP2 substrate interaction occurs at the level of DNA sequence or structure. What role does the SWI2/SNF2 domain play in this function? Determining the structure of JBP1 bound to J-DNA will help clarify the role of this thymidine hydroxylase in the regulation of J synthesis. Although we propose that both JBP1 and JBP2 have thymidine hydroxylase activity, we have not directly shown this using in vitro assays. he cofactor requirements will have to be determined utilizing this in vitro assay, allowing a critical test of the hypothesis that these enzymes are true members of the Fe2+/2-oxoglutarate hydroxylase family. Furthermore, we have not demonstrated that JBP2 has ATPase activity, although this can be inferred from the inactivation of JBP2 by mutations in the SWI2/SNF2 domain. he inability to purify recombinant JBP2 is a major hindrance for carrying out in vitro assays to screen for both thymidine hydroxylase and ATPase activity. A (high-throughput) hydroxylase assay would also beneit the attempts to ind drug leads targeting J biosynthesis in Leishmania.1 Step two of the biosynthesis pathway, the glucosylation of HOMedU, has received relatively little attention in comparison to step one of the pathway. Clearly, the identiication of the glucosyl transferase would greatly enhance our understanding of the complete J biosynthetic pathway, but despite numerous eforts, the enzyme is yet to be found. Currently, we are trying to achieve this by means of an RNAi library screen. A link with other type of glucosylation reactions that

Enzymatic Formation of the Hypermodiied DNA Base J (β-D-Glucopyranosyloxymethyluracil)

155

occur on queuosine derivatives at position 34 of tRNA as described in chapter by Iwata-Reuyl and Crecy-Lagard in this volume may be found. he future for base J research looks bright. Given the recent advances in the ield, the coming years should allow us to make an in depth detailed analysis of the J biosynthetic pathway. Furthermore, we are now in a position where it will inally be possible to determine the function of this modiied base.

References

1. Borst P, Sabatini R. Base J: Discovery, biosynthesis and possible functions. Annu Rev Microbiol 2008; 62:235-251. 2. Pays E. Regulation of antigen gene expression in trypanosoma brucei. Trends Parasitol 2005; 21(11):517-520. 3. Gommers-Ampt J, Lutgerink J, Borst P. A novel DNA nucleotide in trypanosoma brucei only present in the mammalian phase of the life-cycle. Nucleic Acids Res 1991; 19(8):1745-1751. 4. Gommers-Ampt JH, Van Leeuwen F, de Beer AL et al. Beta-D-glucosyl-hydroxymethyluracil:A novel modiied base present in the DNA of the parasitic protozoan T. brucei. Cell 1993; 75(6):1129-1136. 5. Crozatier M, De Brij RJ, Den Engelse L et al. Nucleoside analysis of DNA from trypanosoma brucei and trypanosoma equiperdum. Mol Biochem Parasitol 1988; 31(2):127-131. 6. Gommers-Ampt JH, Teixeira AJ, van de Werken G et al. he identiication of hydroxymethyluracil in DNA of trypanosoma brucei. Nucleic Acids Res 1993; 21(9):2039-2043. 7. van Leeuwen F, de Kort M, van der Marel GA et al. he modiied DNA base beta-d-glucosylhydroxymethyluracil confers resistance to micrococcal nuclease and is incompletely recovered by 32P-postlabeling. Anal Biochem 1998; 258(2):223-229. 8. van Leeuwen F, Taylor MC, Mondragon A et al. Beta-D-glucosyl-hydroxymethyluracil is a conserved DNA modiication in kinetoplastid protozoans and is abundant in their telomeres [see comments]. Proc of the Natl Acad of Sci USA 1998; 95(5):2366-2371. 9. van Leeuwen F, Wijsman ER, Kuyl-Yeheskiely E et al. he telomeric gggtta repeats of trypanosoma brucei contain the hypermodiied base J in both strands. Nucleic Acids Res 1996; 24(13):2476-2482. 10. Grover RK, Pond SJ, Cui Q et al. O-glycoside orientation is an essential aspect of base J recognition by the kinetoplastid DNA-binding protein JBP1. Angew Chem Int Ed Engl 2007; 46(16):2839-2843. 11. Wade PA, Gegonne A, Jones PL et al. Mi-2 complex couples DNA methylation to chromatin remodelling and histone deacetylation. Nat Genet 1999; 23(1):62-66. 12. Wijsman ER, van den Berg O, Kuyl-Yeheskiely E et al. Synthesis of 5-(beta d-glucopyranosyloxymethy l)-2ʹ-deoxyuridine and derivatives thereoff. A modified D-nucleoside from the DNA of trypanosoma brucei. Rec Trav Chem Pays-Bas 1994; 113:337-338. 13. Sabatini R, Meeuwenoord N, van Boom JH et al. Recognition of base J in duplex DNA by J-binding protein. J Biol Chem 2002; 277:958-966. 14. Turner JJ, Meeuwenoord N, Van Boom JH et al. Reinvestigation into the synthesis of oligonucleotides containing 5- (beta-d-glucopyranosyloxymethyl)-2ʹdeoxyuridine. Eur J Org Chem 2003:3832-3839. 15. van Leeuwen F, Wijsman ER, Kiet R et al. Localization of the modiied base J in telomeric vsg gene expression sites of trypanosoma brucei. Genes Dev 1997; 11(23):3232-3241. 16. van Leeuwen F, Kiet R, Cross M et al. Tandemly repeated DNA is a target for the partial replacement of thymine by beta-D-glucosal-hydroxymethyluracil in trypanosoma brucei. Mol Biochem Parasit 2000; 109:133-145. 17. Cross M, Kiet R, Sabatini R et al. J binding protein increases the level and retention of the unusual base J in trypanosome DNA. Mol Micro 2002; 46:37-47. 18. Ulbert S, Cross M, Boorstein R et al. Expression of the human DNA glycosylase hsmug1 in trypanosoma brucei causes DNA damage and interferes with J biosynthesis. Nucleic Acids Res 2002; 30(18):3919-3926. 19. Ulbert S, Eide L, Seeberg E et al. Base J, found in nuclear DNA of trypanosoma brucei, is not a target for DNA glycosylases. DNA Repair 2004; 3(2):145-154. 20. Dooijes D, Chaves I, Kiet R et al. Base J originally found in kinetoplastid is also a minor constituent of nuclear DNA of euglena gracilis. Nucleic Acids Res 2000; 28:3017-3021. 21. Ekanayake DK, Cipriano MJ, Sabatini R. Telomeric colocalization of the modiied base J and contingency genes in the protozoan parasite trypanosoma cruzi. Nucleic Acids Res 2007; 35(19):6367-6377. 22. Genest PA, Ter Riet B, Cijsouw T et al. Telomeric localization of the modiied DNA base J in the genome of the protozoan parasite leishmania. Nucleic Acids Res 2007; 35(7):2116-2124. 23. van Leeuwen F, Kiet R, Cross M et al. Biosynthesis and function of the modiied DNA base betaD-glucosyl-hydroxymethyluracil in trypanosoma brucei. Mol Cell Biol 1998; 18(10):5643-5651. 24. Gommers-Ampt JH, Borst P. Hypermodiied bases in DNA. FASEB J 1995; 9(11):1034-1042.

156

DNA and RNA Modii cation Enzymes

25. Cross M, Kiet R, Sabatini R et al. he modiied base j is the target for a novel DNA-binding protein in kinetoplastid protozoans. EMBO J 1999; 18(21):6573-6581. 26. DiPaolo C, Kiet R, Cross M et al. Regulation of trypanosome DNA glycosylation by a SWI2/SNF2-like protein. Mol Cell 4 2005; 17(3):441-451. 27. Toaldo CB, Kiet R, Dirks-Mulder A et al. A minor fraction of base j in kinetoplastid nuclear DNA is bound by the J-binding protein 1. Mol Biochem Parasitol 2005; 143(1):111-115. 28. Kiet R, Brand V, Ekanayake DK et al. JBP2, a SWI2/SNF2-like protein, regulates de novo telomeric DNA glycosylation in bloodstream form trypanosoma brucei. Mol Biochem Parasitol 2007; 156(1):24-31. 29. Yu Z, Genest PA, ter Riet B et al. he protein that binds to DNA base J in trypanosomatids has features of a thymidine hydroxylase. Nucleic Acids Res 2007; 35(7):2107-2115. 30. Schoield CJ, Zhang Z. Structural and mechanistic studies on 2-oxoglutarate-dependent oxygenases and related enzymes. Curr Opin Struct Biol 1999; 9(6):722-731. 31. Aravind L, Koonin EV. he DNA-repair protein AlkB, Egl-9 and leprecan deine new families of 2-oxoglutarate- and iron-dependent dioxygenases. Genome Biol 2001; 2(3):research 7.1-7.8. 32. Falnes PO, Bjoras M, Aas PA et al. Substrate speciicities of bacterial and human alkb proteins. Nucleic Acids Res 2004; 32(11):3456-3461. 33. Falnes PO, Johansen RF, Seeberg E. Alkb-mediated oxidative demethylation reverses DNA damage in Escherichia coli. Nature 2002; 419(6903):178-182. 34. Hausinger RP. Fe+2/alpha-ketoglutarate-dependent hydroxylases and related enzymes. Crit Rev Biochem Mol Biol 2004; 39(1):21-68. 35. Genest PA. Analysis of the modiied DNA base J and the J-binding proteins in leishmania. [Ph.D.]. Amsterdam, University of Amsterdam, 2007. 36. Sabatini R, Meeuwenoord N, van Boom JH et al. Site-speciic interactions of JBP with base and sugar moieties in duplex J-DNA. J Biol Chem 2002; 277:28150-28156. 37. Genest PA, ter Riet B, Dumas C et al. Formation of linear inverted repeat amplicons following targeting of an essential gene in leishmania. Nucleic Acids Res 2005; 33(5):1699-1709.

Chapter 13

DNA Demethylation Teresa Roldán-Arjona* and Rafael R. Ariza

Abstract

E

ukaryotic DNA methylation is performed by DNA-methyltransferases that catalyze transfer of a methyl group from S-adenosyl-l-methionine to carbon 5 of cytosine bases in DNA, giving rise to 5-methylcytosine (5-meC). Cytosine methylation is used as an epigenetic mark for maintenance of gene silencing across cellular divisions. However, this chemically stable modiication may be removed from DNA through demethylation. DNA demethylation may take place as a passive process due to lack of maintenance methylation during several cycles of DNA replication, or as an active mechanism in the absence of replication. Extensive demethylation of the mammalian genome occurs in preimplantation embryos, irst in the male pronucleus through an active mechanism independent of DNA replication and subsequently in both paternal and maternal chromosomes through a passive process. Localized demethylation at speciic genes takes place later throughout development and tissue diferentiation and rapid cycles of DNA methylation and demethylation of CG dinucleotides at gene promoters have been recently reported. Despite many attempts to identify the mechanism responsible for active DNA demethylation in animal cells, its enzymatic basis remains controversial, although there is evidence for a role of thymine-DNA glycosylase ater deamination of 5-meC to thymine. In plants, genetic and biochemical studies have revealed that the Arabidopsis DNA glycosylase domain-containing proteins DME and ROS1 initiate DNA demethylation. Both DME and ROS1 catalyze the release of 5-meC from DNA by a glycosylase/lyase mechanism, cleaving the phosphodiester backbone at the 5-meC removal site by successive β,δ-elimination and leaving a gap that has to be further processed to generate a 3ʹ-OH terminus suitable for polymerization and ligation. his repair-like pathway provides a mechanism to exchange methylated cytosines with cytosines.

Introduction

DNA methylation is found in the genomes of diverse organisms including both prokaryotes and eukaryotes. In prokaryotes, DNA methylation occurs on both cytosine and adenine bases and encompasses part of the host restriction system.1 However, only adenine methylation is used as an epigenetic signal in bacteria, regulating DNA-protein interactions.2 In multicellular eukaryotes methylation seems to be conined to cytosine bases and is associated with an inhibition of gene expression.3,4 Eukaryotic DNA methylation is detected in protists, fungi, plants and animals5 and plays important roles in the establishment of developmental programs6,7 and in genome defense against parasitic mobile elements.8 Hypermethylation of tumour suppressor genes is considered as an important mechanism in the development of many common forms of cancer.9 DNA methylation is performed by DNA-methyltransferases that catalyze transfer of a methyl group from S-adenosyl-l-methionine to cytosine bases in DNA.10 Most of mammalian and plant DNA methylation is restricted to symmetrical CG sequences, but plants also have signiicant *Corresponding Author: Teresa Roldán-Arjona—Departamento de Genética, Edificio Gregor Mendel, Campus de Rabanales s/n, Universidad de Córdoba, 14071-Córdoba, Spain. Email: [email protected]

DNA and RNA Modiication Enzymes: Structure, Mechanism, Function and Evolution, edited by Henri Grosjean. ©2009 Landes Bioscience.

158

DNA and RNA Modii cation Enzymes

levels of cytosine methylation in the symmetric context CHG (where H is A, C or T) and even in asymmetric sequences.4,11 DNA methylation patterns are established by de novo DNA methyltransferases acting on unmethylated double stranded DNA. Methylation in symmetrical sequences is preserved though cycles of DNA replication by maintenance DNA methyltransferases, which show a preference for hemimethylated substrates and methylate cytosines in the newly synthesized strand.10 Maintenance mechanisms for asymmetric methylation patterns are unknown, but they must include de novo methylation ater each cell division.12 here are two general mechanisms by which DNA methylation inhibits gene expression: irst, modiication of cytosine bases can directly repress transcription by blocking transcriptional activators from binding to cognate DNA sequences;13 and second, proteins that recognize methylated DNA may recruit corepressors to silence gene expression.14-16 here is ample evidence that in vertebrates and plants methyl-CpG-binding proteins (MBPs) use transcriptional corepressor molecules to silence transcription and to modify surrounding chromatin, providing a link between DNA methylation and chromatin remodelling and modiication.17-20 DNA methylation is a stable mark for the initiation, establishment and maintenance of gene silencing across cellular divisions, but resembles other macromolecular modiications in being reversible. Demethylation may take place as a passive process due to lack of maintenance methylation during several cycles of DNA replication, or as an active mechanism in the absence of replication.21 In contrast to the well studied genetics, biochemistry and biology of cytosine-DNA-methyltransferases, the enzymatic basis of active demethylation has remained elusive. In this chapter we will review the evidence available on DNA methylation changes in animals and plants and the eforts taken to identify and characterize enzymatic mechanisms of active DNA demethylation. We will see that in animals these mechanisms are still poorly understood, while in plants there is strong genetic and biochemical evidence supporting a base excision process for active DNA demethylation.

Changes in DNA Methylation Patterns in Animals

Although 5-meC is a stable epigenetic mark, DNA methylation patterns are dynamic during animal development, both at the global and local levels. One of the best examples of global changes in DNA methylation takes place in mammalian fertilized oocytes, where methylation is rapidly lost from the paternal genome before the onset of DNA replication.22,23 his active demethylation process is detectable both by immunoluorescence and bisulite sequencing and does not afect the maternal genome, which is subsequently demethylated through a passive process along cleavage stages. his implies that soon ater fertilization paternal sequences are exposed to a putative demethylation machinery while maternal loci are somehow protected from this process.22 Following global demethylation of both parental genomes in preimplantation embryos, mammalian blastocysts undergo a de novo DNA methylation ultimately responsible for the methylation patterns characteristic of the adult animal.24 Genome-wide epigenetic reprogramming through active and/or passive DNA demethylation in zygotes is essential for normal development, as relected by the diiculty of cloning mammals by somatic cell nuclear transfer (SCNT). In SCNT a diferentiated somatic nucleus needs to be reprogrammed in an enucleated oocyte.25 DNA demethylation is absolutely necessary for the reprogramming of somatic cell nuclei,26 but is usually aberrant in mammalian cloned embryos and may contribute to the low eiciency of cloning.27 In fact, it has been shown that the methylation state of the donor nucleus is a major factor governing the eiciency of reprogramming ater SCNT.28 Following global demethylation in zygotes, a second genome-wide demethylation wave takes place during germ-cell development. his process has been particularly well studied in mouse embryos, where the primordial germ cells (PGCs) undergo a dramatic and apparently active demethylation process soon ater they migrate to the developing gonads.29 his epigenetic reprogramming ensures the erasure of genomic imprints and is critical for returning PGCs to a totipotency state.30 In addition to genome-wide global demethylation, local demethylation occurs in tissue-speciic genes throughout development and cellular diferentiation.31 In fact, it has been proposed that the

DNA Demethylation

159

methylation pattern generated during development is regulated mainly through demethylation events.24 here is a large body of experimental evidence of local DNA demethylation required for the tissue-speciic transcriptional activation of many genes. Some of the best data come from studies with diferentiating muscle cells. Demethylation of the α-actin gene is essential to activate its transcription in myoblasts32 and there is a strong correlation between the temporal dynamics of demethylation of a sole CG site of the 5ʹ-lanking region and myogenin expression during muscle diferentiation.33 Local demethylation processes are also important for regulation of the immune response: the murine interleukin-2 gene promoter-enhancer region has been shown to be demethylated during T-cell activation34 and demethylation of a unique CG site in its human counterpart is a key memory mark in this regulatory event.35 he IFN-γ promoter is rapidly demethylated upon reestimulation of memory T-cells, but not in naive cells.36 Transcriptional activation by nuclear receptors is also accompanied by methylation changes in the target genes. hus, glucocorticoid hormones induce stable DNA demethylation within a key enhancer of the rat liver-speciic tyrosine aminotransferase (Tat) gene.37 his demethylation is required for the recruitment of additional transcription factors and enhanced transcription activation of the gene. Recent evidence points towards an important role for changes in DNA methylation during memory-associated transcriptional regulation in the nervous system. hus, demethylation and transcriptional activation of the synaptic plasticity gene reelin occurs in the adult rat hippocampus during fear-conditioning and may be a key process during memory consolidation.38

Changes in DNA Methylation Patterns in Plants

In contrast to mammals, there is no evidence that plants undergo global genome demethylation ater fertilization and remethylation during embryo development. On the contrary, there are hints that the DNA methylation status may be stably transmitted across generations. For example, a signiicantly reduced DNA methylation caused by the ddm1 mutation in Arabidopsis is not restored to normal levels even in a wild-type DDM1 background.39 he absence of a global DNA methylation resetting may have important consequences, given the characteristics of the plant life cycle. Unlike animals, plants do no set aside a dedicated germline early in development and therefore there is a chance that methylation changes in somatic cells are transmitted to the next generation. hus, hypermethylated epialleles of the Arabidopsis SUPERMAN gene are stable across many generations40 and a naturally occurring hypermethylated version of the LycC gene in Linaria vulgaris is stably inherited.41 Although plants apparently do not experience genome-wide methylation changes, there is some evidence of modiications in DNA methylation during normal plant development. DNA methylation levels in tomato are lower in immature tissues, compared to mature tissues42 and a similar pattern is observed in young seedlings compared to adult rice plants.43 A trend towards increasing DNA methylation during plant development has been also reported in Arabidopsis.44 Surely the local changes in DNA methylation during plant development will receive increased attention with the recent advent of whole genome methylation proiling techniques.45 here is also accumulating evidence of stress-induced changes in DNA methylation patterns. In maize, a signiicant decrease in DNA methylation is observed in roots upon cold treatment.46 his demethylation was genome-wide but not randomly distributed and since chilled tissues immediately ceased DNA replication it has been suggested to be the result of an active process.46 A speciic demethylated region could be identiied as a retrotransposon-like sequence designated as ZmMI146 and this is in agreement with the cold-induced DNA demethylation of the Ac/Ds transposon region observed in the same species.47 Activation of transposons in response to stress is well documented in plants and could contribute to genome adaptation to environmental challenges.48 However, demethylation in response to stress is not limited to transposon sequences. In tobacco, a gene encoding a glycerophosphodiesterase-like protein (NtGPDL) and a pathogen-responsive gene (NtAlix1) are demethylated and expressed in response to aluminium stress49 and tobacco mosaic virus infection,50 respectively.

160

DNA and RNA Modii cation Enzymes

he most compelling evidence of alteration in DNA methylation patterns during plant development comes from imprinted genes. Imprinting is the diferential expression of paternal and maternal alleles and has evolved both in placental mammals and lowering plants.51 Imprinting in plants afects the expression of genes in the endosperm.52 he paternal alleles of the Arabidopsis genes MEA, FIS2 and FWA are hypermethylated in the endosperm, whereas the maternal alleles are hypomethylated.53-55 In mammals, maternal-speciic expression is achieved by paternal-speciic methylation and silencing. In plants, however, methylation is the default state and the maternal expression is attained through the expression in the central cell of DEMETER (DME),55,56 which encodes a DNA glycosylase that excises 5-methylcytosine from DNA56,57 (see below).

he Search for an Enzymatic Mechanism of Active DNA Demethylation in Animal Cells

Despite many attempts to identify the mechanism responsible for active DNA demethylation in animal cells, its enzymatic basis remains controversial.21,58 Four major mechanisms have been proposed, according to the initial target of the process and the chemistry involved (Fig. 1). In the initial search for DNA demethylation mechanisms much attention focused on the identiication of a “true” DNA demethylase, i.e., an enzymatic activity that directly removes the methyl group from 5-methylcytosine (Fig. 1A). he irst report of an active demethylation mechanism in mammals described an activity that released tritium-labelled methyl-groups from DNA in murine erythroleukemic cells,59 but no further analysis has followed. A DNA demethylase activity that catalyzes the cleavage of a methyl residue from 5-methyl cytosine and releases it as methanol was puriied from human cells.60 he same group latter proposed that this activity is identical to MBD2b (methyl CpG-binding domain protein 2b) ater testing for demethylase activity following in vitro translation of mRNA derived from the corresponding cDNA.61 However, the demethylating activity of MBD2b could not be independently reproduced by other laboratories19,62 and its relationship to the activity originally described in human cell extracts remains unclear. Furthermore, paternal demethylation in fertilized oocytes lacking MBD2b takes place normally.63 Doubts have also arisen about the viability of the mechanism itself, which involves a thermodynamically unfavourable breakage of the carbon-carbon bond.64 A recent report suggests that the original methanol detection was an artifact and proposes instead that MBD2b catalyzes an oxidative DNA demethylation in which the methyl group is irst oxidised to hydroxymethyl and then removed as formaldehyde.65 Other demethylation mechanisms requiring less-challenging chemistry have been described. Studying rat myoblasts, Weiss et al66 proposed that demethylation takes place through the excision of the methylated CG di-nucleotide (Fig. 1B) and suggested a role for RNA since the activity was sensitive to RNAse treatment. he activity was also reported in an independent study67 that reevaluated the role of RNA, suggesting that the inhibitory efects of RNAse were likely due to coating of the DNA substrate. No further characterization of this enzymatic mechanism has been described. Recently, Barreto et al68 reported that the protein Gadd4a (growth arrest and DNA-damage-inducible protein 45 alpha) has a key role in active DNA demethylation in conjunction with XPG, a nuclease that participates in nucleotide excision repair. hese authors described that the expression of Gadd45a activates methylation-silenced reported plasmids, promotes gene-speciic and global demethylation and is required to avoid hypermethylation in cultured human cells.68 However, these results have been challenged by Jin et al,69 who argue that Gadd45a is not expressed in oocytes and zygotes, as would be expected from a demethylation factor. Jin et al were unable to substantiate a role of Gadd45a in DNA demethylation69 and casted doubts about the quantiication of 5-meC in cells ater knockdown of Gadd45a reported by Barreto et al.68 herefore, the possible role of Gad45a in DNA demethylation through nucleotide excision is uncertain. A third proposed mechanism for active demethylation is removal of 5-meC as a free base followed by replacement with an unmodiied cytosine70 (Fig. 1C). he initial step of this mechanism would be catalyzed by a DNA glycosylase that cleaves the labile N-glycosidic bond between the 5-meC base and the deoxyribose moiety in DNA, leaving an abasic site that must be further

DNA Demethylation

161

processed. DNA glycosylases initiate the base excision repair (BER) pathway, which in most organisms removes common base modiications (oxidation, deamination, alkylation) caused by endogenous genotoxic agents.71 A 5-methylcytosine-DNA glycosylase activity that releases 5-meC from DNA was identiied and partially puriied from human cells.72,73 A similar activity identiied in chicken embryos74 was found to copurify with a protein homologous to human thymine DNA glycosylase (TDG).75,76 It was latter reported that methyl CpG binding protein 4 (MBD4), another human DNA glycosylase with no sequence similarity to TDG, also has 5-methylcytosine-DNA glycosylase activity.77 Both TDG and MBD4 are DNA glycosylases that show a preference for U and T in U⋅G and T⋅G mismatches located within a CG context.78,79 However, they have been shown to have a very weak activity on 5-meC in 5-meC⋅G pairs compared to their activities towards U⋅G and T⋅G mismatches77,79,80 and hence their precise roles in DNA demethylation remain unclear. Although the exact mechanism for DNA demethylation in mammals is still unknown, the evidence in favour of a DNA repair-based process seems to accumulate. It has been reported that the demethylation upstream the tyrosine aminotransferase (Tat) gene upon activation by the glucocorticoid receptor occurs independently of DNA replication and involves the generation of

Figure 1. Proposed pathways for active DNA demethylation in animal cells. A) Direct removal of the methyl group (black circle), which is released as methanol. B) Excision of the methylated CpG di-nucleotide and replacement with an unmethylated form. C) Excision of 5-methylcytosine by a 5-meC DNA glycosylase, followed by abasic site processing and replacement with unmethylated cytosine via the base excision repair (BER) machinery. D) Deamination of 5-methylcytosine to generate a T⋅ G mismatch, excision of mismatched T by a thymine-DNA glycosylase, followed by abasic site processing and replacement with unmethylated cytosine via the BER machinery. (Adapted from Morgan et al106).

162

DNA and RNA Modii cation Enzymes

nicks in the DNA backbone 3ʹ to the 5-meC.81 In addition, DNA demethylation in mouse PGCs occurs before histone replacement, which supports a DNA-repair based mechanism.82 he notion that DNA demethylation may involve a DNA-repair process has fuelled the search for plausible mechanisms performed by known enzymes. One of the leading hypothesis is that demethylation might be achieved indirectly though deamination of 5-meC and repair of the ensuing T⋅G mismatch by a DNA glycosylase83 (Fig. 1D). he enzymes Aid and Apobec1 have been put forward as candidate deaminases in this process, since they both have 5-meC deaminase activity.84 Furthermore, Aid and Apobec1 genes are colocalized within a cluster of pluripotency genes and are expressed in oocytes and primordial germ cells, which undergo epigenetic reprogramming.84 Since deaminases require single-stranded DNA, the initiation of DNA demethylation would probably need accessory proteins, such as chromatin remodeling and/or transcription factors, to expose 5-meC residues to deaminase action.84 he idea that demethylation may be initiated by a deamination process has received unexpectedly strong support from recent work with the ligand-dependent transcription factor estrogen receptor α (ERα).85,86 ERα induces a cyclical activation of its targets promoters through ordered and periodic recruitment of a series of coactivator complexes, deining a “transcriptional clock” that limits the transcriptional response.87 Two recent studies report that DNA methylation shows a similar cyclical pattern of demethylation and remethylation at the promoter of the ERα-responsive gene pS285 and four other target genes.86 Remarkably, DNA methyltransferases Dnmt3a and b are cyclically recruited to the pS2 promoter at the beginning of each transcriptionally productive cycle, when demethylation occurs, together TDG and other proteins that participate in BER.85 TDG is required for DNA demethylation and transcriptional activation of the pS2 promoter, as demonstrated though reduction of its expression by short interfering RNA (siRNA).85 Dnmt3a and b can deaminate 5-meC to thymine in vitro in the absence of the methyl donor S-adenosyl methionine (SAM), generating T⋅G mismatches that are substrates for TDG.85 he authors propose that the rapid DNA demethylation observed during the transcriptional cycles ERα-responsive genes is achieved though the repair of T⋅G mismatches arising from 5-meC deamination catalyzed by Dnmt3a and b.85 his model implies that Dnmts are involved both in DNA methylation and demethylation. Although the model is attractive, additional data about the eiciency of deamination activity of Dnmts in vivo are needed.

Active DNA Demethylation in Plants

In plants, there is convincing genetic and biochemical evidence that a family of DNA glycosylase domain-containing proteins typiied by Arabidopsis DME (DEMETER) and ROS1 (REPRESSOR OF SILENCING 1) initiate erasure of 5-meC through a base excision repair process. DME was identiied in a search for mutations causing parent-of-origin efects on seed viability88 and is expressed primarily in the central cell of the female gametophyte, where it is required for the expression of the maternal alleles of the imprinted genes MEA, FWA and FIS2.54,55,88 In the case of MEA imprinting, mutations in the methyltransferase gene MET1 suppress the requirement for DME53 and the maternal MEA allele is not hypomethylated in dme endosperm.56 ROS1 was identiied in a screen for mutants with deregulated expression of the repetitive RD29A-LUC transgene.89 Whereas in wild plants the transgene and the homologous endogenous gene are expressed, ros1 mutants display transcriptional silencing and hypermethylation of both loci.89 In addition to DME and ROS1, the genome of Arabidopsis encodes two additional paralogs, referred to as DEMETER-LIKE proteins DML2 and DML3.88 All four proteins are large polypeptides containing a DNA glycosylase domain with signiicant sequence similarity to base excision DNA repair proteins in the HhH-GPD superfamily, named ater its hallmark helix-hairpin-helix and Gly/Pro rich loop followed by a conserved aspartate.90 he HhH motif present in DME, ROS1, DML2 and DML3 includes an invariant lysine conserved in the subset of DNA glycosylases/lyases, able both to hydrolyze the N-glycosidic bond linking bases to DNA and to cleave the phosphodiester backbone at the site where a base has been removed.91 In addition to the DNA glycosylase domain, the proteins of the DME/ROS1 family share two other conserved domains of unknown function.57 he HhH-GPD superfamily of DNA glycosylases is widespread in all three

DNA Demethylation

163

domains of life (bacteria, archea and eukaryotes) and its members are typically 200-400 amino acids long.92 However, proteins of the DME/ROS1 family are unusually large (1100-2000 amino acids) compared to typical DNA glycosylases. Furthermore, they appear to be unique to plants, with putative orthologs present in mosses and unicellular green algae. his suggests that active demethylation though excision of 5-meC may have appeared early during plant evolution. DME and ROS1 are the best characterized in vitro among the members of this family of atypical DNA glycosylases (Fig. 2).56,57,93 Both DME and ROS1 remove 5-meC as a free base from DNA through a glycosylase/lyase mechanism57 and cleave the phosphodiester backbone at the 5-meC removal site by successive β,δ-elimination, leaving a gap that has to be further processed to generate a 3ʹ-OH terminus suitable for polymerization and ligation.56,57,93 Excision of 5-meC in vitro is more eicient on those sequences more likely to be methylated in vivo. hus, DME and ROS1 erase 5-meC at CG, CHG and CHH sequences, with a preference for CG sites56,57 which matches the pattern of DNA methylation in plants. Furthermore, both proteins remove 5-meC more eiciently from a CAG context than when located in the outer position of a CCG context,57 in agreement with the fact that CCG is the sequence showing the lowest methylation level among CHG sites.94 DML2 and DML3 are also 5-meC DNA glycosylases/lyases.95,96 While DML2 activity is very weak, at least in vitro, DML3 shows an enzymatic activity comparable to

Figure 2. Active demethylation in plants initiated by 5-meC DNA glycosylases. DNA glycosylases of the DME/ROS1 family remove 5-meC as a free base and cleave the phosphodiester backbone by successive β, δ -elimination, leaving an abasic site that has to be further processed through the BER pathway. The 3ʹ-phosphate is converted to a 3ʹ- OH terminus suitable for polymerization and ligation, probably by action of a polynucleotide kinase. Gap filling is performed by a DNA polymerase that inserts deoxycytidine monophosphate (dCMP) and the strand is finally sealed by a DNA ligase.

164

DNA and RNA Modii cation Enzymes

those of DME and ROS1, with a similar substrate speciicity.95,96 In addition to 5-meC paired to guanine, DME, ROS1 and DML3 also remove thymine from a T⋅G mismatch located at CG, CHG and CHH sequences.56,57,96 herefore, it cannot be ruled out the possibility that DME, ROS1 and/or DML3 also play a role in neutralizing the mutagenic consequences of the spontaneous deamination of 5-meC to thymine through their activity on T⋅G mismatches. he efect of dme and ros1 mutations on mutagenesis in vivo has not been assessed, but could be compounded by their epigenetic efects on plant development. As noted above, an active demethylation pathway initiated by TDG and/or MBD4 DNA glycosylases acting on 5-meC has been proposed in animal cells.75,77,97,98 However, it has been argued that the main in vivo role for both proteins is to counteract the mutagenic potential of 5-meC and C deamination in CG sequences,79,99 given their high eiciency on U⋅G and T⋅G mismatches,76,78,79 compared to their weak activity on 5-meC⋅G base pairs.77,80 he proteins of the DME/ROS1 family are structurally unrelated to TDG, which belongs to a large group of uracil-DNA glycosylases diferent from the HhH-GPD family,100 but share with MBD4 a HhH-GPD DNA glycosylase domain located at the C-terminal half of the protein. However, unlike MBD4, they do not have a methyl-CpG binding domain.79 In contrast to the strong substrate speciicity of TDG and MBD4 for T⋅G and U⋅G mismatches, DME and ROS1 show a preference for 5-meC over a T⋅G mismatch in a CG sequence context, the most frequent DNA methylation target in plant and animal genomes and they do not display detectable activity on U⋅G mispairs.57 hus, the biochemical properties of DME and ROS1, together with the available genetic evidence, suggest that an important role for both enzymes in vivo is excision of 5-meC. It remains to be explained how these enzymes locate and recognize 5-meC in DNA. he methylated cytosine is not a “lesion”, such as other base modiications from endogenous or exogenous origin that are substrates of repair DNA glycosylases. However, the extent of the 5-meC inluence on DNA structure is largely unknown although may alter its hydration pattern.101 An understanding of how plant 5-meC DNA glycosylases speciically recognize their target base will require solving their crystal structure in complex with DNA. he precise in vivo roles of plant 5-meC DNA glycosylases are not fully understood. DME is probably required to speciically initiate erasure of 5-meC at MEA, FWA, FIS2 and perhaps other unidentiied loci, in female gametes before fertilization.54-56 ROS1 is needed to prevent transcriptional gene silencing and hypermethylation of a repetitive transgene, but the observation of developmental abnormalities in ros1 mutants ater inbreeding89 suggests that it also regulates expression of endogenous loci. In fact, CHG and CHH sites become hypermethylated at FWA and several transposons in ros1 mutants, with an additional slight increase in CG methylation.102 Furthermore, microarray analysis allowed the identiication of several genes with reduced expression in ros1 plants and some of these showed hypermethylation at their promoter regions.102 A recent report based on 5-meC immunocapturing followed by genome-tiling microarrays analysis described the identiication of about 200 regions that become hypermethylated in a ros1 dml2 dml3 triple mutant.95 Most of the hypermethylation was located at genic regions, did not afect any particular gene class and accumulated predominantly at the 5ʹand 3ʹends of genes, which is opposite to the methylation pattern found in wild-type plants.95 Much of the DNA methylation in Arabidopsis is directed by RNA interference (RNAi) pathways and the hypermethylated regions observed in ros1 dml2 dml3 mutants are enriched for small interfering RNAs.103 Furthermore, there is genetic evidence that ROS1 demethylation antagonizes de novo methylation directed by diferent RNAi pathways.103 Altogether, these results suggest that an important in vivo function for ROS1, DML2 and DML3 is to protect the genome from excess methylation. By other hand, a detailed analysis of the methylation distribution at the FWA gene and AtGP1 transposon in wild-type and dml mutant plants suggests that DML2 and DML3 may play additional roles in methylation dynamics.96 Mutations in DML2 and/or DML3 lead to hypermethylation of cytosine residues that are unmethylated or weakly methylated in wild-type plants, in agreement with a role in protecting the genome from excess methylation. But, intriguingly, sites that are heavily methylated in wild-type plants are hypomethylated in mutants.96 Furthermore, a

DNA Demethylation

165

recent report describing the analysis of the Arabidopsis methylome at single-base resolution found, as expected, hundreds of discrete hypermethylated regions in a ros1 dml2 dml3 triple mutant, but also sites where the methylation levels were lower than in the wild-type.104 Altogether, these results suggest that ROS1, DML2 and DML3 are required not only for removing DNA methylation marks from improperly-methylated cytosines, but also for maintenance of high methylation levels in properly targeted sites.

Conclusions and Future Prospects

It is somewhat paradoxical that the mechanism of active DNA demethylation in animals, where evidence of global and local demethylation is abundant, is less well understood than in plants (see also in the chapter by Faines et al and by Parisien and Bhagwat in this volume). here is ample evidence supporting a DNA repair-based process during active DNA demethylation in mammals, but the responsible mechanism(s) is (are) still unknown. Recent data support a model in which demethylation is initiated by 5-meC deamination, followed by thymine excision from the ensuing T⋅G mismatch and replacement with an unmodiied cytosine. When considering a repair-mediated DNA mechanism, it should be contemplated the possibility that the enzymes that carry out the deamination step in local and global demethylation processes may be diferent. While Dnmt3a and b may initiate the rapid demethylation observed during cyclical transcriptional activation, cytidine deaminases such as Aid and Apobec1 have the catalytic activity and expression pattern adequate to initiate the paternal global demethylation observed in zygotes. here is some evidence that TDG may be responsible for the T⋅G repair step during local demethylation at transcriptionally active promoters, but its role, if any, during global demethylation remains unknown. In this regard, it is important to remark that tdg-null mouse embryos die during mid-gestation, while other DNA glycosylases, including another thymine-DNA glycosylase such as MBD4, are dispensable for embryonic development.105 However, no study on the capacity of tdg-null zygotes to perform DNA demethylation has been reported so far. While the evidence available in animal cells remains fragmentary, our current understanding of DNA demethylation in plants is more solid but still far from complete. he data available about 5-meC DNA glycosylases of the DME/ROS1 family indicates that plant cells are able of use base excision not only to remove lesions from DNA but also to erasure naturally occurring modiied residues. he emerging notion is that an important role of 5-meC glycosylases is to protect the genome from excess methylation and this is in agreement with their likely evolution from ancient enzymes dedicated to genome maintenance. It remains to be determined how this protective role its with the speciic function of DME in activating the maternal alleles of imprinted genes. It is possible that plants have availed themselves of the diferential expression of a 5-meC DNA glycosylase such as DME in male and female gametophytes for control of imprinting. In addition, recent data suggest that ROS1, DML2 and DML3 may be required not only to protect from deleterious methylation but also to maintain high methylation levels at appropriately targeted sites. herefore, the inal scenario for DNA demethylation in plants may be more complex than previously suspected, with dynamic DNA methylation/demethylation processes contributing both to the stability and lexibility of the epigenome.

Acknowledgements

Work in our laboratory is supported by grants from the Ministerio de Educación y Ciencia, Spain and the Junta de Andalucía, Spain.

References

1. Wilson GG, Murray NE. Restriction and modiication systems. Annu Rev Genet 1991; 25:585-627. 2. Casadesus J, Low D. Epigenetic gene regulation in the bacterial world. Microbiol Mol Biol Rev 2006; 70(3):830-856. 3. Bender J. DNA methylation and epigenetics. Annu Rev Plant Physiol Plant Mol Biol 2004; 55:41-68. 4. Bird A. DNA methylation patterns and epigenetic memory. Genes Dev 2002; 16(1):6-21. 5. Colot V, Rossignol JL. Eukaryotic DNA methylation as an evolutionary device. Bioessays 1999; 21(5):402-411.

166

DNA and RNA Modii cation Enzymes

6. Holliday R, Pugh JE. DNA modiication mechanisms and gene activity during development. Science 1975; 187(4173):226-232. 7. Riggs AD. X inactivation, differentiation and DNA methylation. Cytogenet Cell Genet 1975; 14(1):9-25. 8. Yoder JA, Walsh CP, Bestor TH. Cytosine methylation and the ecology of intragenomic parasites. Trends Genet 1997; 13(8):335-340. 9. Esteller M. Aberrant DNA methylation as a cancer-inducing mechanism. Annu Rev Pharmacol Toxicol 2005; 45:629-656. 10. Goll MG, Bestor TH. Eukaryotic cytosine methyltransferases. Annu Rev Biochem 2005; 74:481-514. 11. Finnegan EJ, Genger RK, Peacock WJ et al. DNA methylation in plants. Annu Rev Plant Physiol Plant Mol Biol 1998; 49:223-247. 12. Finnegan EJ, Kovac KA. Plant DNA methyltransferases. Plant Mol Biol 2000; 43(2-3):189-201. 13. Watt F, Molloy PL. Cytosine methylation prevents binding to DNA of a HeLa cell transcription factor required for optimal expression of the adenovirus major late promoter. Genes Dev 1988; 2(9):1136-1143. 14. Boyes J, Bird A. DNA methylation inhibits transcription indirectly via a methyl-CpG binding protein. Cell 1991; 64(6):1123-1134. 15. Hendrich B, Bird A. Identiication and characterization of a family of mammalian methyl-CpG binding proteins. Mol Cell Biol 1998; 18(11):6538-6547. 16. Hark AT, Schoenherr CJ, Katz DJ et al. CTCF mediates methylation-sensitive enhancer-blocking activity at the H19/Igf2 locus. Nature 2000; 405(6785):486-489. 17. Jones PL, Veenstra GJ, Wade PA et al. Methylated DNA and MeCP2 recruit histone deacetylase to repress transcription. Nat Genet 1998; 19(2):187-191. 18. Nan X, Ng HH, Johnson CA et al. Transcriptional repression by the methyl-CpG-binding protein MeCP2 involves a histone deacetylase complex. Nature 1998; 393(6683):386-389. 19. Ng HH, Zhang Y, Hendrich B et al. MBD2 is a transcriptional repressor belonging to the MeCP1 histone deacetylase complex. Nat Genet 1999; 23(1):58-61. 20. Zemach A, Grai G. Characterization of Arabidopsis thaliana methyl-CpG-binding domain (MBD) proteins. Plant J 2003; 34(5):565-572. 21. Kress C, homassin H, Grange T. Local DNA demethylation in vertebrates: how could it be performed and targeted? FEBS Lett 2001; 494(3):135-140. 22. Oswald J, Engemann S, Lane N et al. Active demethylation of the paternal genome in the mouse zygote. Curr Biol 2000; 10(8):475-478. 23. Mayer W, Niveleau A, Walter J et al. Demethylation of the zygotic paternal genome. Nature 2000; 403(6769):501-502. 24. Weiss A, Cedar H. The role of DNA demethylation during development. Genes Cells 1997; 2(8):481-486. 25. Cibelli JB. Principles of cloning. Amsterdam: Academic Press, 2002. 26. Simonsson S, Gurdon J. DNA demethylation is necessary for the epigenetic reprogramming of somatic cell nuclei. Nat Cell Biol 2004; 6(10):984-990. 27. Dean W, Santos F, Stojkovic M et al. Conservation of methylation reprogramming in mammalian development: aberrant reprogramming in cloned embryos. Proc Natl Acad Sci USA 2001; 98(24):13734-13738. 28. Blelloch R, Wang Z, Meissner A et al. Reprogramming eiciency following somatic cell nuclear transfer is inluenced by the diferentiation and methylation state of the donor nucleus. Stem Cells 2006; 24(9):2007-2013. 29. Hajkova P, Erhardt S, Lane N et al. Epigenetic reprogramming in mouse primordial germ cells. Mech Dev 2002; 117(1-2):15-23. 30. Surani MA, Hayashi K, Hajkova P. Genetic and epigenetic regulators of pluripotency. Cell 2007; 128(4):747-762. 31. Frank D, Keshet I, Shani M et al. Demethylation of CpG islands in embryonic cells. Nature 1991; 351(6323):239-241. 32. Paroush Z, Keshet I, Yisraeli J et al. Dynamics of demethylation and activation of the alpha-actin gene in myoblasts. Cell 1990; 63(6):1229-1237. 33. Lucarelli M, Fuso A, Strom R et al. he dynamics of myogenin site-speciic demethylation is strongly correlated with its expression and with muscle diferentiation. J Biol Chem 2001; 276(10):7500-7506. 34. Bruniquel D, Schwartz RH. Selective, stable demethylation of the interleukin-2 gene enhances transcription by an active process. Nat Immunol 2003; 4(3):235-240. 35. Murayama A, Sakura K, Nakama M et al. A speciic CpG site demethylation in the human interleukin 2 gene promoter is an epigenetic memory. EMBO J 2006; 25(5):1081-1092.

DNA Demethylation

167

36. Kersh EN, Fitzpatrick DR, Murali-Krishna K et al. Rapid demethylation of the IFN-gamma gene occurs in memory but not naive CD8 T-cells. J Immunol 2006; 176(7):4083-4093. 37. homassin H, Flavin M, Espinas ML et al. Glucocorticoid-induced DNA demethylation and gene memory during development. EMBO J 2001; 20(8):1974-1983. 38. Miller CA, Sweatt JD. Covalent modiication of DNA regulates memory formation. Neuron 2007; 53(6):857-869. 39. Kakutani T, Munakata K, Richards EJ et al. Meiotically and mitotically stable inheritance of DNA hypomethylation induced by ddm1 mutation of Arabidopsis thaliana. Genetics 1999; 151(2):831-838. 40. Jacobsen SE, Meyerowitz EM. Hypermethylated SUPERMAN epigenetic alleles in arabidopsis. Science 1997; 277(5329):1100-1103. 41. Cubas P, Vincent C, Coen E. An epigenetic mutation responsible for natural variation in l oral symmetry. Nature 1999; 401(6749):157-161. 42. Messeguer R, Ganal MW, Stefens JC et al. Characterization of the level, target sites and inheritance of cytosine methylation in tomato nuclear DNA. Plant Mol Biol 1991; 16(5):753-770. 43. Sha AH, Lin XH, Huang JB et al. Analysis of DNA methylation related to rice adult plant resistance to bacterial blight based on methylation-sensitive AFLP (MSAP) analysis. Mol Genet Genomics 2005; 273(6):484-490. 44. Ruiz-Garcia L, Cervera MT, Martinez-Zapater JM. DNA methylation increases throughout Arabidopsis development. Planta 2005; 222(2):301-306. 45. Zhu JK. Epigenome sequencing comes of age. Cell 2008; 133(3):395-397. 46. Steward N, Ito M, Yamaguchi Y et al. Periodic DNA methylation in maize nucleosomes and demethylation by environmental stress. J Biol Chem 2002; 277(40):37741-37746. 47. Steward N, Kusano T, Sano H. Expression of ZmMET1, a gene encoding a DNA methyltransferase from maize, is associated not only with DNA replication in actively proliferating cells, but also with altered DNA methylation status in cold-stressed quiescent cells. Nucleic Acids Res 2000; 28(17):3250-3259. 48. Wessler SR. Turned on by stress. Plant retrotransposons. Curr Biol 1996; 6(8):959-961. 49. Choi CS, Sano H. Abiotic-stress induces demethylation and transcriptional activation of a gene encoding a glycerophosphodiesterase-like protein in tobacco plants. Mol Genet Genomics 2007; 277(5):589-600. 50. Wada Y, Miyamoto K, Kusano T et al. Association between up-regulation of stress-responsive genes and hypomethylation of genomic DNA in tobacco plants. Mol Genet Genomics 2004; 271(6):658-666. 51. Feil R, Berger F. Convergent evolution of genomic imprinting in plants and mammals. Trends Genet 2007; 23(4):192-199. 52. Huh JH, Bauer MJ, Hsieh TF et al. Endosperm gene imprinting and seed development. Curr Opin Genet Dev 2007; 17(6):480-485. 53. Xiao W, Gehring M, Choi Y et al. Imprinting of the MEA Polycomb gene is controlled by antagonism between MET1 methyltransferase and DME glycosylase. Dev Cell 2003; 5(6):891-901. 54. Jullien PE, Kinoshita T, Ohad N et al. Maintenance of DNA methylation during the Arabidopsis life cycle is essential for parental imprinting. Plant Cell 2006; 18(6):1360-1372. 55. Kinoshita T, Miura A, Choi Y et al. One-way control of FWA imprinting in Arabidopsis endosperm by DNA methylation. Science 2004; 303(5657):521-523. 56. Gehring M, Huh JH, Hsieh TF et al. DEMETER DNA glycosylase establishes MEDEA polycomb gene self-imprinting by allele-speciic demethylation. Cell 2006; 124(3):495-506. 57. Morales-Ruiz T, Ortega-Galisteo AP, Ponferrada-Marin MI et al. DEMETER and REPRESSOR OF SILENCING 1 encode 5-methylcytosine DNA glycosylases. Proc Natl Acad Sci USA 2006; 103(18):6853-6858. 58. Wolffe AP, Jones PL, Wade PA. DNA demethylation. Proc Natl Acad Sci USA 1999; 96(11):5894-5896. 59. Gjerset RA, Martin DW Jr. Presence of a DNA demethylating activity in the nucleus of murine erythroleukemic cells. J Biol Chem 1982; 257(15):8581-8583. 60. Ramchandani S, Bhattacharya SK, Cervoni N et al. DNA methylation is a reversible biological signal. Proc Natl Acad Sci USA 1999; 96(11):6107-6112. 61. Bhattacharya SK, Ramchandani S, Cervoni N et al. A mammalian protein with speciic demethylase activity for mCpG DNA. Nature 1999; 397(6720):579-583. 62. Wade PA, Gegonne A, Jones PL et al. Mi-2 complex couples DNA methylation to chromatin remodelling and histone deacetylation. Nat Genet 1999; 23(1):62-66. 63. Santos F, Hendrich B, Reik W et al. Dynamic reprogramming of DNA methylation in the early mouse embryo. Dev Biol 2002; 241(1):172-182. 64. Smith SS. Gilbert’s conjecture: the search for DNA (cytosine-5) demethylases and the emergence of new functions for eukaryotic DNA (cytosine-5) methyltransferases. J Mol Biol 2000; 302(1):1-7. 65. Hamm S, Just G, Lacoste N et al. On the mechanism of demethylation of 5-methylcytosine in DNA. Bioorg Med Chem Lett 2008; 18(3):1046-1049.

168

DNA and RNA Modii cation Enzymes

66. Weiss A, Keshet I, Razin A et al. DNA demethylation in vitro: involvement of RNA. Cell 1996; 86(5):709-718. 67. Swisher JF, Rand E, Cedar H et al. Analysis of putative RNase sensitivity and protease insensitivity of demethylation activity in extracts from rat myoblasts. Nucleic Acids Res 1998; 26(24):5573-5580. 68. Barreto G, Schafer A, Marhold J et al. Gadd45a promotes epigenetic gene activation by repair-mediated DNA demethylation. Nature 2007; 445(7128):671-675. 69. Jin SG, Guo C, Pfeifer GP. GADD45A does not promote DNA demethylation. PLoS Genet 2008; 4(3):e1000013. 70. Razin A, Szyf M, Kafri T et al. Replacement of 5-methylcytosine by cytosine: a possible mechanism for transient DNA demethylation during differentiation. Proc Natl Acad Sci USA 1986; 83(9):2827-2831. 71. Lindahl T, Wood RD. Quality control by DNA repair. Science 1999; 286(5446):1897-1905. 72. Vairapandi M, Duker NJ. Enzymic removal of 5-methylcytosine from DNA by a human DNA-glycosylase. Nucleic Acids Res 1993; 21(23):5323-5327. 73. Vairapandi M, Duker NJ. Partial puriication and characterization of human 5-methylcytosine-DNA glycosylase. Oncogene 1996; 13(5):933-938. 74. Jost JP, Siegmann M, Sun L et al. Mechanisms of DNA demethylation in chicken embryos. Puriication and properties of a 5-methylcytosine-DNA glycosylase. J Biol Chem 1995; 270(17):9734-9739. 75. Zhu B, Zheng Y, Hess D et al. 5-methylcytosine-DNA glycosylase activity is present in a cloned G/T mismatch DNA glycosylase associated with the chicken embryo DNA demethylation complex. Proc Natl Acad Sci USA 2000; 97(10):5135-5139. 76. Neddermann P, Gallinari P, Lettieri T et al. Cloning and expression of human G/T mismatch-speciic thymine-DNA glycosylase. J Biol Chem 1996; 271(22):12767-12774. 77. Zhu B, Zheng Y, Angliker H et al. 5-Methylcytosine DNA glycosylase activity is also present in the human MBD4 (G/T mismatch glycosylase) and in a related avian sequence. Nucleic Acids Res 2000; 28(21):4157-4165. 78. Sibghat U, Gallinari P, Xu YZ et al. Base analog and neighboring base efects on substrate speciicity of recombinant human G:T mismatch-speciic thymine DNA-glycosylase. Biochemistry 1996; 35(39):12926-12932. 79. Hendrich B, Hardeland U, Ng HH et al. he thymine glycosylase MBD4 can bind to the product of deamination at methylated CpG sites. Nature 1999; 401(6750):301-304. 80. Hardeland U, Bentele M, Jiricny J et al. The versatile thymine DNA-glycosylase: a comparative characterization of the human, Drosophila and ission yeast orthologs. Nucleic Acids Res 2003; 31(9):2261-2271. 81. Kress C, homassin H, Grange T. Active cytosine demethylation triggered by a nuclear receptor involves DNA strand breaks. Proc Natl Acad Sci USA 2006; 103(30):11112-11117. 82. Hajkova P, Ancelin K, Waldmann T et al. Chromatin dynamics during epigenetic reprogramming in the mouse germ line. Nature 2008; 452(7189):877-881. 83. Reik W. Stability and lexibility of epigenetic gene regulation in mammalian development. Nature 2007; 447(7143):425-432. 84. Morgan HD, Dean W, Coker HA et al. Activation-induced cytidine deaminase deaminates 5-methylcytosine in DNA and is expressed in pluripotent tissues: implications for epigenetic reprogramming. J Biol Chem 2004; 279(50):52353-52360. 85. Metivier R, Gallais R, Tifoche C et al. Cyclical DNA methylation of a transcriptionally active promoter. Nature 2008; 452(7183):45-50. 86. Kangaspeska S, Stride B, Metivier R et al. Transient cyclical methylation of promoter DNA. Nature 2008; 452(7183):112-115. 87. Metivier R, Penot G, Hubner MR et al. Estrogen receptor-alpha directs ordered, cyclical and combinatorial recruitment of cofactors on a natural target promoter. Cell 2003; 115(6):751-763. 88. Choi Y, Gehring M, Johnson L et al. DEMETER, a DNA glycosylase domain protein, Is required for endosperm gene imprinting and seed viability in Arabidopsis. Cell 2002; 110(1):33-42. 89. Gong Z, Morales-Ruiz T, Ariza RR et al. ROS1, a repressor of transcriptional gene silencing in Arabidopsis, encodes a DNA glycosylase/lyase. Cell 2002; 111(6):803-814. 90. Nash HM, Bruner SD, Scharer OD et al. Cloning of a yeast 8-oxoguanine DNA glycosylase reveals the existence of a base-excision DNA-repair protein superfamily. Curr Biol 1996; 6(8):968-980. 91. Krokan HE, Standal R, Slupphaug G. DNA glycosylases in the base excision repair of DNA. Biochem J 1997; 325:1-16. 92. Denver DR, Swenson SL, Lynch M. An evolutionary analysis of the helix-hairpin-helix superfamily of DNA repair glycosylases. Mol Biol Evol 2003; 20(10):1603-1611. 93. Agius F, Kapoor A, Zhu JK. Role of the Arabidopsis DNA glycosylase/lyase ROS1 in active DNA demethylation. Proc Natl Acad Sci USA 2006; 103(31):11796-11801.

DNA Demethylation

169

94. Cokus SJ, Feng S, Zhang X et al. Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning. Nature 2008; 452(7184):215-219. 95. Penterman J, Zilberman D, Huh JH et al. DNA demethylation in the Arabidopsis genome. Proc Natl Acad Sci USA 2007; 104:6752-6757. 96. Ortega-Galisteo AP, Morales-Ruiz T, Ariza RR et al. Arabidopsis DEMETER-LIKE proteins DML2 and DML3 are required for appropriate distribution of DNA methylation marks. Plant Mol Biol 2008; 67(6):671-681. 97. Jost JP, Oakeley EJ, Zhu B et al. 5-Methylcytosine DNA glycosylase participates in the genome-wide loss of DNA methylation occurring during mouse myoblast diferentiation. Nucleic Acids Res 2001; 29(21):4452-4461. 98. Zhu B, Benjamin D, Zheng Y et al. Overexpression of 5-methylcytosine DNA glycosylase in human embryonic kidney cells EcR293 demethylates the promoter of a hormone-regulated reporter gene. Proc Natl Acad Sci USA 2001; 98(9):5031-5036. 99. Barnes DE, Lindahl T. Repair and genetic consequences of endogenous DNA base damage in mammalian cells. Annu Rev Genet 2004; 38:445-476. 100. Aravind L, Koonin EV. he alpha/beta fold uracil DNA glycosylases: a common origin with diverse fates. Genome Biol 2000; 1(4):research0007.0001-0007.0008. 101. Marcourt L, Cordier C, Couesnon T et al. Impact of C5-cytosine methylation on the solution structure of d(GAAAACGTTTTC)2. An NMR and molecular modelling investigation. Eur J Biochem 1999; 265(3):1032-1042. 102. Zhu J, Kapoor A, Sridhar VV et al. he DNA glycosylase/lyase ROS1 functions in pruning DNA methylation patterns in Arabidopsis. Curr Biol 2007; 17(1):54-59. 103. Penterman J, Uzawa R, Fischer RL. Genetic interactions between DNA demethylation and methylation in Arabidopsis. Plant Physiol 2007; 145(4):1549-1557. 104. Lister R, O’Malley RC, Tonti-Filippini J et al. Highly integrated single-base resolution maps of the epigenome in Arabidopsis. Cell 2008; 133(3):523-536. 105. Cortazar D, Kunz C, Saito Y et al. he enigmatic thymine DNA glycosylase. DNA Repair (Amst) 2006; 6(4):489-504. 106. Morgan HD, Santos F, Green K et al. Epigenetic reprogramming in mammals. Hum Mol Genet 2005; (14 Spec No 1):R47-58.

Chapter 14

Demethylation of DNA and RNA by AlkB Proteins Pål Ø. Falnes,* Erwin van den Born and Trine J. Meza

Abstract

T

he alkb gene was discovered more than two decades ago as a methylation sensitive mutant of Escherichia coli, suggesting that the corresponding protein (EcAlkB) may be involved in removing methyl lesions from DNA. However, it took several years to establish that this was indeed the case; EcAlkB was found to be an iron- and 2-oxoglutarate-dependent enzyme capable of removing certain alkyl adducts from DNA by an oxidative mechanism. Based on protein sequence homology, eight mammalian proteins have been termed AlkB homologues (ABH1-8). hree of these, ABH1, ABH2 and ABH3, as well as the less related ninth member of the family, the obesity-associated protein FTO, have been demonstrated to possess a repair activity similar to that of EcAlkB. he function of the remaining ABH proteins is still unknown and their possible function will be addressed in this chapter. Interestingly, some AlkB proteins display a demethylase activity on RNA as well as DNA and the signiicance of AlkB-mediated RNA repair will be discussed. Apart from their role as repair enzymes, AlkB proteins may conceivably regulate the function of nucleic acids or proteins through removal of endogenous methyl modiications. Interestingly, AlkB substrates such as 1-methyladenine and 3-methylcytosine exist as natural modiications in RNA and it has recently been shown that the so-called JmjC-proteins use the AlkB mechanism for demethylation of methylated arginines and lysines in histones.

Introduction

Ater their initial synthesis, the polymeric cellular macromolecules DNA, RNA and proteins are frequently modiied and one of the most common modiications is methylation. For example, in the genomes of higher eukaryotes, DNA methylation of cytosine to give rise to 5-methylcytosine (m5C), usually in CpG dinucleotides, is an important signal for gene repression.1 In the case of RNA, the abundance and variety of methyl modiications is particularly high in tRNAs, where each molecule usually contains several diferent methylated nucleosides.2 Proteins are primarily methylated at Arg and Lys residues and protein methylation has been extensively studied in the case of histones, where methylation of speciic residues in the N-terminal tails have important roles in activating and repressing gene expression.3 Enzyme-mediated methylation of macromolecules is highly important, both to provide important structural features and to modulate function. On the other hand, macromolecules are also subject to frequent, spontaneous attack by various methylating agents, causing a wide range of harmful lesions.4 Obviously, such damage is most serious in the case of DNA, where a single unrepaired lesion in principle may be suicient to kill the cell. Living cells have developed repair systems capable of repairing all major methyl lesions on DNA, clearly illustrating the utmost *Corresponding Author: Pål Ø. Falnes—Department of Molecular Biosciences, University of Oslo, P.O. Box 1041 Blindern, N-0316 Oslo, Norway. Email: [email protected]

DNA and RNA Modiication Enzymes: Structure, Mechanism, Function and Evolution, edited by Henri Grosjean. ©2009 Landes Bioscience.

Demethylation of DNA and RNA by AlkB proteins

171

importance of removing deleterious methylation damage from DNA.5 However, mechanisms for repairing RNA6 and proteins7 have also been described. For many years, targeted, enzyme-mediated methylations of macromolecules were considered irreversible, but the recent discoveries of several enzymes capable of reversing such methylations have dramatically altered this view. Interestingly, several of the mechanisms found to remove endogenous methylations are identical to previously reported DNA repair mechanisms, demonstrating how DNA repair studies also can illuminate mechanisms for removal of endogenous methylations. For example, all living organisms possess several DNA glycosylases capable of excising a wide range of aberrant bases, including methylated ones, from DNA. In particular, a glycosylase mechanism is also involved in removing epigenetic m5C marks from DNA in plants and possibly also in vertebrates (A topic covered by the chapter of Roldan-Arjona and Ariza). Also, the Jumonji ( JmjC) proteins demethylate histones by a mechanism identical to that used by AlkB proteins to remove aberrant methyl groups from DNA and RNA.8 he focus of the present chapter is the AlkB family of proteins and such proteins are found in most organisms, i.e., in bacteria, fungi, animals and plants. We will summarize here the current state of knowledge of these proteins and discuss the likely possibility that AlkB proteins are involved in targeted macromolecular modiication as well as in repair.

he Discovery of the AlkB Mechanism

Methylating agents are found both extracellularly and intracellularly and can react with the DNA bases at several diferent positions.4 hat methylation damage poses a serious threat to the integrity of DNA is illustrated by the abundance of repair mechanisms devoted to eliminating such damage.5 Pioneering studies on how methylation damage is removed from DNA were performed in the eighties, using Escherichia coli as model organism. Several E. coli mutants displaying hypersensitivity towards methylating agents were isolated and the afected genes identiied. In the case of the alkA mutant, it was quickly established that the corresponding protein was a DNA glycosylase capable of removing purine bases containing an alkyl group at the N3- or N7 position.9,10 However, in the case of the alkB mutant, which displayed hypersensitivity only towards methylating agents of the SN2 type, such as methyl methanesulfonate (MMS), initial attempts to identify the enzymatic activity of the corresponding protein were unsuccessful.11,12 Nearly two decades passed without any substantial progress in identifying the function of EcAlkB and the protein appeared to have an activity diferent from that of other known DNA repair proteins.13 Finally, some clues to the function of EcAlkB were provided in 2000, when it was shown that methylated, single-stranded (ss) phage DNA was ineiciently reactivated in alkB mutant bacteria relative to wild-type bacteria, whereas no such diference was observed in the case of double-stranded (ds) phage DNA.14 he N1 position of purine bases and the N3 position of pyrimidines are shielded from methylation when present in dsDNA and lesions such as 1-methyladenine (m1A) and 3-methylcytosine (m3C) are therefore much more frequent in ssDNA. Consequently, it was proposed that these lesions, which were not repaired by any known repair activity, may be the substrates of EcAlkB.14 In addition, this hypothesis agreed well with the observation that high amounts of m1A and m3C are introduced by treatment of DNA with SN2-type methylating agents. Although some clues about possible AlkB substrates had now been obtained, the actual mechanism was still unknown. However, an additional, very important lead was provided by Aravind and Koonin in 2001.15 heir bioinformatics study showed that the AlkB proteins were likely members of the superfamily of 2-oxoglutarate (2OG) and Fe(II)- dependent dioxygenases, since they shared a predicted three-dimensional fold with this group of proteins and, in addition, contained several conserved amino acid residues putatively involved in coordinating the cofactor Fe2+ and the cosubstrate 2OG.15 Based on this, it was proposed that AlkB may use an oxidative mechanism to remove lesions such as 1-meA and 3-meC. hese predictions were soon conirmed by two independent biochemical studies.16,17 It was found that EcAlkB had all the characteristic features of a typical 2OG-Fe(II) dioxygenase; Fe2+ was a required cofactor for the EcAlkB reaction, in which 2OG was decarboxylated to succinate and O2 was used as an oxidizing agent. Furthermore, EcAlkB catalyzed a reaction typical of this group

172

DNA and RNA Modii cation Enzymes

of enzymes, namely hydroxylation. It was shown that the hydroxylation of the aberrant methyl group found in m1A and m3C was followed by a spontaneous release of the resulting hydroxymethyl moiety as formaldehyde, resulting in a lesion-free base (Fig. 1A).

Figure 1. A) The AlkB mechanism. B) Reported substrates for AlkB proteins. C) Types of AlkB-catalysed reactions.

Demethylation of DNA and RNA by AlkB proteins

AlkB-Mediated DNA Repair

173

he initial observation that EcAlkB was capable of demethylating m1A and m3C lesions in DNA, was later supplemented by studies showing that the structurally analogous, but less abundant, lesions 1-methylguanine (m1G) and 3-methylthymine (m3T) are also AlkB substrates (Fig. 1B).18-20 Several reports have demonstrated that, in addition to the aforementioned methyl adducts, various bulkier lesions at the same positions are also repaired by EcAlkB (Fig. 1B). For example, 1-ethyladenosine is dealkylated in a reaction where the deleterious ethyl group is released as acetaldehyde (Fig. 1C).21 Although this has not yet been veriied biochemically, bases containing propyl, hydroxyethyl and hydroxypropyl adducts also appear to be AlkB substrates, since EcAlkB increases the survival of ssDNA phage M13 treated with compounds known to introduce such adducts.22 Products of lipid peroxidation, as well as metabolites of the widely studied carcinogen vinyl chloride, are able to introduce exocyclic etheno adducts on the nucleobases.23 Here, a ring nitrogen and an exocyclic aminogroup are bridged by an ethenogroup. Examples are 1,N6-ethenoadenine (εA) and 3,N4-ethenocytosine (εC), which represent widely studied etheno adducts and interestingly, the modiied ring nitrogen represents a prototype EcAlkB substrate when methylated (m1A and m3C, respectively). It has indeed been demonstrated that EcAlkB repairs εA lesions by a reaction during which the etheno moiety is oxidized and then released as glyoxal, leading to reversal of the damage (Fig. 1C).24,25 Recently, it was also demonstrated that the similar lesion 1,N6-ethanoadenine (EA), which is formed when DNA is exposed to the alkylating cancer chemotherapeutic 1,3-bis(2-chloroethyl)-1-nitrosourea (BCNU),26 is an EcAlkB substrate.27 Here, the oxidative reaction does not result in reversal of the lesion, but rather to ring opening and conversion into a more innocuous N6 adduct (Fig. 1C). In summary, it has been demonstrated that EcAlkB is capable of repairing a wide range of bulky adducts on nucleobases, but the activity on these lesions is generally lower than on methylated bases. hus, it has yet to be irmly established whether these bulky lesions are important AlkB substrates in vivo, especially since some of them, i.e., the etheno adducts, are also eiciently repaired by DNA glycosylases.23 Lesions on DNA bases can be miscoding (premutagenic) and/or blocking (cytotoxic) during replication and transcription. he efects of m1A, m3C, m1G, or m3T on the idelity and eiciency of DNA replication has been studied by transforming alkB mutant and wild-type E. coli with ssDNA phage genomes containing a single lesion.18 It was then found that all these lesions represented blocks to replication, but that this block was relieved by induction of the SOS-response, probably due to the expression of speciic translesion polymerases capable of copying past these lesions. Under SOS conditions, m3C, m1G and m3T, but not m1A were strongly premutagenic, indicating that the translesion polymerases are not able to accurately insert the correct nucleotide opposite these lesions. hese results agreed well with the previous inding that MMS-induced mutagenesis is only slightly increased in AlkB-deicient E. coli,14 which may be explained by the low mutagenicity of m1A, which represents the primary AlkB substrate introduced by MMS-treatment of DNA. Similar experiments have also demonstrated that εA, εC and EA lesions are EcAlkB substrates in vivo and that failure to repair these lesions leads to increased cytotoxicity (εA, εC and EA) and mutagenesis (εA and εC).24,27 hus, the AlkB function protects cells against DNA lesions that are miscoding and replication-blocking, thereby preventing both mutagenesis and cytotoxicity. Since AlkB substrates such as m1A and m3C are generated much more eiciently in ssDNA than in dsDNA, it has been suggested that such lesions primarily arise when DNA is transiently single-stranded, e.g., during transcription and replication. A recent study provided some experimental support for this notion.28 It was demonstrated that MMS-induced mutagenesis at cytosine residues in a plasmid-borne kanamycin-gene was increased in the E. coli alkB mutant. Interestingly, mutagenesis was further increased by active transcription and this efect was observed only in the nontranscribed strand of the gene. Crystal structures of the EcAlkB protein in complex with m1A-containing ssDNA29 or dsDNA30 have been published during recent years. hese structures have revealed that the protein consists of two important domains, a N-terminal “Nucleotide-recognition lid”, which is a characteristic feature of the AlkB proteins with a similar activity to that of EcAlkB, as well as a dioxygenase

174

DNA and RNA Modii cation Enzymes

domain, shared among all members of the Fe(II)-2OG dioxygenase superfamily. It was found that residues in both these domains are making important contacts with the substrate.29,30 he structure of EcAlkB in complex with a dsDNA substrate also revealed how the protein gains access to the methylated substrate by an unprecedented base-lipping mechanism, where the DNA substrate is distorted so that the bases lanking the lipped-out one are stacked on one another.30

AlkB-Mediated RNA Repair

Genotoxic agents that introduce lesions on the DNA bases will cause similar lesions also in RNA. Obviously, damage poses a more serious problem for the cell when present in DNA than in RNA, which can be replenished by de novo synthesis. his is relected in the presence of numerous DNA repair mechanisms, whereas only a few examples of RNA repair have been reported. Nevertheless, cells possess various diferent mechanisms devoted to surveillance of RNA integrity and subsequent degradation of faulty molecules31 and RNA repair represents a logical extension of the cellular repertoire for maintaining the RNA pool in a functional state. Although repair mechanisms exist to religate strand breaks in tRNAs32 and to mend the ends of RNA molecules subjected to undesired exonucleolytic cleavage,33 the AlkB mechanism currently represents the only known example of base lesion repair in RNA. he initial discovery that EcAlkB and the human homologue hABH3 were able to reverse the methyl lesions m1A and m3C in both DNA and RNA was indeed interesting.34 On the other hand, some concerns could be raised regarding the biological relevance of the observed RNA repair, since the responsible enzymes were equally (hABH3) or more (EcAlkB) active on DNA, relative to RNA. However, subsequent studies have provided additional indications that AlkB-mediated RNA repair is indeed of biological importance. In a irst set of experiments, exposure of tRNA to a methylating agent was found to severely compromise its ability to become aminoacylated and to support protein translation. However, this functional inactivation could be partially reversed by AlkB-mediated RNA repair.35 Similarly, mRNA methylation blocked protein translation, but RNA repair relieved this block.35 hese experiments clearly demonstrated that AlkB-mediated RNA repair has the potential to maintain RNAs in a functional state, but it still remains to be established that such repair of tRNA and/or mRNA is biologically signiicant. A more direct demonstration that AlkB-mediated RNA repair is important came from our recent study on viral AlkB proteins from plant-infecting ssRNA viruses.36 hese viruses express the AlkB domain as part of their replicase polyprotein. his is quite remarkable, since these viruses have very small RNA genomes (in some cases as small as 7 kb), indicating that the presence of an AlkB activity gives the virus a substantial selective advantage. hree diferent viral AlkB proteins were all found to eiciently repair m1A and m3C lesions in RNA and, more importantly, displayed a preference for RNA over DNA. he AlkB-containing viruses usually infect woody and perennial plants, where they may exist for a long period of time in a hostile environment before actually reaching a target cell.37 hus, viral AlkB proteins may increase virus survival through removing from the viral genome, methylation damage that was acquired during a long infection route. In our view, it has now been established that the AlkB mechanism is indeed involved in removing lesions from RNA as well as from DNA and that both these activities are important. Several other methyl lesions, e.g., m3A and m6G, have not yet been found to be substrates for AlkB proteins nor for other repair enzymes when present in RNA and it is an intriguing question whether or not these lesions are actively removed from RNA.

Human AlkB Homologues

Sequence homologues of EcAlkB are found in a wide range of organisms and multicellular organisms possess several diferent proteins of this kind. he genomes of mammals encode eight diferent AlkB homologues (ABH1-8; hABH1-8 in humans; mABH1-8 in mice),15,38-40 that are readily identiiable by bioinformatics analysis. Most of these AlkB homologues are highly conserved throughout the entire animal kingdom, indicating fundamental and important functions.

Demethylation of DNA and RNA by AlkB proteins

175

hABH1

he irst ABH protein to be discovered was hABH1, which, among the human homologues, shows the highest degree of sequence similarity to EcAlkB. In the report describing the initial characterization of hABH1, the protein was reported to complement the alkylation-sensitive phenotype of the E. coli alkB mutant, indicating that it possessed an activity similar to EcAlkB.41 However, subsequent studies failed to reproduce this inding and to detect any EcAlkB-like activity associated with the recombinant protein.21,34 Moreover, a recent study reported that mABH1 knock-out mice are defective in placental development and that the mABH1 protein associates with nuclear euchromatin, where it binds strongly to a protein involved in gene regulation, suggesting that hABH1/mABH1 is involved in regulating transcription.42 hese indings are in contrast to another recent study which indicated that hABH1 is a mitochondrial protein, possessing a relatively weak activity towards m3C lesions in ssDNA and ssRNA.43 he authors of the latter article did not exclude the possibility that hABH1 may act on other substrates and, clearly, more studies are required to irmly establish the biological function of this protein.

hABH2 and hABH3

Among the human AlkB homologues, a biochemical activity was irst demonstrated for hABH2 and hABH3, which like EcAlkB were found to repair m1A and m3C lesions in nucleic acids.21,34 Although these two proteins have the same enzymatic activity, they display some very interesting diferences with respect to subcellular localization, substrate speciicity, as well as the phenotype of knock-out (KO) mice. hABH2 is an exclusively nuclear protein which colocalizes with the proliferating cell nuclear antigen (PCNA) in replication foci during the S phase of the cell cycle and tends to accumulate in nucleoli outside of S-phase.34 hABH3, on the other hand, is found both in the cytoplasm and in the nucleus, where it is somewhat excluded from nucleoli.34 While hABH2 is only active on DNA and displays a preference for dsDNA over ssDNA, hABH3 is equally active on DNA and RNA and displays a strong preference for single-stranded substrates.34,44 Cell-free extracts from mABH2 KO mice showed defective repair of m1A and m3C lesions in DNA and the mice accumulated m1A lesions in genomic DNA with age, whereas no DNA repair defect was detected in the mABH3 KO mice.45 Similar to EcAlkB, hABH3 was found to functionally reactivate tRNA and mRNA damaged by methylation.35 hus, the available experimental data clearly suggest that hABH2 and hABH3 represent very distinct cellular functions. hABH2 appears to be the main DNA demethylase for removal of lesions like m1A and m3C, whereas hABH3 is likely to be a RNA repair enzyme, or, alternatively, remove DNA damage from certain subdomains of the genome, e.g., regions that are transiently single-stranded due to ongoing transcription or replication. Recent crystal structures of hABH2 in complex with diferent dsDNA substrates provided explanations for its preference for dsDNA relative to ssDNA.30 he protein contacts the lesion-free, complementary strand through a positively charged RKK motif, which is not found in the ssDNA-preferring hABH3 protein.46 Also, the methylated base is lipped out of the double helix for repair and, to maintain base-stacking, a speciic Phe residue, a so-called inger, intercalates in the helix.

FTO

he FTO (Fatso/fat mass and obesity associated) protein has been subject to extensive studies during recent years, due to the strong association of variants of its encoding gene with obesity.47 hrough protein sequence analysis, it was recently found that this protein is a likely member of the Fe(II)-2OG dioxygenase superfamily.48,49 FTO shows the strongest sequence resemblance to the AlkB subfamily, although it is not readily identiied as an AlkB homologue by conventional database searches.48,49 he initial in vitro analysis of the FTO protein demonstrated it to be an AlkB-like demethylase with activity towards 3-methylthymine (m3T), a rather minor lesion in DNA. A subsequent study reported that FTO also has activity towards the analogous RNA lesion 3-methyluracil (m3U) and that this activity is actually higher than on m3T lesions in DNA.50 hese results suggest that the primary role of FTO may be in RNA repair rather than in DNA

176

DNA and RNA Modii cation Enzymes

repair, but further studies are required to identify the relevant in vivo substrates for FTO and to establish how FTO deiciency causes obesity.

hABH4-8

Ater the enzymatic activity of EcAlkB was unravelled in 2002, considerable insight has been gained regarding the biological function of the four human AlkB homologues described above. However, many important questions regarding these proteins have still not been answered and, virtually nothing is known about the function of the ive remaining proteins, hABH4-8. In the next section, will speculate on the possible function of these proteins, with a particular emphasis on hABH8, where a role in tRNA modiication may be envisioned.

Possible Regulatory Roles for AlkB Proteins

In contrast to the random, deleterious methylations introduced by methylating agents, numerous important, site-speciic methylations are introduced in DNA, RNA and proteins by dedicated methyltransferases. hese methylations are usually important for correct folding, or they may have regulatory roles. A decade ago, the prevailing view was that such macromolecular methylations are irreversible, but this view has dramatically changed during the last few years, primarily owing to the discovery of histone demethylases. In addition, several recent reports have shown that 5-methylcytosine marks, which represent signals for transcriptional repression in the genomes of higher eukaryotes, are also subject to reversal. Methylation of speciic lysine and arginine residues found in the N-terminal tails of histones is an important regulatory mechanism to control chromatin state and gene expression in eukaryotes.3 So far, three diferent mechanisms for reversing methyl marks on histones have been described.8,51,52 Interestingly, what appears to be the most widely used mechanism, catalyzed by the so-called Jumonji ( JmjC) group of proteins, is identical to that used by the AlkB proteins to demethylate nucleic acids; Fe2+- and 2-oxoglutarate-dependent oxidative demethylation.8 Several diferent roles can be envisioned for mammalian AlkB homologues of unknown function. One obvious candidate function is DNA/RNA repair, most likely of lesions other than those described so far. Alternatively, such proteins may represent novel regulatory demethylases that remove methyl modiications from proteins, such as histones, or from nucleic acids. Finally, since AlkB-mediated demethylation is in fact a consequence of a hydroxylation reaction, one may also easily imagine that some AlkB homologues may catalyse reactions where the end-product is a hydroxylated, rather than a demethylated substrate. Cellular RNAs are subject to a wide variety of targeted methylations and the fact that the AlkB substrates m1A, m3C, m1G and m3U all exist as naturally occurring nucleosides in RNA,53 makes the idea that AlkB homologues may be involved in RNA metabolism particularly attractive. he strongest indications regarding the role of mammalian AlkB proteins of unknown function exist for ABH8, where the domain architecture actually may suggest a role in tRNA modiication. he AlkB domain of ABH8 is sandwiched between an N-terminal RNA recognition motif (RRM) and a C-terminal putative methyltransferase (MT) which represents a mammalian homologue of the tRNA methyltransferase Trm9 from Saccharomyces cerevisiae (ScTrm9) (Fig. 2A).54 It has been demonstrated that ScTrm9 is responsible for adding the last methyl group during the generation of the wobble uridine modiication 5-methoxycarbonylmethyluridine (mcm5U), as well as the 2-thio derivative 5-methoxycarbonylmethyl-2-thiouridine (mcm5s2U), in some tRNAs (Fig. 2B).55 RRMs are frequently found in RNA-binding proteins56 and its presence in ABH8 further supports a role for this protein in RNA metabolism. In vertebrates, yet another Trm9 homologue, KIAA1456, exists (Fig. 2A),55 but its sequence similarity to ScTrm9 is slightly lower than in the case of ABH8 (P. Ø. Falnes, unpublished observations). he degree of sequence similarity between hABH8 and KIAA1456 is substantial and comparable to that between ABH8 and ScTrm9, suggesting that these two vertebrate proteins may recognize similar substrates. Although the MT domain of ABH8 is likely to be a functional Trm9 orthologue, the function of the AlkB domain is far from obvious and some candidate functions are discussed below.

Demethylation of DNA and RNA by AlkB proteins

177

Figure 2. A) Putative ABH8/Trm9 proteins in different organisms. RRM: RNA recognition motif; Dm: Drosophila melanogaster; Ce: Caenorhabditis elegans; At: Arabidopsis thaliana; Sc: Saccharomyces cerevisiae. B) The reaction catalysed by ScTrm9. The asterisk indicates the O-atom which is replaced by an S-atom in the 2-thiolated variant of mcm5U, mcm5s2U. SAM: S-adenosylmethionine; SAH: S-adenosylhomocysteine.

The subset of vertebrate tRNAs containing the mcm5U/mcm5s2U modification in the wobble position is similar to that found in yeast.53 However, vertebrates have one additional mcm5U-containing tRNA, namely tRNA-Sec. his specialized tRNA mediates the insertion of the 21st amino acid selenocysteine at UGA (normally “stop”) codons in a small number of so-called selenoproteins (∼25 in mammals).57 Interestingly, the wobble mcm5U found in tRNA-Sec is also found in a 2-Oʹ-ribose methylated form (mcm5Um) and increasing ribose methylation correlates with increased expression of a subset of the selenoproteins.57 Based on this, one may consider the intriguing possibility that the AlkB domain of ABH8 may reverse the ribose methylation of mcm5Um and thereby could act as a key regulator of selenoprotein synthesis. Attractive as it is,

178

DNA and RNA Modii cation Enzymes

there are still some concerns regarding this hypothesis. Plants, like Arabidopsis thaliana, have a convincing ABH8 homologue (Fig. 2A), but do not have selenoproteins. Moreover, the worm Caenorhabditis elegans has ABH8 (Fig. 2A), but merely expresses a single selenoprotein, thioredoxin reductase,58 which, at least in mammals, is not among the selenoproteins regulated by mcm5Um ribose methylation.59 A recent study found that an intact Trm9 function in S. cerevisiae is important for eicient decoding of the Arg codon AGA and the Glu codon GAA, which are decoded by mcm5U and mcm5s2U containing tRNAs, respectively.60 Furthermore, it was found that the translation of mRNAs abundant in these codons was suppressed in a trm9 mutant and it was proposed that the Trm9 function may act as a regulator of protein translation. Indeed, if such a regulatory mechanism is present in S. cerevisiae, which is devoid of any putative AlkB homologue, the interesting possibility exists that ABH8 in higher eukaryotes, being a fusion protein between Trm9- and AlkB-like domains, regulates translation by altering tRNA wobble modiication status. Conceivably, the AlkB domain of ABH8 could be a demethylase which removes the Trm9 introduced methylation and the relative levels of these two opposing activities could be governed by a regulatory signal. Although the observation that Trm9 ablation suppresses the expression of certain proteins is very interesting, it still remains to be irmly established that modulation of cellular Trm9 activity actually represents a regulator of protein translation. Very few of the enzymes responsible for introducing the numerous diferent RNA modiications have been identiied. Evidently, many of these modiications will require oxidative reactions for their formation and most of the responsible enzymes remain undiscovered. hus, the possibility clearly exists that the AlkB domain of ABH8 is not involved in demethylation, but rather in an oxidative step in the biogenesis of a tRNA modiication.

Concluding Remarks and Future Prospects

he AlkB proteins clearly represent an interesting family of proteins which during the recent years have been mediators of many important biological insights, such as the discoveries of Fe2+- and 2OG-dependent demethylation and of base lesion repair in RNA. Given the fact that the function of numerous AlkB proteins still remains to be unravelled, we feel conident that the ield of AlkB research will contribute fundamental, important discoveries also in the future.

References

1. Caiafa P, Zampieri M. DNA methylation and chromatin structure: the puzzling CpG islands. J Cell Biochem 2005; 94:257-265. 2. Agris PF. Decoding the genome: a modiied view. Nucleic Acids Res 2004; 32:223-238. 3. Wood A, Shilatifard A. Posttranslational modiications of histones by methylation. Adv Protein Chem 2004; 67:201-222. 4. Singer B, Grunberger D. Molecular biology of mutagens and carcinogens. New York: Plenum Press, 1983. 5. Sedgwick B. Repairing DNA-methylation damage. Nat Rev Mol Cell Biol 2004; 5:148-157. 6. Feyzi E, Sundheim O, Westbye MP et al. RNA base damage and repair. Curr Pharm Biotechnol 2007; 8:326-331. 7. Clarke S. Aging as war between chemical and biochemical processes: protein methylation and the recognition of age-damaged proteins for repair. Ageing Res Rev 2003; 2:263-285. 8. Tsukada Y, Fang J, Erdjument-Bromage H et al. Histone demethylation by a family of JmjC domain-containing proteins. Nature 2006; 439:811-816. 9. Evensen G, Seeberg E. Adaptation to alkylation resistance involves the induction of a DNA glycosylase. Nature 1982; 296:773-775. 10. Karran P, Hjelmgren T, Lindahl T. Induction of a DNA glycosylase for N-methylated purines is part of the adaptive response to alkylating agents. Nature 1982; 296:770-773. 11. Kataoka H, Yamamoto Y, Sekiguchi M. A new gene (alkB) of escherichia coli that controls sensitivity to methyl methane sulfonate. J Bacteriol 1983; 153:1301-1307. 12. Kataoka H, Sekiguchi M. Molecular cloning and characterization of the alkB gene of escherichia coli. Mol Gen Genet 1985; 198:263-269. 13. Dinglay S, Gold B, Sedgwick B. Repair in escherichia coli alkB mutants of abasic sites and 3-methyladenine residues in DNA. Mutat Res 1998; 407:109-116.

Demethylation of DNA and RNA by AlkB proteins

179

14. Dinglay S, Trewick SC, Lindahl T et al. Defective processing of methylated single-stranded DNA by E. coli AlkB mutants. Genes Dev 2000; 14:2097-2105. 15. Aravind L, Koonin EV. he DNA-repair protein AlkB, EGL-9 and leprecan deine new families of 2-oxoglutarate- and iron-dependent dioxygenases. Genome Biol 2001; 2:RESEARCH0007. 16. Falnes PO, Johansen RF, Seeberg E. AlkB-mediated oxidative demethylation reverses DNA damage in escherichia coli. Nature 2002; 419:178-182. 17. Trewick SC, Henshaw TF, Hausinger RP et al. Oxidative demethylation by escherichia coli AlkB directly reverts DNA base damage. Nature 2002; 419:174-178. 18. Delaney JC, Essigmann JM. Mutagenesis, genotoxicity and repair of 1-methyladenine, 3-alkylcytosines, 1-methylguanine and 3-methylthymine in alkB escherichia coli. Proc Natl Acad Sci USA 2004; 101:14051-14056. 19. Falnes PO. Repair of 3-methylthymine and 1-methylguanine lesions by bacterial and human AlkB proteins. Nucleic Acids Res 2004; 32:6260-6267. 20. Koivisto P, Robins P, Lindahl T et al. Demethylation of 3-methylthymine in DNA by bacterial and human DNA dioxygenases. J Biol Chem 2004; 279:40470-40474. 21. Duncan T, Trewick SC, Koivisto P et al. Reversal of DNA alkylation damage by two human dioxygenases. Proc Natl Acad Sci USA 2002; 99:16660-16665. 22. Koivisto P, Duncan T, Lindahl T et al. Minimal methylated substrate and extended substrate range of escherichia coli AlkB protein, a 1-methyladenine-DNA dioxygenase. J Biol Chem 2003; 278:44348-44354. 23. Gros L, Ishchenko AA, Saparbaev M. Enzymology of repair of etheno-adducts. Mutat Res 2003; 531:219-229. 24. Delaney JC, Smeester L, Wong C et al. AlkB reverses etheno DNA lesions caused by lipid oxidation in vitro and in vivo. Nat Struct Mol Biol 2005; 12:855-860. 25. Mishina Y, Yang CG, He C. Direct repair of the exocyclic DNA adduct 1,N6-ethenoadenine by the DNA repair AlkB proteins. J Am Chem Soc 2005; 127:14594-14595. 26. Hang B, Chenna A, Guliaev AB et al. Miscoding properties of 1,N6-ethanoadenine, a DNA adduct derived from reaction with the antitumor agent 1,3-bis(2-chloroethyl)-1-nitrosourea. Mutat Res 2003; 531:191-203. 27. Frick LE, Delaney JC, Wong C et al. Alleviation of 1,N6-ethanoadenine genotoxicity by the escherichia coli adaptive response protein AlkB. Proc Natl Acad Sci USA 2007; 104:755-760. 28. Fix D, Canugovi C, Bhagwat AS. Transcription increases methylmethane sulfonate-induced mutations in alkB strains of escherichia coli. DNA Repair (Amst) 2008; 7:1289-1297. 29. Yu B, Edstrom WC, Benach J et al. Crystal structures of catalytic complexes of the oxidative DNA/ RNA repair enzyme AlkB. Nature 2006; 439:879-884. 30. Yang CG, Yi C, Duguid EM et al. Crystal structures of DNA/RNA repair enzymes AlkB and ABH2 bound to dsDNA. Nature 2008; 452:961-965. 31. Isken O, Maquat LE. Quality control of eukaryotic mRNA: safeguarding cells from abnormal mRNA function. Genes Dev 2007; 21:1833-1856. 32. Amitsur M, Levitz R, Kaufmann G. Bacteriophage T4 anticodon nuclease, polynucleotide kinase and RNA ligase reprocess the host lysine tRNA. EMBO J 1987; 6:2499-2503. 33. Nagy PD, Carpenter CD, Simon AE. A novel 3ʹ-end repair mechanism in an RNA virus. Proc Natl Acad Sci USA 1997; 94:1113-1118. 34. Aas PA, Otterlei M, Falnes PO et al. Human and bacterial oxidative demethylases repair alkylation damage in both RNA and DNA. Nature 2003; 421:859-863. 35. Ougland R, Zhang CM, Liiv A et al. AlkB restores the biological function of mRNA and tRNA inactivated by chemical methylation. Mol Cell 2004; 16:107-116. 36. van den Born E, Omelchenko MV, Bekkelund A et al. Viral AlkB proteins repair RNA damage by oxidative demethylation. Nucleic Acids Res 2008; 36:5451-5461. 37. Martelli GP, Adams MJ, Kreuze JF et al. Family lexiviridae: a case study in virion and genome plasticity. Annu Rev Phytopathol 2007; 45:73-100. 38. Drablos F, Feyzi E, Aas PA et al. Alkylation damage in DNA and RNA—repair mechanisms and medical signiicance. DNA Repair (Amst) 2004; 3:1389-1407. 39. Kurowski MA, Bhagwat AS, Papaj G et al. Phylogenomic identiication of ive new human homologs of the DNA repair enzyme AlkB. BMC Genomics 2003; 4:48. 40. Sedgwick B, Bates PA, Paik J et al. Repair of alkylated DNA: recent advances. DNA Repair (Amst) 2007; 6:429-442. 41. Wei YF, Carter KC, Wang RP et al. Molecular cloning and functional analysis of a human cDNA encoding an escherichia coli AlkB homolog, a protein involved in DNA alkylation damage repair. Nucleic Acids Res 1996; 24:931-937.

180

DNA and RNA Modii cation Enzymes

42. Pan Z, Sikandar S, Witherspoon M et al. Impaired placental trophoblast lineage diferentiation in Alkbh1(–/–) mice. Dev Dyn 2008; 237:316-327. 43. Westbye MP, Feyzi E, Aas PA et al. Human AlkB homolog 1 is a mitochondrial protein that demethylates 3-methylcytosine in DNA and RNA. J Biol Chem 2008; 283(36):25046-56. Epub 2008 Jul 3. 44. Falnes PO, Bjoras M, Aas PA et al. Substrate speciicities of bacterial and human AlkB proteins. Nucleic Acids Res 2004; 32:3456-3461. 45. Ringvoll J, Nordstrand LM, Vagbo CB et al. Repair deicient mice reveal mABH2 as the primary oxidative demethylase for repairing 1meA and 3meC lesions in DNA. EMBO J 2006; 25:2189-2198. 46. Sundheim O, Vagbo CB, Bjoras M et al. Human ABH3 structure and key residues for oxidative demethylation to reverse DNA/RNA damage. EMBO J 2006; 25:3389-3397. 47. Frayling TM, Timpson NJ, Weedon MN et al. A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science 2007; 316:889-894. 48. Gerken T, Girard CA, Tung YC et al. he obesity-associated FTO gene encodes a 2-oxoglutarate-dependent nucleic acid demethylase. Science 2007; 318:1469-1472. 49. Sanchez-Pulido L, ndrade-Navarro MA. he FTO (fat mass and obesity associated) gene codes for a novel member of the nonheme dioxygenase superfamily. BMC Biochem 2007; 8:23. 50. Jia G, Yang CG, Yang S et al. Oxidative demethylation of 3-methylthymine and 3-methyluracil in single-stranded DNA and RNA by mouse and human FTO. FEBS Lett 2008; [Epub ahead of print]. 51. Shi Y, Lan F, Matson C et al. Histone demethylation mediated by the nuclear amine oxidase homolog LSD1. Cell 2004; 119:941-953. 52. Wang Y, Wysocka J, Sayegh J et al. Human PAD4 regulates histone arginine methylation levels via demethylimination. Science 2004; 306:279-283. 53. Rozenski J, Crain PF, McCloskey JA. he RNA modiication database: 1999 update. Nucleic Acids Res 1999; 27:196-197. 54. Tsujikawa K, Koike K, Kitae K et al. Expression and sub-cellular localization of human ABH family molecules. J Cell Mol Med 2007; 11:1105-1116. 55. Kalhor HR, Clarke S. Novel methyltransferase for modiied uridine residues at the wobble position of tRNA. Mol Cell Biol 2003; 23:9283-9292. 56. Maris C, Dominguez C, Allain FH. he RNA recognition motif, a plastic RNA-binding platform to regulate posttranscriptional gene expression. FEBS J 2005; 272:2118-2131. 57. Hatield DL, Carlson BA, Xu XM et al. Selenocysteine incorporation machinery and the role of selenoproteins in development and health. Prog Nucleic Acid Res Mol Biol 2006; 81:97-142. 58. Buettner C, Harney JW, Berry MJ. he caenorhabditis elegans homologue of thioredoxin reductase contains a selenocysteine insertion sequence (SECIS) element that difers from mammalian SECIS elements but directs selenocysteine incorporation. J Biol Chem 1999; 274:21598-21602. 59. Carlson BA, Moustafa ME, Sengupta A et al. Selective restoration of the selenoprotein population in a mouse hepatocyte selenoproteinless background with diferent mutant selenocysteine tRNAs lacking Um34. J Biol Chem 2007; 282:32591-32602. 60. Begley U, Dyavaiah M, Patil A et al. Trm9-catalyzed tRNA modiications link translation to the DNA damage response. Mol Cell 2007; 28:860-870.

Chapter 15

he APOBEC1 Paradigm for Mammalian Cytidine Deaminases hat Edit DNA and RNA Harold C. Smith*

Abstract

P

roteins are classiied as members of the APOBEC family based on the occurrence of a signature amino acid sequence and its characteristic three-dimensional fold known as a zinc-dependent deaminase domain (ZDD). his domain enables APOBEC proteins to bind nucleic acids and in most cases, deaminate cytidines. he ZDD coordinates a zinc atom necessary for hydrolytic deamination of cytosine or cytidine to form uracil or uridine. he family is named ater the founding member Apolipoprotein B mRNA Editing Catalytic Subunit 1 or APOBEC1 that was discovered as the catalytic subunit of a macromolecular complex that carries out a site-speciic cytidine to uridine transition at nucleotide position 6666 in apoB mRNA. Although eleven additional members of this family have been discovered, APOBEC1 is the only one known to edit RNA. Current data suggest that the function of other members of the APOBEC family is to edit single stranded genomic or viral DNA. However cells may use the intrinsic RNA-binding of APOBEC proteins to suppress coding and noncoding RNAs. Binding RNA has the additional efect of inactivating APOBEC ssDNA editing activity. Within cells these interactions have been observed as the reversible formation of APOBEC homomultimeric complexes and high molecular mass complexes containing numerous other cellular or viral proteins and RNAs. he dynamics in the cell that determine active and inactive APOBEC are key to our understanding of how these enzymes can function without becoming genotoxic. his chapter will focus on factors responsible for apoB mRNA editing and their regulation and will draw parallels to systems involving other APOBEC family members. he goal of this chapter is to put into perspective mechanistic themes that continue to provide the foundation for testing new hypotheses. As such this chapter cannot be a comprehensive review and therefore where appropriate, the reader will be directed to other publications for details.

he APOBEC Protein Family

When APOBEC1 was discovered in 1993, there were no obvious homologous sequences listed in the human cDNA database. However the amino acid sequences and structures of prokaryotic cytidine deaminases active on nucleosides/nucleotides were known at that time and these provided a foundation for understanding of the APOBEC proteins1-5 (Fig. 1). Members of the APOBEC family of metalloenzymes coordinate a zinc atom through three residues (two cysteines and a histidine) that serve as a Lewis acid by positioning a water molecule for hydrolytic deamination *Harold C. Smith—Department of Biochemistry and Biophysics, University of Rochester School of Medicine and Dentistry, 601 Elmwood Avenue, Rochester New York, USA 14642. Email: [email protected]

DNA and RNA Modiication Enzymes: Structure, Mechanism, Function and Evolution, edited by Henri Grosjean. ©2009 Landes Bioscience.

182

DNA and RNA Modii cation Enzymes

Figure 1. Examples of the functional motifs within editing factors. APOBEC1 functional motifs are represented with an expansion showing essential amino acid residues. The ‘consensus’ ZDD motif found with APOBEC homologs (see also chapter by Wedekind and chapter by Parisien in this book), ADAR/ADAT (see also chapter by Wedekind and chapter by Haele in this book) and E. coli cytidine deaminase are indicated within the central box. In addition, the RNA-binding ZDD in the N-terminal half of APOBEC3G and ssDNA binding and catalytically active ZDD in its C-terminal half are shown. Functional motifs with in ACF and Vif are also indicated. Proteins are represented to scale with their respective molecular masses. Functional motifs are fill coded and keyed to the right. A color version of this image is available at www. landesbioscience.com/curie.

of cytidine (Fig. 1). he proximity of a conserved glutamic acid residue within the active site ensures that a proton is transferred from the water to the N3 imino group of the pyrimidine ring in the mechanism of hydrolytic cytidine deamination3,4,6-10 and a conserved proline residue ensures conformational positioning of the reacting moieties within the catalytic pocket11 (for details see ref. 5 and chapter by Wedekind et al). his zinc dependent deaminase domain (ZDD) is a deining characteristic of all APOBEC proteins5,12-14 and of adenosine deaminases active on double stranded RNA (Fig. 1) and tRNAs (referred to as ADAR and ADAT respectively).15-17 Phylogenetic modeling suggests that the APOBEC family evolved from a primordial cytidine deaminase active on free nucleosides/nucleotides.1,3,12,14,18-21 A series of gene mutation events may have given rise to an APOBEC progenitor cytidine deaminase with RNA or ssDNA editing function. Gene duplication, mutation and recombination would have led to the expansion of the APOBEC family to include AID and APOBEC1 on human chromosome 12 and APOBEC2, APOBEC3 and APOBEC4 on human chromosomes 6, 22 and 1 respectively. APOBEC222 and APOBEC423 are expressed in cardiac/skeletal muscle and testis respectively but have not been ascribed functions. All of the other members of the APOBEC family have been characterized as having functions.

he APOBEC1 Paradigm for Mammalian Cytidine Deaminases hat Edit DNA and RNA

183

AID deaminase activity on ssDNA within the variable region of immunoglobulin genes results in somatic hypermutation (SHM) that is necessary to produce antibodies with diferent antigen recognition characteristics24,25(see chapter by Maxwell et al). AID expression is also required for immunoglobulin class switch recombination (CSR), a nonhomologous recombination event that is necessary to produce antibodies that will have an appropriate distribution and functionality in the body24,26 and gene conversion (GC) in which stretches of nucleotide sequences from one of several pseudogene variable regions are recombined to generate immunoglobulin diversity in fowl, rabbits and sheep (reviewed in refs. 24, 27, 28). Evidence for the ancient origin of AID in vertebrate evolution comes from gene sequence comparisons demonstrating immunoglobulin gene SHM emerged in cartilaginous ish.21,29-31 In contrast, immunoglobulin gene CSR is irst evident in amphibians and land vertebrates.32 AID was discovered through a search for genes that participate in and regulate CSR and SHM through subtractive hybridization of mRNAs (cDNAs) expressed in B-cell lymphomas with and without induction of CSR activity.33 AID−/− knockout mice no longer carried out CSR, were more sensitive to secondary infections but otherwise were healthy. Patients with hyper-IgM syndrome Type 2 (HIGM2) that cannot perform CSR were demonstrated to have mutations that linked to the AID gene (see chapter by Wedekind et al for structural mapping of AID mutations associated with HIGM2).34 HIGM2 patients and AID−/− knock mice were also deicient in SHM. Expression of catalytically active AID was shown to be necessary and suicient to induce CSR and SHM35,36 (reviewed in refs. 25, 26). AID expression is also required for GC.37 AID functions in CSR, GC and SHM as a ssDNA deaminase targeting the transcribed regions of the immunoglobulin locus in B-lymphocytes38-41 that participate in nonhomologous recombination for CSR and GC and in the variable region of immunoglobulin genes for SHM. he resultant deoxyuridines trigger a repair response involving the removal of uridine bases by uracil DNA glycosylase (UNG)42,43 and strand break repair of the resultant apyrimidinic sites.44-46 Although ssDNA deaminase activity of AID is essential for both CSR and SHM, targeting of AID to these speciic genomic regions is independently regulated through chaperones and traicking into the nucleus.47,48 APOBEC3 proteins are only expressed in mammals and are largely viewed as having host-defense functions that provide a post-entry block to viral replication (for those viruses with an extracellular phase) and regulate mobile DNA transposable and retrotransposable elements within the genome (reviewed in refs. 49, 50). Mice have a single APOBEC3 gene that encodes a protein with two ZDD12,14,51 however an expansion of the APOBEC3 gene during evolution into a tandem array of APOBEC3A, 3B, 3C, 3D/E, 3F, 3G and 3H containing either one or two ZDD (Fig. 2) is suggested by the progressive increase in number of APOBEC3 genes from cloven-hoof mammals52 to nonhuman primates and humans.20,53,54 he emergence of the APOBEC3 gene cluster may have undergone adaptive evolution in response to the rapid evolution of endogenous retroelements and retroviruses.12,20,55-60 he genetic variation within the human APOBEC3 gene cluster is extremely high.20,53,54,61 Perhaps the most overt variation is in the APOBEC3B gene where deletions within this gene are becoming ixed in oceanic human populations.62 he function of APOBEC3G as an anti-viral host factor was demonstrated in 2002 by Michael Malim’s laboratory through cDNA transfer experiments designed to identify a host cell suppressor of the viral accessory protein known as the virion infectivity factor or Vif.63 Viruses deicient in Vif have low infectivity if they are produced in cells known as ‘nonpermissive’, but otherwise exhibit near wild type infectivity levels when produced in cells known as ‘permissive’. Several studies have shown ≥1000-fold reduced infectivity of virions produced by Vif-deicient virus compared to wild type virus in nonpermissive cells. Heterokaryons comprising nonpermissive and permissive cells retained the nonpermissive phenotype, demonstrating expression of a dominant inhibitory factor in nonpermissive cells that could be neutralized by Vif.66,67 Transfection of permissive cells with APOBEC3G cDNA proved necessary and suicient for conversion to the nonpermissive phenotype when challenged with Vif deicient virus. he inhibition is due to a defect at the post-entry step of infection arising from reduced reverse transcript production and/or stability.64,65

184

DNA and RNA Modii cation Enzymes

Figure 2. Summary of activity and subcellular localization. APOBEC family members are shown with their ZDD homologies aligned and to scale with their relative primary sequence length. Whether or not each APOBEC has been characterized as having deaminase activity is indicated (+ or −) to the left and subcellular distribution (C, cytoplasmic; N, nuclear) is listed to the right. For proteins with a bipartite distribution, N/C indicates the predominant cytoplasmic localization. The * next to the ZDD in APOBEC4 indicates that this sequence is divergent from the consensus.

APOBEC3 proteins deaminate deoxycytidine (dC) to form deoxyuridine (dU) within ssDNA regions of lentiviral proviral DNA that arise during its replication.68-73 he dC-to-dU transitions produce deoxyguanosine (dG) to deoxyadenosine (dA) mutations during positive strand HIV replication and these changes occur with a frequency similar to that observed in HIV DNA isolated from T-cells of HIV positive patients.74-76 APOBEC3G deaminase activity may not depend on additional68,77,78 host or viral factors as evident by the inding that most APOBEC3 proteins expressed in bacteria readily deaminate ssDNA in actively transcribed genes79,80 although there is evidence for a cellular cofactor that facilitates the anti-viral activities of APOBEC3F and 3G.81 To identify the antiviral deaminase domain of APOBEC3G, point mutagenesis and deletion mutagenesis were conducted on the N- and C-terminal ZDD motifs. Several groups ascribed the C-terminal ZDD motif as the source of antiviral deaminase activity, whereas the N-terminal ZDD motif was deemed necessary for RNA binding, interaction with HIV Gag protein and packaging of APOBEC3G into budding virons51,82-87 (reviewed in ref. 49). Other groups found that mutation in either ZDD motif abolished deaminase activity but did not ablate APOBEC3G antiviral activity.85,88,89 his efect has been attributed to an APOBEC3G-dependent physical block to reverse transcription.90-92 he data remain controversial as the antiviral efect of the catalytic mutant may be due to the experimental system in which APOBEC3G is expressed well beyond physiological levels.93 A similar controversy exists concerning the mechanism by which APOBEC3G inhibits hepatitis B virus.94-97 For more complete discussion of this topic the reader is directed to a recent review in reference 49. Long terminal repeat (LTR) containing retrotransposons are inhibited by APOBEC3B, C, F and G through both a reduction of the number of copies of reverse transcribed cDNAs as well as hypermutation.98 Non LTR retrotransposons (LINE and the L1-dependent SINE, principally Alu elements) are diferentially inhibited by APOBEC3 members. here are several mechanisms whereby APOBEC3 proteins inhibit these retroelements that include nuclear APOBEC3A, B and C blocking LINE reverse transcription and integration within the genome and APOBEC3B, F and G sequestering essential LINE encoded proteins, L1 RNA99 and Alu RNA100 in the cytoplasm101 (see discussion below).

he APOBEC1 Paradigm for Mammalian Cytidine Deaminases hat Edit DNA and RNA

185

Apolipoprotein B mRNA Editing Opens a New Field

Apolipoprotein B is an integral structural protein of lipopoprotein particles that is required for the assembly of lipids into very low-density lipoproteins (VLDL) in the liver and chylomicrons in the small intestine.102 his process is essential for mammalian life.103 ApoB predominantly exists as two variants, a full-length protein (ApoB100) and a truncated protein consisting of the N-terminal 48% of ApoB (ApoB48). Hepatic secretion of lipoproteins into the blood stream and their uptake by tissues is diferentially regulated through these ApoB variants. An elevated level of ApoB100 lipoproteins in circulation is positively correlated with a higher risk of developing atherosclerosis as seen in a number of diseases such as Type II diabetes and a variety of hyperlipidemias and obesity.104-107 ApoB mRNA editing was discovered simultaneously by the laboratories of Lawrence Chan and James Scott in an efort to determine the molecular mechanism regulating the expression of ApoB 100 and ApoB48.108,109 Editing occurs at nucleotide position 6666 in apoB mRNA through a posttranscriptional cytidine to uridine transition and converts a CAA glutamine codon (that enables ApoB100 to be expressed) to UAA translation stop codon (resulting in the expression of ApoB48). he cells that line the small intestine (enterocytes) of all mammalian species edit ∼100% of the apoB mRNA that they transcribe.110 A signiicant portion (40% to 70%) of apoB mRNA expressed in the liver of rodents is edited but this is not true in other species.111 ApoB mRNA is not edited in human and nonhuman primate liver (because the catalytic subunit APOBEC1 is not expressed in this tissue112) and this results in a heightened risk of cardiovascular disease in persons consuming a western diet consisting of high fat and high fructose sweeteners.113,114 he discovery of APOBEC1 as the enzyme responsible for apoB mRNA editing was a signiicant breakthrough in the ield115 and together with the availability of the human genome sequence, proved to be important in the discovery of the APOBEC protein family1,12,14 (Fig. 1). Functional characterization of APOBEC1 and in fact its discovery was expedited by pre-existing enabling technologies.116 Speciically, progress in the ield was enabled through the methods for in vitro RNA editing on short recombinant apoB RNA reporters in cell or tissue extracts and a rapid quantitative assay for editing activity (known as ‘poisoned’ primer extension117). APOBEC1 was identiied by size fractionating polyA+ mRNA from rat small intestine and microinjecting these RNAs into Xenopus oocytes115 for expression. Oocyte extracts were screened for in vitro editing in an assay containing an apoB mRNA reporter and cell extracts from chicken small intestine (that can support editing activity on human apoB RNA in vitro but do not naturally edit chicken apoB mRNA in vivo118). A cDNA encoding a 229 amino acid open reading frame for APOBEC1 was cloned and shown to induce apoB mRNA editing in transfected human liver cells. APOBEC1 was proven to be the sole cytidine deaminase responsible for apoB mRNA editing using APOBEC1−/− knockout mice. hese mice no longer edited intestinal or liver apoB mRNA and produced chylomicrons and VLDL using only ApoB100.119,120 APOBEC1 gene delivery induced apoB mRNA editing activity.121-125

Identiication of the Minimal Components of Editosome Assembly

he nucleotides lanking cytidine 6666 that are required for editing site recognition had been identiied prior to the discovery of APOBEC1.126-129 he entire editing site consists of tripartite motif: a 5ʹ enhancer sequence (improves the eiciency of editing site recognition), a four nucleotide spacer 3ʹ of the editing site and an eleven nucleotide mooring sequence (reviewed in refs. 19, 116). he mooring sequence serves as the principal cis-acting element for editing site recognition. Translocation of the mooring sequence to other RNAs is typically suicient to direct editing to 5ʹ cytidines130,131 provided that the lanking RNA sequences are A-T rich and the cells or cell extracts can support editing activity. A tripartite motif also supports editing at an additional site within apoB mRNA 3ʹ of cytidine 6666 (nt 6802) whose editing has no functional consequence because these mRNAs are typically edited at nt 6666 as well. he mRNA encoding the NF1 tumor suppressor (a G-protein regulator of Ras signaling), also contains a tripartite motif whose editing may contribute to the dysregulation

186

DNA and RNA Modii cation Enzymes

of Ras signaling seen in neuroibromas, gliomas and schwannomas.132,133 While computational methods have identiied other mRNAs with mooring sequences in the annotated human, mouse and rat cDNA databases,19 none of these candidate editing sites supported editing activity when added to editing competent extracts. Although editing of these transcripts in yet-to-be identiied cell types or tissues cannot be ruled out, additional constraints in vivo may limit editing. For example, the close proximity of the tripartite motif to pre-mRNA splicing sites (a characteristic of most of the candidate editing sites) can dramatically reduce editing site utilization in the context of reporter RNAs.134-136 APOBEC1 does not selectively bind to the mooring sequence. APOBEC1 can bind AU-rich RNA nonspeciically and with low ainity137 through key residues within its ZDD (Fig. 1). Puriied recombinant APOBEC1 alone cannot edit RNA unless the in vitro reaction is incubated at 45˚C.138 However, ssDNA editing activity of most members of the family, including APOBEC1, will take place at 30˚C to 37˚C s when puriied recombinant proteins are added to ssDNA substrates that are partially or completely single stranded.71,72,139-142 APOBEC1 requirement for elevated temperatures to edit RNA stems from a requirement for a single stranded RNA substrate that is ensured by heat denaturation of the AT-rich RNA sequence surrounding the apoB editing site.143 In this regard, the next major advance in the ield was the discovery of an RNA binding protein that could recruit APOBEC to the mooring sequence and facilitate site-speciic editing. A role for RNA binding proteins in editing activity was irst suggested by glycerol gradient sedimentation studies. Reporter RNAs containing the mooring sequence assembled as 11S complexes that progressed to 27S complexes with longer incubations. Both complexes contained RNA binding proteins that selectively bound to the mooring sequence.48,144 he 27S complexes were proposed to be C to U editosomes because: (1) they did not form on RNAs lacking the mooring sequence,145 (2) their assembly only occurred in cell or tissue extracts that supported apoB mRNA editing,144 (3) in vitro editing activity commenced following their assembly145 and (4) edited RNA and editing factors were recovered from these complexes.145,146 Donna Driscoll’s laboratory was irst to identify and clone the mooring sequence RNA binding protein responsible for site-speciic editing. hey used a combination of apoB RNA ainity chromatography of baboon kidney extracts and peptide sequencing to obtain a human EST clone to screen a human cDNA library.146 he newly identiied clone encoded a ∼64 kDa protein (dubbed as APOBEC1 Complementation Factor (ACF)) that proved to be necessary and suicient to complement APOBEC1 in site-speciic apoB mRNA editing. Immunodepletion of ACF from extracts resulted in a marked inhibition of in vitro editing activity. hese studies brought closure to the controversy over whether apoB mRNA editing involved more than one protein by showing that ACF interacted with APOBEC1 to form the ‘minimal editosome’ and gave credence to the proposed role of RNA binding proteins in the editosome assembly process.144,145,147,148 A number of alternatively spliced variants of the ACF were subsequently identiied by several labs through biochemical and bioinformatics analyses.149-152 An alternatively spliced variant of ACF153 known as APOBEC1 Stimulatory Protein, ASP151 was discovered in the same time frame as ACF. Although expression of ASP in rat liver is >10-fold lower than ACF,153 on a per mass basis, ASP is as good as ACF in complementing APOBEC1 editing activity.151,153 Although alternatively spliced ACF variants identiied subsequently19,152 contained the same three RNA Recognition Motifs (RRM) in tandem followed by Nuclear Localization Signal (NLS) found in ACF and ASP (Fig. 1 and reviewed in refs. 19, 116, 154), they did not have the same ability to bind to APOBEC1 or the mooring sequence nor did they complement editing with the same eiciency.19,152 In addition, these ACF variants were expressed at diferent levels in various tissues. he mechanism ACF variants serve in editosome assembly and function remains to be determined.144,148,155 Historically, the process of searching for a factor that could complement APOBEC1, lead to the discovery of several RNA-binding proteins (some containing three RRMs) that had the ability to bind APOBEC1, apoB mRNA and/or ACF (156-158 and reviewed in ref. 19). In contrast to ACF, introduction of these RNA-binding proteins into cells through transfection or addition of

he APOBEC1 Paradigm for Mammalian Cytidine Deaminases hat Edit DNA and RNA

187

recombinant proteins to in vitro editing assays inhibited editing activity. It has been proposed that the function of these ‘candidate’ auxiliary proteins may be to suppress the activity of the C to U editosome by interacting with ACF and/or APOBEC1.156,157 In fact, complexes containing ACF and APOBEC1 that do not supporting editing in situ have been isolated from the cytoplasm of cells144,159 (see further discussion below) and immunoprecipitation analysis suggested that ACF and APOBEC are not directly associated with each other in these complexes.160 he ability of ACF to selectively bind to the mooring sequence and position APOBEC1 for site-speciic editing has focused attention on ACF as an RNA editing factor. However, ACF is likely to have other functions because it is an essential gene product that is required at or before the time of blastocyte implantation.149 his is in contrast to APOBEC1 which is not an essential gene product119,120 as well as ApoB that becomes a requirement at the time of yolk sack development and thereater.103 It is not known whether ACF binds to other APOBEC family members however these proteins are either not essential (e.g., APOBEC2 and APOBEC3161) or only required later in life for a fully functional immune system (e.g., AID34,162). Structural analyses of ACF and its interactions with the mooring sequence and APOBEC1 will hopefully be forthcoming and provide insight for future studies of ACF function(s) during cell growth and tissue development. In contrast to the sequence requirements for APOBEC1 editing of RNA, ssDNA editing activity by APOBEC family members is lax and does not require ACF. With rare exception (APOBEC2,161 APOBEC423) all members of the APOBEC family will bind to and edit several genomic sequences when transformed into E. coli.77,79,80,163 he cis-acting sequence requirement for ssDNA editing is not well characterized but there are 5ʹ nearest neighbor preferences. hese are for example: GTC for APOBEC1;79 (A/T)(A/G)C for AID,71,79,163,164; TTC for APOBEC3F and GCC for APOBEC3G,68,70,73,77-79,165-167 (where the edited C is underlined). AID prefers to edit ssDNA within unpaired regions (bubble) of otherwise duplex DNA71 such as is predicted to be present in transcribed regions of the genome. APOBEC3 proteins may have similar preferences but in general, bind and edit ssDNA as it becomes exposed during reverse transcription of the viral RNA genomes.72,73,168 Once bound to a ssDNA substrate, both AID and APOBEC3G have been shown to be processive enzymes with 3ʹ to 5ʹ polarity of their catalytic activities.72,164,168

Subcellular Distribution of Editing Factors Determines heir Access to Substrates

RNA sequence analysis by Lawrence Chan’s laboratory demonstrated that apoB mRNA editing activity occurred on nuclear RNA. Editing took place subsequent to polyadenylation and coincident with or immediately ater pre-mRNA splicing.169 Even though APOBEC1 and ACF are distributed throughout the cell, 27S editosomes are only recovered from nuclear extracts.159 Metabolic activation of apoB mRNA editing does not require de novo protein synthesis170 but rather can be accomplished through nuclear import of pre-existing cytoplasmic ACF and APOBEC1.160 In addition, access to nuclear pre-mRNA within the time frame of transcription, processing and nuclear export requires precise timing. Localization of suicient editing factors to ensure eicient editosome assembly must therefore involve regulation at the temporal and spatial level as proposed in the ‘gating hypothesis’.134 Taken together these indings underscore the importance of intracellular traicking of editing factors in the regulation of editing activity. APOBEC1 contains signals for both nuclear localization (NLS) and cytoplasmic retention (CRS).171 he CRS of APOBEC1 is a dominant determinant that must be masked or inhibited before APOBEC1 can enter the nucleus. Although it has not been completely resolved, the NLS within ACF may determine traicking of both proteins to the nucleus.172,173 Metabolic regulation of hepatic ACF and APOBEC1 (e.g., through ethanol or insulin signaling pathways) promotes nuclear retention of these proteins through phosphorylation of key serine residues in ACF by protein kinase C.174 Hyperphosphorylated ACF is retained in the nucleus but ACF nuclear import and ACF binding to APOBEC1 do not require phosphorylation. Biochemical studies have shown that the interaction of hyperphosphorylated ACF with APOBEC1 is improved and

188

DNA and RNA Modii cation Enzymes

is more eicient in complementing editing activity. Consistent with this is the inding that in vitro editing activity in hepatocyte nuclear extracts was reduced by treating them with phosphatase.160 In this regard, reduction of serum insulin concentration in fasting animals or the removal of insulin from primary hepatocyte cultures resulted in dephosphorylation of ACF, accumulation of ACF in the cytoplasm and a reduction of apoB mRNA editing activity in situ.155,175,176 Regulation of activity through protein traicking is also seen for AID.46,177,178 In this instance, CSR in activated B-cells is dependent on an evolutionarily conserved, nuclear export signal (NES) within the C-terminus of AID.29,46,177,178 In addition to regulating AID traicking to the nucleus, interactions through the NES are proposed to target AID editing activity to select ssDNA sequences within the genome and thereby induce nonhomologous recombination for CSR and GC. Protein kinase A phosphorylation of serine within the N-terminus of AID enhances binding to replication protein A (RPA) and promotes both CSR and SHM.41,179-181 Although recombinant AID can bind to and deaminate ssDNA in vitro,141,182-184 RPA is likely to serve in vivo as a molecular chaperone for traicking of AID and its targeting to appropriate ssDNA within chromatin.40 Protein phosphatase I-dependent dephosphorylation of ACF results in ACF nuclear export and reduced binding to APOBEC1.174 Given that phosphorylated and dephosphorylated ACF appear to bind equally well to apoB mRNA,185,186 it has been proposed that ACF remains bound to apoB mRNA during nuclear export to the cytoplasm.19,174 ACF phosphorylation (and nuclear retention) therefore may regulate not only editing activity but also the amount of apoB mRNA transported to the cytoplasm and available for translation. Evidence suggesting that ACF is bound to apoB mRNA during translation was irst presented by Edward Fisher’s lab who showed that apoB mRNA translation complexes (polysomes) were atypically buoyant in sedimentation gradients and that this characteristic was mooring sequence dependent.187 ACF had not been discovered at that time but by inference, the data suggest that the buoyancy of these polysomes was due to viscous-drag or a ‘parachute efect’ from high molecular mass complexes containing ACF bound to the mooring sequence. he next line of evidence came from immunoelectron microscopy of rat liver thin sections demonstrating that ACF is concentrated along the exterior surface of the endoplasmic reticulum159 (the site of apoB mRNA translation). Finally, edited apoB mRNA is stabilized in the cytoplasm even though the presence of the premature UAA stop codon would otherwise subject the mRNA to rapid degradation by the nonsense codon mediated decay (NMD) mechanism.173 he block to NMD on edited apoB mRNA is dependent on the mooring sequence at the editing site and the expression of ACF. Active stabilization of edited apoB mRNA relative to unedited apoB mRNA may be a contributing factor to a long standing observation that in species with hepatic apoB mRNA editing, VLDL containing ApoB48 are produced and secreted in greater abundance than those that assembled on ApoB100.188

Stringent Control of APOBEC Proteins

APOBEC1 idelity for editing sites is coupled to the level of its expression. Constitutive high levels of APOBEC1 ectopic expression in cell lines136,189,190 or transgenic animals191-193 led to aberrant site editing and neoplastic transformation. High levels of site-speciic editing such as that observed in the small intestine in vivo are thought to be due to the interaction of APOBEC1 with ACF and their constitutive activation.148 However APOBEC1 abundance in liver and intestine was extremely low (not readily detectable by western blotting) whereas ACF is a moderately abundant protein (estimated to be 100- to 500-fold less abundant than β actin in rat liver based on 2D PAGE, Smith unpublished data). Moreover, the bulk of both proteins were sequestered in the cytoplasm as complexes that are not active in editing (see discussion below). he underlying basis for neoplastic transformation may have been due to excessive amounts of APOBEC1 that aberrantly edited mRNA(s) that otherwise were not substrates leading to the expression of a dysfunctional proteome.192 Protein overexpression leading to a cancer phenotype has also been observed with other APOBEC members such as AID182,194-198 and members of the APOBEC3 subgroup.79,199 In these

he APOBEC1 Paradigm for Mammalian Cytidine Deaminases hat Edit DNA and RNA

189

situations genotoxicity due to ssDNA editing has been proposed as the underlying transforming mechanism. It was in fact in the course of studies on AID and APOBEC3 that APOBEC1 was shown to be a very efective ssDNA editing cytidine deaminase.77,79,80,116 his inding suggested an alternative hypothesis that excessive expression of APOBEC1 can become genotoxic when its abundance exceeds a threshold that cellular factors can regulate. In this hypothesis, APOBEC1 is free to difuse to the nucleus and once there, binds to and mutates ssDNA within actively transcribed regions of the genome. Regulation of protein expression and restricted access to the cell nucleus was in fact a characteristic found for many APOBEC family members (Fig. 2). Although the abundance of AID could become higher than that of APOBEC1 (AID was readily detected by western blotting and immunocytochemical staining of B-cells (http://www.lsbio.com/Products/GeneDetail.aspx? LSID = 170008), it was acutely expressed during B-cell activation40,200,201 and rapidly eliminated by ubiquitination-dependent degradation.202 AID deaminase activity on ssDNA could be inactivated through its interaction with RNA.71 By analogy to other family members, it is likely that the ZDD of AID bound to RNA and this inhibited or displaced ssDNA from the active site.203 AID also can be regulated by restricting its access to the cell nucleus46,177 through interactions with auxiliary proteins181,204 and phosphorylation.179,180 APOBEC3G and APOBEC3F were more abundant than APOBEC1 and AID. APOBEC3G is estimated to be 200- to 700-fold less abundant than β actin in human peripheral blood mononuclear cells and APOBEC3F is estimated to be 5- to 10-fold less abundant that APOBEC3G (Leonard and Smith, unpublished ELISA data). APOBEC3G was restricted to the cytoplasm by its own CRS located immediately C-terminal to the N-terminal ZDD205,206 he CRS is likely to restrict APOBEC3G to the cytoplasm through protein-protein interactions although APOBEC3G interactions with several cytoplasmic RNAs through its N-terminal ZDD86,207-216 also would contribute to cytoplasmic retention. APOBEC3G is expressed at diferent basal levels in the various white blood cell types.210,217-220 APOBEC3G expression can be transcriptionally activated by various mitogens and cytokines210,218,221 however this did not necessarily lead to increased abundance of catalytically active enzyme. APOBEC3G ssDNA deaminase activity and function as a host defense factor could be suppressed through the formation of high molecular mass (HMM) ribonucleoprotein complexes with a variety of cytoplasmic RNAs.208,218,219 Cells that were most resistant to HIV infection maintained cytoplasmic APOBEC3G in low molecular mass (LMM) complexes that have little or no bound RNA (reviewed in ref. 49). Regulation of APOBEC3 abundance was also important for viral infectivity. Upon HIV infection APOBEC3G (and APOBEC3F) was rapidly polyubiquitinated and degraded through the proteosomal protein degradation pathway (reviewed in ref. 49). It is not certain whether ubiquitination-dependent degradation of APOBEC3G/3F is a normal cellular mechanism for turnover, however polyubiquitination of the HIV Vif was required for rapid degradation of APOBEC3G.63,222-227 here are several residues within the N-terminus of Vif that was essential for binding to APOBEC3G and/or APOBEC3F228-232 and the C-terminus contains residues that bound to Cullin 5 and Elongin C of the cellular ubiquitination machinery230,233-237 (Fig. 1). APOBEC3G interacted with human Vif through key residues within its N-terminal half, one of which (D128) determined species-speciic Vif-APOBEC3G interactions69,231,238-243 (Fig. 1). hrough these interactions Vif chaperoned APOBEC3F and 3G to the proteosome for degradation, thereby eliminating these proteins and in the process is itself degraded222 (reviewed in ref. 49). In the absence of a Vif viral defense mechanism, newly synthesized APOBEC3 proteins219 assembled with HIV virions through interactions with HIV RNA genomes, viral Gag protein and cellular RNAs.86,92,208,214,239,244-248 Following infection, APOBEC3F/3G in the viral core interfered with viral replication and hypermutated nascent proviral ssDNA (reviewed in ref. 49). his is possible because Vif was not expressed until late stages of infection and therefore could not block APOBEC3 coming in with virions. his is why HIV virions that do not contain APOBEC3F/G could still be arrested if APOBEC3F/3G was maintained in cells as LMM complexes (such as

190

DNA and RNA Modii cation Enzymes

was the case in resting T-lymphocytes) but were fully infectious in cells when APOBEC3F/3G was inactivated in HMM complexes (as is the case in activated T-lymphocytes).210,219

Regulation through Macromolecular Complex Formation

he current hypothesis is that a dimer of APOBEC14,116,249,250 binds to ACF as the minimal in vitro C to U editosome (118 kDa) and this complex binds to the mooring sequence for site speciic editing.116 he composition of C to U editosomes in situ remains an open question and evidence from yeast two hybrid analysis suggested that ACF can homodimerize.116 Glycerol gradient sedimentation of functional C to U editosomes isolated from rat liver nuclear extracts155,160 or assembled on an apoB RNA reporter (490 nt long) in vitro116,144,145 suggested these complexes were 27S (>500 kDa). he kinetics of in vitro C to U editosome assembly suggested that protein complexes with apoB reporter RNA proceeded through an 11S intermediate complex (∼250 kDa).116,144,145 Atomic force microscopy of ainity puriied catalytically active C to U editosomes assembled in vitro in McArdle hepatoma cell extracts with recombinant 6His tagged APOBEC1251 suggested complexes equivalent to 650 kDa, consistent with glycerol gradient sedimentation studies (http://dbb.urmc.rochester.edu/labs/smith/photo_gallery.htm). Taken together the data suggested that the C to U editosome in cells has a higher-order state that is more complex than the minimally functional editosome. Atomic force microscopy,168 size exclusion chromatography139,217,218 and small angle X-ray scattering139 also have suggested higher order complexes of APOBEC3G as homo dimers, tetramers and hexamers. he oligomeric state of APOBEC3G has been suggested to be essential for 3ʹ to 5ʹ processivity of deaminase activity along ssDNA and the orientation of the APOBEC3G catalytic domain relative to the cytidines in the ssDNA.168 However the catalytic domain of APOBEC3G could be expressed as a soluble, monomeric C-terminal fragment following selective mutagenesis and this construct retained catalytic activity despite being unable to dimerize.252 NMR analysis showed that the fragment largely conformed to the structure of known cytidine deaminases (see chapter by Wedekind et al) and chemical shits indicated select residues in the catalytic pocket that interacted with ssDNA oligonucleotides.253 hese indings have fueled a controversy over whether monomers or multimers of APOBEC3G are catalytically active despite the knowledge that all known cytosine/cytidine deaminase function as homo or heteromultimers (see chapter Wedekind et al). he higher order organization of AID is also controversial. Co-immunoprecipitation of mutant and wild type AID coupled with activity analyses suggested that AID dimers formed through its N-terminal 60 amino acids and that dimerization was required for activity.254 he crystal structure of an N-terminal truncated form of APOBEC2 (which is the approximate size of AID) has been determined as an elongated N-terminal dimer.255 Modeling of AID upon this structure suggested a good it with an N-terminal dimeric interface. Conlicting with these conclusions were data from atomic force microscopy coupled with functional analyses suggesting that AID is active as a monomer.256 Although the controversy has centered on whether APOBEC proteins can be active as monomers or must form homomultimers for activity, it is important to not lose track of the consistent inding that APOBEC family members reside in higher-order complexes within cells and that their association with cellular proteins (such as ACF for APOBEC1) are likely to have important regulatory roles in the cell.87,101,139,168,181,209,255 Among the largest of these complexes mentioned earlier in this chapter were the HMM ribonucleoprotein particles (RNP) containing APOBEC3F and 3G that range from 5 to 15 megadaltons. hese complexes were held together through RNA-bridged interactions with proteins associated with cytoplasmic stress granules and RNA-processing bodies (p-bodies).208,209,211 Not only were these complexes instrumental in dynamically regulating active and inactive APOBEC host-defense factors (described above), but their assembly with various retroviral/retroelement RNA, micro RNAs207 and cellular RNAs50,87,208,209,211 also are proving to be important in regulating translation and other RNA functions in the cell (reviewed in refs. 49, 257). he composition of macromolecular complexes

he APOBEC1 Paradigm for Mammalian Cytidine Deaminases hat Edit DNA and RNA

191

regulating the function of other APOBEC family members is likely to be an important focus of future research in this ield.

Conclusions and Prospects

Research on apolipoprotein B (apoB) mRNA editing over the past twenty years has led to the discovery of APOBEC1, its complementing factors and the physiological and cellular dynamics that regulate editosomal complexes. Although these discoveries occurred in the context of research on cardiovascular disease, the identiication of the APOBEC family comprising twelve structural homologs within the past ten years has led to new discoveries demonstrating the diverse functions these proteins have and their broad impact on human health and disease (Fig. 3). Examples of systems afected by APOBEC proteins include: the control of retroelements, DNA recombination, cell signaling, genome mutation, intracellular traicking of proteins, cytoplasmic ribonucleoprotein function, lipoprotein metabolism, neoplastic transformation, proteome diversiication, proteosomal function, regulation of siRNA in the control of translation, RNA turnover and viral infectivity. he ield needs to continue to progress in the area of structural analysis of APOBEC proteins and their interactions with nucleic acids and other cellular or viral proteins. High-resolution structures of APOBEC proteins in complex with RNA and ssDNA will further our understanding of not only the catalytic mechanism but also address the key issue of regulation such as substrate speciicity and processivity. Knowledge of the amino acid residues necessary for nucleic acid binding and deaminase

Figure 3. Biological systems impacted by the function of editing enzymes. The APOBEC family of C to U editing enzymes (12 proteins) are structurally related to the ADAR family of A to I editing enzymes (3 proteins) active on dsRNA and the ADAT family of A to I editing enzymes (3 proteins) active on tRNA. Research over the past 20 years has revealed that the expression of these enzymes is essential for the function, or in some cases dysfunction, of a broad array of mammalian physiology (discussed throughout this chapter). Shown in Venn diagram format are the APOBEC and ADAR/ADAT families of enzymes. Members in each family play critical roles in various physiological systems or disease states as represent through overlapping spheres and ovals (the size of which are arbitrary). For more information see chapters by Wedekind, Parisien and Haele in this book. A color version of this figure is available at www.landesbioscience.com/curie.

192

DNA and RNA Modii cation Enzymes

activity will also facilitate experiments to determine why RNA binding to the deaminase domain of AID, for example, or to the N-terminal noncatalytic ZDD of enzymes such as APOBEC3G inhibits ssDNA deaminase activity. High resolution structure-function analyses of interacting proteins such as ACF, RPA and Vif will be important for understanding how these proteins regulate APOBEC and target binding to RNA or DNA. he open question of whether APOBEC proteins are functional in biological systems as subunits or multimers must be addressed through structure-guided functional assays (see chapter by Wedekind et al). Future experiments also need to focus on understanding regulation of APOBEC proteins in the cell. Cell signal transduction, cell cycle progression, the diferentiated phenotype of cells, embryogenesis, neoplastic transformation and viral life cycle have now all have been linked to the expression of APOBEC proteins and the macromolecular interactions that regulate deaminase activity. We currently do not fully understand the molecular basis for these linkages. Future studies need to address transcriptional and translational regulation of APOBEC protein expression and determine how posttranslational modiications regulate APOBEC protein abundance, activity and intracellular traicking. he unifying theme that activity is regulated through the formation of higher order complexes tells us that there are dynamic protein-protein and protein-RNA interactions that cells use in the acute and long-term control of APOBEC functions. hese areas of research are likely to become the major focus for the next two decades as they address the central question of the mechanisms that cells and viruses use to manage the activities of potentially genotoxic proteins. A major translation research problem that lies before this ield is whether we can use the knowledge of APOBEC protein structure, function and cell/viral regulation to understand human health and disease. Beyond this, the next generation of research will have new gene delivery systems and stems cells that will enable biotechnology and the development of therapeutics that targeting APOBEC proteins to improved healthcare.

Acknowledgements

he author thanks, Jenny M.L. Smith for the preparation of Figures and Drs. Andrea Bottaro and Ryan Bennett as well as Chad Galloway and Jason Salter for critical reading and discussions. he author has sought to reference contributions to the discovery process on APOBEC proteins. Due to the restrictions of page limits, a comprehensive recognition of all of the contributions was not possible. References were selected based on their data content and priority in discovery. It is hoped that this review will encourage the reader to pursue more broadly the literature in topics of interest. his chapter was written while the author was on sabbatical leave and its preparation was not supported through extramural funding agencies.

References

1. Anant S, Yu H, Davidson NO. Evolutionary origins of the mammalian apolipoproteinB RNA editing enzyme, apobec-1: structural homology inferred from analysis of a cloned chicken small intestinal cytidine deaminase. Biol Chem 1998; 379:1075-18081. 2. Nagahara H, Vocero-Akbani AM, Snyder EL et al. Transduction of full-length TAT fusion proteins into mammalian cells: TAT-p27Kip1 induces cell migration. Nat Med 1998; 4:1449-1452. 3. Navaratnam N, Bhattacharya S, Fujino T et al. Evolutionary origins of apoB mRNA editing: catalysis by a cytidine deaminase that has acquired a novel RNA-binding motif at its active site. Cell 1995; 81:187-195. 4. Navaratnam N, Fujino T, Bayliss J et al. Escherichia coli cytidine deaminase provides a molecular model for ApoB RNA editing and a mechanism for RNA substrate recognition. J Mol Biol 1998; 275(4):695-714. 5. MacElrevey CA, Wedekind JE. Chemistry, phylogeny and three-dimensional structure of the APOBEC protein family. In RNA and DNA Editing: Molecular mechanisms and their integration inot biological systems. (H. Smith, ed) Hoboken, NJ: Wiley and Sons 2008; 16:369-420. 6. MacGinnitie AJ, Anant S, Davidson NO. Mutagenesis of APOBEC-1, the catalytic subunit of the mammalian apolipoprotein B mRNA editing enzyme, reveals distinct domains that mediate cytosine nucleoside deaminase, RNA-binding and RNA editing activity. J Biol Chem 1995; 270:14768-14775.

he APOBEC1 Paradigm for Mammalian Cytidine Deaminases hat Edit DNA and RNA

193

7. Yamanaka S, Poksay KS, Balestra ME et al. Cloning and mutagenesis of the rabbit ApoB mRNA editing protein. A zinc motif is essential for catalytic activity and noncatalytic auxiliary factor(s) of the editing complex are widely distributed. J Biol Chem 1994; 269:21725-21734. 8. Barnes C, Smith HC. Apolipoprotein B mRNA editing in vitro is a zinc-dependent process. Biochem Biophys Res Commun 1993; 197:1410-1414. 9. Johnson DF, Poksay KS, Innerarity TL. he mechanism for apo-B mRNA editing is deamination. Biochem Biophys Res Commun 1993; 195:1204-1210. 10. Navaratnam N, Morrison JR, Bhattacharya S et al. he p27 catalytic subunit of the apolipoprotein B mRNA editing enzyme is a cytidine deaminase. J Biol Chem 1993; 268:20709-20712. 11. Smith AA, Carlow DC, Wolfenden R et al. Mutations afecting transition-state stabilization by residues coordinating zinc at the active site of cytidine deaminase. Biochemistry 1994; 33:6468-6474. 12. Jarmuz A, Chester A, Bayliss J et al. An Anthropoid-speciic locus of orphan C to U RNA-Editing enzymes on chromosome 22. Genomics 2002; 79:285-296. 13. Mian IS, Moser MJ, Holley WR et al. Statistical modelling and phylogenetic analysis of a deaminase domain. J Comput Biol 1998; 5:57-72. 14. Wedekind JE, Dance GS, Sowden MP et al. Messenger RNA editing in mammals: new members of the APOBEC family seeking roles in the family business. Trends Genet 2003; 19:207-216. 15. Maas S, Rich A, Nishikura K. A-to-I RNA editing: recent news and residual mysteries. J Biol Chem 2003; 278:1391-1394. 16. Keegan LP, Leroy A, Sproul D et al. Adenosine deaminases acting on RNA (ADARs): RNA-editing enzymes. Genome Biol 2004; 5:209. 17. Reenan RA. he RNA world meets behavior: A—>I pre-mRNA editing in animals. Trends Genet 2001; 17:53-56. 18. Smith HC. Editing informational content of expressed DNA sequences and their transcripts. In the implicit genome 2006; (LH Caporale, ed) NY, NY: Oxford University Press 2006; 14:248-265 19. Smith HC, Wedekind JE, Xie K et al. Mammaliam C to U editing. Topics in current genetics. (H Grosjean, ed) Germany: Springer-Verlag 2005; 12:365-400. 20. Sawyer SL, Emerman M, Malik HS. Ancient adaptive evolution of the primate antiviral DNA-editing enzyme APOBEC3G. PLoS Biol 2004; 2:E275. 21. Conticello SG, homas CJ, Petersen-Mahrt SK et al. Evolution of the AID/APOBEC family of polynucleotide (deoxy)cytidine deaminases. Mol Biol Evol 2005; 22:367-377. 22. Liao W, Hong SH, Chan BH et al. APOBEC-2, a cardiac- and skeletal muscle-speciic member of the cytidine deaminase supergene family. Biochem Biophys Res Commun 1999; 260:398-404. 23. Rogozin IB, Basu MK, Jordan IK et al. APOBEC4, a new member of the AID/APOBEC family of polynucleotide (deoxy)cytidine deaminases predicted by computational analysis. Cell Cycle 2005; 4:1281-1285. 24. MacDuf DA, Ofer SM, Demorest ZL et al. Antibody gene diversiication by AID-Catalyzed DNA Editing. In: RNA and DNA Editing: Molecular mechanisms and their integration into biological systems. (HC, Smith, ed) John Wiley and Sons 2008; 2:31-70. 25. Peled JU, Kuang FL, Iglesias-Ussel MD et al. he biochemistry of somatic hypermutation. Annu Rev Immunol 2008; 26:481-511. 26. Stavnezer J, Guikema JE, Schrader CE. Mechanism and regulation of class switch recombination. Annu Rev Immunol 2008; 26:261-292. 27. Fugmann SD, Schatz DG. Immunology. One AID to unite them all. Science 2002; 295:1244-1245. 28. Honjo T, Muramatsu M, Fagarasan S. AID: how does it aid antibody diversity? Immunity 2004; 20:659-668. 29. Ichikawa HT, Sowden MP, Torelli AT et al. Structural phylogenetic analysis of activation-induced deaminase function. J Immunol 2006; 177:355-361. 30. Zhao Y, Pan-Hammarström Q, Zhao Z et al. Identiication of the activation-induced cytidine deaminase gene from zebraish: an evolutionary analysis. Dev Comp Immunol 2005; 29:61-71. 31. Barreto VM, Pan-Hammarstrom Q, Zhao Y et al. AID from bony ish catalyzes class switch recombination. J Exp Med 2005; 202:733-738. 32. Hinds-Frey KR, Nishikata H, Litman RT et al. Somatic variation precedes extensive diversiication of germline sequences and combinatorial joining in the evolution of immunoglobulin heavy chain diversity. J Exp Med 1993; 178:815-824. 33. Muramatsu M, Kinoshita K, Fagarasan S et al. Class switch recombination and hypermutation require activation-induced cytidine deaminase (AID), a potential RNA editing enzyme. Cell 2000; 102:553-563. 34. Revy P, Muto T, Levy Y et al. Activation-induced cytidine deaminase (AID) deiciency causes the autosomal recessive form of the Hyper-IgM syndrome (HIGM2). Cell 2000; 102:565-575.

194

DNA and RNA Modii cation Enzymes

35. Martin A, Bardwell PD, Woo CJ et al. Activation-induced cytidine deaminase turns on somatic hypermutation in hybridomas. Nature 2002; 415:802-806. 36. Okazaki IM, Kinoshita K, Muramatsu M et al. he AID enzyme induces class switch recombination in ibroblasts. Nature 2002; 416:340-345. 37. Arakawa H, Saribasak H, Buerstedde JM. Activation-induced cytidine deaminase initiates immunoglobulin gene conversion and hypermutation by a common intermediate. PLoS Biol 2004; 2:E179. 38. Yu K, Roy D, Bayramyan M et al. Fine-structure analysis of activation-induced deaminase accessibility to class switch region R-loops. Mol Cell Biol 2005; 25:1730-1736. 39. Larson ED, Maizels N. Transcription-coupled mutagenesis by the DNA deaminase AID. Genome Biol 2004; 5:211. 40. Nambu Y, Sugai M, Gonda H et al. Transcription-coupled events associating with immunoglobulin switch region chromatin. Science 2003; 302:2137-2140. 41. Chaudhuri J, Tian M, Khuong C et al. Transcription-targeted DNA deamination by the AID antibody diversiication enzyme. Nature 2003; 422:726-730. 42. Rada C, Williams GT, Nilsen H et al. Immunoglobulin isotype switching is inhibited and somatic hypermutation perturbed in UNG-deicient mice. Curr Biol 2002; 12:1748-1755. 43. Imai K, Slupphaug G, Lee WI et al. Human uracil-DNA glycosylase deiciency associated with profoundly impaired immunoglobulin class-switch recombination. Nat Immunol 2003; 4:1023-1028. 44. Bross L, Muramatsu M, Kinoshita K et al. DNA Double-Strand Breaks: Prior to but not Suicient in Targeting Hypermutation. J Exp Med 2002; 195:1187-1192. 45. Papavasiliou FN, Schatz DG. Cell-cycle-regulated DNA double-stranded breaks in somatic hypermutation of immunoglobulin genes. Nature 2000; 408:216-221. 46. Brar SS, Watson M, Diaz M. Activation-induced cytosine deaminase (AID) is actively exported out of the nucleus but retained by the induction of DNA breaks. J Biol Chem 2004; 279:26395-26401. 47. Barreto V, Reina-San-Martin B, Ramiro AR et al. C-terminal deletion of AID uncouples class switch recombination from somatic hypermutation and gene conversion. Mol Cell 2003; 12:501-508. 48. Shinkura R, Ito S, Begum NA et al. Separate domains of AID are required for somatic hypermutation and class-switch recombination. Nat Immunol 2004; 5:707-712. 49. Chiu YL, Greene WC. he APOBEC3 cytidine deaminases: an innate defensive network opposing exogenous retroviruses and endogenous retroelements. Annu Rev Immunol 2008; 26:317-353. 50. Strebel K, Khan MA. APOBEC3G encapsidation into HIV-1 virions: which RNA is it? Retrovirology 2008; 5:55. 51. Hakata Y, Landau NR. Reversed functional organization of mouse and human apobec3 cytidine deaminase domains. J Biol Chem 2006; 281:36624-36631. 52. Jonsson SR, Hache G, Stenglein MD et al. Evolutionarily conserved and nonconserved retrovirus restriction activities of artiodactyl APOBEC3F proteins. Nucleic Acids Res 2006; 34:5683-5694. 53. OhAinle M, Kerns JA, Malik HS et al. Adaptive evolution and antiviral activity of the conserved mammalian cytidine deaminase APOBEC3H. J Virol 2006; 80:3853-3862. 54. Zhang J, Webb DM. Rapid evolution of primate antiviral enzyme APOBEC3G. Hum Mol Genet 2004; 13:1785-1791. 55. Kinomoto M, Kanno T, Shimura M et al. All APOBEC3 family proteins diferentially inhibit LINE-1 retrotransposition. Nucleic Acids Res 2007; 35:2955-2964. 56. Turelli P, Vianin S, Trono D. he innate antiretroviral factor APOBEC3G does not afect human LINE-1 retrotransposition in a cell culture assay. J Biol Chem 2004; 279:43371-43373. 57. Muckenfuss H, Hamdorf M, Held U et al. APOBEC3 proteins inhibit human LINE-1 retrotransposition. J Biol Chem 2006; 281:22161-22172. 58. Bogerd HP, Wiegand HL, Doehle BP et al. APOBEC3A and APOBEC3B are potent inhibitors of LTR-retrotransposon function in human cells. Nucleic Acids Res 2006; 34:89-95. 59. Jonsson SR, LaRue RS, Stenglein MD et al. he restriction of zoonotic PERV transmission by human APOBEC3G. PLoS ONE 2007; 2:e893. 60. Esnault C, Heidmann O, Delebecque F et al. APOBEC3G cytidine deaminase inhibits retrotransposition of endogenous retroviruses. Nature 2005; 433(7024):430-433. 61. Ortiz M, Bleiber G, Martinez R et al. Patterns of evolution of host proteins involved in retroviral pathogenesis. Retrovirology 2006; 3:11. 62. Kidd JM, Newman TL, Tuzun E et al. Population stratiication of a common APOBEC gene deletion polymorphism. PLoS Genet 2007; 3:e63. 63. Sheehy AM, Gaddis NC, Choi JD et al. Isolation of a human gene that inhibits HIV-1 infection and is suppressed by the viral Vif protein. Nature 2002; 418:646-650. 64. von Schwedler U, Song J, Aiken C et al. Vif is crucial for human immunodeiciency virus type 1 proviral DNA synthesis in infected cells. J Virol 1993; 67:4945-4955.

he APOBEC1 Paradigm for Mammalian Cytidine Deaminases hat Edit DNA and RNA

195

65. Simon JH, Malim MH. he human immunodeiciency virus type 1 Vif protein modulates the postpenetration stability of viral nucleoprotein complexes. J Virol 1996; 70:5297-5305. 66. Madani N, Kabat D. An endogenous inhibitor of human immunodeiciency virus in human lymphocytes is overcome by the viral Vif protein. J Virol 1998; 72:10251-10255. 67. Simon JH, Gaddis NC, Fouchier RA et al. Evidence for a newly discovered cellular anti-HIV-1 phenotype. Nat Med 1998; 4:1397-1400. 68. Harris RS, Bishop KN, Sheehy AM et al. DNA deamination mediates innate immunity to retroviral infection. Cell 2003; 113:803-809. 69. Yu Q, Konig R, Pillai S et al. Single-strand speciicity of APOBEC3G accounts for minus-strand deamination of the HIV genome. Nat Struct Mol Biol 2004; 11:435-442. 70. Zhang H, Yang B, Pomerantz RJ et al. he cytidine deaminase CEM15 induces hypermutation in newly synthesized HIV-1 DNA. Nature 2003; 424:94-98. 71. Bransteitter R, Pham P, Scharf MD et al. Activation-induced cytidine deaminase deaminates deoxycytidine on single-stranded DNA but requires the action of RNase. Proceedings of the National Academy of Sciences USA 2003; 100:4102-4107. 72. Chelico L, Pham P, Calabrese P et al. APOBEC3G DNA deaminase acts processively 3' → 5' on single-stranded DNA. Nat Struct Mol Biol 2006; 13:392-399. 73. Suspene R, Rusniok C, Vartanian JP et al. Twin gradients in APOBEC3 edited HIV-1 DNA relect the dynamics of lentiviral replication. Nucleic Acids Res 2006; 34:4677-4684. 74. Janini M, Rogers M, Birx DR et al. Human immunodeiciency virus type 1 DNA sequences genetically damaged by hypermutation are oten abundant in patient peripheral blood mononuclear cells and may be generated during near-simultaneous infection and activation of CD4(+) T-cells. J Virol 2001; 75:7973-7986. 75. Pace C, Keller J, Nolan D et al. Population level analysis of human immunodeiciency virus type 1 hypermutation and its relationship with APOBEC3G and vif genetic variation. J Virol 2006; 80:9259-9269. 76. Simon V, Zennou V, Murray D et al. Natural variation in Vif: diferential impact on APOBEC3G/3F and a potential role in HIV-1 diversiication. PLoS Pathog 2005; 1:e6. 77. Beale RC, Petersen-Mahrt SK, Watt IN et al. Comparison of the diferential context-dependence of DNA deamination by APOBEC enzymes: correlation with mutation spectra in vivo. J Mol Biol 2004; 337:585-596. 78. Bishop KN, Holmes RK, Sheehy AM et al. Cytidine deamination of retroviral DNA by diverse APOBEC proteins. Curr Biol 2004; 14:1392-1396. 79. Harris RS, Petersen-Mahrt SK, Neuberger MS. RNA editing enzyme APOBEC1 and some of its homologs can act as DNA mutators. Mol Cell 2002; 10:1247-1253. 80. Petersen-Mahrt SK, Neuberger MS. In vitro deamination of cytosine to uracil in single-stranded DNA by apolipoprotein B editing complex catalytic subunit 1 (APOBEC1). J Biol Chem 2003; 278:19583-19586. 81. Han Y, Wang X, Dang Y et al. APOBEC3G and APOBEC3F require an endogenous cofactor to block HIV-1 replication. PLoS Pathog 2008; 4:e1000095. 82. Hache G, Liddament MT, Harris RS. he retroviral hypermutation speciicity of APOBEC3F and APOBEC3G is governed by the C-terminal DNA cytosine deaminase domain. J Biol Chem 2005; 280:10920-10924. 83. Iwatani Y, Takeuchi H, Strebel K et al. Biochemical activities of highly puriied, catalytically active human APOBEC3G: correlation with antiviral efect. J Virol 2006; 80:5992-6002. 84. Navarro F, Bollman B, Chen H et al. Complementary function of the two catalytic domains of APOBEC3G. Virology 2005; 333:374-386. 85. Shindo K, Takaori-Kondo A, Kobayashi M et al. he enzymatic activity of CEM15/Apobec-3G is essential for the regulation of the infectivity of HIV-1 virion but not a sole determinant of its antiviral activity. J Biol Chem 2003; 278:44412-44416. 86. Bogerd HP, Cullen BR. Single-stranded RNA facilitates nucleocapsid: APOBEC3G complex formation. RNA 2008; 14:1228-1236. 87. Svarovskaia ES, Xu H, Mbisa JL et al. Human apolipoprotein B mRNA-editing enzyme-catalytic polypeptide-like 3G (APOBEC3G) is incorporated into HIV-1 virions through interactions with viral and nonviral RNAs. J Biol Chem 2004; 279:35822-35828. 88. Bishop KN, Holmes RK, Malim MH. Antiviral potency of APOBEC proteins does not correlate with cytidine deamination. J Virol 2006; 80:8450-8458. 89. Newman EN, Holmes RK, Craig HM et al. Antiviral function of APOBEC3G can be dissociated from cytidine deaminase activity. Curr Biol 2005; 15:166-170. 90. Guo F, Cen S, Niu M et al. Inhibition of formula-primed reverse transcription by human APOBEC3G during human immunodeiciency virus type 1 replication. J Virol 2006; 80:11710-11722.

196

DNA and RNA Modii cation Enzymes

91. Gaddis NC, Chertova E, Sheehy AM et al. Comprehensive investigation of the molecular defect in vif-deicient human immunodeiciency virus type 1 virions. J Virol 2003; 77:5810-5820. 92. Guo F, Cen S, Niu M et al. he interaction of APOBEC3G with human immunodeiciency virus type 1 nucleocapsid inhibits tRNA3Lys annealing to viral RNA. J Virol 2007; 81:11322-11331. 93. Schumacher AJ, Hache G, Macduf DA et al. he DNA deaminase activity of human APOBEC3G is required for Ty1, MusD and human immunodeiciency virus type 1 restriction. J Virol 2008; 82:2652-2660. 94. Noguchi C, Ishino H, Tsuge M et al. G to A hypermutation of hepatitis B virus. Hepatology 2005; 41:626-633. 95. Rosler C, Kock J, Kann M et al. APOBEC-mediated interference with hepadnavirus production. Hepatology 2005; 42:301-309. 96. Suspene R, Guetard D, Henry M et al. Extensive editing of both hepatitis B virus DNA strands by APOBEC3 cytidine deaminases in vitro and in vivo. Proc Natl Acad Sci USA 2005; 102:8321-8326. 97. Turelli P, Mangeat B, Jost S et al. Inhibition of hepatitis B virus replication by APOBEC3G. Science 2004; 303:1829. 98. Esnault C, Millet J, Schwartz O et al. Dual inhibitory efects of APOBEC family proteins on retrotransposition of mammalian endogenous retroviruses. Nucleic Acids Res 2006; 34:1522-1531. 99. Stenglein MD, Harris RS. APOBEC3B and APOBEC3F inhibit L1 retrotransposition by a DNA deamination-independent mechanism. J Biol Chem 2006; 281:16837-16841. 100. Hulme AE, Bogerd HP, Cullen BR et al. Selective inhibition of Alu retrotransposition by APOBEC3G. Gene 2007; 390:199-205. 101. Chiu YL, Witkowska HE, Hall SC et al. High-molecular-mass APOBEC3G complexes restrict Alu retrotransposition. Proc Natl Acad Sci USA 2006; 103:15588-15593. 102. Chan L. Apolipoprotein B, the major protein component of triglyceride-rich and low density lipoproteins. J Biol Chem 1992; 267:25621-25624. 103. Farese RV Jr, Ruland SL, Flynn LM et al. Knockout of the mouse apolipoprotein B gene results in embryonic lethality in homozygotes and protection against diet-induced hypercholesterolemia in heterozygotes. Proc Natl Acad Sci USA 1995; 92:1774-1778. 104. Olofsson SO, Wiklund O, Boren J. Apolipoproteins A-I and B: biosynthesis, role in the development of atherosclerosis and targets for intervention against cardiovascular disease. Vasc Health Risk Manag 2007; 3:491-502. 105. Carmena R, Duriez P, Fruchart JC. Atherogenic lipoprotein particles in atherosclerosis. Circulation 2004; 109(23 Suppl 1):III2-7. 106. Bamba V, Rader DJ. Obesity and atherogenic dyslipidemia. Gastroenterology 2007; 132:2181-2190. 107. Sniderman AD, Faraj M. Apolipoprotein B, apolipoprotein A-I, insulin resistance and the metabolic syndrome. Curr Opin Lipidol 2007; 18:633-637. 108. Chen SH, Habib G, Yang CY et al. Apolipoprotein B-48 is the product of a messenger RNA with an organ- speciic in-frame stop codon. Science 1987; 238:363-366. 109. Powell LM, Wallis SC, Pease RJ et al. A novel form of tissue-speciic RNA processing produces apolipoprotein- B48 in intestine. Cell 1987; 50:831-840. 110. Backus JW, Eagleton MJ, Harris SG et al. Quantitation of endogenous liver apolipoprotein B mRNA editing. Biochem Biophys Res Commun 1990; 170:513-518. 111. Greeve J, Altkemper I, Dieterich JH et al. Apolipoprotein B mRNA editing in 12 diferent mammalian species: hepatic expression is relected in low concentrations of apoB- containing plasma lipoproteins. J Lipid Res 1993; 34:1367-1383. 112. Greeve J, Axelos D, Welker S et al. Distinct promoters induce APOBEC-1 expression in rat liver and intestine. Arterioscler hromb Vasc Biol 1998; 18:1079-1092. 113. Ding EL, Malik VS. Convergence of obesity and high glycemic diet on compounding diabetes and cardiovascular risks in modernizing China: An emerging public health dilemma. Global Health 2008; 4:4. 114. Yach D, Stuckler D, Brownell KD. Epidemiologic and economic consequences of the global epidemics of obesity and diabetes. Nat Med 2006; 12:62-66. 115. Teng B, Burant CF, Davidson NO. Molecular cloning of an apolipoprotein B messenger RNA editing protein. Science 1993; 260:1816-1819. 116. Smith HC. Measuring editing activity and identifying cytidine-to-uridine mRNA editing factors in cells and biochemical isolates. Methods Enzymol 2007; 424:389-416. 117. Driscoll DM, Wynne JK, Wallis SC et al. An in vitro system for the editing of apolipoprotein B mRNA. Cell 1989; 58:519-525. 118. Teng B, Davidson NO. Evolution of intestinal apolipoprotein B mRNA editing. Chicken apolipoprotein B mRNA is not edited, but chicken enterocytes contain in vitro editing enhancement factor(s). J Biol Chem 1992; 267:21265-21272.

he APOBEC1 Paradigm for Mammalian Cytidine Deaminases hat Edit DNA and RNA

197

119. Hirano K, Young SG, Farese RV Jr et al. Targeted disruption of the mouse apobec-1 gene abolishes apolipoprotein B mRNA editing and eliminates apolipoprotein B-48. J Biol Chem 1996; 271:9887-9890. 120. Xie Y, Nassir F, Luo J et al. Intestinal lipoprotein assembly in apobec-1−/− mice reveals subtle alterations in triglyceride secretion coupled with a shit to larger lipoproteins. Am J Physiol Gastrointest Liver Physiol 2003; 285:G735-746. 121. Giannoni F, Bonen DK, Funahashi T et al. Complementation of apolipoprotein B mRNA editing by human liver accompanied by secretion of apolipoprotein B48. J Biol Chem 1994; 269:5932-5936. 122. Hughs SD, Rouy D, Nararatnam N et al. Gene transfer of cytidine deaminase APOBEC-1 lowers lipoprotein(a) in transgenic mice and induces apolipoprotein B mRNA editing in rabbits. Hum Gene her 1996; 7:39-49. 123. Kozarsky KF, Bone DK, Giannoni F et al. Hepatic expression of the catalytic subunit of the apolipoprotein B mRNA editing enzyme ameliorates hypercholesterolemia in LDL receptor-deicient rabbits. Hum Gene herapy 1996; 7:943-957. 124. Qian X, Balestra ME, Yamanaka S et al. Low expression of the apolipoprotein B mRNA editing transgene in mice reduces LDL but does not cause liver dysplasia or tumors. Arteriosc. hromb. Vasc Biol 1998; 18:1013-1020. 125. Teng B-Blumenthal S, Forte T et al. Adenovirus-mediated gene transfer of rat apolipoprotein B mRNA editing protein in mice virtually eliminates apolipoprotein B-100 and normal low density lipoprotein production. J Biol Chem 1994; 269:29395-29404. 126. Backus JW, Smith HC. hree distinct RNA sequence elements are required for eicient apolipoprotein B (apoB) RNA editing in vitro. Nucleic Acids Res 1992; 20:6007-6014. 127. Backus JW, Smith HC. Speciic 3ʹ sequences lanking a minimal apolipoprotein B (apoB) mRNA editing ‘cassette’ are critical for eicient editing in vitro. Biochim Biophys Acta 1994; 1217:65-73. 128. Shah RR, Knott TJ, Legros JE et al. Sequence requirements for the editing of apolipoprotein B mRNA. J Biol Chem 1991; 266:16301-16304. 129. Smith HC, Gott JM, Hanson MR. A guide to RNA editing. RNA 1997; 3(10):1105-1123. 130. Backus JW, Smith HC. Apolipoprotein B mRNA sequences 3ʹ of the editing site are necessary and suicient for editing and editosome assembly. Nucleic Acids Research 1991; 19:6781-6786. 131. Driscoll DM, Lakhe-Reddy S, Oleksa LM et al. Induction of RNA editing at heterologous sites by sequences in apolipoprotein B mRNA. Mol Cell Biol 1993; 13:7288-7294. 132. Cappione AJ, French BL, Skuse GR. A potential role for NF1 mRNA editing in the pathogenesis of NF1 tumors. Am J Hum Genet 1997; 60:305-312. 133. Mukhopadhyay D, Anant S, Lee RM et al. C→U editing of neuroibromatosis 1 mRNA occurs in tumors that express both the type II transcript and apobec-1, the catalytic subunit of the apolipoprotein B mRNA-editing enzyme. Am J Hum Genet 2002; 70:38-50. 134. Sowden M, Hamm JK, Spinelli S et al. Determinants involved in regulating the proportion of edited apolipoprotein B RNAs. RNA 1996; 2:274-288. 135. Sowden MP, Smith HC. Commitment of apolipoprotein B RNA to the splicing pathway regulates cytidine-to-uridine editing-site utilization. Biochem J 2001; 359(Pt 3):697-705. 136. Yang Y, Sowden MP, Smith HC. Induction of cytidine to uridine editing on cytoplasmic apolipoprotein B mRNA by overexpressing APOBEC-1. J Biol Chem 2000; 275:22663-22669. 137. Anant S, MacGinnitie AJ, Davidson NO. Apobec-1, the catalytic subunit of the mammalian apolipoprotein B mRNA editing enzyme, is a novel RNA-binding protein. J Biol Chem 1995; 270:14762-14767. 138. Chester A, Weinreb V, Carter CW Jr et al. Optimization of apolipoprotein B mRNA editing by APOBEC1 apoenzyme and the role of its auxiliary factor, ACF. RNA 2004; 10:1399-1411. 139. Wedekind JE, Gillilan R, Janda A et al. Nanostructures of APOBEC3G support a hierarchical assembly model of high molecular mass ribonucleoprotein particles from dimeric subunits. J Biol Chem 2006; 281:38122-38126. 140. Opi S, Takeuchi H, Kao S et al. Monomeric APOBEC3G is catalytically active and has antiviral activity. J Virol 2006; 80:4673-4682. 141. Shen HM, Ratnam S, Storb U. Targeting of the activation-induced cytosine deaminase is strongly inluenced by the sequence and structure of the targeted DNA. Mol Cell Biol 2005; 25:10815-10821. 142. Yu K, Huang FT, Lieber MR. DNA substrate length and surrounding sequence afect the activation induced deaminase activity at cytidine. J Biol Chem 2004; 279:6496-6500. 143. Maris C, Masse J, Chester A et al. NMR structure of the apoB mRNA stem-loop and its interaction with the C to U editing APOBEC1 complementary factor. RNA 2005; 11:173-186. 144. Harris SG, Sabio I, Mayer E et al. Extract-speciic heterogeneity in high-order complexes containing apolipoprotein B mRNA editing activity and RNA-binding proteins. J Biol Chem 1993; 268:7382-7392. 145. Smith HC, Kuo SR, Backus JW et al. In vitro apolipoprotein B mRNA editing: identiication of a 27S editing complex. Proc Natl Acad Sci USA 1991; 88:1489-1493.

198

DNA and RNA Modii cation Enzymes

146. Mehta A, Kinter MT, Sherman NE et al. Molecular cloning of apobec-1 complementation factor, a novel RNA- binding protein involved in the editing of apolipoprotein B mRNA. Mol Cell Biol 2000; 20:1846-1854. 147. Navaratnam N, Shah R, Patel D et al. Apolipoprotein B mRNA editing is associated with UV crosslinking of proteins to the editing site. Proc Natl Acad Sci USA 1993; 90:222-226. 148. Smith HC. Analysis of protein complexes assembled on apolipoprotein B mRNA for mooring sequence-dependent RNA editing. Methods 1998; 15(1):27-39. 149. Blanc V, Henderson JO, Newberry EP et al. Targeted deletion of the murine apobec-1 complementation factor (acf ) gene results in embryonic lethality. Mol Cell Biol 2005; 25:7260-7269. 150. Dance GSC, Sowden MP, Cartegni L et al. Two proteins essential for apolipoprotein B mRNA editing are expressed from a single gene through alternative splicing. J Biol Chem 2002; 277:12703-12709. 151. Lellek H, Kirsten R, Diehl I et al. Puriication and molecular cloning of a novel essential component of the apolipoprotein B mRNA editing enzyme-complex. J Biol Chem 2000; 275:19848-19856. 152. Sowden MP, Lehmann DM, Lin X et al. Identiication of novel alternative splice variants of APOBEC-1 complementation factor with diferent capacities to support apolipoprotein B mRNA editing. J Biol Chem 2004; 279:197-206. 153. Dance GS, Sowden MP, Cartegni L et al. Two proteins essential for apolipoprotein B mRNA editing are expressed from a single gene through alternative splicing. J Biol Chem 2002; 277:12703-12709. 154. Blanc V, N.O.D. Biological Implications and Broader-Range Functions for APOBEC-1 and APOBEC-1 Complementation Factor (ACF). In: RNA and DNA Editing : Molecular Mechanism and Their Integration into Biological Systems. (HC Smith, ed). Hoboken, NJ: John Wiley and Sons Inc 2008; 10:203-230. 155. Yang Y, Kovalski K, Smith HC. Partial characterization of the auxiliary factors involved in apolipoprotein B mRNA editing through APOBEC-1 affinity chromatography. J Biol Chem 1997; 272:27700-27706. 156. Anant S, Henderson JO, Mukhopadhyay D et al. Novel role for RNA-binding protein CUGBP2 in mammalian RNA editing. J Biol Chem 2001; 276:47338-47351. 157. Blanc V, Navaratnam N, Henderson JO et al. Identiication of GRY-RBP as an apolipoprotein B RNA-binding protein that interacts with both apobec-1 and apobec-1 complementation factor to modulate C to U editing. J Biol Chem 2001; 276:10272-10283. 158. Lau PP, Zhu HJ, Nakamuta M et al. Cloning of an Apobec-1-binding protein that also interacts with apolipoprotein B mRNA and evidence for its involvement in RNA editing. J Biol Chem 1997; 272:1452-1455. 159. Sowden MP, Ballatori N, Jensen KL et al. he editosome for cytidine to uridine mRNA editing has a native complexity of 27S: identiication of intracellular domains containing active and inactive editing factors. J Cell Sci 2002; 115(Pt 5):1027-1039. 160. Lehmann DM, Galloway CA, Sowden MP et al. Metabolic regulation of apoB mRNA editing is associated with phosphorylation of APOBEC-1 complementation factor. Nucleic Acids Res 2006; 34:3299-3308. 161. Mikl MC, Watt IN, Lu M et al. Mice deicient in APOBEC2 and APOBEC3. Mol Cell Biol 2005; 25:7270-7277. 162. Minegishi Y, Lavoie A, Cunningham-Rundles C et al. Mutations in activation-induced cytidine deaminase in patients with hyper IgM syndrome. Clin Immunol 2000; 97:203-210. 163. Petersen-Mahrt SK, Harris RS, Neuberger MS. AID mutates E. coli suggesting a DNA deamination mechanism for antibody diversiication. Nature 2002; 418:99-103. 164. Pham P, Bransteitter R, Petruska J et al. Processive AID-catalysed cytosine deamination on single-stranded DNA simulates somatic hypermutation. Nature 2003; 424:103-107. 165. Langlois MA, Beale RC, Conticello SG et al. Mutational comparison of the single-domained APOBEC3C and double-domained APOBEC3F/G anti-retroviral cytidine deaminases provides insight into their DNA target site speciicities. Nucleic Acids Res 2005; 33:1913-1923. 166. Liddament MT, Brown WL, Schumacher AJ et al. APOBEC3F properties and hypermutation preferences indicate activity against HIV-1 in vivo. Curr Biol 2004; 14:1385-1391. 167. Wiegand HL, Doehle BP, Bogerd HP et al. A second human antiretroviral factor, APOBEC3F, is suppressed by the HIV-1 and HIV-2 Vif proteins. EMBO 2004; 23:2451-2458. 168. Chelico L, Sacho EJ, Erie DA et al. A model for oligomeric regulation of APOBEC3G cytosine deaminase-dependent restriction of HIV. J Biol Chem 2008; 283:13780-13791. 169. Lau PP, Xiong WJ, Zhu HJ et al. Apolipoprotein B mRNA editing is an intranuclear event that occurs posttranscriptionally coincident with splicing and polyadenylation. J Biol Chem 1991; 266:20550-20554. 170. Giangreco A, Sowden MP, Mikityansky I et al. Ethanol stimulates apolipoprotein B mRNA editing in the absence of de novo RNA or protein synthesis. Biochem Biophys Res Commun 2001; 289:1162-1167.

he APOBEC1 Paradigm for Mammalian Cytidine Deaminases hat Edit DNA and RNA

199

171. Yang Y, Smith HC. Multiple protein domains determine the cell type-speciic nuclear distribution of the catalytic subunit required for apolipoprotein B mRNA editing. Proc Natl Acad Sci USA 1997; 94:13075-13080. 172. Blanc V, Kennedy S, Davidson NO. A novel nuclear localization signal in the auxiliary domain of apobec-1 complementation factor regulates nucleocytoplasmic import and shuttling. J Biol Chem 2003; 278:41198-41204. 173. Chester A, Somasekaram A, Tzimina M et al. he apolipoprotein B mRNA editing complex performs a multifunctional cycle and suppresses nonsense-mediated decay. EMBO J 2003; 22:3971-3982. 174. Lehmann DM, Galloway CA, Macelrevey C et al. Functional characterization of APOBEC-1 complementation factor phosphorylation sites. Biochim Biophys Acta 2007; 1773:408-418. 175. Sowden MP, Lehmann DM, Lin X et al. Identiication of novel alternative splice variants of apobec-1 complementation factor with diferent capacities to support ApoB mRNA editing. J Biol Chem 2004; 278:197-206. 176. Harris SG, Smith HC. In vitro apolipoprotein B mRNA editing activity can be modulated by fasting and refeeding rats with a high carbohydrate diet. Biochem Biophys Res Commun 1992; 183:899-903. 177. Ito S, Nagaoka H, Shinkura R et al. Activation-induced cytidine deaminase shuttles between nucleus and cytoplasm like apolipoprotein B mRNA editing catalytic polypeptide 1. Proc Natl Acad Sci USA 2004; 101:1975-1980. 178. McBride KM, Barreto V, Ramiro AR et al. Somatic hypermutation is limited by CRM1-dependent nuclear export of activation-induced deaminase. J Exp Med 2004; 199:1235-1244. 179. Basu U, Chaudhuri J, Alpert C et al. he AID antibody diversiication enzyme is regulated by protein kinase A phosphorylation. Nature 2005; 438(7067):508-511. 180. Chatterji M, Unniraman S, McBride KM et al. Role of activation-induced deaminase protein kinase A phosphorylation sites in Ig gene conversion and somatic hypermutation. J Immunol 2007; 179:5274-5280. 181. Chaudhuri J, Khuong C, Alt FW. Replication protein A interacts with AID to promote deamination of somatic hypermutation targets. Nature 2004; 430:992-998. 182. Duquette ML, Pham P, Goodman MF et al. AID binds to transcription-induced structures in c-MYC that map to regions associated with translocation and hypermutation. Oncogene 2005; 24:5791-5798. 183. Ramiro AR, Stavropoulos P, Jankovic M et al. Transcription enhances AID-mediated cytidine deamination by exposing single-stranded DNA on the nontemplate strand. Nat Immunol 2003; 4:452-456. 184. Shen HM, Storb U. Activation-induced cytidine deaminase (AID) can target both DNA strands when the DNA is supercoiled. Proc Natl Acad Sci USA 2004; 101:12997-13002. 185. Mehta AaD DM. Identiication of domains in APOBEC-1 complementation factor required for RNA binding and apolipoprotein B mRNA editing. RNA 2002; 8:69-82. 186. Blanc V, Henderson JO, Kennedy S et al. Mutagenesis of apobec-1 complementation factor (ACF) reveals distinct domains that modulate RNA binding, protein-protein interaction with apobec-1 and complementation of C to U RNA editing activity. J Biol Chem 2001; 276:46386-93. 187. Chen X, Sparks JD, Yao Z et al. Hepatic polysomes that contain apoprotein B mRNA have unusual physical properties. J Biol Chem 1993; 268:21007-21013. 188. Sparks JD, Sparks CE. Insulin modulation of hepatic synthesis and secretion of apoB by rat hepatocytes. J Biol Chem 1990; 265:8854-8862. 189. Siddiqui JF, Van Mater D, Sowden MP et al. Disproportionate relationship between APOBEC-1 expression and apolipoprotein B mRNA editing activity. Exp Cell Res 1999; 252:154-164. 190. Sowden M, Hamm JK, Smith HC. Overexpression of APOBEC-1 results in mooring sequence-dependent promiscuous RNA editing. J Biol Chem 1996; 271:3011-3017. 191. Yamanaka S, Balestra ME, Ferrell LD et al. Apolipoprotein B mRNA-editing protein induces hepatocellular carcinoma and dysplasia in transgenic animals. Proc Natl Acad Sci USA 1995; 92:8483-8487. 192. Yamanaka S, Poksay KS, Arnold KS et al. A novel translational repressor mRNA is edited extensively in livers containing tumors caused by the transgene expression of the apoB mRNA- editing enzyme. Genes Dev 1997; 11:321-333. 193. Yamanaka S, Poksay KS, Driscoll DM et al. Hyperediting of multiple cytidines of apolipoprotein B mRNA by APOBEC-1 requires auxiliary protein(s) but not a mooring sequence motif. J Biol Chem 1996; 271:11506-11510. 194. Babbage G, Ottensmeier CH, Blaydes J et al. Immunoglobulin heavy chain locus events and expression of activation-induced cytidine deaminase in epithelial breast cancer cell lines. Cancer Res 2006; 66:3996-4000. 195. Okazaki IM, Hiai H, Kakazu N et al. Constitutive expression of AID leads to tumorigenesis. J Exp Med 2003; 197:1173-1181.

200

DNA and RNA Modii cation Enzymes

196. Oppezzo P, Vuillier F, Vasconcelos Y et al. Chronic lymphocytic leukemia B-cells expressing AID display dissociation between class switch recombination and somatic hypermutation. Blood 2003; 101:4029-4032. 197. Ramiro AR, Jankovic M, Eisenreich T et al. AID is required for c-myc/IgH chromosome translocations in vivo. Cell 2004; 118:431-438. 198. Ramiro AR, Jankovic M, Callen E et al. Role of genomic instability and p53 in AID-induced c-myc-Igh translocations. Nature 2006; 440:105-109. 199. Doehle BP, Schafer A, Cullen BR. Human APOBEC3B is a potent inhibitor of HIV-1 infectivity and is resistant to HIV-1 Vif. Virology 2005; 339:281-288. 200. Nagaoka H, Ito S, Muramatsu M et al. DNA cleavage in immunoglobulin somatic hypermutation depends on de novo protein synthesis but not on uracil DNA glycosylase. Proc Natl Acad Sci USA 2005; 102:2022-2027. 201. Endo Y, Marusawa H, Kinoshita K et al. Expression of activation-induced cytidine deaminase in human hepatocytes via NF-kappaB signaling. Oncogene 2007; 26:5587-5595. 202. Aoufouchi S, Faili A, Zober C et al. Proteasomal degradation restricts the nuclear lifespan of AID. J Exp Med 2008; 205:1357-1368. 203. Xie K, Sowden MP, Dance GS et al. he structure of a yeast RNA-editing deaminase provides insight into the fold and function of activation-induced deaminase and APOBEC-1. Proc Natl Acad Sci USA 2004; 101:8114-8119. 204. Muto T, Muramatsu M, Taniwaki M et al. Isolation, tissue distribution and chromosomal localization of the human activation-induced cytidine deaminase (AID) gene. Genomics 2000; 68:85-88. 205. Bennett RP, Diner E, Sowden MP et al. APOBEC-1 and AID are nucleo-cytoplasmic traicking proteins but APOBEC3G cannot traic. Biochem Biophys Res Commun 2006; 350:214-219. 206. Bennett RP, Presnyak V, Wedekind JE et al. Nuclear Exclusion of the HIV-1 host defense factor APOBEC3G requires a novel cytoplasmic retention signal and is not dependent on RNA binding. J Biol Chem 2008; 283:7320-7327. 207. Huang J, Liang Z, Yang B et al. Derepression of microRNA-mediated protein translation inhibition by apolipoprotein B mRNA-editing enzyme catalytic polypeptide-like 3G (APOBEC3G) and its family members. J Biol Chem 2007; 282:33632-33640. 208. Kozak SL, Marin M, Rose KM et al. he anti-HIV-1 editing enzyme APOBEC3G binds HIV-1 RNA and messenger RNAs that shuttle between polysomes and stress granules. J Biol Chem 2006; 281:29105-29119. 209. Gallois-Montbrun S, Holmes RK, Swanson CM et al. Comparison of cellular ribonucleoprotein complexes associated with the APOBEC3F and APOBEC3G antiviral proteins. J Virol 2008; 82:5636-5642. 210. Stopak KS, Chiu YL, Kropp J et al. Distinct patterns of cytokine regulation of APOBEC3G expression and activity in primary lymphocytes, macrophages, and dendritic cells. J Biol Chem 2006; 282:3539-3546. 211. Wichroski MJ, Robb GB, Rana TM. Human retroviral host restriction factors APOBEC3G and APOBEC3F localize to mRNA processing bodies. PLoS Pathog 2006; 2(5):e41. 212. Wichroski MJ, Ichiyama K, Rana TM. Analysis of HIV-1 viral infectivity factor-mediated proteasome-dependent depletion of APOBEC3G: correlating function and subcellular localization. J Biol Chem 2005; 280:8387-8396. 213. Gallois-Montbrun S, Kramer B, Swanson CM et al. Antiviral protein APOBEC3G localizes to ribonucleoprotein complexes found in P bodies and stress granules. J Virol 2007; 81:2165-2178. 214. Khan MA, Goila-Gaur R, Opi S et al. Analysis of the contribution of cellular and viral RNA to the packaging of APOBEC3G into HIV-1 virions. Retrovirology 2007; 4:48. 215. Bach D, Peddi S, Mangeat B et al. Characterization of APOBEC3G binding to 7SL RNA. Retrovirology 2008; 5(1):54. 216. Wang T, Zhang W, Tian C et al. Distinct viral determinants for the packaging of human cytidine deaminases APOBEC3G and APOBEC3C. Virology 2008; 377:71-79. 217. Chiu YL, Soros VB, Kreisberg JF et al. Cellular APOBEC3G restricts HIV-1 infection in resting CD4+ T-cells. Nature 2005; 435:108-114. 218. Kreisberg JF, Yonemoto W, Greene WC. Endogenous factors enhance HIV infection of tissue naive CD4 T-cells by stimulating high molecular mass APOBEC3G complex formation. J Exp Med 2006; 203:865-870. 219. Soros VB, Yonemoto W, Greene WC. Newly synthesized APOBEC3G is incorporated into HIV virions, inhibited by HIV RNA and subsequently activated by RNase H. PLoS Pathog 2007; 3(2):e15. 220. Muckenfuss H, Kaiser JK, Krebil E et al. Sp1 and Sp3 regulate basal transcription of the human APOBEC3G gene. Nucleic Acids Res 2007; 35:3784-3796. 221. Rose KM, Marin M, Kozak SL et al. Transcriptional regulation of APOBEC3G, a cytidine deaminase that hypermutates human immunodeiciency virus. J Biol Chem 2004; 279:41744-41749.

he APOBEC1 Paradigm for Mammalian Cytidine Deaminases hat Edit DNA and RNA

201

222. Dang Y, Siew LM, Zheng YH. APOBEC3G is degraded by the proteasomal pathway in a Vif-dependent manner without being polyubiquitylated. J Biol Chem 2008; 283:13124-13131. 223. Mehle A, Strack B, Ancuta P et al. Vif overcomes the innate antiviral activity of APOBEC3G by promoting its degradation in the ubiquitin-proteasome pathway. J Biol Chem 2004; 279:7792-7798. 224. Sheehy AM, Gaddis NC, Malim MH. he antiretroviral enzyme APOBEC3G is degraded by the proteasome in response to HIV-1 Vif. Nat Med 2003; 9:1404-1407. 225. Liu B, Yu X, Luo K et al. Inluence of primate lentiviral Vif and proteasome inhibitors on human immunodeiciency virus type 1 virion packaging of APOBEC3G. J Virol 2004; 78:2072-2081. 226. Stopak K, De Noronha C, Yonemoto W et al. HIV-1 Vif Blocks the Antiviral Activity of APOBEC3G by Impairing both Its Translation and Intracellular Stability. Mol Cell 2003; 12:591-601. 227. Conticello SG, Harris RS, Neuberger MS. he Vif protein of HIV triggers degradation of the human antiretroviral DNA deaminase APOBEC3G. Curr Biol 2003; 13:2009-2013. 228. Tian C, Yu X, Zhang W et al. Diferential requirement for conserved tryptophans in human immunodeiciency virus type 1 Vif for the selective suppression of APOBEC3G and APOBEC3F. J Virol 2006; 80:3112-3115. 229. Russell RA, Pathak VK. Identiication of two distinct human immunodeiciency virus type 1 Vif determinants critical for interactions with human APOBEC3G and APOBEC3F. J Virol 2007; 81:8201-8210. 230. Yamashita T, Kamada K, Hatcho K et al. Identiication of amino acid residues in HIV-1 Vif critical for binding and exclusion of APOBEC3G/F. Microbes Infect 2008; Epub Ahead of Print. 231. Mehle A, Wilson H, Zhang C et al. Identiication of an APOBEC3G binding site in human immunodeiciency virus type 1 Vif and inhibitors of Vif-APOBEC3G binding. J Virol 2007; 81:13235-13241. 232. He Z, Zhang W, Chen G, Xu R et al. Characterization of conserved motifs in HIV-1 Vif required for APOBEC3G and APOBEC3F interaction. J Mol Biol 2008; In press. 233. Kobayashi M, Takaori-Kondo A, Miyauchi Y et al. Ubiquitination of APOBEC3G by an HIV-1 Vif-Cullin5-Elongin B-Elongin C complex is essential for Vif function. J Biol Chem 2005; 280:18573-18578. 234. Mehle A, Goncalves J, Santa-Marta M et al. Phosphorylation of a novel SOCS-box regulates assembly of the HIV-1 Vif-Cul5 complex that promotes APOBEC3G degradation. Genes Dev 2004; 18:2861-2866. 235. Stanley BJ, Ehrlich ES, Short L et al. Structural insight into the HIV Vif SOCS box and its role in human E3 ubiquitin ligase assembly. J Virol 2008; In press. 236. Yu X, Yu Y, Liu B et al. Induction of APOBEC3G ubiquitination and degradation by an HIV-1 Vif-Cul5-SCF complex. Science 2003; 302:1056-1060. 237. Yu Y, Xiao Z, Ehrlich ES et al. Selective assembly of HIV-1 Vif-Cul5-ElonginB-ElonginC E3 ubiquitin ligase complex through a novel SOCS box and upstream cysteines. Genes Dev 2004; 18:2867-2872. 238. Bogerd HP, Doehle BP, Wiegand HL et al. A single amino acid diference in the host APOBEC3G protein controls the primate species speciicity of HIV type 1 virion infectivity factor. Proc Natl Acad Sci USA 2004; 101:3770-3774. 239. Huthof H, Malim MH. Identiication of amino acid residues in APOBEC3G required for regulation by human immunodeiciency virus type 1 Vif and Virion encapsidation. J Virol 2007; 81:3807-3815. 240. Xu H, Svarovskaia ES, Barr R et al. A single amino acid substitution in human APOBEC3G antiretroviral enzyme confers resistance to HIV-1 virion infectivity factor-induced depletion. Proc Natl Acad Sci USA 2004; 101:5652-5657. 241. Santa-Marta M, da Silva FA, Fonseca AM et al. HIV-1 Vif can directly inhibit apolipoprotein B mRNA-editing enzyme catalytic polypeptide-like 3G-mediated cytidine deamination by using a single amino acid interaction and without protein degradation. J Biol Chem 2005; 280:8765-8775. 242. Mangeat B, Turelli P, Liao S et al. A single amino acid determinant governs the species-speciic sensitivity of APOBEC3G to Vif action. J Biol Chem 2004; 279:14481-14483. 243. Zhang L, Saadatmand J, Li X et al. Function analysis of sequences in human APOBEC3G involved in Vif-mediated degradation. Virology 2008; 370:113-121. 244. Alce TM, Popik W. APOBEC3G is incorporated into virus-like particles by a direct interaction with HIV-1 Gag nucleocapsid protein. J Biol Chem 2004; 279:34083-34086. 245. Cen S, Guo F, Niu M et al. he interaction between HIV-1 Gag and APOBEC3G. J Biol Chem 2004; 279:33177-33184. 246. Schafer A, Bogerd HP, Cullen BR. Speciic packaging of APOBEC3G into HIV-1 virions is mediated by the nucleocapsid domain of the gag polyprotein precursor. Virology 2004; 328:163-168. 247. Xu H, Chertova E, Chen J et al. Stoichiometry of the antiviral protein APOBEC3G in HIV-1 virions. Virology 2007; 360:247-256. 248. Zennou V, Perez-Caballero D, Gottlinger H et al. APOBEC3G incorporation into human immunodeiciency virus type 1 particles. J Virol 2004; 78:12058-12061.

202

DNA and RNA Modii cation Enzymes

249. Lau PP, Zhu H-J, Baldini HA et al. Dimeric structure of a human apo B mRNA editing protein and cloning and chromosomal localization of its gene. Proc Natl Acad Sci USA 1994; 91:8522-8526. 250. Oka K, Kobayashi K, Sullivan M et al. Tissue-speciic inhibition of apolipoprotein B mRNA editing in the liver by adenovirus-mediated transfer of a dominant negative mutant APOBEC-1 leads to increased low density lipoprotein in mice. J Biol Chem 1997; 272:1456-1460. 251. Yang Y, Smith HC. In vitro reconstitution of apolipoprotein B RNA editing activity from recombinant APOBEC-1 and McArdle cell extracts. Biochem Biophys Res Commun 1996; 218:797-801. 252. Chen KM, Martemyanova N, Lu Y et al. Extensive mutagenesis experiments corroborate a structural model for the DNA deaminase domain of APOBEC3G. FEBS Lett 2007; 581:4761-4766. 253. Chen KM, Harjes E, Gross PJ et al. Structure of the DNA deaminase domain of the HIV-1 restriction factor APOBEC3G. Nature 2008; 452:116-119. 254. Wang J, Shinkura R, Muramatsu M et al. Identiication of a speciic domain required for dimerization of activation-induced cytidine deaminase. J Biol Chem 2006; 281:19115-19123. 255. Prochnow C, Bransteitter R, Klein MG et al. he APOBEC-2 crystal structure and functional implications for the deaminase AID. Nature 2007; 445:447-451. 256. Brar SS, Sacho EJ, Tessmer I et al. Activation-induced deaminase, AID, is catalytically active as a monomer on single-stranded DNA. DNA Repair (Amst) 2008; 7:77-87. 257. Goila-Gaur R, Strebel K. HIV-1 Vif, APOBEC and intrinsic immunity. Retrovirology 2008; 5:51.

Chapter 16

Mechanism of Action and Structural Aspects of ADARS (A-to-I) and APOBEC-Related (C-to-U) Deaminases Joseph E. Wedekind* and Peter A. Beal*

Abstract

N

ucleoside deaminases that act on RNA and DNA play important roles in proteome diversiication, mRNA stability and innate immunity. Adenosine deaminases that act on RNA (ADARs) or tRNA (ADATs) fall into one branch of a phylogenetic tree that catalyzes the hydrolytic deamination of adenosine (A) to inosine (I) in the context of folded substrates. A distant but related class of cytidine deaminases (CDAs) converts cytidine (C) to uridine (U) in the context of RNA or DNA. he latter CDAR/D enzymes belong to the APOBEC protein family, whose founding member, APOBEC-1, is a bona ide RNA ‘editing’ enzyme that deaminates C-to-U in mammalian mRNA. Two related proteins, activation induced deaminase (AID) and APOBEC3G (A3G) deaminate deoxy (d)C-to-dU within single-stranded DNA substrates. he goal of this chapter is to provide an overview of ADAR/T and CDAR/D family members from the perspective of how their shared reaction chemistry arises from a common molecular architecture that entails Zn2+ binding for functionality. As such, this work is intended to provide the reader with a broader perspective on the commonalities of A-to-I and C-to-U polynucleotide deaminases, which should be considered a divergent protein family of common ancestry, rather than isolated specialty molecules separated by evolution.

Introduction

Deamination of adenosine (A) residues within ribonucleic acids results in inosine (I) at the site of modiication (Fig. 1A). Similarly, cytidine (C) can undergo enzymatic conversion to uridine (U) in the context of ribo- or deoxyribo-nucleic acids (Fig. 1B).a hese apparently modest changes can profoundly afect the molecular recognition properties of the resulting ‘edited’

a It has been suggested that the nomenclature for APOBEC1-related proteins be adjusted to reflect their deaminase activity on cytosine bases in the context of DNA or RNA substrates (e.g., chapter by Parisien and Bhagwat in this volume). Because such a change would necessitate alteration of the ADAR/T definitions1,2 to RNA adenine deaminases, we stipulate to the reader that our nomenclature involving cytidine implies that these enzymes act on polynucleotide substrates.

*Corresponding Authors: Joseph E. Wedekind—Department of Biochemistry and Biophysics, University of Rochester School of Medicine and Dentistry, 601 Elmwood Avenue Box 712, Rochester, New York 14642 USA. Email: [email protected]; Peter A. Beal—Department of Chemistry, University of California-Davis, One Shields Avenue, Davis, California 95616 USA. Email: [email protected]

DNA and RNA Modiication Enzymes: Structure, Mechanism, Function and Evolution edited by Henri Grosjean. ©2009 Landes Bioscience.

204

DNA and RNA Modii cation Enzymes

Figure 1. Reactions of the RNA/DNA-dependent deaminases. A) The ADAR/ADAT A-to-I reaction. B) The APOBEC family reaction. C) Examples of nucleic acid substrates for the RNA/ DNA-dependent deaminases.

nucleobases. For example, inosine can be considered a guanosine analog, difering only by its lack of guanosine’s C2 amino group. As such, inosine has hydrogen-bonding properties similar to guanosine, forming Watson-Crick- and Hoogsteen-like base pairs with cytidine. Likewise, uridine is a thymidine analog that possesses an identical Watson-Crick face that prefers to pair with adenosine. In the context of RNA, the change from C to U creates another common RNA nucleotide; however a dC to dU conversion is restricted in DNA and is the target of natural glycosylases that recognize and repair this modiication, which is sensed by the absence of the major groove CH3-group at C5 (chapter by Parisien and Bhagwat in this volume). Overall, the changes in base recognition properties resulting from deamination of A to I in RNA, or C to U in RNA as well as DNA, can have an enduring biological efect on the characteristics of nucleic acid species harboring these modiications due to the covalent nature of the modiication. Such deamination reactions are used to regulate the structure, function and stability of RNA or DNA via enzymes that catalyze adenosine or cytidine deamination reactions at speciic locations in their cognate substrates. Messenger RNA, tRNA and a variety of other noncoding, regulatory RNAs are known substrates for adenosine deaminase enzymes (Fig. 1C). Whereas mRNA of apolipoprotein B is the known target of the cytidine deaminase APOBEC-1 (the apolipoprotein B mRNA editing catalytic subunit-1), the closely related family members activation induced deaminase (AID) and APOBEC3G act on single-stranded (ss)DNA substrates (Fig. 1C, right panels). In this chapter, the activities of these enzymes are considered in relation to structure and chemistry. A complementary description of the biological aspects of family members is provided in chapters by Smith, by Parisien and Bhagwat, and by Haele and O’Connell in this volume.

Mechanism of Action and Structural Aspects of ADARS

205

A structural perspective on auxiliary domains and protein cofactors is provided in chapter by Maris and Allain in this volume.

he Zinc-Dependent Deaminase (ZDD) Signature Motif

Members of the adenosine and cytidine deaminase superfamily are deined by the presence of a conserved amino acid signature motif (H/C)xEx25-30PCxxC (Fig. 2), where x is any amino acid.3 his pattern has been dubbed the Zinc Dependent Deaminase (ZDD) amino acid sequence motif, which implies a common chemistry and three-dimensional architecture within its constituency.4-6 he ZDD motif is a deining characteristic of numerous pyrimidine metabolism enzymes that act on free nucleosides,7,8 as well as the adenosine deaminases and APOBEC family members5,9 of this review that act only on polymeric RNA or DNA substrates. Importantly, the presence of the ZDD motif implies zinc binding by a conserved helix-strand-helix tertiary structural element that spatially positions conserved residues for Zn2+-atom coordination and proton transfer (Fig. 3). In the ZDD motif, a His or Cys residue at the amino-terminal end of a conserved α-helix provides the irst metal ligand to coordinate the essential Zn2+ ion, which serves to activate a water molecule for nucleophilic attack. he second conserved residue is a Glu, which interacts with the Watson-Crick face of the target nucleobase by reading out the imino and exocylic amine groups.10 A β-strand links the irst helical motif to a second conserved α-helical element that features the PCxxC substrate/Zn2+-binding element. he Pro residue resides in a loop between the β-strand and the second α-helix. Because Pro is an imino acid, it restricts the dihedral angle of the preceding residue such that its carbonyl oxygen is poised to interact with the exocyclic amine that leaves the nucleobase ring as free ammonia.10 he conserved ZDD motif concludes with two Cys residues whose thiolate (ionized) sulfydryl groups complete the protein contribution to the coordination sphere. It is notable that the Zn2+ ion prefers tetrahedral geometry and thus requires an additional

Figure 2. Amino acid sequence alignments for the RNA/DNA-dependent deaminases. Residues of the conserved deaminase active site motif are highlighted. The abbreviations are: (h) human, (d) Drosophila, (sc) Saccharomyces cerevisiae and (ec) Escherichia coli.

206

DNA and RNA Modii cation Enzymes

Figure 3. The Zinc Dependent Deaminase (ZDD) amino acid signature motif and cartoon diagram of its helix-strand-helix polypeptide fold based on crystallography.

nonprotein ligand. In the active members of the deaminase family, the fourth ligand is water. he signiicance of this solvent molecule is described later. However, any variation of the active site that displaces this water, for example by an amino acid mutation, is expected to abolish catalytic activity. Overall, the fact that the ADAR/T and CDAR/D deaminases target diferent nucleobases does not assuage the importance of the conserved ZDD motif, which is evidence for a common ancestor.3 Moreover, the observation that the ZDD motif is part of a larger, conserved core fold implies that the adenosine and cytidine deaminase architectures arose by divergent, rather than convergent evolution.4,11

he Conserved ‘Cytidine Deaminase’ or CDA Architecture

Several crystallographic and a single NMR spectroscopy analysis have provided detailed molecular-level insight into the conserved three-dimensional fold of the ZDD motif in the context of the fundamental deaminase domain from prokaryotic and eukaryotic sources.8,10,12-21 In general, the deaminase domain features ive mixed parallel/antiparallel β-strands lanked by α-helices (Fig. 4, central panel). he ZDD β-strand is a component of a central β-sheet (Fig. 4, pink β3) that is broad and lat, oten adopting a triangular shape.8 Flanking α-helices reside on either face of the sheet and provide topological connection points between the respective β-strands. he helices comprising the ZDD motif (Fig. 4, cyan) invariantly reside on a single face of the β-sheet with their amino-terminal ends pointing toward the Zn2+ atom, which may be a means to utilize the positive helix dipole for phosphate binding. A series of pairwise, least-squares superpositions between backbone-Cα coordinates of various ADAR/T and CDAR/D subunits reveals a considerable degree of spatial conservation among the deaminase cores (Fig. 4). Historically, the bacterial and yeast enzymes of pyrimidine metabolism were used as structural models for APOBEC1 and its related proteins.8,22 Particular enthusiasm for use of the yeast protein Cdd1 was based on the observation that it exhibited a limited RNA editing capability in a cell-based reporter analysis,8,23 as well as DNA deaminase activity in a commonly used bacterial rifampicin resistance assay,24 which is a characteristic of several bona ide RNA/DNA cytidine deaminases.25-27 In hindsight, Cdd1 is most likely a pyrimidine metabolism

Mechanism of Action and Structural Aspects of ADARS

207

Figure 4. Ribbon diagrams of known RNA/DNA-dependent deaminase subunits. Each fundamental deaminase domain is oriented similarly with the central ZDD colored cyan (helix) and pink (β -strand 3); the spatial conservation of the latter helix-strand-helix super secondary element is a conserved feature shared by family members. Zn2+ atoms are yellow spheres. Arrows indicate pairwise superpositions of Cα coordinates with the number of common coordinates found in parentheses. Superpositions were performed using the “brute force” option in LSQMAN.104 A color version of this image is available at www.landesbioscience.com/curie.

enzyme28 whose ability to edit nucleic acids is analogous to APOBEC-1ʹs adeptness in mutating bacterial DNA, which has no established biological signiicance. Another notable diference between Cdd1 and other classical R/DNA deaminases is that strand β5 of the yeast enzyme is reversed in direction (Fig. 4), suggesting that such enzymes diverged earlier from CDAR/Ds than ADAR/Ts. his topological diference does not inluence the ZDD motif, but has implications for positioning C-terminal helical elements that lank the active site and provide a motif for intersubunit association.6,8 As illustrated, scCdd1 superimposes well on human (h)A3G, human APOBEC2 (hA2) and hADAR2 (root-mean-square distance diference or rmsd from 1.86 Å to 2.21 Å), although the level of structural similarity is limited to approximately 40 amino acids in the conserved core. From this perspective, Cdd1 can be considered a ‘minimal’ cytidine deaminase fold, comprising ~130 amino acids and is representative of numerous related enzymes dubbed “free nucleoside deaminases” (reviewed in ref. 4). Other precedents for spatial agreement between the deaminases of lower and higher organisms are demonstrated by the superposition of bacterial ADAT/TadA with hA2, which produced an rmsd of 1.70 Å for 68 Cα positions. Like scCdd1, the position and orientation of the ZDD

208

DNA and RNA Modii cation Enzymes

helix-strand-helix element is highly similar to that of hA2, although the location varies for peripheral helical elements. Signiicantly, hA2 and the C-terminal deaminase domain of hA3G are spatially analogous on numerous levels and produced an rmsd of 1.88 Å for 95 Cα atoms (Fig. 4). Beyond the preservation of the ZDD motif, one striking similarity is the presence of all three lanking nonZDD helices in both the hA2 and hA3G structures (Fig. 4, dark blue). Two of these helices juxtapose the ZDD helices and reside at the β5-end of the β-sheet. he third helix sits underneath the β-sheet surface opposite to the ZDD helices. ADAR2 also exhibits the latter helix, which runs along the ‘amino’ end of the mostly parallel sheet. Although the C-terminal hA3G domain is missing strand β2, this segment is likely to exist in the full-length molecule. he rationale for this supposition is described below, but is supported by the strict evolutionary conservation of β2 in the fundamental deaminase architecture.4 Inspection of the human ADAR2 crystal structure (Fig. 4) suggests that the family of ADAR enzymes underwent the most substantial evolutionary divergence relative to other polynucleotide deaminases. However, the ADAR core β-sheet organization still exhibits visibly tangible similarity to a broad range of deaminases ranging from yeast (e.g., scCdd1) to humans (i.e., hA2), as supported by rmsd values of 2.02 Å for 41 and 31 Cα atoms, respectively. A general diference between the ADARs and related enzymes is that the ZDD motif of the former is not limited to a two amino acid ‘xx’ spacer that typically separates the Zn2+ ligands in the PCxxC motif (Fig. 2). Instead, the irst PC signature sequence localizes to a short loop in ADARs that precedes the second α-helix of the ZDD secondary structure (Fig. 4). he overall efect is that the ADAR2 active site appears somewhat sequestered compared to other deaminases such as hA2, which necessitates the observed substrate base-lipping mechanism (described below). Notably, the sequestration of deaminase active sites is relevant to their activity on polymeric RNA and DNA substrates. For RNA substrates, the editing deaminases can act on either single-stranded or duplex structures. We now consider such activity on a case-by-case basis to illustrate how enzymes with a common catalytic signature motif accomplish specialized activities.

Adenosine Deaminases hat Act on tRNAs (ADATs/Tads)

Adenosine is deaminated to inosine in the anticodon loops of several tRNAs from both eukaryotes and prokaryotes. 1-Methyinosine is present at position 37 in eukaryotic tRNAAla and arises from conversion of a genomically encoded adenosine to inosine catalyzed by an RNA-dependent deaminase followed by methylation at N1 by a diferent enzyme. he enzyme responsible for deamination at position 37 in the tRNA anticodon loop is ADAT1 (adenosine deaminase that acts on tRNA).1 Inosine is also present at position 34 in anticodon loops of several eukaryotic tRNAs as well as in tRNAArg2 in prokaryotes. In eukaryotes, deamination at this position results from the action of the heterodimeric ADAT2/ADAT3 protein complex.29, 29b In bacteria, a homodimer of the enzyme TadA (tRNA-dependent adenosine deaminase) deaminates position 34 in tRNAArg2 (ref. 30). he importance of adenosine deamination in tRNA is emphasized by the fact that TadA is essential for viability of E. coli and ADAT2 and ADAT3 are essential in S. cerevisiae.29,30

Details of the ADAT/Tad Structure

he tRNA-modifying adenosine deaminase family that includes ADATs 1, 2 and 3 in eukaryotes and TadA in prokaryotes (as well as the duplex RNA-dependent adenosine deaminases, next section) shares sequence homology and spatial conservation with the cytidine deaminases (Figs. 2 and 4). Each of these deaminases possesses the ZDD signature motif in which conserved residues are involved in formation of the Zn2+-containing active site (Fig. 3). he Zn2+-bound water molecule, alluded to previously, serves as a nucleophile in the reaction (i.e., a catalytic water) and is the source of the oxygen atom in the inosine product. In addition to conservation of the Zn2+-binding residues, the conserved Glu residue that hydrogen bonds the nucleobase Watson-Crick face also serves as a proton shuttle in the reaction (Fig. 5 and described below). TadA enzymes from several bacteria have been crystallized and their structures solved by X-ray difraction techniques.13,17-19 In addition, TadA from S. aureus bound to a mini-helix of the

Mechanism of Action and Structural Aspects of ADARS

209

Figure 5. Proposed deamination mechanism for RNA/DNA-dependent deaminases based on TadA. A color version of this figure is available at www.landesbioscience.com/curie.

anticodon stem loop of tRNAArg2 has been crystallized and the structure of the complex reported, providing insight into the basis for substrate selectivity in the deamination reaction (Fig. 6).19 he tRNAArg2 anticodon stem RNA used in the structure determination terminates in an unusual C:A pair followed by a 5ʹ-UNCGG-3ʹ loop where N is nebularine (purine ribonucleoside) (Fig. 6). Nebularine lacks the C6 amino group of the bona ide adenosine 34 deamination target and is thus unable to support the deamination process. As such, the structure provides a glimpse into the mode of substrate binding, rather than the product-bound state. Like numerous other deaminases,4 TadA functions as a dimer of identical subunits that binds the C:A pair and lanking loop nucleotides to induce a signiicant change in the conformation of the loop. his ‘induced it’ process exposes the N (presumably A), inserting it into the zinc-containing active site (Fig. 6). Speciicity in the deamination reaction arises from interactions between the splayed loop nucleotides and nucleotide-speciic pockets in TadA. In particular, the nebularine is bound at the active site with an extensive network of hydrogen bonding and van der Waals interactions. Indeed, each of the purine ring nitrogens available for hydrogen bonding (N1, N3 and N7) appears to engage in a hydrogen bond in the TadA active site. However, in contrast to adenosine or AMP deaminases, as well as cytidine deaminases that process free nucleosides or mononucleotides, TadA does not bind nebularine in its active site as the covalent hydrate.31 he catalytic water is too far removed from C6 for nebularine to be in the hydrated form. In addition, the conserved Glu55 proton shuttle is not hydrogen bonded to N1 of nebularine as expected given its role in other adenosine or cytidine deaminases (i.e., Fig. 3). hus, one could consider this a “predeamination” ground-state structure with the base poised to fully engage the active site, but not yet at the point of bond formation with the zinc-bound water. Additional movement into the active site is required to allow Glu55 to interact with N1 and the oxygen atom of the catalytic water to form a bond to C6.

he TadA Mechanism as a Paradigm for Adenosine and Cytidine Editing Enzymes

Measurements of kinetic isotope efects with a battery of diferentially-isotopically labeled anticodon stem loop RNAs with E. coli TadA provide us with a model for the transition state for the TadA reaction.32 hese studies involved heavy isotope labeling at diferent atoms of the position 34 adenosine including: 3H at the 5ʹ and 5ʹʹ positions and 3H at the 1ʹ position of the ribose

210

DNA and RNA Modii cation Enzymes

Figure 6. Recognition of a tRNA substrate analog by TadA as described in reference 19. A color version of this figure is available at www.landesbioscience.com/curie.

along with 13C at C6, 15N at N1 and 15N at the N6-NH2 group of the base. Isotope efects on the reaction rate were then measured and used to determine a transition-state structure for the reaction with quantum-chemical calculations. he results are consistent with a late nucleophilic aromatic substitution (SNAr) transition-state with complete hydroxyl-C6 bond formation, nearly complete N1 protonation and partial N6 amino group dissociation. he late SNAr transition state is similar to that observed previously for cytidine deaminase, underscoring the mechanistic link between these deaminases, as implied by their structural similarity (Fig. 4). In addition, a kinetic isotope efect observed with 5ʹ, 5ʹʹ 3H labeled substrate suggests ribosyl/backbone conformational changes occur on the path to the transition state. his is consistent with the observed changes in loop conformation when TadA binds its substrate (see above). A catalytic mechanism for TadA that summarizes the structural and mechanistic work described above is shown in Figure 5. Features of this reaction scheme are likely to be preserved for cytidine deaminases that act on R/DNA, although the latter enzymes have not been characterized as extensively. Signiicantly, a nucleotide extrusion step may not be necessary for cytidine deaminases that act on R/DNA, because they prefer single-stranded substrates rather than duplex species such as those recognized by ADAR/ADATs.

Adenosine Deaminases hat Act on Duplex RNA (ADARs)

A related yet distinct group of RNA-dependent adenosine deaminases are those that react at sites within double-helical segments found in a variety of diferent RNAs including mRNAs and premiRNAs.33 hese enzymes have been given the name ADAR for adenosine deaminase that acts on duplex RNA.2 Because ADARs convert adenosine to inosine within coding sequences, they are also RNA editing enzymes. Since inosine is decoded as guanosine during translation, this reaction can lead to codon changes and the introduction of amino acids into a protein not encoded in the gene. ADARs are known to alter codons in many diferent mRNAs including those encoding proteins important for nervous system function like glutamate and serotonin receptors.34,35 (For a more complete discussion, see chapter by Haele and O’Connell in this volume).

ADAR Function and Structure

he irst ADAR to be discovered (ADAR1) was originally identiied as a duplex RNA unwinding enzyme in Xenopus embryos.36 It was later shown that the unwinding activity arose

Mechanism of Action and Structural Aspects of ADARS

211

Figure 7. Domain structures for ADARs and APOBEC/AID deaminases based on primary sequence. A color version of this figure is available at www.landesbioscience.com/curie.

from the enzyme’s ability to deaminate adenosine in base-paired RNAs. his reaction creates an I:U mismatch that destabilizes the double-helical structure and, thus, “unwinds” the duplex. We now know that ADAR1 is expressed in two forms in human cells—a long form (p150) that is interferon-induced and found in the nucleus and cytoplasm and a short form (p110) found exclusively in the cell nucleus.37 he long form of ADAR1 is believed to play an antiviral role in the cell by nonselective deamination of viral duplex RNAs found in the cytoplasm. ADAR1 also has an essential function in mammals beyond the nervous system (see chapter by Haele and O’Connell in this volume). he failure of in vitro ADAR1 deamination assays to reveal editing at RNA sites known to be processed in vivo prompted the search for new RNA-editing adenosine deaminases. his work led to the discovery of ADAR2, an ∼80 kDa protein smaller than ADAR1 that harbors a diferent N-terminal domain organization (Fig. 7).38 Deletion of the ADAR2 gene in mice is lethal with homozygotes dying between postnatal day 0 to 20.39 Consistent with an important role for ADAR2 in the nervous system, ADAR2 null mice become progressively seizure prone ater postnatal day 12. ADARs 1 and 2 are expressed in several diferent tissues whereas a related protein referred to as ADAR3 is expressed exclusively in the brain.40 To date, no editing substrate has been identiied for ADAR3. ADARs are modular in their makeup with multiple independently folded domains that work in concert to achieve eicient and selective RNA editing (Fig. 7). RNA binding is controlled by sequence motifs known as double-stranded RNA binding motifs (dsRBMs) present in multiple copies (see chapter by Maris and Allain in this volume). In addition, ADAR1 has an N-terminal Z-domain similar to known Z-DNA binding domains.41 he C-terminal segment of ADARs contains the deaminase domain with the catalytic ZDD motif that is necessary to convert adenosine to inosine. As with the ADATs and TadA, the ADAR catalytic domain shares sequence similarity with cytidine deaminases (CDAs) (Fig. 2). he C-terminal deaminase domain of human ADAR2 (amino acids 306-700) has been solved by X-ray difraction methods (Fig. 8A).16 As expected from sequence similarities to CDA and TadA, ADAR2 ligates a Zn2+ ion with residues H394, C451 and C516, which are conserved in the ZDD motif (Figs. 2 and 3) and characteristic of the CDA protein family (Fig. 8B). he fourth ligand to zinc is a water molecule that also hydrogen bonds to E396—another conserved

212

DNA and RNA Modii cation Enzymes

Figure 8. Crystal structure of the deaminase domain of human ADAR2.16 A) Fold of deaminase domain indicating the locations of the active site (AMP) and the IP6 pocket. B) Zn-containing active site. C) Residues of the IP6 site. A color version of this figure is available at www. landesbioscience.com/curie.

catalytic residue as described above. However, one key diference between the structure of the ADAR2 active site versus those of CDAR/Ds and TadA is the presence of a loop in the former enzyme, which harbors T375 (Fig. 8B). Modeling of AMP into the ADAR2 active site suggests T375 is in close proximity to the ribose.16 Indeed, modeling CMP in lieu of AMP produces a clash between the cytidine ribose and T375. his obstruction arises because the larger purine ring of adenine reaches farther into the active site than its more diminutive pyrimidine counterpart and thus accesses the zinc bound water. As such, the presence of the T375 loop in ADARs provides a plausible discrimination element that dictates substrate selectivity for an enzyme that is otherwise related to the CDAR/D architecture. Aside from its substrate selectivity, the ADAR2 catalytic domain revealed a structural feature not seen before among deaminases. An inositol hexakisphosphate (IP6) molecule was found buried in the core of the protein, hydrogen bonded to numerous polar residues that appear to be conserved in the ADARs and ADAT1 (Fig. 8C). IP6 was not added to puriication bufers or during crystallization so the protein must have sequestered it during expression in yeast. An important role for IP6 in ADAR function is implied by the fact that active ADAR2 is isolable from overexpression in S. cerevisiae only when the biosynthetic pathway for IP6 formation is intact. It seems likely that IP6 is required for ADAR and ADAT1 folding, since it is dii cult to imagine ADAR2 maintaining the fold identiied in the crystal structure without IP6 present. A network of hydrogen bonds between the phosphate groups of the IP6 molecule and basic residues at the active site has been noted, suggesting that the metabolite may ine-tune the enzyme’s catalytic ability.

Mechanism of Action and Structural Aspects of ADARS

213

Figure 9. Synthetic RNA duplex used for substrate analog studies of ADAR2. A color version of this figure is available at www.landesbioscience.com/curie.

he ADAR2 Mechanism

One of the consequences of the double-helical nature of the ADAR substrate is the requirement for conformational changes in the RNA prior to deamination. It is clear from the structure of the catalytic domain of ADAR2 that the reactive nucleotide must adopt a conformation that removes the edited base from the helical stack before it can access the zinc-containing active site. he issue of conformational changes in the ADAR substrate was addressed using RNAs bearing the luorescent base 2-aminopurine (2AP) at diferent positions, including at a known editing site (Fig. 9).42,43 Stacking into a duplex quenches the luorescence of 2AP. hus, 2-AP can be used as a probe of the stacking environment of a nucleotide under diferent experimental conditions. hese studies demonstrated that ADAR2 causes a conformational change in an RNA substrate consistent with lipping the reactive base from the helix into the enzyme active site. Molecular dynamics simulations were also used to study base lipping processes for adenosines in diferent duplex RNA sequence environments.44 hese eforts demonstrated that an adenosine at a known editing site (R/G of GluR-B) is more prone to move out of the helical stack than other adenosines present in the simulated duplex. hus, the local structure of the RNA may facilitate the base-lipping step in the editing reaction. It remains to be determined the extent to which an increased propensity to base lip afects the rate of the deaminase reaction at diferent editing sites. Protein conformational changes were studied by monitoring diferences in the tryptophan luorescence of ADAR2 when RNA binds.42 he results point to a coupling of RNA substrate binding and conformational rearrangements in the ADAR2 catalytic domain, consistent with a report suggesting that ADAR2 exists in an autoinhibited conformation until it binds an RNA substrate capable of engaging both of its dsRBMs.45 he presence of multiple dsRBDs in ADARs led to the question of whether self-association of subunits is necessary for catalytic activity. Several groups have addressed this topic46-51 with early work suggesting that ADARs function as dimers. However, recent analytical gel iltration and equilibrium sedimentation studies with highly puriied ADAR2 samples indicate the enzyme exists as a monomer in the absence of RNA.51 Given the ability of ADARs to bind nonselectively to duplex RNAs of suicient length, the oligomerization observed likely arises from multiple enzyme molecules associating with a given RNA. his type of RNA-mediated oligomerization may be important for editing activity and continues to be a controversial issue. Details of CDAR/ CDAD self-association are discussed in the ensuing sections. As described in the preceding sections, the ADAT and TadA enzymes function as subunit dimers. he spatial organization and ZDD conservation of the ADAR active site imply a deamination mechanism similar to that of TadA, regardless of the extensive amino acid insert in the PCxxC motif (Figs. 2 and 4, i.e., pink elements in ADAR2 of the latter igure). As described above, TadA uses a zinc-bound water molecule to carry out hydrolytic deamination with a conserved glutamic acid (E70 in E. coli) available for proton transfer. Mutation of the ADAR active site residues involved in zinc

214

DNA and RNA Modii cation Enzymes

binding (H394, C451 and C516 in ADAR2) causes loss of activity, as does mutation of the conserved glutamic acid (E396 in ADAR2). he reactivity of substrate analogs—determined with a duplex RNA mimic of structure surrounding the editing site of glutamate receptor B subunit pre-mRNA— also generally supports the proposed hydrolytic deamination mechanism (Fig. 5). ADAR2 does not absolutely require the 2ʹ-hydroxyl group at the editing site (e.g., 2ʹ-deoxyadenosine is deaminated with a moderately reduced rate, but deaminates 2ʹ-O-methyladenosine very slowly).52 Interestingly, snoRNA-directed 2ʹ-O-methylation at an editing site in a serotonin receptor pre-mRNA is used naturally for regulating editing at that site.53 Also, a large rate acceleration is realized for the ADAR2 reaction when adenosine was replaced with 8-azaadenosine.54 he intrinsic diference in hydration free energies of purine vs 8-azapurine has been estimated to be as much as 7 kcal/mole.55 his is largely a result of the diference in resonance energy, with the purine ring system signiicantly more stabilized by resonance. hus, 8-aza substitution makes hydration of the purine ring a more favorable process. he fact that this substitution accelerated the ADAR reaction rate suggests that the covalent hydration step is rate limiting; for the substrates tested, see Fig. 9. Furthermore, the observation that 8-azaadenosine was an excellent substrate for ADAR2 led to a method for mechanism-based trapping of the enzyme bound to RNA bearing 8-azanebularine at the site of editing.56 Given these results and the similarities between ADARs and TadA, a mechanism for ADAR2 can be proposed that is analogous to TadA (Fig. 5). Initially, the dsRBM domain binds certain sites on a duplex RNA substrate. If enough recognition surface is present, allowing both dsRBMs to bind simultaneously, the deaminase domain is relieved of autoinhibition and contacts the RNA. he reactive adenosine is lipped out of the helix and occupies the active site. he zinc-bound hydroxide attacks the C6 position of the purine ring with protonation at N1 forming the high-energy Meisenheimer intermediate. Proton transfer from the C6 hydroxyl to N6 mediated by E396 followed by departure of ammonia yields the inosine product. In the TadA reaction, proton transfer from the C6 hydroxyl to the leaving group appears to be rate limiting given the observed kinetic isotope efects and calculated transition-state structure.32 However, this approach has not yet been applied to the study of the ADAR reaction. Data with substrate analogs support formation of the Meisenheimer intermediate as rate determining for ADAR2 (Fig. 5).54,56 Furthermore, since ADARs deaminate adenosine in a variety of diferent RNAs, the rate-determining step could be context dependent.54 For instance, in particularly stable duplex structures, base lipping may be slow and rate determining overall. At present, there are no comparably detailed mechanistic analyses of APOBEC-related enzymes, mostly due to diiculties in their puriication. As such, it is largely accepted, but untested, that these enzymes function similarly to free-nucleoside cytidine deaminases,4 although ssDNA substrates harboring zebularine did not confer tighter binding to pure hA3G (Wedekind and Smith, unpublished results), which parallels observations with ADARs and TadA in the use of nebularine. Until further experimental details are forthcoming, the proposed mechanism of ADARs and ADAT/Tads is likely to be a reasonable approximation for APOBEC-related proteins.

APOBEC-1, AID and APOBEC2 Cytidine Deaminases

APOBEC-1 is the founding member of a cytidine deaminase family whose members catalyze the C-to-U deamination of single-stranded RNA or DNA substrates. A cellular and molecular perspective of APOBEC-1, its related proteins as well as their biology is provided in chapter by Smith in this volume and elsewhere.5,6,57-60 Although knockouts of APOBEC-1 have no apparent phenotype,61-63 ablation of its obligate complementation factor ACF, which is required for cognate apoB mRNA recognition, is embryonic lethal in mice at a nascent stage.64 Despite APOBEC-1’s early discovery relative to other family members, little information is available regarding its three-dimensional structure at the molecular level, although the enzyme continues to provide a paradigm for ‘editosome’ (auxiliary factor) mediated RNA editing, cellular traicking and molecular regulation through hierarchical assembly with cofactors (chapter by Smith in this volume). AID was the second member of the APOBEC-1 protein family to be discovered and is essential for production of high-ainity antibodies in vertebrates (chapter by Smith and chapter by Parisien and Bhagwat in this volume). Its isolation by Tasuku Honjo and colleagues65 led to the breakthrough that AID−/− mice were deicient in class switch recombination and somatic

Mechanism of Action and Structural Aspects of ADARS

215

hypermutation of immunoglobulin genes.66 AID knockout mice developed enlarged lymphoid organs, which provided clues that AID is the causative factor in the well-known human disease hyper-IgM syndrome type II (HIGM2).67 Unlike APOBEC-1, which targets apoB mRNA, AID in the absence of cofactors deaminates dC-to-dU within WRC ‘hot spot’ sequences, but not dsDNA, hybrids or ssRNA.68,69 he observation that AID targets dC in the nontemplate strand of actively transcribed genes70-72 suggested that AID prefers ssDNA as its biological substrate, although an RNA editing role has not be excluded at this time.57,73 Structural analysis of AID would provide a direct means of visualizing the mode of substrate binding in the enzyme active site. Although modeling of substrates into a comparative model for AID has been reported,8 few conclusions could be drawn about substrate speciicity due to a lack of experimental restraints describing how APOBEC family members actually bind DNA (or RNA) at the molecular level. General diiculties in producing suicient quantities of AID for structural investigations prompted Chen and Goodman to collaborate on the crystallographic structure determination of a homologous human protein, hA2,20 which provided the irst structural insight in the APOBEC family at the molecular level (Fig. 10A). hA2 does not support apoB mRNA editing activity despite apparent similarity to APOBEC-1. Reports of hA2 deamination activity on free nucleosides74,75 are tempered by contradictory observations76 and knockout of A2 in mice resulted in no phenotypic changes.76 Nonetheless, the overall primary-structure organization of hA2 is similar to APOBEC-1 and AID, although hA2 has a distinctly longer N-terminus, but a shorter C-terminus (Fig. 7). Sequence homology between hA2 with APOBEC-1 and AID is 41% and 44%, respectively, over 224 amino acids. his level of similarity makes the hA2 crystal structure appropriate to model the APOBEC-1 and AID folds, in spite of the absence of detectable A2 activity or function.

hA2 and AID Intersubunit Interactions: A Comparative Modeling Approach

One noteworthy aspect of the hA2 crystal structure was the observation that the enzyme crystallized as an elongated tetramer, 127 Å in length20 (Fig. 10A). his result suggested that the APOBEC lineage may be distinctly structured compared to other deaminases that prefer a more compact oligomeric organization of subunits.4 hus, although the fundamental deaminase fold is preserved (Fig. 4), the manner by which subunits self-associate appears diferent among proteins of the CDA superfamily.4 Eforts to correlate the crystallographic observation of an hA2 tetramer with solution analysis were ambivalent. Gel iltration suggested that the molecule was a dimer,20 although the molecular masses of rod-shaped particles are notoriously diicult to characterize by this method since they elute more rapidly then their globular counterparts.77-79 hus, the question of whether hA2 is dimeric or tetrameric in solution requires independent conirmation by a more sensitive method. Knowledge of AID’s oligomeric state has implications for function, but like hA2 a strong consensus has not been forthcoming among investigators. Use of atomic force microscopy (AFM) by Diaz and coworkers led to the conclusion that AID functions as an isolated subunit.80 he Papavasiliou lab reported an AID tetramer,81 which agrees with the hA2 crystal structure. Further support for AID intersubunit association comes from the Honjo lab, who presented evidence that AID’s oligomeric state contributes to disease. Speciically, patients exhibiting the heterozygous R190X AID mutation exhibited HIGM2, suggesting a dominant-negative efect resulting from the combination of wild type and mutant AID molecules.82 To interrogate AID’s intersubunit interactions using a rational approach, Chen, Goodman and coworkers made a molecular model derived from the hA2 crystal structure and then generated point mutants to assess efects on ssDNA deaminase activity in the context of recombinant AID expressed in E. coli.20 he results provide compelling evidence for two distinct AID intersubunit contact interfaces. One interface corresponds to a central “dimer-of-dimers” interface that features α-helical interactions and includes amino acids K16, R19, R24, R112, Y114, F115 and C116 (Fig. 10B). In the AID model, these residues map to the subunit interface between dimers (i.e., the tetrameric interface) and each point mutant or combinations thereof results in a loss of

216

DNA and RNA Modii cation Enzymes

Figure 10. Proposed AID model and subunit interfaces. A) A putative tetrameric arrangement of AID subunits as described in reference 20. Each subunit is colored differently. B) The model tetramer interface with mutations mapped in red ball-and-stick models from reference 20. C) The model dimer interface with point mutants mapped from reference 20. The model was constructed as described in reference 105. The red and gold polypeptide chains are presumed modes of dominant-negative dimerization derived from heterozygous individuals with HIGM2.65,82-84 The gold coil represents a putative polypeptide from a frameshift mutation. A color version of this image is available at www.landesbioscience.com/curie.

deamination activity,20 which suggests protein unfolding or a requirement for oligomerization in C-to-U deamination. Notably, mutants R24W, R112H and R112C, which reside at this putative interface, are associated with HIGM2 and each results in dramatic losses in SHM and CSR activities,82 possibly consistent with disruption of a subunit interface. he elongated β2-strand of hA2 contributes to backbone dimerization with a neighboring subunit, suggesting a second mode of intermolecular contact (Fig. 10C, red and blue strand interface). Like the aforementioned “dimer-of-dimers” interface, AID amino acid point mutants were

Mechanism of Action and Structural Aspects of ADARS

217

prepared that led to loss of deaminase activity.20 hese included F46A and Y48A, which reside on a 2-fold axis of symmetry between subunits in the hA2-based AID model. hese loss-of-function mutants imply an important role for maintenance of the dimer interface in AID activity. Also of interest are AID truncations that arise in HIGM2 patients. One set of patients exhibited heterozygous AID variants comprising a W68X (stop) truncation accompanied by a downstream L59F(Δ60 to 61) deletion mutant.65,83 hese defective genes were veriied for expression65 and suggested the possibility of dominant-negative heterodimers whose putative arrangement is depicted in Figure 10C (blue and red polypeptides). However, it is also conceivable that the L59F(Δ60 to 61) AID subunit alone, or its dimeric form, is suicient to produce the abnormally low levels of SHM and CSR activity observed in patients. he combined L59F mutant and (Δ60 to 61) deletion occur in the irst, conserved ZDD helix (Fig. 10C, dark blue patch with Δ symbol), which is expected to destabilize the active site. In contrast, patients homozygous for the W68X variant exhibited neither SHM nor CSR activity,65 consistent with loss of active site formation. A more compelling argument for a dominant-negative AID subunit interaction arises from HIGM2 patients who express one polypeptide comprising amino acids 1-75 followed by a frameshit that prematurely stops at 116X.84 his fragment contains the red and gold regions in Figure 10C and is heterozygously paired with an AID subunit harboring an F11V point mutant. Although the latter amino acid is conserved among species, it resides in a short α-helix at the N-terminus of the AID model, distant from the active site. Moreover, an F to V change is relatively modest and deleterious efects from homozygous combination of the F11V mutation have not been reported, unlike numerous other point substitutions that lead to HIGM2. As such, one interpretation of the latter heterozygous pair is that a dominant-negative efect arises through subunit dimerization via β2 (Fig. 10C, blue, gold and red polypeptides), thus leading to AID inactivation. Such observations, when coupled with directed mutagenesis in vitro, provide evidence that AID requires dimerization via an hA2-like dimer interface. hese indings parallel prior reports on APOBEC-1 that support its dimerization,22,85,86 which is required for activity.22 We now turn our attention to hA3G, which has been reported in a variety of oligomeric states as well.

APOBEC3G Domain Organization and Evidence for Subunit Oligomerization

Like APOBEC-1, the activity of hA3G appears to be modulated through formation of high molecular mass (HMM) assemblies. Smith and coworkers irst demonstrated that 60S particles comprising dormant APOBEC-1 reside in the cytoplasm, but become active as 27S variants in the nucleus where apoB mRNA editing occurs87 (see chapter by Smith in this volume). Likewise, hA3G forms high molecular mass complexes that lead to its inactivation as an anti-viral factor in vivo.88-91 In contrast, low molecular mass forms, consistent with less complex subunit organization, demonstrate anti-viral activity.88,92 herefore, the factors that inluence hA3G oligomerization represent an important means to promote antiviral function. Towards this goal, several labs have produced recombinant hA3G to investigate its fundamental biochemical and biophysical properties. Evidence for hA3G functioning as a dimer in vitro came from the Goodman lab, who showed the puriied enzyme processively edited ssDNA substrates;93 work by Levin and Strebel and colleagues also established the ssDNA binding preference of the pure enzyme.94 Subsequently, the Smith and Wedekind labs investigated the molecular shape and volume of hA3G by small angle X-ray scattering (SAXS).95 he results suggested hA3G forms an elongated, 140 Å dimer with a tail-to-tail subunit interface. Notably, this model was derived without knowledge of the hA2 crystal structure, whose subsequent structure determination corroborated the elongated shape and tail-to-tail subunit features of hA3G.96 he hA3G polypeptide chain is notably longer than that of a single hA2 subunit (Fig. 7) and appears to have evolved by gene duplication giving rise to two deaminase domains, each with a characteristic ZDD motif.5 As such, it is tempting to speculate that hA3G forms a pseudo-dimeric subunit interface between the β2-strand-equivalents from the respective N- and C-terminal deaminase regions, akin to hA2 or AID dimers (e.g., Fig. 10C). If such an interaction were present, it would bolster

218

DNA and RNA Modii cation Enzymes

support for intermolecular dimerization of hA3G subunits by means of the equivalent tetrameric (“dimer-of-dimers”) interface of hA2 or AID (Fig. 10B). Two-hybrid analysis of hA3G supports self-, as well as heterosubunit oligomerization among some hA3 family members.9 More recently, AFM analysis by the Goodman lab reported that hA3G forms dimeric, tetrameric and higher order complexes dependent on DNA.97 Indeed, one of the contributing factors to hA3G’s oligomeric complexity has been noted to be “bridging” nucleic acid,92 which cannot be dismissed as a source of intersubunit stability in dimeric or higher order assemblies.95 his theme appears analogous to ADAR2, as mentioned above. Using a deletion-mutation approach Harris and Matsuo identiied a wild type sequence of the C-terminal half of hA3G (amino acids 198-384) that eluted as a monomer in gel iltration.98 Sixty-nine Ala point mutants were made and mapped onto the hA2 crystal structure suggesting that its subunit structure was a reasonable approximation of the hA3G fold. Subsequently, a series of point mutants to remove Cys (C243A, C321A and C356A), as well as hydrophobic-to-hydrophilic solubilizing changes to Lys (L234K and F310K), were incorporated into the hA3G C-terminal construct to confer stability for NMR analysis21 (Fig. 4). he resulting construct appeared monomeric by analytical ultracentrifugation and exhibited a circular dichroism spectrum consistent with that of the full-length protein reported elsewhere.95 Importantly, the modiied C-terminal hA3G domain was suicient for DNA deaminase activity in a bacterial reporter assay21 suggesting proper folding, although its antiviral properties were not described. he overall fold and topology of the hA3G C-terminal domain were remarkably similar to hA2 (Fig. 4) with one exception. Speciically, the hA3G NMR structure is devoid of the otherwise strictly conserved β2-strand, which is replaced by a somewhat disordered coil (Fig. 4). he a posteriori mapping of Ala point mutations onto this coil, from W232 to C243, reveals that 4 of 7 hydrophobic mutations resulted in impaired deaminase activity,98 which is not per se characteristic of a surface residue. his observation is heightened in importance by the fact that the L234K and C243A mutations employed for NMR stabilization map to the β2-strand and appear spatially reminiscent of AID point mutations that abolish activity20 (Fig. 10C). Whether the N- and C-terminal hA3G deaminase domains evolved to form a pseudo-dimeric interaction via a β2-strand interface by analogy to hA2,20 or possibly AID, remains an open question that will be resolved best in the context of the full-length enzyme. Regardless, the NMR structure of hA3G’s C-terminal region has provided an important, incremental advance in understanding the structure and function of this important antiviral factor.

Future Directions

In closing, it is appropriate to relect on the words of Samuel Karlin who stated, “the purpose of models is not to it the data but to sharpen the questions”.99 In this respect, the structure-function analyses described herein have led to new heights in molecular understanding, but have also revealed signiicant gaps in our knowledge. A major take-home message is that the core fold of adenosine and cytidine deaminases that act on RNA and DNA are highly similar, although their peripheral elements have been diversiied to achieve specialized substrate and presumably regulatory factor binding. Such diversiication has proven challenging when it comes to homology modeling and demands that any model be heavily tested by experiments that relate structure to function. In the latter regard, hA2 appears to be a suitable model for the AID subunit interactions, although more work is required to be certain. A second important lesson is that proteins derived from higher organisms have proven complicated to purify, which has necessitated a ‘divide-and-conquer’ strategy that has been decidedly successful for ADAR2, hA2 and hA3G—yet major questions remain. If the ield is to make advances, it must work to experimentally deine the properties of intact molecules, especially with regard to substrate binding. hus far, only TadA has been crystallized in the presence of a substrate analog, although the results suggest it is a precatalytic ground-state. Notably, the NMR analysis of the hA3G C-terminus provides a tantalizing glimpse regarding the mode of ssDNA

Mechanism of Action and Structural Aspects of ADARS

219

binding by chemical shit perturbation, although this work precludes the construction of a detailed molecular model. Existing ADAR, TadA, hA2 and hA3G structures also suggest these molecules are likely to exhibit signiicant plasticity as they undertake their biological functions. As such, it will be necessary to investigate the molecular conformations of the molecules in multiple states along their respective reaction coordinates. Such conformations will be inluenced undoubtedly by auxiliary domains, or binding by trans-acting factors, such as the dsRBD of ADAR2 (chapter by Maris and Allain in this volume) or ACF in the case of APOBEC-1 (chapter by Smith in this volume). It is also unclear how binding to RNA or DNA inluences the oligomeric state of these enzymes, although several APOBEC proteins require treatment with RNase to produce low molecular mass and/or enzymatically active species. A large step toward characterization of ADAR/T and CDAR/D enzymes in homogeneous molecular states for structural analysis will be to simultaneously investigate their chemical mechanisms. Elegant analysis of TadA revealed important details of the transition state and studies with ADAR2 substrates containing purine analogs aided in the generation of a mechanism-based inhibitor. Such tight-binding inhibitors will beneit future structural studies and progress must be made for APOBEC family members. his work may be of signiicant practical signiicance. AID has been linked to certain lymphomas100,101 (reviewed in ref. 102) including follicular lymphoma, which undergoes active mutation with AID expression.103 his observation suggests that the selective use of mechanism-based inhibitors in the context of ssDNA could prove to be a potent and selective means to target disease agents, without the adverse pleiotropic side efects characteristic of free nucleoside analogs.

Acknowledgements

We thank H.C. Smith, J. Alfonso, A. Bhagwat, M. O’Connell, R. Spitale and J. Salter for critical comments and suggestions. Support for this work was provided in part by PHS NIH grants AI076085 to J.E.W and GM061115 to P.A.B.

References

1. Gerber A, Grosjean H, Melcher T et al. Tad1p, a yeast tRNA-speciic adenosine deaminase, is related to the mammalian pre-mRNA editing enzymes ADAR1 and ADAR2. EMBO J 1998; 17:4780-4789. 2. Bass BL, Nishikura K, Keller W et al. A standardized nomenclature for adenosine deaminases that act on RNA. RNA 1997; 3:947-949. 3. Mian IS, Moser MJ, Holley WR et al. Statistical modelling and phylogenetic analysis of a deaminase domain. J Comput Biol 1998; 5:57-72. 4. MacElrevey C, Wedekind JE. Chemistry, phylogeny and structure of the APOBEC family. In: Smith HC, ed. RNA and DNA Editing: Molecular mechanisms and their integration into biological systems. New Jersey: Wiley-Interscience 2008:369-419. 5. Wedekind JE, Dance GS, Sowden MP et al. Messenger RNA editing in mammals: new members of the APOBEC family seeking roles in the family business. Trends Genet 2003; 19:207-216. 6. Smith HC, Wedekind JE, Xie K et al. Fine-tuning of RNA functions by modiication and editing in mammalian C to U editing. In: Grosjean H, ed. Topics in Current Genetics. Berlin: Springer-Verlag, 2005:1610-2096. 7. Carter CW Jr. Nucleoside deaminase for cytidine and adenosine: Comparison with Deaminases Acting on RNA. In: Grosjean H, Benne R, eds. Modiication and Editing of RNA. Washington, D.C.: ASM Press, 1998:1-596. 8. Xie K, Sowden MP, Dance GS et al. he structure of a yeast RNA-editing deaminase provide insight into the fold and function of activation—induced deaminase and APOBEC-1. Proc Natl Acad Sci USA 2004; 101:8114-8119. 9. Jarmuz A, Chester A, Bayliss J et al. An anthropoid-speciic locus of orphan C to U RNA-editing enzymes on chromosome 22. Genomics 2002; 79:285-296. 10. Betts L, Xiang S, Short SA et al. Cytidine deaminase. he 2.3 A crystal structure of an enzyme: transition-state analog complex. J Mol Biol 1994; 235:635-656. 11. Conticello SG, homas CJ, Petersen-Mahrt SK et al. Evolution of the AID/APOBEC family of polynucleotide (deoxy)cytidine deaminases. Mol Biol Evol 2005; 22:367-377. 12. Johansson E, Mejlhede N, Neuhard J et al. Crystal structure of the tetrameric cytidine deaminase from Bacillus subtilis at 2.0 Å resolution. Biochemistry 2002; 41:2563-2570.

220

DNA and RNA Modii cation Enzymes

13. Elias Y, Huang RH. Biochemical and structural studies of A-to-I editing by tRNA:A34 deaminases at the wobble position of transfer RNA. Biochemistry 2005; 44:12057-12065. 14. Teh AH, Kimura M, Yamamoto M et al. he 1.48 A resolution crystal structure of the homotetrameric cytidine deaminase from mouse. Biochemistry 2006; 45:7825-7833. 15. Chung SJ, Fromme JC, Verdine GL. Structure of human cytidine deaminase bound to a potent inhibitor. J Med Chem 2005; 48:658-660. 16. MacBeth MR, Schubert HL, Vandemark AP et al. Inositol hexakisphosphate is bound in the ADAR2 core and required for RNA editing. Science 2005; 309:1534-1539. 17. Kim J, Malashkevich V, Roday S et al. Structural and kinetic characterization of Escherichia coli TadA, the wobble-speciic tRNA deaminase. Biochemistry 2006; 45:6407-6416. 18. Kuratani M, Ishii R, Bessho Y et al. Crystal structure of tRNA adenosine deaminase (TadA) from Aquifex aeolicus. J Biol Chem 2005; 280:16002-16008. 19. Losey HC, Ruthenburg AJ, Verdine GL. Crystal structure of Staphylococcus aureus tRNA adenosine deaminase TadA in complex with RNA. Nat Struct Mol Biol 2006; 13:153-159. 20. Prochnow C, Bransteitter R, Klein MG et al. he APOBEC-2 crystal structure and functional implications for the deaminase AID. Nature 2007; 445:447-451. 21. Chen KM, Harjes E, Gross PJ et al. Structure of the DNA deaminase domain of the HIV-1 restriction factor APOBEC3G. Nature 2008; 452:116-119. 22. Navaratnam N, Fujino T, Bayliss J et al. Escherichia coli cytidine deaminase provides a molecular model for ApoB RNA editing and a mechanism for RNA substrate recognition. J Mol Biol 1998; 275:695-714. 23. Dance GS, Beemiller P, Yang Y et al. Identiication of the yeast cytidine deaminase CDD1 as an orphan C→U RNA editase. Nucleic Acids Res 2001; 29:1772-1780. 24. Smith HC. Measuring editing activity and identifying cytidine-to-uridine mRNA editing factors in cells and biochemical isolates. Methods Enzymol 2007; 424:389-416. 25. Petersen-Mahrt SK, Harris RS, Neuberger MS. AID mutates E. coli suggesting a DNA deamination mechanism for antibody diversiication. Nature 2002; 418:99-103. 26. Petersen-Mahrt SK, Neuberger MS. In vitro deamination of cytosine to uracil in single-stranded DNA by apolipoprotein B editing complex catalytic subunit 1 (APOBEC1). J Biol Chem 2003; 278:19583-19586. 27. Harris RS, Petersen-Mahrt SK, Neuberger MS. RNA editing enzyme APOBEC1 and some of its homologs can act as DNA mutators. Mol Cell 2002; 10:1247-1253. 28. Kurtz JE, Exinger F, Erbs P et al. New insights into the pyrimidine salvage pathway of Saccharomyces cerevisiae: requirement of six genes for cytidine metabolism. Curr Genet 1999; 36:130-136. 29. Gerber AP, Keller W. An adenosine deaminase that generates inosine at the wobble position of tRNAs. Science 1999; 286:1146-1149. 29b. Auxilien, Crain, Trewyn et al. Mechanism, specificity and general properties of the yeast enzyme catalysing the formation of inosine 34 in the anticodon of transfer RNA. JMB 1996; 262:437-458. 30. Wolf J, Gerber A, Keller W. tadA, an essential tRNA-speciic adenosine deaminase from Escherichia coli. EMBO J 2002; 21:3841-3851. 31. Wilson DK, Rudolph FB, Quiocho FA. Atomic structure of adenosine deaminase complexed with a transition-state analog: understanding catalysis and immunodeiciency mutations. Science 1991; 252:1278-1284. 32. Luo M, Schramm VL. Transition state structure of E. coli tRNA-speciic adenosine deaminase. J Am Chem Soc 2008; 130:2649-2655. 33. Maydanovych O, Beal PA. Breaking the central dogma by RNA editing. Chem Rev 2006; 106:3397-3411. 34. Higuchi M, Single FN, Kohler M et al. RNA editing of AMPA receptor subunit GluR-B: a base-paired intron-exon structure determines position and eiciency. Cell 1993; 75:1361-1370. 35. Burns CM, Chu H, Rueter SM et al. Regulation of serotonin-2C receptor G-protein coupling by RNA editing. Nature 1997; 387:303-308. 36. Bass BL, Weintraub H. A developmentally regulated activity that unwinds RNA duplexes. Cell 1987; 48:607-613. 37. Patterson JP, Samuel CE. Expression and regulation by interferon of a double-stranded-RNA-speciic adenosine deaminase from human cells: evidence for two forms of the deaminase. Mol Cell Biol 1995; 15:5376-5388. 38. Melcher T, Maas S, Herb A et al. A mammalian RNA editing enzyme. Nature 1996; 379:460-464. 39. Higuchi M, Maas S, Single FN et al. Point mutation in an AMPA receptor gene rescues lethality in mice deicient in the RNA-editing enzyme ADAR2. Nature 2000; 406:78-81. 40. Melcher T, Maas S, Herb A et al. RED2, a brain-speciic member of the RNA-speciic adenosine deaminase family. J Biol Chem 1996; 271:31795-31798. 41. Schwartz T, Rould MA, Lowenhaupt K et al. Crystal structure of the Z-alpha domain of the human editing enzyme ADAR1 bound to let-handed Z-DNA. Science 1999; 284:1841-1845.

Mechanism of Action and Structural Aspects of ADARS

221

42. Yi-Brunozzi H-Y, Stephens OM, Beal PA. Conformational changes that occur during an RNA-editing adenosine deamination reaction. J Biol Chem 2001; 276:37827-37833. 43. Stephens OM, Yi-Brunozzi HY, Beal PA. Analysis of the RNA-editing reaction of ADAR2 with structural and luorescent analogues of the GluR-B R/G editing site. Biochemistry 2000; 39:12243-12251. 44. Hart K, Nystrom B, Ohman M et al. Molecular dynamics stimulations and free energy calculations of base lipping in dsRNA. RNA 2005; 11:609-618. 45. MacBeth MR, Lingam AT, Bass BL. Evidence for auto-inhibition by the N-terminus of hADAR2 and activation by dsRNA binding. RNA 2004; 10:1563-1571. 46. Cho DS, Yang W, Lee JT et al. Requirement of dimerization for RNA editing activity of adenosine deaminases acting on RNA. J Biol Chem 2003; 278:17093-17102. 47. Jaikaran DC, Collins CH, MacMillan AM. Adenosine to inosine editing by ADAR2 requires formation of a ternary complex on the GluR-B R/G site. J Biol Chem 2002; 277:37624-37629. 48. Gallo A, Keegan LP, Ring GM et al. An ADAR that edits transcripts encoding ion channel subunits functions as a dimer. EMBO J 2003; 22:3421-3430. 49. Chilibeck KA, Wu T, Liang C et al. FRET analysis of in vivo dimerization by RNA-editing enzymes. J Biol Chem 2006; 281:16530-16535. 50. Valente L, Nishikura K. RNA binding-independent dimerization of adenosine deaminases acting on RNA and dominant negative efects of nonfunctional subunits on dimer functions. J Biol Chem 2007; 282:16054-16061. 51. MacBeth MR, Bass BL. Large-scale overexpression and puriication of ADARs from Saccharomyces cerevisiae for biophysical and biochemical studies. Methods Enzymol 2007; 424:319-331. 52. Yi-Brunozzi H-Y, Easterwood LM, Kamilar GM et al. Synthetic substrate analogs for the RNA-editing adenosine deaminase ADAR-2. Nucleic Acids Res 1999; 27:2912-2917. 53. Vitali P, Basyuk E, LeMeur E et al. ADAR2-mediated editing of RNA substrates in the nucleolus is inhibited by C/D small nucleolar RNAs. J Cell Biol 2005; 169:745-753. 54. Veliz EA, Easterwood LM, Beal PA. Substrate analogues for an RNA-editing adenosine deaminase: mechanistic investigation and inhibitor design. J Am Chem Soc 2003; 125:10867-10876. 55. Erion MD, Reddy MR. Calculation of relative hydration free energy diferences for heteroaromatic compounds: use in the design of adenosine deaminase and cytidine deaminase inhibitors. J Am Chem Soc 1998; 120:3295-3304. 56. Haudenschild BL, Maydanovych O, Veliz EA et al. A transition state analogue for an RNA-editing reaction. J Am Chem Soc 2004; 126:11213-11219. 57. Honjo T. A memoir of AID, which engraves antibody memory on DNA. Nat Immunol 2008; 9:335-337. 58. Chiu YL, Greene WC. he APOBEC3 cytidine deaminases: an innate defensive network opposing exogenous retroviruses and endogenous retroelements. Annu Rev Immunol 2008; 26:317-353. 59. Goila-Gaur R, Strebel K. HIV-1 Vif, APOBEC and intrinsic immunity. Retrovirology 2008; 5:51. 60. Peled JU, Kuang FL, Iglesias-Ussel MD et al. he biochemistry of somatic hypermutation. Annu Rev Immunol 2008; 26:481-511. 61. Hirano K, Young SG, Farese RV Jr et al. Targeted disruption of the mouse apobec-1 gene abolishes apolipoprotein B mRNA editing and eliminates apolipoprotein B48. J Biol Chem 1996; 271:9887-9890. 62. Nakamuta M, Chang BH, Zsigmond E et al. Complete phenotypic characterization of apobec-1 knockout mice with a wild-type genetic background and a human apolipoprotein B transgenic background and restoration of apolipoprotein B mRNA editing by somatic gene transfer of Apobec-1. J Biol Chem 1996; 271:25981-25988. 63. Morrison JR, Paszty C, Stevens ME et al. Apolipoprotein B RNA editing enzyme-deicient mice are viable despite alterations in lipoprotein metabolism. Proc Natl Acad Sci USA 1996; 93:7154-7159. 64. Blanc V, Henderson JO, Newberry EP et al. Targeted deletion of the murine apobec-1 complementation factor (acf ) gene results in embryonic lethality. Mol Cell Biol 2005; 25:7260-7269. 65. Muramatsu M, Sankaranand VS, Anant S et al. Speciic expression of activation-induced cytidine deaminase (AID), a novel member of the RNA-editing deaminase family in germinal center B-cells. J Biol Chem 1999; 274:18470-18476. 66. Muramatsu M, Kinoshita K, Fagarasan S et al. Class switch recombination and hypermutation require activation-induced cytidine deaminase (AID), a potential RNA editing enzyme. Cell 2000; 102:553-563. 67. Revy P, Muto T, Levy Y et al. Activation-induced cytidine deaminase (AID) deiciency causes the autosomal recessive form of the hyper-IgM syndrome (HIGM2). Cell 2000; 102:565-575. 68. Bransteitter R, Pham P, Scharf MD et al. Activation-induced cytidine deaminase deaminates deoxycytidine on single-stranded DNA but requires the action of RNase. Proc Natl Acad Sci USA 2003; 100:4102-4107. 69. Pham P, Bransteitter R, Petruska J et al. Processive AID-catalysed cytosine deamination on single-stranded DNA simulates somatic hypermutation. Nature 2003; 424:103-107.

222

DNA and RNA Modii cation Enzymes

70. Martin A, Scharf MD. Somatic hypermutation of the AID transgene in B and non B-cells. Proc Natl Acad Sci USA 2002; 99:12304-12308. 71. Ramiro AR, Stavropoulos P, Jankovic M et al. Transcription enhances AID-mediated cytidine deamination by exposing single-stranded DNA on the nontemplate strand. Nat Immunol 2003; 4:452-456. 72. Sohail A, Klapacz J, Samaranayake M et al. Human activation-induced cytidine deaminase causes transcription-dependent, strand-biased C to U deaminations. Nucleic Acids Res 2003; 31:2990-2994. 73. Smith HC, Bottaro A, Sowden MP et al. Activation induced deaminase: the importance of being speciic. Trends Genet 2004; 20:224-227. 74. Liao W, Hong SH, Chan BH et al. APOBEC-2, a cardiac- and skeletal muscle-speciic member of the cytidine deaminase supergene family. Biochemical and Biophysical Research Communications 1999; 260:398-404. 75. Anant S, Mukhopadhyay D, Sankaranand V et al. ARCD-1, an apobec-1-related cytidine deaminase, exerts a dominant negative efect on C to U RNA editing. Am J Physiol Cell Physiol 2001; 281:C1904-1916. 76. Mikl MC, Watt IN, Lu M et al. Mice deicient in APOBEC2 and APOBEC3. Mol Cell Biol 2005; 25:7270-7277. 77. Ross NT, Mace CR, Miller BL. Biophysical analysis of the EPEC translocated intimin receptor-binding domain. Biochem Biophys Res Commun 2007; 362:1073-1078. 78. Ackers GK. Molecular exclusion and restricted difusion processes in molecular-sieve chromatography. Biochemistry 1964; 3:723-730. 79. Andrews P. he gel-iltration behaviour of proteins related to their molecular weights over a wide range. Biochem J 1965; 96:595-606. 80. Brar SS, Sacho EJ, Tessmer I et al. Activation-induced deaminase, AID, is catalytically active as a monomer on single-stranded DNA. DNA Repair (Amst) 2008; 7:77-87. 81. Dickerson SK, Market E, Besmer E et al. AID mediates hypermutation by deaminating single stranded DNA. J Exp Med 2003; 197:1291-1296. 82. Ta VT, Nagaoka H, Catalan N et al. AID mutant analyses indicate requirement for class-switch-speciic cofactors. Nat Immunol 2003; 4:843-848. 83. Quartier P, Bustamante J, Sanal O et al. Clinical, immunologic and genetic analysis of 29 patients with autosomal recessive hyper-IgM syndrome due to activation-induced cytidine deaminase deiciency. Clin Immunol 2004; 110:22-29. 84. Zhu Y, Nonoyama S, Morio T et al. Type two hyper-IgM syndrome caused by mutation in activation-induced cytidine deaminase. J Med Dent Sci 2003; 50:41-46. 85. Teng BB, Ochsner S, Zhang Q et al. Mutational analysis of apolipoprotein B mRNA editing enzyme (APOBEC1). Structure-function relationships of RNA editing and dimerization. J Lipid Res 1999; 40:623-635. 86. Lau PP, Zhu HJ, Baldini A et al. Dimeric structure of a human apolipoprotein B mRNA editing protein and cloning and chromosomal localization of its gene. Proc Natl Acad Sci USA 1994; 91:8522-8526. 87. Sowden MP, Ballatori N, Jensen KL et al. he editosome for cytidine to uridine mRNA editing has a native complexity of 27S: identiication of intracellular domains containing active and inactive editing factors. J Cell Sci 2002; 115:1027-1039. 88. Chiu YL, Soros VB, Kreisberg JF et al. Cellular APOBEC3G restricts HIV-1 infection in resting CD4+ T-cells. Nature 2005; 435:108-114. 89. Kozak SL, Marin M, Rose KM et al. he anti-HIV-1 editing enzyme APOBEC3G binds HIV-1 RNA and messenger RNAs that shuttle between polysomes and stress granules. J Biol Chem 2006; 281(39):29105-29119. 90. Opi S, Kao S, Goila-Gaur R et al. Human immunodeiciency virus type 1 Vif inhibits packaging and antiviral activity of a degradation-resistant APOBEC3G variant. J Virol 2007; 81:8236-8246. 91. Goila-Gaur R, Khan MA, Miyagi E et al. HIV-1 Vif promotes the formation of high molecular mass APOBEC3G complexes. Virology 2008; 372:136-146. 92. Opi S, Takeuchi H, Kao S et al. Monomeric APOBEC3G is catalytically active and has antiviral activity. J Virol 2006; 80:4673-4682. 93. Chelico L, Pham P, Calabrese P et al. APOBEC3G DNA deaminase acts processively 3ʹ→5ʹ on single-stranded DNA. Nat Struct Mol Biol 2006; 13:392-399. 94. Iwatani Y, Takeuchi H, Strebel K et al. Biochemical activities of highly puriied, catalytically active human APOBEC3G: correlation with antiviral efect. J Virol 2006; 80:5992-6002. 95. Wedekind JE, Gillilan R, Janda A et al. Nanostructures of APOBEC3G support a hierarchical assembly model of high molecular mass ribonucleoprotein particles from dimeric subunits. J Biol Chem 2006; 281:38122-38126. 96. Conticello SG, Langlois MA, Neuberger MS. Insights into DNA deaminases. Nat Struct Mol Biol 2007; 14:7-9.

Mechanism of Action and Structural Aspects of ADARS

223

97. Chelico L, Sacho EJ, Erie DA et al. A model for oligomeric regulation of APOBEC3G cytosine deaminase-dependent restriction of HIV. J Biol Chem 2008; 283:13780-13791. 98. Chen KM, Martemyanova N, Lu Y et al. Extensive mutagenesis experiments corroborate a structural model for the DNA deaminase domain of APOBEC3G. FEBS Lett 2007; 581:4761-4766. 99. Karlin S. Presented at the Eleventh RA. Fisher Memorial Lecture, 1983 (unpublished). 100. Okazaki IM, Hiai H, Kakazu N et al. Constitutive expression of AID leads to tumorigenesis. J Exp Med 2003; 197:1173-1181. 101. Ramiro AR, Jankovic M, Eisenreich T et al. AID is required for c-myc/IgH chromosome translocations in vivo. Cell 2004; 118:431-438. 102. Okazaki IM, Kotani A, Honjo T. Role of AID in tumorigenesis. Adv Immunol 2007; 94:245-273. 103. Hardianti MS, Tatsumi E, Syampurnawati M et al. Activation-induced cytidine deaminase expression in follicular lymphoma: association between AID expression and ongoing mutation in FL. Leukemia 2004; 18:826-831. 104. Kleywegt GJ. Use of noncrystallographic symmetry in protein structure reinement. Acta Crystallogr D Biol Crystallogr 1996; 52:842-857. 105. Torelli AT. Doctoral Dissertation, University of Rochester School of Medicine and Dentistry 2008.

Chapter 17

Structure of RNA Editing Substrates and heir Recognition by RNA Base Deaminase Christophe Maris and Frédéric H.-T. Allain*

Abstract

R

NA editing occurs in humans by single base deamination, Cytidine-to-Uridine or Adenosine-to-Inosine. hese changes create codons for a diferent amino acid, stop codons or even new splice-site allowing protein diversity generated from a single gene. Despite the abundance of these modiications (especially A-to-I editing) and their importance for the regulation of gene expression, very little is known at present about the mechanism of RNA editing, the protein factors implicated and structural information in the ield has not been forthcoming. We are reviewing here the current structural and molecular knowledge of A to I editing by the ADAR family and C-to-U editing by APOBEC1 and ACF. We focus on the structures of the RNA substrates and how these structures are recognized speciically by the deaminases and their complementary factors. he mode of recognition of the two enzymatic systems is completely diferent. While ACF and APOBEC recognize the sequence of the RNA surrounding the editing site of ApoB-mRNA, ADARs recognize primarily the shape of the RNA rather than its sequence. his diference originates mainly from the diferent types of RNA binding domains used in the respective substrate recognition processes. Speciically, ACF contains three RRMs whereas ADARs contains two or three dsRBMs and in the case of ADAR1, two additional Z-DNA/RNA binding domains.

Introduction: RNA Editing

he published sequence of the human, mouse and rat genomes1 revealed a surprisingly small number of genes, estimated to be around 26000. Such a small number cannot fully account for the expected molecular complexity of these species and it is now well appreciated that such a complexity is likely to come from the multitude of protein variants created by alternative-splicing and editing of pre-mRNA.2 For example, the sole paralytic gene (a Drosophila sodium channel) can generate up to 1 million mRNA isoforms by combining its 13 alternative exons and its 11 known RNA editing sites.3 Moreover, alternatively spliced and edited mRNAs are particularly abundant in the neurons. he inely regulated population of the diferent isoforms of most neurotransmitter receptors, ion channels, neuronal cell-surface receptors and adhesion molecules ensure proper brain function. Any imbalance of the gene expression can impair neurological functions and lead to severe diseases such as brain cancer, schizophrenia or neuromuscular and neurodegenerative syndromes.2 *Corresponding Author: Frédéric H.-T. Allain—ETH Zurich, Institute of Molecular Biology and Biophysics, Schafmattstr.20, ETH Zurich, HPK G18, CH-8093 Zürich, Switzerland. Email: [email protected]

DNA and RNA Modiication Enzymes: Structure, Mechanism, Function and Evolution edited by Henri Grosjean. ©2009 Landes Bioscience.

Structure of RNA Editing Substrates and heir Recognition by RNA Base Deaminase

225

RNA editing is a postranscriptional modiication of pre-mRNA.4 Editing occurs via insertion or deletion of poly-U sequence (seen in Trypanosome mitochondria), or via a single base conversion by deamination, Cytidine to Uridine or Adenosine to Inosine (seen from protozoa to man). hese changes can create a codon for a diferent amino acid, a stop codon or even a new splice-site allowing protein diversity to be created from a single gene. We are reviewing here the current structural and molecular knowledge of RNA editing by base deamination, namely A-to-I editing by the ADAR family of enzymes and C-to-U editing by APOBEC1 and ACF (Apobec1 complementation factor). We are focusing here on the structures of the RNA substrates and how these structures are recognized speciically by the RNA binding domains present in these deaminases or their complementary factors. For detailed information regarding the function of APOBEC1 and ADARs please refer to the chapter by Smith and the chapter by Heale and O’Connell, respectively and for the structural, kinetic and mechanistic aspects of deaminases, please refer to the chapter by Wedekind and Beal.

Adenosine to Inosine Editing by ADARs: Mechanism of Substrate Recognition RNA Editing Substrate: Secondary and Tertiary Structures RNA Editing Substrate: What Makes a Good RNA Editing Substrate?

Adenosine deaminases that act on RNA (ADARs) convert adenosine to inosine (A-to-I) by hydrolytic deamination in cellular and viral RNA transcripts containing either perfect or imperfect regions of double-stranded RNA (dsRNA5,6). ADARs are present from worm to man. In mammals, two functional enzymes (ADAR17 and ADAR28-10) and one inactive enzyme (ADAR311,12) have been characterized. In C. elegans, two active ADARs (Ce ADAR1 and ADAR213) have been found while in D. melanogaster, only one dADAR14 was found. A-to-I modiication is nonspeciic within perfect dsRNA substrates, deaminating up to 50% of the adenosine residues.15,16 he nonspeciic reaction occurs as long as the double-stranded architecture of the RNA substrate is maintained since ADARs unwind dsRNA by changing A⋅U base-pairs to I⋅U mismatches.17,18 Such modiications can modulate gene silencing triggered by intramolecular structures in mRNA,19 nuclear retention of RNA transcripts,20 or antiviral responses by extensive modiication of viral transcripts.21 he majority of nonselective editing occurs in untranslated regions (UTRs) and introns where large regular duplexes are formed between inverted repeats of ALU and LINE (Long Interspersed Nucleotides Element in primates) or SINE domains (Small Interspersed Nucleotides Elements found in mouse).22,23 It is estimated that this constitutes about 15000 editing events in about 2000 human genes. he biological function of this major A-to-I editing event is not fully understood yet.24 A-to-I editing can also be highly speciic within imperfect dsRNA regions in modifying a single or limited set of adenosine residues.5,6 Selective editing within pre-mRNAs has been shown to afect the primary amino acid sequence of the resultant protein therefore producing multiple isoforms from a single gene. For example, editing by ADARs produced functionally important isoforms of numerous proteins involved in synaptic neurotransmission, including ligand and voltage-gated ion channels and G-protein coupled receptors (see chapter by Heale and O’Connell et al). he pre-mRNA encoding the B-subunit of the α-amino-3-hydroxy-5-methyl-4-isoxazole propionic acid (AMPA) subtype of glutamate receptor (GluR-B) is probably the most extensively studied mRNA editing substrate.25 It is edited at multiple sites and one of these locations is the R/G site, where a genomically-encoded AGA is modiied to IGA, resulting in an arginine-to-glycine change (the ribosome interprets I as G due to its similar base-pairing properties). he R/G site of the GluR-B pre-mRNA is oten used as a model system for A-to-I editing studies as it forms a small and well conserved 70 nucleotide stem-loop containing three mismatches,26 referred to as the R/G stem-loop (Fig. 1A). More recently, speciic editing of many pri-miRNAs, pre-miRNA and miRNAs have been discovered suggesting a crosstalk between the RNA editing and RNA interference machineries.24,27 MicroRNA editing can regulate miRNA expression by afecting pri-microRNA processing and pre-miRNA.28,29 MiRNA editing can also afect gene targeting when the seed sequence

226

DNA and RNA Modii cation Enzymes

Figure 1. A) SECONDARY structures of various ADAR editing substrates.8,30,36,38,42,119 B) Structure of the Zα domain of ADAR1 in complex with a Z-RNA (CG)3.51 C) Structure of the Zβ domain from ADAR1 in its free state.52 D) Structures of the two dsRBMs of ADAR2 in their free state.43 Note the difference in the orientation of α-helix1 in the two dsRBMs.

Structure of RNA Editing Substrates and heir Recognition by RNA Base Deaminase

227

of the miRNA is edited. his later editing event allows an extension of the number of genes targeted by the miRNAs.30 What characterizes a speciic A-to-I RNA editing site is a major and long standing question in the ield. It is clear that the targeted adenines must be embedded in an RNA stem and that the sequence around the adenine has a major efect on the level of editing. ADAR1 and ADAR2 have a preference for a A = U > C > G at the 5ʹ of the edited adenosine15,31 and ADAR2 has also a preference for a G = U > C = A at the 3ʹ of the adenosine.31 hese initial preference rules were further conirmed and optimized in subsequently discovered targets.28,32 he nucleotide base-pairing with the adenosine can drastically inluence editing, with a preference for a cytidine (creating an AC mismatch like in the GluR-B R/G site, Fig. 1)22,23,32-34 over a uridine (like in the GluR-B Q/R site, Fig. 1). Purines are not favored and a guanosine in some case can severely impair editing.34-36 More generally, the presence of RNA mismatches, bulges, internal loop or hairpin loops is very frequent in RNA substrates with a speciic editing site (Fig. 1). hese secondary structure elements are conserved highly during evolution26,37,38 indicating that the RNA structure as much as the sequence determines the RNA editing speciicity.26,35,37,39-41 Clear similarities in secondary structures between several RNA editing substrates can be observed already (Fig. 1), although no clear rules could be inferred yet to distinguish unambiguously an editing site from a non-editing site.

Structure of RNA Editing Substrate

Structural information on A-to-I RNA editing substrates has been limited so far to the GluR-B R/G site. he solution structure was determined of the central region of the human R/G stem-loop pre-mRNA containing a GCU(A/C)A pentaloop.42 Quite surprisingly, the loop showed a rigid structure and revealed a pentaloop fold, novel for its time. he fold is stabilized by a complex interplay of hydrogen-bonds and stacking interactions (Fig. 1). he structure of GCUAA pentaloop explains well the phylogenetic conservation of GCUMA (where M is A/C).26 he UNCG tetraloops and the GCUAA pentaloop are structurally similar. his is particularly interesting considering that the pre-mRNA encoding the R/G site of subunit C of the glutamate receptor that is also speciically edited by ADAR2 has a UCCR tetraloop.26 When the size of the GCUAA pentaloop is changed or the loop is deleted, the level of editing is reduced43 indicating that this structural element plays an important role in the recognition processes of ADAR2 (see below). he role of the loop was subsequently conirmed by using a high throughput method.44

ADARs: RNA Binding Domains and Substrate Recognition

Like many RNA-binding proteins, ADARs display a modular domain organization, containing from one to three tandem copies of dsRBMs.45,46 Speciically, ADARs harbor these dsRBMs in their N-terminal region, with the adenosine deaminase domain in the C-terminal portion.47 Mammalian ADAR1 is particularly unique because it contains two copies of a Z-DNA binding domain in its N-terminal part as well.48 Both types of RNA binding domains are likely to play a role in the target selectivity; their structures and their mode of RNA recognition are described below.

he Structure of ADAR1 Z-DNA Binding Domains and heir Substrate Recognition

Among the members of ADAR protein family, the mammalian ADAR1 is unique in containing two copies of a Z-DNA/RNA binding domain, Zα and Zβ. Z-DNA binding domains are found in several proteins that participate in the interferon response pathway or in viral proteins that inhibit this pathway.48 Although the role of the Z-DNA binding domains in RNA editing in vivo is still not clear, it was shown that A-to-I editing by ADAR1 is substantially increased in a dsRNA substrate containing a Z-RNA compared to a dsRNA without a Z-RNA (80% versus 60%49). Moreover, a recent cocrystal structure of the Zα domain of ADAR1 in complex Z-RNA (CG)3 revealed the molecular basis of this recognition (Fig. 1B).50 he Zα domain recognizes Z-RNA in a manner similar to Z-DNA.51 Two copies of Zα are bound to the Z-RNA with each domain contacting one strand of the RNA. It is the unusual sugar-phosphate conformation of the let-handed helix of the Z-RNA that is recognized by ADAR1 Zα, since all protein-RNA interactions are directed toward

228

DNA and RNA Modii cation Enzymes

the sugar-phosphate backbone of the Z-RNA. he crystal structure of the Zβ domain of ADAR1 was also determined in its free state (Fig. 1C).52 It reveals an αβ-fold like the Zα domain. Although Zβ is not identical to Zα because it contains an additional α-helix formed in its C-terminus. Some structural diferences could explain why Zβ does not bind Z-DNA/RNA. As such, the functional role of Zβ remains to be elucidated.

he Structure of ADAR2 dsRBMs and heir Substrate Recognition

he dsRBMs of ADARs appears to play an important role in modulating the editing selectivity of ADARs.53-55 he dsRBM is a 70-75 amino-acid domain found in many eukaryotic proteins with diverse functions including RNA interference, microRNA biogenesis and gene regulation, RNA transport, RNA processing and of course RNA editing.45,46,56 he structures of several dsRBMs have been determined56,57 revealing a highly conserved αβββα protein topology in which the two α-helices are packed along a face of a three-stranded anti-parallel β-sheet. Furthermore, structures of the dsRBMs from Xenopus laevis RNA-binding protein A (Xlrbpa2),58 Drosophila Staufen protein,59 and Aquifex aeolicus RNase III,60 in complex with nonnatural synthetic dsRNA substrates have been determined. One dsRBM structure, that of Rnt1p (an RNase III homologue from budding yeast), was determined in complex with its natural RNA substrate (dsRNA capped by an AGAA tetraloop).61 hese structures revealed not only how dsRBMs can bind any dsRNA, regardless of base composition, but also how structure-speciic recognition of RNA hairpins is achieved. While the enzymatic activity of ADARs and their biological role(s) have extensively been studied,5,6 the determinants that control site-selective RNA modiication are poorly understood. Swapping of the dsRBMs between ADAR1 and ADAR234 do not change the ability of the enzymes to eiciently and accurately process their RNA substrates implying that the editing site speciicity comes from the catalytic domains. In contrast, several biochemical studies suggested that ADAR dsRBMs possess not only a dsRNA-binding ainity but also an RNA-binding speciicity. Indeed, when the dsRBMs of PKR were fused with the ADAR1 deaminase domain, the chimeric protein was able to edit a perfect dsRNA, but none of the well-characterized editing substrates like the GluR-B R/G site, indicating that the dsRBM of ADAR1 and more particularly dsRBM3 are essential to edit RNA selectively.62 In addition, a study using footprinting techniques indicated that the dsRBMs of ADAR2 and PKR bind the GluR-B Q/R editing site in a diferent manner suggesting that dsRBMs from two diferent proteins might have diferent binding speciicities.53 he structure of the two ADAR2 dsRBMs was determined using NMR spectroscopy.43 As expected both dsRBMs adopt the αβββα topology in which the two α-helices are packed along the face of a three-stranded anti-parallel β-sheet (Fig. 1D) and are separated by an unstructured interdomain linker. However, the structures of the domains are not identical, particularly at their RNA binding surfaces. he α-helices of position 1 in the respective dsRBMs have diferent lengths, solvent exposed residues and are positioned slightly diferently relative to the other secondary structure elements. In comparison to other dsRBMs, ADAR2 dsRBM1 and dsRBM2 difer from the canonical dsRBM fold like the ones of Xlrbpa258 and Aquifex aeolicus RNase III.60 Interestingly, ADAR2 dsRBM1 resembles the dsRBM of Rnt1p63 albeit it lacks α-helix 3, an additional element that imposes the conformation of “recognition” α-helix 1 in the dsRBM of Rnt1p. ADAR2 dsRBM2 appears to be unique among other members of the dsRBM family. his structural difference in the relative orientation of α-helix 1 may be functionally important as it is oten a key element that modulates the RNA-binding speciicity of dsRBMs.46,59,61 To understand the role of ADAR2 dsRBMs in editing of the GluR-B stem-loop, NMR footprint experiments were performed showing that ADAR2 dsRBM1 contacts the RNA pentaloop, whereas dsRBM2 recognizes the stem containing the two A⋅C mismatches.43 Based on these indings a molecular model of the ADAR2 dsRBMs bound to the R/G stem-loop was generated (Fig. 2A).43 he binding preference of ADAR2 dsRBM1 for the stable GCU(A/C)A pentaloop is reminiscent of Rnt1p dsRBM structure-speciic recognition of an AGNN tetraloop61 and to a lesser extent of Staufen dsRBM3 bound to a stem-loop capped by a UUCG tetraloop.59 Interestingly, all three RNA loops have common structural features,64 suggesting that dsRBMs prefer RNA stem-loops over regular RNA duplexes more generally than previously expected. hese structural indings were

Structure of RNA Editing Substrates and heir Recognition by RNA Base Deaminase

229

Figure 2. A) ATOMIC model of ADAR2 in complex with the GluR-B R/G site based on the NMR study the ADAR2 dsRBMs in complex with this RNA target.43 DsRBM1 recognizes the RNA pentaloop while dsRBM2 bind near the editing site. The deaminase domain structure is shown as well in blue.47 B) Scheme of the sequence of recognition events leading to the editing of the GluR-B R/G site by ADAR2 based on structural and molecular biology studies on this system (see the text for details).

230

DNA and RNA Modii cation Enzymes

consistent with several biochemical experiments showing that ADAR2 forms multiple nonspeciic complexes when bound to the R/G stem-loop lacking mismatches39 resulting in a dramatically reduced editing eiciency and selectivity at the R/G site.35 More generally, this structural study43 suggests that the dsRBMs of ADAR2 appear to recognize preferentially certain structural elements (the pentaloop and the mismatches) of the R/G stem-loop rather than its sequence explaining why the secondary structure of the R/G stem-loop is very well conserved.26 he molecular basis of these two recognition events needs to be further elucidated to understand the role played by the dsRBMs in selecting RNA editing sites.

A Mechanism for ADAR2 Editing?

It has been reported that ADAR activation involves RNA-dependent dimerization.65-69 However it is still controversial if this dimerization is RNA-dependant or not and if so which parts of the protein would be responsible for this dimerization.68-70 Furthermore, it was reported that the dsRBM1 of ADAR2 might “auto inhibit” editing if the substrate is too small to accommodate the binding of both dsRBMs.71 his observation suggests that a good ADAR2 RNA editing substrate needs to be suiciently large to allow two binding steps: a irst step to allow both dsRBMs to bind in order to alleviate dsRBM1 inhibition and a second step to allow the second monomer of ADAR2 to bind. Altogether, an ADAR editing substrate must obey four rules: the irst one is a sequence preference at the 5ʹ, 3ʹ and at the base opposite to the edited adenosine probably to accommodate the deaminase domain; the second rule is a secondary structure containing mismatches, bulges, internal loops or hairpin loops in order to be recognized by the dsRBMs; the third requirement is to be suiciently long to allow all dsRBMs to bind to prevent autoinhibition of the enzyme by dsRBM1 (in the case of ADAR2); and the fourth rule is to be suiciently long to allow binding of a second monomer of ADAR2. hese diferent constraints help explain why only a small subset of adenosines are selectively edited by ADARs. he scheme in Figure 2B recapitulates the sequence of events that could be envisaged for this target recognition in the case of ADAR2 editing of the GluR-B R/G site, the molecular basis of which still remains to be discovered. DsRBM1 would bind the apical pentaloop of the RNA stem-loop42,43 as dsRBM1 provides most of the RNA binding ainity for this substrate.69,71 In this manner, dsRBM2 and the deaminase domain are liberated to bind the substrate.71 DsRBM2 would then position itself near the editing site by recognizing the two AC mismatches.35,39,43 his would lead to a destabilization of the mismatch at the editing site but would not be suicient to open the mismatch.72 his positioning of dsRBM2 would then bring the deaminase domain near the editing site.73 Productive editing would then depend on dimerization66-70 and on the 5ʹ and 3ʹ sequence around the adenosine.31 he GluR-B R/G site allows such dimerization65 and has an ideal sequence around the editing site with a 5ʹ A, a 3ʹ G and a C opposite the edited adenosine, explaining the high level of editing in vivo and in vitro at this target. A discussion of the ensuing enzymatic mechanism of deamination is described in detail in the chapter by Wedekind and Beal. he sequence of events proposed here for the GluR-B R/G site by ADAR2 is likely to be diferent for ADAR1 considering that this enzyme binds this substrate more promiscuously35 and also for ADAR2 between diferent substrates since the relative importance of each ADAR2 dsRBMs for editing appears to vary widely.55

Cytidine-to-Uridine Editing of apoB mRNA

he editing of apoB100 apolipoprotein mRNA is a highly speciic reaction that selectively deaminates one cytidine at position 6666 to a uridine in a sequence >14000 nucleotides (See chapter by Smith). his RNA modiication transforms the genomically encoded glutamine codon 2153 (CAA) into a premature termination codon (UAA) that leads to translation of the carboxy-terminal truncated apoB48 protein.74,75 his event occurs in the nucleus coincident with and/or subsequent to pre-mRNA splicing and polyadenylation.76 he editing of apoB mRNA is catalyzed by a large 27S macromolecular complex or editosome that recognizes speciic cis-acting elements close to the editing site. In vitro, the minimal functional core comprises the homodimeric enzyme APOBEC1 (apoB editing catalytic subunit 1) and its binding partner the APOBEC1 complementation factor

Structure of RNA Editing Substrates and heir Recognition by RNA Base Deaminase

231

(ACF).77 Other proteins containing RNA binding domains or functioning as cochaperones have been identiied as components of the editosome.77,78 Little structural information is available on ACF and none on APOBEC1. However, structures of related proteins have been solved and provide important inputs to understand APOBEC1 and ACF functions. We will review here the structural and biochemical information that has been obtained on the cis-acting elements of apoB mRNA and on the main trans-acting factors identiied with a particular emphasis on ACF and APOBEC1 (a member of the zinc dependent deaminase family described in the chapter by Wedekind and Beal). We will conclude by a potential model explaining how APOBEC1 in concert with ACF recognizes and edits its RNA substrate.

he Structure ApoB mRNA Stem-Loop Substrate: Cis-Acting Element Features

he high speciicity of the apoB mRNA editing is associated with the presence of an important cis-acting element called the “mooring sequence” located a few nucleotides downstream of the editing site.79-82 he other cis-acting elements are the “spacer element” just upstream of the mooring sequence, the AU-rich “eiciency sequence” and the two 5ʹ (6609-6628) and 3ʹ (6717-6747) distant sequences that all modulate the yield of the editing reaction.83 he minimal human apoB mRNA sequence competent for editing contains the four main cis-acting elements (from 6656 to 6682): the editing site, the mooring sequence, part of the eiciency sequence and spacer (Fig. 3A).81,84 his short RNA stretch folds into a stem-loop whose NMR structure was determined.85 In the structure, the target cytidine at the position 6666 is found to be sandwiched between two adenosines at the 5ʹ side of an octa-loop (Fig. 3B). his stacking would prevent a direct access of APOBEC1 to the C6666 amino group buried into the loop at physiological temperature. he uridine resulting ater deamination adopts the same conformation.85 he mooring sequence, although it is mostly base-paired with the eiciency sequence, shows some lexibility (Fig. 3C) with the two irst nucleotides U and G that closes the 3ʹ end of the octa-loop being disordered and lexible (the guanosine being in exchange between cis and anti conformations).85 he guanosine G6677 positioned in the middle of the mooring sequence disrupts the A-form helix, resulting in a dynamic internal loop, which confers moderate lexibility to both the eiciency and mooring sequences. he last element is the AAUU spacer, which is part of the consensus binding site of APOBEC1 (UUUN(A/U)U) that overlaps partially the 5ʹ end of the mooring sequence.86 he two adenosines stack over the editing site whereas the uridines are highly dynamic. his overall lexibility might play a critical role in the process of RNA recognition by the APOBEC1 and ACF.

APOBEC1 Complementation Factor, a Modular hnRNP Trans-Acting Factor

APOBEC1 complementation factor is a 64.3 kDa protein (ACF64, isoform 1 or ACF) that encodes in its N-terminus three RNA recognition motifs (RRMs) followed by an arginine-glycine rich region (RG) and a C-terminal double stranded RNA binding motif (dsRBM) (Fig. 3D). ACF is widely expressed in several tissues and plays a crucial role in cell survival. ACF is predominantly nuclear but shuttles to the cytoplasm upon metabolic changes.87 ACF (isoform 1) has four other alternatively spliced variants called ACF65 (isoform 2), ACF64 (isoform 3), ACF45 and ACF43. Isoforms 3 is distinguished from isoform 1 only by its irst 42 residue at the N-terminus. ACF65 (called also ASP standing for APOBEC1 Stimulating Protein) difers by only eight additional amino acids from residue 381 (EIYMNVPV) just before the RG rich region. his additional sequence constitutes a tyrosine phosphorylation site that might alter complementation activity and/or subcellular localization of ACF65.88 Both ACF45 and ACF43 are C terminally truncated variants lacking the dsRBD region. ACF45 difers from ACF43 in containing the RG rich region (they encode until residue 405 and residue 383, respectively). ACF45 and ACF43, which down regulate editing activity, are expressed only in liver and small intestine cells whereas ACF65 and ACF64 are present in multiple tissues.89 he dsRBM is important but not essential since its deletion does not completely abolish complementing activity or binding to apoB mRNA.90,91 he RG rich region confers RNA binding ainity but not speciicity.92,93 It is the presence of the RRMs that gives rise to highly speciic editing of apoB mRNA through recognition of the mooring sequence. Below, we review the binding properties of RRMs and suggest a role for the three RRMS of ACF in the editing reaction.

232

DNA and RNA Modii cation Enzymes

Figure 3. A) SECONDARY structure of the 31 nucleotide sequence of the human apoB mRNA containing the editing site C6666.85 B) Ensemble of the 20 conformers of the stem-loop of ApoB-mRNA containing the editing site C6666.85 C) NMR structure that suggests the flexibility of the internal loop G77-A60-U78 in apoB mRNA. D) Domain composition of ACF domains and indication of the regions interacting with APOBEC1 and apoB mRNA.90,91 E) RNP1 and RNP2 motifs and structure of hnRNPA1 RRM 2 (a typical RRM fold120). Scheme of the β -sheet annotated with the conserved RNP 1 and RNP 2 aromatic residue positions numbered according to each RNP sequence numbering. The conserved aromatic residues are highlighted by a green circle. F) Predicted structures of the RRM1, RRM2 and RRM3 of ACF using online PHYRE software,96 the N-terminal proline-rich region potentially interacting with APOBEC1 is shown in bold. The aromatic residues of RNP1 and RNP2 are displayed in green as well as additional solvent exposed aromatic residues highlighted in blue.

Structure of RNA Editing Substrates and heir Recognition by RNA Base Deaminase

233

RNP Fold Features and Implications for ACF as a Trans-Acting Factor

he RRM fold is a αβ sandwich structure with a β1α1β2β3α2β4 topology (Fig. 3E). he RNP1 and RNP2 (RiboNucleoParticle) motifs are the sequence signature of this domain and are located in the central strands of the β-sheet namely β3 and β1, respectively. he β-sheet is the primary RNA binding surface while the N and C termini and the loops (1, 3 and 5) confer RNA sequence- speciicity.94,95 In order to visualize the RRM features of ACF, we ran the online PHYRE sotware96 that proposed a structural prediction for a speciic RRM (Fig. 3F). he structure of ACF RRM3 has been solved by the RIKEN Structural Genomics/Proteomics Initiative (RSGI) and is similar to the predicted one (PDB ID 2cpd). All four conserved residues namely RNP 1 positions 1, 3 and 5 and RNP 2 position 2 (Fig. 3E) are present in each of the RRMs of ACF. he conservation of these canonical positions suggests that RNA is likely to bind the β-sheet (Fig. 3E and 3F). Each RRM has one additional aromatic residue that is solvent exposed, which could contribute to RNA binding as well, namely his272 in β3 of RRM3, tyr169 in β4 of RRM2 and phe91 in loop 3 of RRM1. Loops 3 of RRM1 and RRM2 are unusually long (8 and 10 residues respectively instead of commonly 4-6 residues) and could provide an extended RNA binding surface. Another unusual feature is that RRM2 contains a tryptophan (trp207) located in loop 5. Interestingly, Blanc et al97 mentioned that the APOBEC1 interacting domain of ACF contains at least RRM2 and the mutant ACFΔ55, which lacks residues from 203 to 257, fails to bind APOBEC1. Trp207 absent in this last mutant could mediate this interaction with APOBEC1 as observed in the structural model of PTB RRM2 in complex with Raver1 peptide. It is a tyrosine instead of a tryptophan that mediates part of the contacts with the proline-rich region of Raver1. he three domains can form a large RNA binding platform as they are separated by relatively short interdomain linkers (Fig. 3F) In summary, typical features for RNA binding are present in the three RRMs of ACF. he topology of three consecutive RRMs potentially allows a large RNA recognition surface for the mooring sequence and additional cis-acting elements. Additionally, RRM2 could eventually mediate part of the interaction of ACF with APOBEC1 via trp207 in loop5.

ACF, a Single Stranded RNA Binder of the Mooring Sequence

Full-length ACF binds single-stranded apoB mRNA (280 nts) with a low dissociation constant Kd of 8 nM. he N-terminal part including the RRM and RG rich region contributes mainly to this high ainity (Kd of 30 nM). hese two regions might work cooperatively when bound to RNA since the ainity drops drastically when one of these regions is mutated or deleted (Kd > 1000 nM). However, only the N-terminal region containing the RRMs can bind speciically RNA. Indeed, a single point mutation of the aromatic residue either at position 2 of RNP2 or at position 5 of RNP1 to an alanine in each domain reduces the ainity for a 280 nt segment of apoB mRNA by two orders of magnitude except in RRM3 where, both mutants show only a 13 fold reduction in ainity (Fig. 3E and 3F).91 hese results suggest also that RRM1 and RRM2 could work as a tandem unit to recognize a long RNA stretch like Hpr1 that recognizes six consecutive nucleotides (AUAUAU)98 whereas RRM3 might bind an other RNA stretch independently. In this coniguration, the inter-domain linker of RRM1 and RRM2 and the C-terminal extension of RRM3 might be important for RNA binding. Among the cis-acting elements of apoB mRNA, the mooring sequence and its complementary sequence have been identiied as the primary ACF binding sites.85,99 his RNA recognition does not occur within a regular A-form helix but rather in the context of a lexible stem containing a bulge and/or internal loop.85,91 An NMR study showed that ACF RRMs have the ability to bind and to denature the apoB mRNA stem-loop harboring the embedded editing site, the eficiency sequence and the mooring sequence.85 However, the exact contribution of each RRM for the recognition of both the mooring sequence and its complementary strand remains to be determined. In melting the RNA stem-loop, ACF would make accessible the amino group of C6666 for deamination by APOBEC1, which is otherwise stacked between two adenosines in the protein-free form.85

234

DNA and RNA Modii cation Enzymes

Figure 4. A) DOMAINS organization of the main trans-acting factors of the C-to-U editosome. The binding properties of the different proteins for the mRNA, APOBEC1 or ACF are indicated in the second column.78,99,101,103,107,121,122 B) Sequence alignment between APOBEC2 and APOBEC1. Identical residues are shown in red and similar ones are shown in green. The prolines unique to APOBEC1 are indicated in bold. A star highlights the residues of the NLS (nuclear localization signal) and NES sequences (nuclear export signal). C) Mapping of the conserved residues colored in red on APOBEC2 structure.

Structure of RNA Editing Substrates and heir Recognition by RNA Base Deaminase

235

Other Trans-Acting Factors

Besides ACF conferral of speciicity to APOBEC1, several other proteins regulate apoB mRNA editing activity by multiple protein-protein and/or protein-RNA interactions. hey contain RNA binding domains of the RRM or KH types suggesting an RNA binding role while others could play a role of cochaperone in the assembly of the editosome (Fig. 4A). GRY-RBP (glycine-arginine-tyrosine-rich RNA binding protein) is an alternatively spliced form of NSAP1 belonging to the hnRNP Q family.100 It contains 3 RRMs that are closely related to ACF (51% identity in the RRM region) and preceded by an acidic domain (AcD). It has been shown that AcD interacts with APOBEC1 and this interaction is inhibited when the AcD is phosphorylated. CUGBP2 (CUG triplet repeat RNA binding protein 2) contains exclusively RRMs. Between RRM1 and RRM2, there is a short sequence and a long linker between RRM2 and RRM3 suggesting that RRM3 might bind independently from the others (Fig. 4A). GRY-RBP and CUGBP2 are two inhibitors of apoB-mRNA editing that can bind apoB mRNA, ACF and APOBEC1.99,101 CUGBP2 binds apoB mRNA, speciically the AU-rich sequence located immediately upstream of the edited cytidine whereas GRY-RBP shows low binding speciicity for the regions lanking the editing site. GRY-RBP and CUGBP2 have a diferent RNA binding speciicity recognizing U- and AU-rich sequences, respectively. Interestingly, both down regulate C to U RNA editing in a dose dependent manner with one diference: the CUGBP2 inhibiting efect is abolished either by adding more ACF or APOBEC1 in an in vitro editing assay, whereas, ACF alone can rescue the editing reaction in presence of GRY-RBP, which binds to and sequesters ACF.99 APOBEC1 and ACF keep control of the inhibition efect of GRY-RBP and CUGBP2 by their nucleo-cytoplasmic distribution. Indeed, GRY-RBP when cotransfected either with ACF or APOBEC1 colocalizes with both in the nucleus whereas CUGBP2 colocalizes either in the nucleus with ACF or in the cytoplasm with APOBEC1.99,101 ABBP1 (APOBEC1-binding protein 1) and hnRNP C1 contain also RRMs but do not interact with ACF (Fig. 4A). Both are nuclear proteins and could be components of the 27S editosome particle. ABBP1 (APOBEC1-binding protein 1) seems to enhance apoB mRNA editing. ABBP1 contains two RRMs separated by only one amino-acid that bind apoB mRNA.102 Its C-terminal part contains glycine- and glycine-tyrosine-rich regions that interact with APOBEC1. In contrast, hnRNP C1, which binds to both APOBEC1 and apoB mRNA, is a strong inhibitor of editing.103 HnRNP C1 contains one RRM that recognizes poly-U stretches and a tetramerization domain.103 KSRP (KH-type splicing regulatory binding protein), which was discovered irst as splicing activator, was identiied at the same time as ASP (equivalent of ACF).104 It strongly cross-linked with apoB mRNA but its function in editing has not been yet clariied (Fig. 4A). he editosome is present in the cytoplasm predominantly as an inactive 60S particle that can dissociate under metabolic stimuli into an active 27S complex competent for apoB mRNA editing.88,105 he assembly and disassembly of the editosome appears to be also under the control of protein chaperones. Two such chaperones have been identiied, namely ABBP2/HEDJ (APOBEC1-binding protein 2/human endoplasmic reticulum associated DNAJ) and BAG4/ SODD (Bcl-2-associated anthogene 4/silencers of death domains) (Fig. 4A). ABBP2 stimulates editing whereas BAG4 represses it.106 ABBP2 binds APOBEC1 via its DnaJ domain and its neighboring G/F-rich region.107 BAG4 interacts with APOBEC1 via its N-terminal proline/ glycine-rich region independently of the α-helical BAG domain.78 his ensemble of trans-acting factors identiied to date show the multiple regulatory facets of the apoB mRNA editosome. he editing machinery is controlled either directly by favoring or disrupting one of the interactions between apoB mRNA, ACF and APOBEC1 or indirectly by modifying the subcellular distribution of the diferent components of the editosome.

APOBEC1 and ACF, a Tandem Unit to Speciically Edit apoB mRNA

APOBEC1, which stands for (apoB-editing catalytic subunit 1), is a zinc dependent cytidine deaminase of 27 kDa highly conserved in mammals and is catalytically active as a dimer (chapters by Smith and Wedekind and Beal). APOBEC1 is a low abundance protein expressed only in the

236

DNA and RNA Modii cation Enzymes

small and large intestines. It is an important determinant of lipoprotein metabolism via apoB48/ apoB100 expression and shuttles between the nucleus and the cytoplasm via both its nuclear localization (NLS) and nuclear export (NES) signals located at the N and C termini, respectively.108 It has been shown that APOBEC1 could also transport ACF, therefore restricting the access of APOBEC1 to speciic sites of apoB mRNA. he recently solved structures of the two homologues APOBEC2 and APOBEC3G provide a signiicant advance for better understanding the function of APOBEC1.109-111 APOBEC1 shares 21% sequence identity with its APOBEC2 homologue and 35% similarity and should fold similarly except for the dimerisation domain unique to APOBEC1 (Fig. 4B).112-114 When we map the APOBEC1 conserved residues onto the APOBEC2 structure (Fig. 4C), we found them located as expected around the catalytic pocket but also more surprisingly in the α5 helix, which packs against the β-sheet. APOBEC1 is composed of a cytidine deaminase domain (CDA) followed a C-terminal part (APOBEC_C), which extends the fold of the CDA domain. he CDA domain folds into a αβ sandwich with the topology α1β1β2α2β3α3β4 as shown in blue in APOBEC2 structure (Fig. 4C). he conserved motif H(AV)E-X (24-36)-PCxxC (where x is any amino acid) is the signature of the domain for a zinc dependent deaminase.115 he N-terminal parts of the two helices α2 and α3 hold the catalytic pocket. he APOBEC_C extension of 60 amino acids folds into a α4β5α5α6 topology extending the β-sheet surface as shown in APOBEC2 structure in Fig. 4C.109-111 APOBEC1 has unique features compared to its homologues conferring speciic binding properties for RNA and protein. As discussed previously, APOBEC1 can bind a multitude of cofactors and in particular RRM2 of ACF. It might fulill this function via its unusual high number of prolines located at its N and C termini (12 in total, Fig. 4B). Prolines can be structurally important to induce hinges like the prolines P190 and P191 shown to be essential for proper function,114 but also are known to mediate protein-protein interactions.116 Even though ACF guides APOBEC1 close to the editing site, APOBEC1 recognizes also speciically the RNA sequence located downstream of the edited cytidine that contains its binding consensus (UUUN(A/U)U).86 his RNA stretch (around ten nucleotides) requires a large RNA binding platform for APOBEC1. he regions involved in RNA binding are the N-terminus that contains several positively charged residues essential for editing (R15R16R17 and R33K34), the C terminus and the region surrounding the catalytic pocket.114,117 Additional regions of APOBEC1 or the other monomer might be required to accommodate such a long RNA stretch.

A Mechanistic Model for ApoB mRNA Editing

APOBEC1 and ACF represent the minimal complex that is suicient to speciically edit the apoB-mRNA substrate in vitro. In vivo, they belong to a large particle whose sedimentation coeicient ranges from 27S to 60S. Several trans-acting factors control the nucleo-cytoplasmic distribution and the catalytic activity of the editosome. A model describing the main steps of the nucleo-cytoplasmic translocation of the editosome and the key interactions between apoB mRNA, ACF and APOBEC1 is shown in Figure 5. he tandem APOBEC1/ACF unit assembles into a large 60S particle localized in the cytoplasm that maintains the editosome in an inactive state (Fig. 5A). Under various metabolic stimuli, the complex dissociates into an active 27S particle to be readily exported to the nucleus for apoB mRNA editing. APOBEC1 and ACF act in concert to deaminate speciically the cytidine at position 6666 embedded in a stem-loop containing the indispensable cis-acting elements (Fig. 5B). he mooring sequence anchors the RRMs of ACF downstream of the editing site and triggers the melting of the stem-loop (Fig. 5). he eiciency sequence might contribute to this event in binding the RRMs of ACF or of other cis-acting factors. he RG region and the dsRBD of ACF increases the RNA ainity probably by interacting with the 5ʹ and 3ʹ eiciency elements. ACF RRM2 could potentially recruit APOBEC1 by interacting with its N terminus bringing it near the editing site. APOBEC1 recognizes the spacer element that contains its binding consensus and docks the targeted cytidine in the active catalytic pocket (Fig. 5B). Ater the editing reaction, the editosome transports the edited substrate to the cytoplasm for translation (Fig. 5A).

Structure of RNA Editing Substrates and heir Recognition by RNA Base Deaminase

237

Figure 5. MODEL of apoB mRNA editing. A) Nucleo-cytoplasmic exchange of the editosome. B) Schematic representation of the interactions between APOBEC1, ACF and apoB mRNA.

Conclusions and Future Prospects

In reviewing the structural and biochemical knowledge that was obtained in recent years on A-to-I and C-to-U RNA editing of mRNA by ADARs and APOBEC1, respectively, one can see common features and clear diferences. In common between these two RNA modiication systems are the deamination reactions themselves involving in both cases a zinc atom at the catalytic site and, maybe more surprisingly, a common need for both enzymes to be active as a dimer, although only one of the two active sites will be used for the catalysis. he main diferences lie mostly in

238

DNA and RNA Modii cation Enzymes

how the substrates of the two RNA enzymes are recognized. When APOBEC1 requires a protein cofactor ACF containing three single-stranded RNA recognition motifs, ADAR2 used two to three dsRBMs that are present in the protein itself N-terminal to the deaminase domain. While the RRMs of ACF recognize the RNA sequence around the targeted cytidine of ApoB-mRNA, the dsRBMs of ADARs seem to recognize the structure of the target RNA (its mismatches and a rigid loop), to anchor the protein near the editing site. he dsRBMs of ADAR2 destabilize the editing site but do not unfold the RNA unlike the RRMs of ACF. hese two very diferent modes of RNA recognition explain on one hand the extreme selectivity of APOBEC1 (together with its cofactor ACF) for C6666 of ApoB-mRNA (based on the uniqueness of the RNA sequence around C6666 of apoB-mRNA) and on the other hand the rather more promiscuous but still selective mode of RNA binding by ADARs allowing editing of many more targets. In both systems, although considerable progress has been made over the last few years to understand the structural basis of mRNA editing, using X-ray crystallography,118 NMR spectroscopy43 or other methods, atomic resolution structures of both enzymes in complex with RNA is still lacking. his is urgently needed in order to fully understand the target selectivity of these disease-associated enzymes, to ind additional targets and to possibly help developing artiicial modifying enzymes that could be used in the future for therapeutic purposes.

Acknowledgements

Support for this work comes from the Swiss National Science Foundation (Nr. 3100A0-118118) and the SNF-NCCR structural biology. he authors are grateful to Joseph Wedekind and Peter Beal for critical reading of the manuscript.

References

1. Venter JC, Adams MD, Myers EW et al. The sequence of the human genome. Science 2001; 291(5507):1304-1351. 2. Wang Q, Zhang Z, Blackwell K et al. Vigilins bind to promiscuously A-to-I-edited RNAs and are involved in the formation of heterochromatin. Curr Biol 2005; 15(4):384-391. 3. Hanrahan CJ, Palladino MJ, Ganetzky B et al. RNA editing of the drosophila para Na(+) channel transcript. Evolutionary conservation and developmental regulation. Genetics 2000; 155(3):1149-1160. 4. Gott JM, Emeson RB. Functions and mechanisms of RNA editing. Annu Rev Genet 2000; 34:499-U434. 5. Bass BL. RNA editing by adenosine deaminases that act on RNA. Annu Rev Biochem 2002; 71:817-846. 6. Emeson RB, Singh M. Adenosine to inosine RNA editing: substrates and consequences. In: Bass BL, ed. RNA Editing: Frontiers in Molecular Biology. London: Oxford University Press, 2000:109-138. 7. Kim U, Wang Y, Sanford T et al. Molecular cloning of cDNA for double-stranded RNA adenosine deaminase, a candidate enzyme for nuclear RNA editing. Proc Natl Acad Sci USA 1994; 91(24):11457-11461. 8. Melcher T, Maas S, Herb A et al. A mammalian RNA editing enzyme. Nature 1996; 379(6564):460-464. 9. Lai F, Chen CX, Carter KC et al. Editing of glutamate receptor B subunit ion channel RNAs by four alternatively spliced DRADA2 double-stranded RNA adenosine deaminases. Mol Cell Biol 1997; 17(5):2413-2424. 10. Gerber A, O’Connell MA, Keller W. Two forms of human double-stranded RNA-speciic editase 1 (hRED1) generated by the insertion of an Alu cassette. RNA 1997; 3(5):453-463. 11. Melcher T, Maas S, Herb A et al. RED2, a brain-speciic member of the RNA-speciic adenosine deaminase family. J Biol Chem 1996; 271(50):31795-31798. 12. Chen CX, Cho DS, Wang Q et al. A third member of the RNA-speciic adenosine deaminase gene family, ADAR3, contains both single- and double-stranded RNA binding domains. RNA 2000; 6(5):755-767. 13. Tonkin LA, Saccomanno L, Morse DP et al. RNA editing by ADARs is important for normal behavior in caenorhabditis elegans. EMBO J 2002; 21(22):6025-6035. 14. Palladino MJ, Keegan LP, O’Connell MA et al. A-to-I pre-mRNA editing in drosophila is primarily involved in adult nervous system function and integrity. Cell 2000; 102(4):437-449. 15. Polson AG, Bass BL. Preferential selection of adenosines for modiication by double-stranded RNA adenosine deaminase. EMBO J 1994; 13(23):5701-5711. 16. Nishikura K, Yoo C, Kim U et al. Substrate speciicity of the dsRNA unwinding/modifying activity. EMBO J 1991; 10(11):3523-3532.

Structure of RNA Editing Substrates and heir Recognition by RNA Base Deaminase

239

17. Bass BL, Weintraub H. An unwinding activity that covalently modiies its double-stranded RNA substrate. Cell 1988; 55(6):1089-1098. 18. Wagner RW, Smith JE, Cooperman BS et al. A double-stranded RNA unwinding activity introduces structural alterations by means of adenosine to inosine conversions in mammalian cells and xenopus eggs. Proc Natl Acad Sci USA 1989; 86(8):2647-2651. 19. Tonkin LA, Bass BL. Mutations in RNAi rescue aberrant chemotaxis of ADAR mutants. Science 2003; 302(5651):1725. 20. Zhang Z, Carmichael GG. he fate of dsRNA in the nucleus: a p54(nrb)-containing complex mediates the nuclear retention of promiscuously A-to-I edited RNAs. Cell 2001; 106(4):465-475. 21. Wong TC, Ayata M, Ueda S et al. Role of biased hypermutation in evolution of subacute sclerosing panencephalitis virus from progenitor acute measles virus. J Virol 1991; 65(5):2191-2199. 22. Levanon EY, Eisenberg E, Yelin R et al. Systematic identiication of abundant A-to-I editing sites in the human transcriptome. Nat Biotechnol 2004; 22(8):1001-1005. 23. Athanasiadis A, Rich A, Maas S. Widespread A-to-I RNA editing of Alu-containing mRNAs in the human transcriptome. PLoS Biol 2004; 2(12):e391. 24. Nishikura K. Editor meets silencer: crosstalk between RNA editing and RNA interference. Nat Rev Mol Cell Biol 2006; 7(12):919-931. 25. Seeburg PH, Higuchi M, Sprengel R. RNA editing of brain glutamate receptor channels: mechanism and physiology. Brain Res Brain Res Rev 1998; 26(2-3):217-229. 26. Aruscavage PJ, Bass BL. A phylogenetic analysis reveals an unusual sequence conservation within introns involved in RNA editing. RNA 2000; 6(2):257-269. 27. Ohman M. A-to-I editing challenger or ally to the microRNA process. Biochimie 2007; 89(10):1171-1176. 28. Kawahara Y, Megraw M, Kreider E et al. Frequency and fate of microRNA editing in human brain. Nucleic Acids Res 2008; 36(16):5270-5280. 29. Kawahara Y, Zinshteyn B, Chendrimada TP et al. RNA editing of the microRNA-151 precursor blocks cleavage by the dicer-TRBP complex. EMBO Rep 2007; 8(8):763-769. 30. Kawahara Y, Zinshteyn B, Sethupathy P et al. Redirection of silencing targets by adenosine-to-inosine editing of miRNAs. Science 2007; 315(5815):1137-1140. 31. Lehmann KA, Bass BL. Double-stranded RNA adenosine deaminases ADAR1 and ADAR2 have overlapping speciicities. Biochemistry 2000; 39(42):12875-12884. 32. Riedmann EM, Schopof S, Hartner JC et al. Speciicity of ADAR-mediated RNA editing in newly identiied targets. RNA 2008; 14(6):1110-1118. 33. Blow M, Futreal PA, Wooster R et al. A survey of RNA editing in human brain. Genome Res 2004; 14(12):2379-2387. 34. Wong SK, Sato S, Lazinski DW. Substrate recognition by ADAR1 and ADAR2. RNA 2001; 7(6):846-858. 35. Kallman AM, Sahlin M, Ohman M. ADAR2 A—>I editing: site selectivity and editing eiciency are separate events. Nucleic Acids Res 2003; 31(16):4874-4881. 36. Ohlson J, Pedersen JS, Haussler D et al. Editing modiies the GABA(A) receptor subunit alpha3. RNA 2007; 13(5):698-703. 37. Dawson TR, Sansam CL, Emeson RB. Structure and sequence determinants required for the RNA editing of ADAR2 substrates. J Biol Chem 2004; 279(6):4941-4951. 38. Reenan RA. Molecular determinants and guided evolution of species-speciic RNA editing. Nature 2005; 434(7031):409-413. 39. Ohman M, Kallman AM, Bass BL. In vitro analysis of the binding of ADAR2 to the pre-mRNA encoding the GluR-B R/G site. RNA 2000; 6(5):687-697. 40. Lehmann KA, Bass BL. he importance of internal loops within RNA substrates of ADAR1. J Mol Biol 1999; 291(1):1-13. 41. Klaue Y, Kallman AM, Bonin M et al. Biochemical analysis and scanning force microscopy reveal productive and nonproductive ADAR2 binding to RNA substrates. RNA 2003; 9(7):839-846. 42. Stefl R, Allain FH. A novel RNA pentaloop fold involved in targeting ADAR2. RNA 2005; 11(5):592-597. 43. Stel R, Xu M, Skrisovska L et al. Structure and speciic RNA binding of ADAR2 double-stranded RNA binding motifs. Structure 2006; 14(2):345-355. 44. Pokharel S, Beal PA. High-throughput screening for functional adenosine to inosine RNA editing systems. ACS Chem Biol 2006; 1(12):761-765. 45. Chang KY, Ramos A. he double-stranded RNA-binding motif, a versatile macromolecular docking platform. FEBS J 2005; 272(9):2109-2117. 46. Stel R, Skrisovska L, Allain FH. RNA sequence- and shape-dependent recognition by proteins in the ribonucleoprotein particle. EMBO Rep 2005; 6(1):33-38.

240

DNA and RNA Modii cation Enzymes

47. Macbeth MR, Schubert HL, Vandemark AP et al. Inositol hexakisphosphate is bound in the ADAR2 core and required for RNA editing. Science 2005; 309(5740):1534-1539. 48. Rich A, Zhang S. Timeline: Z-DNA: the long road to biological function. Nat Rev Genet 2003; 4(7):566-572. 49. Koeris M, Funke L, Shrestha J et al. Modulation of ADAR1 editing activity by Z-RNA in vitro. Nucleic Acids Res 2005; 33(16):5362-5370. 50. Placido D, Brown BA 2nd, Lowenhaupt K et al. A let-handed RNA double helix bound by the Z alpha domain of the RNA-editing enzyme ADAR1. Structure 2007; 15(4):395-404. 51. Schwartz T, Rould MA, Lowenhaupt K et al. Crystal structure of the zalpha domain of the human editing enzyme ADAR1 bound to let-handed Z-DNA. Science 1999; 284(5421):1841-1845. 52. Athanasiadis A, Placido D, Maas S et al. The crystal structure of the zbeta domain of the RNA-editing enzyme ADAR1 reveals distinct conserved surfaces among Z-domains. J Mol Biol 2005; 351(3):496-507. 53. Stephens OM, Haudenschild BL, Beal PA. he binding selectivity of ADAR2ʹs dsRBMs contributes to RNA-editing selectivity. Chem Biol 2004; 11(9):1239-1250. 54. Doyle M, Jantsch MF. New and old roles of the double-stranded RNA-binding domain. J Struct Biol 2002; 140(1-3):147-153. 55. Xu M, Wells KS, Emeson RB. Substrate-dependent contribution of double-stranded RNA-binding motifs to ADAR2 function. Mol Biol Cell 2006; 17(7):3211-3220. 56. Fierro-Monti I, Mathews MB. Proteins binding to duplexed RNA: one motif, multiple functions. Trends Biochem Sci 2000; 25(5):241-246. 57. Bycrot M, Grunert S, Murzin AG et al. Nmr solution structure of a dsrna binding domain from drosophila staufen protein reveals homology to the N-terminal domain of ribosomal-protein S5. EMBO Journal 1995; 14(14):3563-3571. 58. Ryter JM, Schultz SC. Molecular basis of double-stranded RNA-protein interactions: structure of a dsRNA-binding domain complexed with dsRNA. EMBO Journal 1998; 17(24):7505-7513. 59. Ramos A, Grunert S, Adams J et al. RNA recognition by a staufen double-stranded RNA-binding domain. EMBO Journal 2000; 19(5):997-1009. 60. Blaszczyk J, Gan J, Tropea JE et al. Noncatalytic assembly of ribonuclease III with double-stranded RNA. Structure (Camb) 2004; 12(3):457-466. 61. Wu H, Henras A, Chanfreau G et al. Structural basis for recognition of the AGNN tetraloop RNA fold by the double-stranded RNA-binding domain of Rnt1p RNase III. Proc Natl Acad Sci USA 2004; 101(22):8307-8312. 62. Liu Y, Lei M, Samuel CE. Chimeric double-stranded RNA-speciic adenosine deaminase ADAR1 proteins reveal functional selectivity of double-stranded RNA-binding domains from ADAR1 and protein kinase PKR. Proc Natl Acad Sci USA 2000; 97(23):12541-12546. 63. Leulliot N, Quevillon-Cheruel S, Graille M et al. A new alpha-helical extension promotes RNA binding by the dsRBD of Rnt1p RNAse III. EMBO J 2004; 23(13):2468-2477. 64. Stel R, Allain FHT. A novel RNA pentaloop fold involved in targeting ADAR2. RNA 2005; in press. 65. Jaikaran DC, Collins CH, MacMillan AM. Adenosine to inosine editing by ADAR2 requires formation of a ternary complex on the GluR-B R/G site. J Biol Chem 2002; 277(40):37624-37629. 66. Cho DS, Yang W, Lee JT et al. Requirement of dimerization for RNA editing activity of adenosine deaminases acting on RNA. J Biol Chem 2003; 278(19):17093-17102. 67. Gallo A, Keegan LP, Ring GM et al. An ADAR that edits transcripts encoding ion channel subunits functions as a dimer. EMBO J 2003; 22(13):3421-3430. 68. Chilibeck KA, Wu T, Liang C et al. FRET analysis of in vivo dimerization by RNA-editing enzymes. J Biol Chem 2006; 281(24):16530-16535. 69. Poulsen H, Jorgensen R, Heding A et al. Dimerization of ADAR2 is mediated by the double-stranded RNA binding domain. RNA 2006; 12(7):1350-1360. 70. Valente L, Nishikura K. RNA binding-independent dimerization of adenosine deaminases acting on RNA and dominant negative efects of nonfunctional subunits on dimer functions. J Biol Chem 2007; 282(22):16054-16061. 71. Macbeth MR, Lingam AT, Bass BL. Evidence for auto-inhibition by the N terminus of hADAR2 and activation by dsRNA binding. RNA 2004; 10(10):1563-1571. 72. Yi-Brunozzi HY, Stephens OM, Beal PA. Conformational changes that occur during an RNA-editing adenosine deamination reaction. J Biol Chem 2001; 276(41):37827-37833. 73. Haudenschild BL, Maydanovych O, Veliz EA et al. A transition state analogue for an RNA-editing reaction. J Am Chem Soc 2004; 126(36):11213-11219. 74. Chen SH, Habib G, Yang CY et al. Apolipoprotein B-48 is the product of a messenger RNA with an organ-speciic in-frame stop codon. Science 1987; 238(4825):363-366.

Structure of RNA Editing Substrates and heir Recognition by RNA Base Deaminase

241

75. Powell LM, Wallis SC, Pease RJ et al. A novel form of tissue-speciic RNA processing produces apolipoprotein-B48 in intestine. Cell 1987; 50(6):831-840. 76. Lau PP, Xiong WJ, Zhu HJ et al. Apolipoprotein B mRNA editing is an intranuclear event that occurs posttranscriptionally coincident with splicing and polyadenylation. J Biol Chem 1991; 266(30):20550-20554. 77. Chester A, Scott J, Anant S et al. RNA editing: cytidine to uridine conversion in apolipoprotein B mRNA. Biochim Biophys Acta 2000; 1494(1-2):1-13. 78. Lau PP, Chan L. Involvement of a chaperone regulator, Bcl2-associated athanogene-4, in apolipoprotein B mRNA editing. J Biol Chem 2003; 278(52):52988-52996. 79. Davies MS, Wallis SC, Driscoll DM et al. Sequence requirements for apolipoprotein B RNA editing in transfected rat hepatoma cells. J Biol Chem 1989; 264(23):13395-13398. 80. Chen SH, Li XX, Liao WS et al. RNA editing of apolipoprotein B mRNA. Sequence speciicity determined by in vitro coupled transcription editing. J Biol Chem 1990; 265(12):6811-6816. 81. Shah RR, Knott TJ, Legros JE et al. Sequence requirements for the editing of apolipoprotein B mRNA. J Biol Chem 1991; 266(25):16301-16304. 82. Driscoll DM, Wynne JK, Wallis SC et al. An in vitro system for the editing of apolipoprotein B mRNA. Cell 1989; 58(3):519-525. 83. Driscoll DM, Lakhe-Reddy S, Oleksa LM et al. Induction of RNA editing at heterologous sites by sequences in apolipoprotein B mRNA. Mol Cell Biol 1993; 13(12):7288-7294. 84. Backus JW, Smith HC. hree distinct RNA sequence elements are required for eicient apolipoprotein B (apoB) RNA editing in vitro. Nucleic Acids Res 1992; 20(22):6007-6014. 85. Maris C, Masse J, Chester A et al. NMR structure of the apoB mRNA stem-loop and its interaction with the C to U editing APOBEC1 complementary factor. RNA 2005; 11(2):173-186. 86. Anant S, Davidson NO. An AU-rich sequence element (UUUN(A/U)U) downstream of the edited C in apolipoprotein B mRNA is a high-ainity binding site for apobec-1: binding of apobec-1 to this motif in the 3ʹ untranslated region of c-myc increases mRNA stability. Mol Cell Biol 2000; 20(6):1982-1992. 87. Blanc V, Kennedy S, Davidson NO. A novel nuclear localization signal in the auxiliary domain of apobec-1 complementation factor regulates nucleocytoplasmic import and shuttling. J Biol Chem 2003; 278(42):41198-41204. 88. Dance GS, Sowden MP, Cartegni L et al. Two proteins essential for apolipoprotein B mRNA editing are expressed from a single gene through alternative splicing. J Biol Chem 2002; 277(15):12703-12709. 89. Sowden MP, Lehmann DM, Lin X et al. Identiication of novel alternative splice variants of APOBEC-1 complementation factor with diferent capacities to support apolipoprotein B mRNA editing. J Biol Chem 2004; 279(1):197-206. 90. Blanc V, Henderson JO, Kennedy S et al. Mutagenesis of apobec-1 complementation factor reveals distinct domains that modulate RNA binding, protein-protein interaction with apobec-1 and complementation of C to U RNA-editing activity. J Biol Chem 2001; 276(49):46386-46393. 91. Mehta A, Driscoll DM. Identiication of domains in apobec-1 complementation factor required for RNA binding and apolipoprotein-B mRNA editing. RNA 2002; 8(1):69-82. 92. Kiledjian M, Dreyfuss G. Primary structure and binding activity of the hnRNP U protein: binding RNA through RGG box. EMBO J 1992; 11(7):2655-2664. 93. Zanotti KJ, Lackey PE, Evans GL et al. hermodynamics of the fragile X mental retardation protein RGG box interactions with G quartet forming RNA. Biochemistry 2006; 45(27):8319-8330. 94. Maris C, Dominguez C, Allain FH. he RNA recognition motif, a plastic RNA-binding platform to regulate posttranscriptional gene expression. Febs J 2005; 272(9):2118-2131. 95. Clery A, Blatter M, Allain FH. RNA recognition motifs: boring? Not quite. Curr Opin Struct Biol 2008; 18(3):290-298. 96. Bennett-Lovsey RM, Herbert AD, Sternberg MJ et al. Exploring the extremes of sequence/structure space with ensemble fold recognition in the program Phyre. Proteins 2008; 70(3):611-625. 97. Blanc V, Henderson JO, Kennedy S et al. Mutagenesis of apobec-1 complementation factor (ACF) reveals distinct domains that modulate RNA binding, protein-protein interaction with apobec-1 and complementation of C to U RNA editing activity. J Biol Chem 2001; 24:24. 98. Perez-Canadillas JM. Grabbing the message: structural basis of mRNA 3ʹUTR recognition by Hrp1. EMBO J 2006; 25(13):3167-3178. 99. Blanc V, Navaratnam N, Henderson JO et al. Identiication of GRY-RBP as an apolipoprotein B RNA-binding protein that interacts with both apobec-1 and apobec-1 complementation factor to modulate C to U editing. J Biol Chem 2001; 276(13):10272-10283. 100. Lau PP, Chang BH, Chan L. Two-hybrid cloning identiies an RNA-binding protein, GRY-RBP, as a component of apobec-1 editosome. Biochem Biophys Res Commun 2001; 282(4):977-983.

242

DNA and RNA Modii cation Enzymes

101. Anant S, Henderson JO, Mukhopadhyay D et al. Novel role for RNA-binding protein CUGBP2 in mammalian RNA editing. CUGBP2 modulates C to U editing of apolipoprotein B mRNA by interacting with apobec-1 and ACF, the apobec-1 complementation factor. J Biol Chem 2001; 276(50):47338-47351. 102. Lau PP, Zhu HJ, Nakamuta M et al. Cloning of an Apobec-1-binding protein that also interacts with apolipoprotein B mRNA and evidence for its involvement in RNA editing. J Biol Chem 1997; 272(3):1452-1455. 103. Greeve J, Lellek H, Rautenberg P et al. Inhibition of the apolipoprotein B mRNA editing enzyme-complex by hnRNP C1 protein and 40S hnRNP complexes. Biol Chem 1998; 379(8-9):1063-1073. 104. Lellek H, Kirsten R, Diehl I et al. Puriication and molecular cloning of a novel essential component of the apolipoprotein B mRNA editing enzyme-complex. J Biol Chem 2000; 275(26):19848-19856. 105. Yang Y, Sowden MP, Smith HC. Induction of cytidine to uridine editing on cytoplasmic apolipoprotein B mRNA by overexpressing APOBEC-1. J Biol Chem 2000; 275(30):22663-22669. 106. Polier S, Dragovic Z, Hartl FU et al. Structural basis for the cooperation of hsp70 and hsp110 chaperones in protein folding. Cell 2008; 133(6):1068-1079. 107. Lau PP, Villanueva H, Kobayashi K et al. A DnaJ protein, apobec-1-binding protein-2, modulates apolipoprotein B mRNA editing. J Biol Chem 2001; 276(49):46445-46452. 108. Chester A, Somasekaram A, Tzimina M et al. he apolipoprotein B mRNA editing complex performs a multifunctional cycle and suppresses nonsense-mediated decay. EMBO J 2003; 22(15):3971-3982. 109. Prochnow C, Bransteitter R, Klein MG et al. he APOBEC-2 crystal structure and functional implications for the deaminase AID. Nature 2007; 445(7126):447-451. 110. Chen KM, Harjes E, Gross PJ et al. Structure of the DNA deaminase domain of the HIV-1 restriction factor APOBEC3G. Nature 2008; 452(7183):116-119. 111. Holden LG, Prochnow C, Chang YP et al. Crystal structure of the anti-viral APOBEC3G catalytic domain and functional implications. Nature 2008; 456(7218):121-124. 112. Conticello SG. he AID/APOBEC family of nucleic acid mutators. Genome Biol 2008; 9(6):229. 113. Navaratnam N, Fujino T, Bayliss J et al. Escherichia coli cytidine deaminase provides a molecular model for ApoB RNA editing and a mechanism for RNA substrate recognition. J Mol Biol 1998; 275(4):695-714. 114. Teng BB, Ochsner S, Zhang Q et al. Mutational analysis of apolipoprotein B mRNA editing enzyme (APOBEC1). structure-function relationships of RNA editing and dimerization. J Lipid Res 1999; 40(4):623-635. 115. Carter CW Jr. he nucleoside deaminases for cytidine and adenosine: structure, transition state stabilization, mechanism and evolution. Biochimie 1995; 77(1-2):92-98. 116. Kay BK, Williamson MP, Sudol M. he importance of being proline: the interaction of proline-rich motifs in signaling proteins with their cognate domains. FASEB J 2000; 14(2):231-241. 117. Scott J, Navaratnam N, Carter C. Molecular modelling of the biosynthesis of the RNA-editing enzyme APOBEC-1, responsible for generating the alternative forms of apolipoprotein B. Exp Physiol 1999; 84(4):791-800. 118. Xie K, Sowden MP, Dance GS et al. he structure of a yeast RNA-editing deaminase provides insight into the fold and function of activation-induced deaminase and APOBEC-1. Proc Natl Acad Sci USA 2004; 101(21):8114-8119. 119. Lomeli H, Mosbacher J, Melcher T et al. Control of kinetic properties of AMPA receptor channels by nuclear RNA editing. Science 1994; 266(5191):1709-1713. 120. Ding J, Hayashi MK, Zhang Y et al. Crystal structure of the two-RRM domain of hnRNP A1 (UP1) complexed with single-stranded telomeric DNA. Genes Dev 1999; 13(9):1102-1115. 121. Lau PP, Zhu HJ, Nakamuta M et al. Cloning of an Apobec-1-binding protein that also interacts with apolipoprotein B mRNA and evidence for its involvement in RNA editing. J Biol Chem 1997; 272(3):1452-1455. 122. Schaal TD, Maniatis T. Selection and characterization of pre-mRNA splicing enhancers: identiication of novel SR protein-speciic enhancer sequences. Mol Cell Biol 1999; 19(3):1705-1719.

Chapter 18

Biological Roles of ADARs Bret S.E. Heale and Mary A. O’Connell*

Abstract

R

NA editing is widespread throughout the human transcriptome. he major editing event is the deamination of adenosine to inosine. he enzymes responsible are ADARs and they deaminate speciic adenosines in double-stranded RNA. When editing occurs within a coding region it can result in another amino acid being inserted at the edited position since inosine is read as guanosine by the translation machinery. Classically, alteration in neurotransmitter receptors has been demonstrated to be the biological role of editing. Recently, additional roles for editing enzymes have been proposed in relation to Alu repeats, siRNA and miRNA function, cancer and the innate immune response. Overall, ADAR activity is an important contributor to many biological pathways.

Introduction

RNA editing by ADARs (adenosine deaminase acting on RNA) was irst discovered in Xenopus laevis.1,2 Originally the enzyme was thought to have helicase activity as it altered the mobility of double-stranded (ds) RNA when electrophoresed on a native polyacrylamide gel. Further investigation revealed it to be an editing or modifying enzyme that converted adenosines to inosines within dsRNA. he observed change of mobility in the gel was due to the lack of inosine base-pairing with uracil so that the RNA became increasingly single-stranded.3 his enzymatic activity is widespread, present not only in diferent mammalian cell lines but also in all Metazoa.4 he ADAR enzymes convert adenosine to inosine in dsRNA via hydrolytic deamination and do not require energy or any cofactor in vitro.5 Inosine base-pairs with cytosine and is read as guanosine by the translational machinery,6 so if editing occurs within exons it can result in another amino acid being incorporated at the edited position. Most of these recoding editing events occur within transcripts that are expressed in the CNS (central nervous system), however it is not understood why ADARs speciically target CNS transcripts. One possibility is that the CNS requires protein diversity for proper functioning. Alternatively expression of protein variants through RNA editing may be better tolerated in the CNS due to the blood brain barrier preventing the entry of polyclonal antibodies that would otherwise recognize the modiied protein as foreign. One of the main reasons why ADARs have been studied so intensively is because of the profound efects editing can have on the properties of the encoded protein and these will be discussed below. RNA editing can be found at any location within a pre-mRNA. In mammals and Drosophila most editing has been observed in exons and in the neighboring introns with which they base-pair to form a duplex, the sequence in the intron is the editing site complementary sequence (ECS).7 Sometimes the duplex can be entirely contained within an exon such as in the case of the Gabra-3 and Kv1.1 transcripts where editing can occur ater splicing.8,9 Some C. elegans mRNAs contain long hairpins within their UTRs and editing has been found both in the 5ʹ and 3ʹ UTR.10 RNA editing can also afect splicing in that it can generate a 3ʹ acceptor site as in the case of the ADAR2 *Corresponding Author: Mary A. O’Connell—MRC Human Genetics Unit, Western General Hospital, Crewe Road, Edinburgh EH4 2XU, UK. Email: mary.o’[email protected]

DNA and RNA Modiication Enzymes: Structure, Mechanism, Function and Evolution edited by Henri Grosjean. ©2009 Landes Bioscience.

244

DNA and RNA Modii cation Enzymes

transcript11 or edit a putative branch point sequence as found in the PTPN6 transcript.12 Hepatitis delta virus (HDV) is edited at an amber stop codon to generate the longer isoform of the hepatitis delta antigen.13 ADARs can also edit noncoding RNAs and can afect the processing of miRNAs such as pri-miR-14214 or afect the ‘seed’ sequence as occurs in miR-37615 so that the miRNA is redirected to silence a diferent set of transcripts. here are four ADAR proteins in mammals, ADAR1, ADAR2, ADAR3 and TENR (testis nuclear RNA binding protein). Overall all ADAR proteins are comprised of dsRNA binding domains at the amino terminus and the catalytic deaminase domain at the carboxy terminus with each protein having some unique features16-21 (Fig. 1). All proteins have a deaminase domain and that deines them as ADARs proteins however the number of dsRNA binding domains can vary from one to three (for

Figure 1. Protein Domains of ADARs. ADARs are characterized by containing double-stranded RNA binding domains (box with diagonal gray lines) and a deaminase domain (gray box). The deaminase domain is well conserved, even in tRNA modification enzymes such as ADAT1. ADAR3 contains a domain rich in arginine at the amino terminus (dark gray box). ADAR1 as well as vaccinia virus E3L have a Z-DNA binding domain at the amino terminus (box with diagonal black lines).

Biological Roles of ADARs

245

more information on deaminase domains see chapter by Smith and chapter by Wedekind and Beal in this book, and for dsRNA binding domain see chapter by Maris and Allain in this book). he ADAR1 gene can generate two isoforms, an interferon inducible protein of 150 kDa and a nuclear isoform that is 110 kDa.22 Both have three dsRNA binding domain however ADAR1p150 also has two Z-DNA binding domains at its amino terminus.23 he nuclear localization signal (NLS) overlaps the third dsRNA binding domain24 whereas the nuclear export signal (NES) overlaps the irst Z-DNA binding domain Zα.25 ADAR1 is modiied by SUMO-1 at lysine 418 which is located between the Z-DNA binding domains and the dsRNA binding domains.26 Sumoylation of ADAR1 occurs in the nucleolus and although it is not required for localization, it reduces the editing activity in vitro.26 ADAR2 is a nuclear protein that has two dsRNA binding domains, it has a non canonical NLS within the irst 64 amino acids and does not contain an NES.27 It is also targeted to the nucleolus by a signal that overlaps the irst dsRNA binding domain and by some other amino acid sequence that has not yet been identiied. A crystal structure at 1.7 Å has been obtained for the deaminase domain of ADAR228 (see chapter by Wedekind and Beal in this book for structural details). As expected it contains a zinc ion in its active site. A surprising revelation of this structure was that inositol hexakisphosphate (IP6) is buried within the enzyme core and is required for protein folding. Analysis of other ADAR and ADAT1 (adenosine deaminase that acts on tRNA) proteins revealed that the amino acids necessary for coordinating IP6 are conserved and that ADARs and ADAT1 require IP6 as a cofactor. ADAR3 is a nuclear protein with two dsRNA binding domains.29 It is enzymatically inactive although it is very similar in sequence to ADAR2 and has an arginine rich region located at the amino terminus capable of binding single stranded RNA.30 TENR is expressed only in the testis, has one dsRNA binding domain and is enzymatically inactive as it lacks the cysteines in the deaminase domain that are thought to chealate the catalytic zinc ion.31,32 One question that is still controversial is whether ADARs act as monomers or are required to forms dimers for enzymatic activity. All cytidine deaminases (CDA) including ADATs form either dimers or tetramers (see chapter by Wedekind and Beal in this book). Four groups have independently shown ADAR2 or dADAR form dimers.33-36 Two groups report that ADAR2 and dADAR require binding to RNA for dimerization35,36 whereas other groups have shown that dimerization is RNA independent.34,37 It has been reported that ADARs can only form homodimers34 whereas another group has published that heterodimers between ADAR1 and ADAR2 can be formed.37 To add to the controversy one group has shown that ADAR2 acts as a monomer and that the two dsRNA binding domains can dimerize on short duplex RNA and inhibit RNA editing.38 hey propose that in the presence of duplex RNA of suicient length both dsRNA binding domains can bind to the RNA and editing occurs. It is obvious that further experiments are required to clarify the issue concerning dimerization.

Classical Editing Substrates of ADARs: Mammalian GluR-B and Serotonin (5-HT2c) Transcripts

One of the irst edited transcripts to be found in mammals encodes subunit B of the glutamate-gated ion channel receptor (GluR-B)39 (Fig. 2). Editing by ADAR2 was found at the Q/R site which lies in the channel pore. he consequence of editing is that the glutamate codon is converted into that of arginine and this occurs at a frequency of 100% except in the white matter in human brain where it varies from 81-93%.40 Arginine at this position has a dramatic efect on the glutamate receptor function as it controls the permeability of the ion channel making it impermeable to calcium ions. herefore any receptor that does not contain the GluR-B subunit or contains the unedited subunit is permeable to calcium ions. his position also controls the rate at which the GluR-B subunits form tetramers and traic through the ER to the synapse.41 he edited form is retained in the ER whereas the unedited isoform forms tetramer more rapidly and traics to the synapses. hus this one edited position plays a vital role in regulating the properties of the glutamate receptor. Loss of editing at this position by deletion of the ECS in the intron leads to seizures and mice die by 3 weeks of age.42 In GluR-B, -C and–D subunits editing also occurs at

246

DNA and RNA Modii cation Enzymes

Figure 2. Alu repeats potentially form hairpins. Comparison of the structures of the GluR-B Q/R editing site, an Alu monomer (folded with Michel Zuker’s mfold) and an inverted Alu repeat. An Alu monomer can form a hairpin of approximately 22 nucleotides that is disrupted by loops and therefore not an ideal editing substrate for ADARs. However if 2 Alu elements are inserted in opposite orientation less and than 2 kb, they can base pair and be edited by ADARs.

another position termed the R/G site where an arginine codon is converted into one for glycine.43 Editing at this position in the GluR-B transcript is not 100% and increases with brain development. he consequence of editing at this R/G position is that the channel has a faster recovery rate from desensitization.44 Editing also occurs at the Q/R site in the pre-mRNA encoding the kainate subunits GluR5 and GluR645 as well as in the irst transmembrane domain of GluR6 at positions I/V and Y/C.46 Transcripts encoding the G-protein coupled serotonin 5-HT2C receptor are edited by both ADAR1 and ADAR2 at ive closely spaced positions in the second intracellular loop so that potentially 24 diferent protein isoforms can be generated.47 he result of editing this transcript is to modify the ligand ainity and the eicacy of G protein coupling. he unedited isoform has the highest constitutive activity and the most eicient G-protein coupling while the edited isoform displays the least, with the partial edited isoforms displaying an intermediate efect. As the serotonin receptor is important for the regulation of mood, much efort has been invested in determining if editing levels vary signiicantly in patients with mood disorders such as depression and schizophrenia and in suicide victims (for review see refs. 48,49). Unfortunately no clear-cut answer has emerged due to the variation in editing levels in the control groups and because many patients in the study have a history of taking medication that can inluence editing. For this reason research has now focused on animal models but again natural variations such as diferences in mouse strains and behavioral tests can inluence the results so that no deinite conclusion can be drawn.

ADAR Activity in Model Organisms; Mice, Flies, Worms

ADAR activity was irst discovered in Xenopus laevis1,2 however as it is not a genetically tractable organism eforts switched to other model organisms such as mice, Drosophila and C. elegans, each of which exhibit diferences in the efect of editing on their biology. As mentioned previously, in mammals there are four members of the ADAR family; ADAR1, ADAR2, ADAR3 and TENR and no activity has been demonstrated for either TENR or ADAR3. TENR is expressed solely in the germ cells of the testis and plays a role in spermatid morphogenesis32

Biological Roles of ADARs

247

while ADAR3 is expressed only in the brain with highest levels in the olfactory bulb and thalamus.29,30 he TENR protein lacks some of the conserved amino acids required to chelate zinc at the active site in the deaminase domain however it is not clear why ADAR3 is enzymatically inactive. Both proteins are phylogenetically conserved and as they contain dsRNA binding domains it is possible that they can inluence RNA editing either by sequestering transcripts that are normally edited or form heterodimers with the active ADARs and modulate their activity. he biological role of ADAR1 remains elusive despite the endeavors of many groups studying it. As most of the edited transcripts that result in recoding events are expressed in the CNS, it was anticipated that an Adar1−/− mouse would display a phenotype with defects in the nervous system. However the Adar1−/− mice die at day E12.5 with severe defects in hematopoiesis and display stress induced apoptosis in embryonic ibroblasts cultured from the embryos.50,51 his phenotype has not been attributed to lack of editing of any particular transcript or noncoding RNA. No editing was observed at the A and B sites in the serotonin 5-HT2c transcript in cultured neurons from the Adar1−/− mice that are normally edited to 80 and 90% respectively. Intriguingly exon 5 that contains the edited sites was spliced out in the ADAR1 deicient mice suggesting a close link between RNA editing and splicing.50 he phenotype of the Adar2 null mice is better understood and can be attributed to lack of RNA editing at the Q/R site in the GluR-B transcript.52 he Adar2−/− mice become progressively prone to seizures die by day P20. his phenotype was rescued by generating transgenic mice Adar2−/−GluR-BR/R in which the edited version of GluR-B(R) replaces the unedited version GluR-B(Q). his elegant experiment indicates that the critical site edited by ADAR2 is the GluR-B Q/R site. Transcripts encoding the kainite receptor subunits, GluR5 and GluR6 are also edited at a Q/R site in the second transmembrane domain. Editing at these positions increases during development in rat brain.45 Transgenic mice were generated that encoded either an arginine or glutamine at the edited position in GluR5 to determine the consequence of editing on receptor function.53 Surprisingly, editing at the Q/R site in GluR5 was not important for viability, development of the brain, spatial learning or nociceptive transmission in the mice. Transgenic mice were also generated that were unable to edit the Q/R site in GluR6 as they lacked the ECS.54 he unedited receptor mediated synaptic plasticity and the mice were more vulnerable to kainite-induced seizures. Transgenic mice over-expressing ADAR2 displayed mature onset obesity in both male and female animals.55 What was unexpected is that the catalytic inactive ADAR2 also gave the same phenotype suggesting that this phenotype is independent of editing activity. One hypothesis to explain this result is that the inactive protein can still bind dsRNA and can compete with ADAR1 or other dsRNA binding proteins for binding sites. Insects take advantage of RNA editing by ADARs to generate protein diversity.56 here is one Adar gene in Drosophila and deletion of this gene results in loss of locomotion, infertility and age-related neurodegeneration.57,58 his deletion is not lethal however good husbandry is essential to maintain the viability of the mutant lies. he Adar gene is located within an ecdysone puf and transcription of the gene is up-regulated at metamorphosis.57 he Adar transcript is edited at one position near the active site with a conversion of serine to glycine (S/G). he consequence of editing is that a protein is generated that has less enzymatic activity than the genomically encoded protein. he ADAR(S) isoform but not the ADAR(G) isoform is lethal when expressed as a transgene with the UAS-GAL4 binary system under the control of a strong driver such as actin 5C-GAL4.59 herefore even though Adar is not essential in Drosophila, if it is expressed the transcript has to be regulated by editing. As in mammals editing events in Drosophila, lead to recoding of transcripts expressed in the CNS. To date over 57 transcripts that undergo editing have been reported, with an average of 4 sites/transcript. Many of these editing sites, found by various methods, have been veriied by sequencing.60-62 One common feature in Drosophila is that editing levels rise through development however some sites are constitutively edited to 100%.60,63 Using the strong Mef2-GAL4 driver to express the more active ADAR isoforms in muscles and heart in embryos and larvae, results in

248

DNA and RNA Modii cation Enzymes

lethality and analysis of cDNA isolated from dying larvae indicates that some transcripts are edited inappropriately early.59 herefore it has been proposed that Drosophila uses RNA editing temporally to help generate the adult nervous system and the cause of lethality is when editing occurs too early in the embryo and larvae. he use of gene sparing strategies such as multiple promoters and alternative splicing is a common theme in Drosophila. In C. elegans there are two ADAR genes; adr-1 and adr-2, the name does not relect a relationship with the mammalian genes. adr-1 is required for editing at some sites whereas deletion of adr-2 completely abolishes RNA editing.64 he adr-2 gene encodes one dsRNA binding domain and has a canonical deaminase motif that co-ordinates zinc in the active site. his sequence is highly divergent in adr-1 and one possibility is that ADR-1 is catalytically inactive but for editing activity it is required to form heterodimers with ADR-2. In addition adr-1 but not adr-2 plays a role in vulva development whereas both genes are necessary for normal chemotaxis.64 Editing in C. elegans has only been found in noncoding regions such as 5ʹ and 3ʹUTRs65 and as yet no recoding editing event has been observed. herefore the question arises if there is some evolutionary pressure on C. elegans to avoid editing of codons. hese adr genes are also involved in the RNA interference (RNAi) pathway.66,67 Transgene induced silencing in somatic tissues is abolished in the adr double mutant however they do not antagonize the pathway if dsRNA is injected.

Disorders Associated with Lack of RNA Editing

Lack of RNA editing has been implicated in many disorders; however the number of disorders where the association has been emphatically proven are few. Point mutations have been found in ADAR1 in Chinese and Japanese patients with dyschromatosis symmetrica hereditaria (DSH). his is a rare autosomal dominant inherited dermatosis characterized by a mixture of hyperpigmented and hypopigmented macules on the back of hands and feet.68-70 Seventy mutations have been identiied in the ADAR1 gene with 30 missense mutations in the deaminase domain that is thought to be a mutational hot spot. No mutation has been found in the human ADAR2 gene, this is expected since Adar2 deicient mice sufered seizures and die within three weeks of birth.52 To date there is no evidence that ADAR2 is associated with seizures in humans, although not many human seizure genes have been identiied yet. A decrease in editing at the Q/R site in GLUR-B has been found in motor neurons in 5 Japanese patients sufering from sporadic ALS.71 A signiicant decrease was observed in editing of the GLUR-B Q/R site in individual motor neurons in sporadic ALS patients compared to controls. here was no detectable change in the level of the mature GLUR-B transcript in the afected motor neurons and editing levels were 99% in the cerebellar Purkinje cells of these ALS patients.71 his supports the hypothesis that the reduction in editing of the Q/R site in GLUR-B in motor neurons contributes to the selective motor neuron death that is observed in ALS patients. No decrease in editing was observed in patients with familial ALS such as spinal and bulbar muscular atrophy (SBMA) or in rats transgenic for mutant human Cu/Zn-superoxide dismutase (SOD1).72 AMPA receptors in motor neurons contain less GLUR-B subunit relative to AMPA receptors in other neuron types, therefore motor neurons would be more sensitive to the loss of editing at the Q/R site. Transient forebrain ischemia in adult rats resulted in a reduction of editing at the Q/R site in GLUR-B transcripts isolated from single CA1 pyramidal neurons in the hippocampus.73 his could be directly attributed to reduction in RNA editing as silencing of ADAR2 caused degeneration of these neurons whereas CREB induced expression of ADAR2 protected vulnerable neurons in the rat hippocampus from forebrain ischemic insult.73 his result could have signiicant clinical implications, as these calcium-permeable AMPA receptors containing GLUR-B could be a good target for drugs to combat the efect of stroke.

RNA Editing of Alu Repeats

Isolation, total nuclease digestion and 2D chromatography of mRNA from rat brain led to the estimation that 1 in 17,000 nucleotides is inosine.74 Despite an intense search to identify these

Biological Roles of ADARs

249

edited transcripts, they remained elusive. Subsequently it was discovered in silico that Alu repeats in humans are highly edited. Four groups performed bioinformatic searches to ind transcripts that are edited by looking for discrepancies between the genomic and corresponding cDNA sequences.75-78 A hallmark of editing by ADARs is that A in the genomic sequence is G in the cDNA. he largest study found 30,085 A to G discrepancies in 2674 transcripts with the other groups inding similar results.76 Alu repeats belong to the SINE family (short interspersed nuclear elements) and arose during the separation of primates from other mammals. Even today, they are estimated to transpose at a rate of one insertion per three thousand births. he highest level of editing is observed when two Alu elements are in close proximity, 55%) synthesize queuosine or GluQ de novo such as B. subtilis and E. coli, respectively. It was known from sequencing of mature tRNA that Mycoplasma capricolum106 did not contain queuosine and indeed no genes encoding queuosine biosynthetic enzymes can be identiied in its genome, including tgt, the signature gene of the queuosine pathway. he absence of tgt can be generalized to all Mollicutes (most of these organisms are intracellular pathogens that underwent drastic genome reductions). he absence of queuosine seems to be more widespread as many Actinomycetes, such as Mycobacterium tuberculosis and a few Lactobacilli also lack tgt genes, indicating that free-living bacteria can survive without queuosine as suggested by the viability of the E. coli Δtgt strain. However, there must exist strong selective pressure to keep the modiication since the number of sequenced bacteria that have lost the pathway is low (S) of tRNA confers ribosome binding. RNA 1999; 5:188-94. 27. Sylvers LA, Rogers KC, Shimizu M et al. A 2-thiouridine derivative in tRNAGlu is a positive determinant for aminoacylation by Escherichia coli glutamyl-tRNA synthetase. Biochemistry 1993; 32:3836-41. 28. Yasukawa T, Suzuki T, Ishii N et al. Wobble modiication defect in tRNA disturbs codon-anticodon interaction in a mitochondrial disease. EMBO J 2001; 20:4794-802. 29. Yasukawa T, Suzuki T, Ishii N et al. Defect in modiication at the anticodon wobble nucleotide of mitochondrial tRNA(Lys) with the MERRF encephalomyopathy pathogenic mutation. FEBS Lett 2000; 467:175-8. 30. Kaneko T, Suzuki T, Kapushoc ST et al. Wobble modiication diferences and subcellular localization of tRNAs in Leishmania tarentolae: implication for tRNA sorting mechanism. EMBO J 2003; 22:657-67. 31. Nakai Y, Umeda N, Suzuki T et al. Yeast Nfs1p is involved in thio-modiication of both mitochondrial and cytoplasmic tRNAs. J Biol Chem 2004; 279:12363-8. 32. Frazzon J, Dean DR. Formation of iron-sulfur clusters in bacteria: an emerging ield in bioinorganic chemistry. Curr Opin Chem Biol 2003; 7:166-73. 33. Lill R, Muhlenhof U. Maturation of iron-sulfur proteins in eukaryotes: mechanisms, connected processes and diseases. Annu Rev Biochem 2008; 77:669-700. 34. Kambampati R, Lauhon CT. MnmA and IscS are required for in vitro 2-thiouridine biosynthesis in Escherichia coli. Biochemistry 2003; 42:1109-17. 35. Numata T, Fukai S, Ikeuchi Y et al. Structural basis for sulfur relay to RNA mediated by heterohexameric TusBCD complex. Structure 2006; 14:357-66. 36. Numata T, Ikeuchi Y, Fukai S et al. Snapshots of tRNA sulphuration via an adenylated intermediate. Nature 2006; 442:419-24. 37. Hagervall TG, Pomerantz SC, McCloskey JA. Reduced misreading of asparagine codons by Escherichia coli tRNALys with hypomodiied derivatives of 5-methylaminomethyl-2-thiouridine in the wobble position. J Mol Biol 1998; 284:33-42. 38. Noma A, Sakaguchi Y, Suzuki T. Mechanistic characterization of the sulfur-relay system for eukaryotic 2-thiouridine biogenesis at tRNA wobble positions. Nucleic Acids Res In press, 2009: 39. Bjork GR, Huang B, Persson OP et al. A conserved modiied wobble nucleoside (mcm5s2U) in lysyl-tRNA is required for viability in yeast. RNA 2007; 13:1245-55. 40. Huang B, Lu J, Bystrom AS. A genome-wide screen identiies genes required for formation of the wobble nucleoside 5-methoxycarbonylmethyl-2-thiouridine in Saccharomyces cerevisiae. RNA 2008; 14:2183-94.

404

DNA and RNA Modii cation Enzymes

41. Schlieker CD, Van der Veen AG, Damon JR et al. A functional proteomics approach links the ubiquitin-related modiier Urm1 to a tRNA modiication pathway. Proc Natl Acad Sci USA 2008; 105:18255-60. 42. Bordo D, Bork P. he rhodanese/Cdc25 phosphatase superfamily. Sequence-structure-function relations. EMBO Rep 2002; 3:741-6. 43. Furukawa K, Mizushima N, Noda T et al. A protein conjugation system in yeast with homology to biosynthetic enzyme reaction of prokaryotes. J Biol Chem 2000; 275:7462-5. 44. Goehring AS, Rivers DM, Sprague GF. Urmylation: a ubiquitin-like pathway that functions during invasive growth and budding in yeast. Mol Biol Cell 2003; 14:4329-41. 45. Rubio-Texeira M. Urmylation controls Nil1p and Gln3p-dependent expression of nitrogen-catabolite repressed genes in Saccharomyces cerevisiae. FEBS Lett 2007; 581:541-50. 46. Goehring AS, Rivers DM, Sprague GF. Attachment of the ubiquitin-related protein Urm1p to the antioxidant protein Ahp1p. Eukaryot Cell 2003; 2:930-6. 47. Jeong JS, Kwon SJ, Kang SW et al. Puriication and characterization of a second type thioredoxin peroxidase (type II TPx) from Saccharomyces cerevisiae. Biochemistry 1999; 38:776-83. 48. Park SG, Cha MK, Jeong W et al. Distinct physiological functions of thiol peroxidase isoenzymes in Saccharomyces cerevisiae. J Biol Chem 2000; 275:5723-32. 49. Begley U, Dyavaiah M, Patil A et al. Trm9-catalyzed tRNA modiications link translation to the DNA damage response. Mol Cell 2007; 28:860-70. 50. Kispal G, Csere P, Prohl C et al. he mitochondrial proteins Atm1p and Nfs1p are essential for biogenesis of cytosolic Fe/S proteins. EMBO J 1999; 18:3981-9. 51. Nakai Y, Yoshihara Y, Hayashi H et al. cDNA cloning and characterization of mouse nifS-like protein, m-Nfs1: mitochondrial localization of eukaryotic NifS-like proteins. FEBS Lett 1998; 433:143-8. 52. Li J, Kogan M, Knight SA et al. Yeast mitochondrial protein, Nfs1p, coordinately regulates iron-sulfur cluster proteins, cellular iron uptake and iron distribution. J Biol Chem 1999; 274:33025-34. 53. Huh WK, Falvo JV, Gerke LC et al. Global analysis of protein localization in budding yeast. Nature 2003; 425:686-91. 54. Kumar A, Agarwal S, Heyman JA et al. Subcellular localization of the yeast proteome. Genes Dev 2002; 16:707-19. 55. Sickmann A, Reinders J, Wagner Y et al. he proteome of Saccharomyces cerevisiae mitochondria. Proc Natl Acad Sci USA 2003; 100:13207-12. 56. Kowalak JA, Dalluge JJ, McCloskey JA et al. he role of posttranscriptional modiication in stabilization of transfer RNA from hyperthermophiles. Biochemistry 1994; 33:7869-76. 57. Watanabe K, Oshima T, Saneyoshi M et al. Replacement of ribothymidine by 5-methyl-2-thiouridine in sequence GT psi C in tRNA of an extreme thermophile. FEBS Lett 1974; 43:59-63. 58. Watanabe K, Shinma M, Oshima T et al. Heat-induced stability of tRNA from an extreme thermophile, hermus thermophilus. Biochem Biophys Res Commun 1976; 72:1137-44. 59. Yokoyama S, Watanabe K, Miyazawa T. Dynamic structures and functions of transfer ribonucleic acids from extreme thermophiles. Adv Biophys 1987; 23:115-47. 60. Shigi N, Sakaguchi Y, Suzuki T et al. Identiication of two tRNA thiolation genes required for cell growth at extremely high temperatures. J Biol Chem 2006; 281:14296-306. 61. Shigi N, Suzuki T, Terada T et al. Temperature-dependent biosynthesis of 2-thioribothymidine of hermus thermophilus tRNA. J Biol Chem 2006; 281:2104-13. 62. Shigi N, Sakaguchi Y, Asai S et al. Common thiolation mechanism in the biosynthesis of tRNA thiouridine and sulphur-containing cofactors. EMBO J 2008; 27:3267-78. 63. Ikeuchi Y, Soma A, Ote T et al. Molecular mechanism of lysidine synthesis that determines tRNA identity and codon recognition. Mol Cell 2005; 19:235-46. 64. Dewez M, Bauer F, Dieu M et al. he conserved Wobble uridine tRNA thiolase Ctu1-Ctu2 is required to maintain genome integrity. Proc Natl Acad Sci USA 2008; 105:5459-64. 65. Schindelin H, Kisker C, Rajagopalan KV. Molybdopterin from molybdenum and tungsten enzymes. Adv Protein Chem 2001; 58:47-94. 66. Pitterle DM, Rajagopalan KV. he biosynthesis of molybdopterin in Escherichia coli. Puriication and characterization of the converting factor. J Biol Chem 1993; 268:13499-505. 67. Taylor SV, Kelleher NL, Kinsland C et al. hiamin biosynthesis in Escherichia coli. Identiication of this thiocarboxylate as the immediate sulfur donor in the thiazole formation. J Biol Chem 1998; 273:16555-60. 68. Hochstrasser M. Evolution and function of ubiquitin-like protein-conjugation systems. Nat Cell Biol 2000; 2:E153-7. 69. Favre A, Yaniv M, Michelson AM. he photochemistry of 4-thiouridine in Escherichia coli t-RNA Val1. Biochem Biophys Res Commun 1969; 37:266-71.

Biogenesis and Functions of hio-Compounds in Transfer RNA

405

70. Carre DS, homas G, Favre A. Conformation and functioning of tRNAs: cross-linked tRNAs as substrate for tRNA nucleotidyl-transferase and aminoacyl synthetases. Biochimie 1974; 56:1089-101. 71. Ryals J, Hsu RY, Lipsett MN et al. Isolation of single-site Escherichia coli mutants deicient in thiamine and 4-thiouridine syntheses: identiication of a nuvC mutant. J Bacteriol 1982; 151:899-904. 72. Kambampati R, Lauhon CT. Evidence for the transfer of sulfane sulfur from IscS to hiI during the in vitro biosynthesis of 4-thiouridine in Escherichia coli tRNA. J Biol Chem 2000; 275:10727-30. 73. Mueller EG, Palenchar PM, Buck CJ et al. he role of the cysteine residues of hiI in the generation of 4-thiouridine in tRNA: Evidence that hiI, an enzyme shared between thiamin and 4-thiouridine biosynthesis, may be a sulfurtransferase that proceeds through a persulide intermediate. J Biol Chem 2001; 276:33588-95. 74. Webb E, Claas K, Downs DM. Characterization of thiI, a new gene involved in thiazole biosynthesis in Salmonella typhimurium. J Bacteriol 1997; 179:4399-402. 75. Waterman DG, Ortiz-Lombardia M, Fogg MJ et al. Crystal structure of Bacillus anthracis hiI, a tRNA-modifying enzyme containing the predicted RNA-binding THUMP domain. J Mol Biol 2006; 356:97-110. 76. Soma A, Ikeuchi Y, Kanemasa S et al. An RNA-modifying enzyme that governs both the codon and amino acid speciicities of isoleucine tRNA. Mol Cell 2003; 12:689-98. 77. Nakanishi K, Fukai S, Ikeuchi Y et al. Structural basis for lysidine formation by ATP pyrophosphatase accompanied by a lysine-speciic loop and a tRNA-recognition domain. Proc Natl Acad Sci USA 2005; 102:7487-92. 78. Nishimura S. Minor components in transfer RNA: their characterization, location and function. Prog Nucleic Acid Res Mol Biol 1972; 12:49-85. 79. Vacher J, Grosjean H, Houssier C et al. he efect of point mutations afecting Escherichia coli tryptophan tRNA on anticodon-anticodon interactions and on UGA suppression. J Mol Biol 1984; 177:329-42. 80. Moore JA, Poulter CD. Escherichia coli dimethylallyl diphosphate:tRNA dimethylallyltransferase: a binding mechanism for recombinant enzyme. Biochemistry 1997; 36:604-14. 81. Leung HC, Chen Y, Winkler ME. Regulation of substrate recognition by the MiaA tRNA prenyltransferase modiication enzyme of Escherichia coli K-12. J Biol Chem 1997; 272:13073-83. 82. Esberg B, Leung HC, Tsui HC et al. Identiication of the miaB gene, involved in methylthiolation of isopentenylated A37 derivatives in the tRNA of Salmonella typhimurium and Escherichia coli. J Bacteriol 1999; 181:7256-65. 83. Hernandez HL, Pierrel F, Elleingand E et al. MiaB, a bifunctional radical-S-adenosylmethionine enzyme involved in the thiolation and methylation of tRNA, contains two essential (4Fe-4S) clusters. Biochemistry 2007; 46:5140-7. 84. Eckstein F. Phosphorothioation of DNA in bacteria. Nat Chem Biol 2007; 3:689-90. 85. Wang L, Chen S, Xu T et al. Phosphorothioation of DNA in bacteria by dnd genes. Nat Chem Biol 2007; 3:709-10. 86. Liang J, Wang Z, He X et al. DNA modiication by sulfur: analysis of the sequence recognition speciicity surrounding the modiication sites. Nucleic Acids Res 2007; 35:2944-54.

Chapter 28

Enzymatic Formation of 5-Aminomethyl-Uridine Derivatives in tRNA: Functional and Evolutionary Implications Yoshitaka Bessho* and Shigeyuki Yokoyama

Abstract

P

osttranscriptional modiication of the wobble uridine at position 34 in the anticodon of tRNA allows accurate and eicient decoding of the genetic code. In particular, decoding of the synonymous two-codon sets speciic for Leu, Gln, Lys, Glu and Arg primarily depends on the presence of a methylene carbon on the C-5 atom of U34 (xm5U), combined with the thiolation S-2 (xm5s2U) in the cases of Gln, Lys and Glu, or methylation of the 2ʹ-hydroxyl of ribose-34 (xm5Um) in the case of Leu. Together with other structural parameters of the anticodon arm, including the type of modiication of the purine nucleotide at position 37, adjacent to the anticodon xm5UNN, these xm5U34-containing tRNAs are eiciently and accurately able to decode only the purine-ending codons in the correct reading frame (no frameshit). he various enzymes in Bacteria (MnmE, GidA and MnmC) involved in the formation of these wobble xm5U34 derivatives have been identiied. In this chapter, we will summarize in structural terms what is known about these enzymes. heir relationships with other modiication enzymes that also act on carbon-5 of uridine in other positions of tRNA (mainly position 54) and their evolutionary interrelationships will also be discussed.

Introduction: Properties of 5-Substituents of tRNA Wobble Uridines

Modiications of nucleotides in the anticodon loop are important for tRNA recognition by cognate aminoacyl-tRNA synthetases and for accurate mRNA decoding. In particular, to ensure that tRNA accurately decodes the two-codon sets ending with purines (NNA/NNG of the two degenerate codon boxes) of the bacterial genetic code (Fig. 1A), the wobble uridines at position 34 have to be modiied into 5-methyluridine derivatives (xm5U) and eventually combined with an additional U-modiication, such as 2-thiolation (xm5s2U) or 2ʹ-O-methylation (xm5Um), with their synergy and redundancy efects.1 hese three types of modiications of uridine 34 favor the formation of the C3ʹ-endo form of the sugar pucker (Fig. 1B).2-5 In this way and in combination with the types of nucleotides at positions 32 and 35 and the types of modiications at position 37 within the anticodon loop, these uridine-34 modiications restrict and facilitate the codon recognition to NNA/NNG in the ribosomal A-site.1,6 he xm5U modiication especially contributes to increasing *Corresponding Author: Yoshitaka Bessho—RIKEN Systems and Structural Biology Center, Yokohama Institute, and SPring-8 Center, Harima Institute. 1-7-22 Suehiro-cho, Tsurumi, Yokohama 230-0045. Email: [email protected]

DNA and RNA Modiication Enzymes: Structure, Mechanism, Function and Evolution edited by Henri Grosjean. ©2009 Landes Bioscience.

Enzymatic Formation of 5-Aminomethyl-Uridine Derivatives in tRNA

407

the codon interaction for NNG, while 2-thiolation favors the interaction with NNA.7-10 Note that in the four-codon family boxes of the genetic code, unmodiied U34 can recognize all four codons, which has been explained by the “4-way wobbling” or “two out of three” hypothesis.11,12 However, in most organisms, the wobble U34 of tRNAs of the four-codon family boxes is generally modiied to a 5-hydroxyuridine derivative (xo5U), as in mo5U34 or cmo5U34 in E. coli, to increase the eiciency of the codon-anticodon interactions. In this case, the 5-substituent of xo5U lies in a coplanar conformation with the uracil base, which allows the sugar pucker of U34 to adopt the C2ʹ-endo form, hence favoring the recognition of NNU and NNG, in addition to the standard NNA recognition in the C3ʹ-endo form (for details see the chapter by Weixlbaumer and Murphy in this book).1,3 Considering only the physicochemical contributions of the modiications to the eiciency and accuracy of codon recognition in the ribosomal decoding-site is not suicient. Modiications could order the anticodon loop,13 for reducing the entropic energy barrier to codon binding.14,15 Kinetic parameters, such as proofreading, certainly play an important role.16,17 In either manner, the type of C-5 modiication inluences the ine arrangement of the U34 base in the ribosomal decoding site and controls the codon recognition patterns. Furthermore, hypermodiication of the C-5 atom of U34 contributes to prevent frameshits during translation, providing more evidence that the modiication helps to augment the codon-anticodon ainity.18,19 In bacteria and mitochondria, as well as some archaea,20 the x of xm5U34 can be an amino (nm5U), methylamino (mnm5U), or carboxymethylamino (cmnm5U) group and their enzymatic formations are described in this

Figure 1. tRNA-U34 wobble modification: A) Codon-recognition pattern in the four-codon (family box) and two-codon sets of the genetic code. aa: amino acid, anti: anticodon. B) Conformational rigidity and flexibility of modified U34.1 The modified U34 nucleotides in the left panel exist in the two-codon sets. The xo5U residue, in the center and right panels, retains conformational flexibility to recognize the codons of the four-codon sets. See details in the text.

408

DNA and RNA Modii cation Enzymes

chapter. In Eukarya, a 5-methoxycarbonylmethyl (mcm5U) group is generally found in U34 of the cytosolic tRNA for the two-codon sets. Although the enzymes catalyzing the eukaryotic modiications are still poorly characterized,21,22 the properties of the 5-substituents are proposed to be basically similar to those of Bacteria.23,24

Biosynthesis of 5-Aminomethyl-Uridine Derivatives

he xm5 modiications at U34 in tRNA result from a cascade of enzymatic reactions, involving several distinct enzymes (Fig. 2). First, MnmE and GidA are involved in the conversion of unmodiied uridine to 5-carboxymethylaminomethyl-uridine (cmnm5U), using a tetrahydrofolate (THF) derivative and glycine as donor cofactors.25-27 In the mitochondrial tRNA of higher animals, taurine is incorporated instead of glycine, thus producing 5-taurinomethyluridine (τm5U34, also named tm5U34).28 his cmnm5U (or τm5U) modiication is one of the rare examples in which an amino acid is incorporated into a nucleotide modiication. Among the more than one-hundred kinds of modiications in nucleic acids, only three examples of aminoacyl-modiied bases have been found thus far, besides xm5U34. hese are all hyper modiications in the anticodon loop of tRNA and they include k2C34 (lysine),29-31 GluQ34 (glutamine, see the chapter by Giegé and Lapointe in this book)32,33 and N6-carbamoyladenosine derivatives (t6A37: threonine, g6A37: glycine and hn6A37: 3-hydroxynorvaline).34-36 Similarly, methionine is oten incorporated within tRNA, via the α-aminobutyric acid moiety of the S-adenosyl-L-methionine (AdoMet or SAM) cofactor (for example, yW37 and acp3U47, see the chapter by Urbonavičius et al).37,38 hese indings suggest a relationship between amino acids and tRNA within the genetic codes in early life. he detailed mechanistic reactions by MnmE and GidA have not been characterized yet, since the in vitro reaction has not been achieved. It is also unclear how many steps precede the formation of cmnm5U34. At this point, it is just known that the modiication at position 5 of the base occurs independently of the thiolation at position 2 and the 2ʹ-O-methylation.25,39 Despite the importance of the C-5 modiication for the genetic code, the null mnmE and gidA mutations are not lethal (but slow growth) in some E. coli strains. However, this conclusion depends on the genetic background and the tRNA content, by reason of their synthetic lethalities.25-27,39,40 he tRNAs from mutants carrying mnmE or gidA mutations were shown to contain hypomodiied s2U34 instead of fully modiied mnm5s2U34 and therefore, no intermediates were detected.25,41,42 he methylene group adjacent to the C-5 atom of uracil arises from a THF derivative, since only one of the two-carbon atoms present in the fully modiied mnm5s2U34 originates from AdoMet and the irst step in the synthesis of the mnm5 side chain is not an AdoMet-dependent methylation.25,43,44 However, the type of THF cofactor is still unclear. Two conlicting ideas have been proposed for the reaction mechanisms, which are based on the results of null mutagenesis of enzymes. First, the MnmE activity precedes that of GidA in the MnmE/GidA pathway.27,45 his idea comes from diferences in the growth rates between E. coli mnmE and gidA null mutants. he frameshit frequency of tRNAArg , which has the mnm5UCU anticodon in the wild type, in the double (null gidA and mnmE) mutant is the same as that in the single mnmE mutant, but is signiicantly lower than the frequency in the single gidA mutant.27 For this reaction model, 5-formyl-THF in MnmE is proposed as the C1 (one-carbon) donor, followed by the incorporation of glycine and the reduction of GidA for a Schif ’s base intermediate with FAD.45 he other proposal is that MnmE and GidA form a functional complex in which both proteins are interdependent, from the conlicting observation in which no diferences were detected between the growth rates of mnmE and gidA mutants in minimal medium.46 his idea is also supported by an experiment where no intermediate with 5-substituents was observed in the gidA null mutant. However, one reason for the lack of detectable intermediates might be that such intermediates are toxic in the cells and are immediately eliminated. he cmnm5U34 functions in the translation system in the recognition of the two-codon sets and in frameshit prevention. However, in the tRNAs speciic for glutamine, lysine, glutamate and arginine, the cmnm5 group is further modiied to an mnm5 group, probably for enhanced stabilization of the modiied group and for further eiciency in translation.8 he bifunctional

Enzymatic Formation of 5-Aminomethyl-Uridine Derivatives in tRNA

409

Figure 2. Cascade enzymatic reactions leading to mnm5U formation in Bacteria: A) mnm5U cascade in Bacteria. B,C,D) Schematic organization of the functional domains in MnmE and Ras (B), GidA and TrmFO (C) and MnmC and DUF752 (D). Each domain is represented by light and medium gray patches, respectively. Conserved-sequence motifs are shown in dark gray. THF in (B) shows the tetrahydrofolate-binding domain. G1 to G4 in (B) are the four conserved sequence motifs in the G-domain. CXGK and CAAX in (B) are the conserved C-terminal motifs in MnmE and small GTP-binding proteins, respectively. DBM in (C) is the dinucleotide binding motif in the Rossmann fold (Rf in the figure). For MnmC2 and MnmC1 in (D), refer to the text.

enzyme MnmC, in E. coli, catalyzes the inal two steps in mnm5U biosynthesis from cmnm5U.44 he intermediate cmnm5U is irst transformed to 5-aminomethyl uridine (nm5U) with FAD, by removal of the carboxymethyl (acetic acid) group and subsequently is methylated to mnm5U, using the methyl group of an AdoMet cofactor. Methyl-deicient tRNA from methionine-starved cells contains the undermodiied derivative nm5U in addition to cmnm5U, suggesting that the two reaction steps occur independently (see Fig. 2A). he intermediate nm5U can be detected in E. coli and thus it is not toxic. he nm5 group might partially contribute to preventing frameshits, because no mnmC mutants were detected in the frameshit reporter system.27,47 he mechanism of the MnmC enzyme has been well characterized, since the successful in vitro reaction was reported, using puriied MnmC and its mutants.44,48 he details of the enzymatic characteristics of MnmE, GidA and MnmC will be described in the following sections.

Structure and Mechanism of the MnmE Enzyme

he o454 gene, encoding MnmE in E. coli, the main protein participating in cmnm5U formation, was irst assigned to the trmE gene, involved in tRNA modiication.25 his gene is allelic

410

DNA and RNA Modii cation Enzymes

Figure 3. Structure of MnmE from Thermotoga maritima. A) The crystal structure of MnmE from T. maritima at 2.3 Å, comprising the three-domain structure represented in ribbon (left) and surface (right) models (PDB code, 1XZP).45 The disordered domains in molecule B were replaced with the corresponding domains of molecule A. B) Model for the activation of MnmE.67 The potassium-dependent dimerization of the G-domains during the GTP hydrolysis transitional state could influence the helical domains. A color version of this image is available at www.landesbioscience.com/curie.

with thdF (thiophene degradation), which was cloned independently and proposed as an E. coli gene involved in thiophene and furan oxidation.49 To avoid confusion, the genetic symbol was changed to mnmE.50 he crystal structure of MnmE from hermotoga maritima revealed that MnmE is a three-domain protein, composed of an N-terminal α/β domain, a central exclusively helical domain and a G-domain inserted into the helical domain (Fig. 2B and Fig. 3A).45 he N-terminal α/β domain induces dimerization and is homologous to the tetrahydrofolate-binding domain. he central helical domain is poorly conserved, except for the C-terminal motif (CXGK). Mutagenesis experiments revealed that the cysteine residue in this motif, which is the only cysteine in MnmE, is essential for the tRNA modiication activity.51 In the structure, the G-domain, which is responsible for GTP binding and hydrolysis, is loosely connected to the other domains of MnmE. he nuclear-encoded mitochondrial MSS1 and GTPBP3 proteins are the MnmE homologues in yeast and human, respectively.52,53 GTPBP3 malfunction has been implicated in human mitochondrial diseases, such as MELAS and MERRF.54-56 he MSS1 mutants have the same C-5 modiication defect of the wobble U34 in mitochondrial tRNAs, as in the null MnmE mutants in E. coli.57,58 herefore, the mitochondrial proteins are evolutionarily conserved with bacterial MnmE in both the sequence and function. Based on their sequences, the proteins may also be GTPases.

G-Domain

MnmE is one of 11 universally conserved GTPases in bacteria (EF-G, EF-Tu, IF-2, LepA, Era, Obg, MnmE, Fh, FtsY, EngA and YchF), with functions elicited by interactions with RNA and/ or ribosomes.59 Eukaryotes, on the other hand, have large families of GTPases that are important regulators of membrane signaling pathways.60 he common property shared by the GTPases is the presence of a structural module, the G-domain, which mainly functions as a molecular switch between GTP-bound and GDP-bound conformations.61,62 his conformational switch is crucial for the functions of all GTPases.

Enzymatic Formation of 5-Aminomethyl-Uridine Derivatives in tRNA

411

he bacterial GTPases can be assigned to four main ancestral groups: the elongation factor subfamily, the Era subfamily, the FtsY/Fh subfamily and the Obg subfamily.59 Among them, MnmE belongs to the Era subfamily, as judged from the sequence homology. Era is an essential small G protein that binds to the 30S ribosomal subunit in E. coli.63 EngA, another member of the Era subfamily, interacts with the ribosomal protein S7, suggesting its involvement in ribosomal maturation.64 MnmE, the last member of the Era subfamily, is the only enzyme with GTPase activity, among all of the RNA modiication enzymes identiied thus far. It was reported recently that Salmonella YhbZ (ObgE/Obg in E. coli) speciically interacts with the ribosomal pseudouridine synthase RluD and may deliver the rRNA modiication enzyme to the appropriate region of the ribosome.64 herefore, Obg might be a helper protein for an RNA modiication enzyme. However, MnmE and Obg may independently participate in RNA modiication functions in phylogeny, since they are ailiated with diferent GTPase subfamilies. here are four conserved sequence motifs (G1-G4) in the G-domain (Fig. 2B). he G1, G3 and G4 motifs are important for the GTP-binding activities.65 he G2 motif is in the switch-I region, which is highly conserved within each GTPase subfamily but not between diferent subfamilies and is involved in GTP hydrolysis, rather than GTP binding.59,61,65 A threonine residue is the only invariant G2 residue between the subfamilies. he mechanistic role of the G-domain of MnmE in cmnm5U formation is unknown. However, it was proposed that the conformational change occurring with GTP hydrolysis promotes the tRNA modiication reaction and is probably important for turnover, since mutational analyses of the GTPase domain indicated that MnmE is more similar to the classical GTPases than the GTP-speciic metabolic enzymes.51 he mutational analyses also revealed that efective GTP hydrolysis by MnmE and not simply GTP binding, is necessary for tRNA modiication.42,51 In the dimer form of the full-length MnmE protein, the G-domains are close together, with the putative nucleotide-binding sites facing each other.45 he pocket for GTP binding shares the conserved three-dimensional distribution observed in other GTPases. he G-domain of MnmE exhibits the same level of GTPase activity as the full-length, intact protein.40 he crystal structure of the truncated G-domain with GDP and aluminum luoride (a γ-phosphate analogue),66 revealed that the G-domain is associated with another G-domain molecule.67 he G-domain could be a dimer in the transitional state (Fig. 3B) and this probably inluences the conformational change of the entire protein. he G2 motif in the switch-I region plays a role in stabilizing the transition state of MnmE. Mutations in the G2 motif, leading to a minor loss of the GTPase activity, resulted in a nonfunctional MnmE protein.42 herefore, the conformational change of the switch-I region associated with GTP hydrolysis seems to be crucial for the function of MnmE. he invariant threonine of the G2 motif would be essential for such a change, because it cannot be substituted by serine. he structures of the MnmEs from E. coli and T. maritima revealed that the G-domain has the canonical Ras-like fold, with no insertion or deletion of secondary structural elements.45,67,68 However, MnmE difers extensively from the Ras proteins as well as from other GTPases, such as translation factors, by its very high intrinsic GTP hydrolysis rate, rather low ainity for GTP and extremely low ainity for GDP.40,51,69 he lower GTP ainity of MnmE may be a consequence of the shorter P-loop in the G1 motif than that in Ras21.68 MnmE binds GTP/GDP with micromolar ainity and therefore, the recombinant protein is usually obtained as the apo-form.45 In contrast to other GTPases, MnmE does not require auxiliary factors such as GAP (GTPase-activating protein), despite its high hydrolase activity.40 In addition, MnmE, unlike other GTPases, does not use an “arginine inger” to drive catalysis, which was previously thought to be the case for all GTP-binding proteins. Instead, an arginine in MnmE, corresponding to the arginine inger of the normal GTPases, may play a role in stabilizing the transition state.67,68 he hydrolysis reaction by MnmE from T. maritima is stimulated by potassium ions, which is a special characteristic of MnmE among all GTPases.69 he G-domains of MnmE dimerize in a potassium-dependent manner and induce GTP hydrolysis.67 Potassium provides a positive charge in the catalytic site, in a position analogous to the arginine inger. his is the reason for the high intrinsic activity of MnmE.

412

DNA and RNA Modii cation Enzymes

Tetrahydrofolate-Binding Fold

One-carbon (C1) metabolism by folate coenzymes plays an essential role in various cellular processes.70 he N-terminal domain of MnmE is involved in dimerization and is structurally homologous to the THF-binding domains of DMGO (N,N-dimethylglycine oxidase), T-protein (aminomethyltransferase) and Ygf Z (an enzyme involved in ms2i6A37 formation in tRNA), although the primary structures share no homology with each other.71-74 he THF-molecule is located on the center of the 2-layer β-sandwich composed of two Greek key motifs within these THF-binding folds. In the case of DMGO as well as T-protein and YgfZ, the two β-sheets are composed of the two domains of a single chain.71-73 However, for MnmE, the N-terminal domain of the second molecule corresponds to the second β-sheet of the sandwich structure, which forms a tight dimer with the irst molecule (Fig. 3A). herefore, homodimerization of MnmE would be required to retain the THF cofactor(s). Folinic acid (5-formyl-THF), which was introduced into the crystal by soaking, resides at the periphery of the dimer interface of the MnmE crystal structure.45 he MnmE dimer binds two molecules of the THF cofactor, although the stoichiometry determined from a solution assay is less than unity. Solution studies have demonstrated that MnmE has submicromolar-binding ainity for 5-formyl-THF.45 his is rather weak binding, as compared to that of other enzymes with the same fold.73 Folates are based on pteroic acid (PTA) conjugated to one or several glutamate units. he ligand is bound between the two β-sheets, with the pterin group of PTA perpendicular to the β-sheets. he pterin group is stabilized by double hydrogen bonds with the conserved glutamic acid, as in T-protein and partially in DMGO. A conserved arginine in MnmE directly stabilizes the carbonyl group of the pteridin ring, whereas in DMGO a glutamate, instead of the arginine, indirectly binds to the carbonyl position via a water bridge. his THF-binding fold family enhances the nucleophilic character of the THF N10 position.70 he catalytic mechanism within the THF domain of MnmE is unknown, but the N10 position closely contacts an acidic or amide amino acid conserved in the MnmE family, which might assume the role of the catalytic aspartate in DMGO. he need for some conformational change was suggested, since the donor C1 group in the N5 position is oriented toward the inside of the rigid body.45 On the other hand, the glutamate portion of THF is close to the surface of the enzyme. his coniguration enables MnmE to accept a variable length (1-8) of glutamate residues of the THF cofactors in cells, without steric hindrance.

Structure and Mechanism of the GidA Enzyme

he gene encoding GidA was irst isolated in association with a glucose-inhibited division phenotype of E. coli.75,76 Disruption of gidA (gid at irst) in E. coli delays cell division, but only when cells are grown on glucose. his may result from a pleiotropic phenotype due to translational control through hypomodiied tRNAs. he gene gidA is allelic with trmF,27 which is the gene involved in the mnm5s2U34 modiication in E. coli.25 he gene symbol ‘gidA’ was also designated as mnmG, as in trmE for mnmE, but researchers still currently use the symbol gidA, to avoid confusion. he “G” of mnmG is inconsistent with trmF. It is not a G protein like MnmE, while MnmA is a diferent modiication enzyme involved in the thiolation of the O2 group of U34 (for details, see the chapter by Noma et al in this book). he gidA genes, like the mnmE genes, are well conserved among a wide range of bacteria. Human and yeast possess a GidA homologue, MTO1 (mitochondrial translation optimization protein 1).77 MTO1 functions in the biosynthesis of the cmnm5 or τm5 group in the wobble U34 of mitochondrial tRNA,58 in connection with MSS1/ GTPBP3, mitochondrial orthologues of MnmE. A shorter GidA-related protein (GidR, in Fig. 2C; also designated as GidAsmall) has been identiied in the genomes of bacteria belonging to the Deinococcus-hermus phylum. GidR is ca. 230 amino acids (aa), as compared to the approximately 650 aa of GidA. Although the structure of GidR revealed an evolutionary relationship with GidA, the enzymatic function of GidR is still unknown.78 he mechanistic function of GidA is still elusive, but some indications in the scientiic literature may help to clarify its function. GidA is an FAD-binding lavoprotein, and disruption of the

Enzymatic Formation of 5-Aminomethyl-Uridine Derivatives in tRNA

413

Figure 4. Structure of GidA from Aquifex aeolicus. A) The crystal structure of GidA from A. aeolicus at 2.9 Å. B) Schematic view of the GidA dimer attached to tRNA together with FAD, in the vicinity of U34 of the anticodon loop of docked tRNA. A color version of this image is available at www.landesbioscience.com/curie.

N-terminal dinucleotide binding motif (DBM, GXGXXG) reduces the ability of the enzyme to bind FAD and modify tRNA.46 GidA behaves as a homodimer in solution and physically interacts with MnmE, suggesting that MnmE and GidA form an α2β2 heterotetrameric complex.46 he three crystal structures of GidA (E. coli, Chlorobium and Aquifex) revealed that the noncovalently, tightly-bound FAD is a genuine cofactor (ref. 79 and Bessho et al unpublished data) (Fig. 4). he overall fold of GidA is consistent with a global structure encompassing three domains. he main domain belongs to an FAD-binding domain with the classical Rossmann fold, which is characteristic of a dinucleotide-binding fold. he second α/β domain is inserted between two strands of the Rossmann fold (Fig. 2C). he C-terminal domain is organized as an all-helical domain. A large-scale sequence and structural analysis classiied the FAD-containing proteins into four diferent FAD-family folds, exempliied by glutathione reductase (GR), ferredoxin reductase (FR), p-cresol methylhydroxylase (PCMH) and pyruvate oxidase (PO).80 Among them, the FAD domain of GidA can be categorized in the GR family (especially, the GR2 subfamily), which is characterized as proteins with the DBM mainly at the N-terminus. Since the insertion domain, in the Rossmann fold of the FAD domain, shows some similarities to the presumed NADH-binding domain of phenol hydroxylase, also a member of the GR2 subfamily, it was proposed that this domain in GidA is an NADH-binding domain.79 GidA actually binds the NADH cofactor with high speciicity, suggesting that it works as an initial donor of electrons.79 hese features of GidA suggest that this protein catalyzes an oxidation-reduction reaction.

Relationship between GidA and TrmFO

Many bacteria encode a rather short GidA homologue (about 450 aa), designated as Gid/ GidAs/GidA2, which caused misannotation with GidA (about 600 aa) in many genomic projects.81 his shorter GidA homologue has been renamed TrmFO, based on the observation that the protein is a folate-dependent methyltransferase involved in forming m5U54 (rT54), which is a ubiquitous modiication in the T loop of tRNA,82-84 (see also the chapter by Myllykallio et al in this book). TrmFO difers from TrmA (Trm2p in Eukarya), which also catalyzes the C-5 methylation of U54, but uses AdoMet as the methyl donor. TrmFO and TrmA have mutually exclusive phylogenetical

414

DNA and RNA Modii cation Enzymes

distributions. Indeed, trmFO is never found in bacterial genomes containing trmA and vice versa.85 A phylogenetic analysis indicated that the GidA and TrmFO protein families evolved from a common ancestor, but acquired diferent, non-overlapping cellular functions during evolution.85 Both enzymes have the GIDA (PF01134) domain in the Pfam database (release 23.0). he length diferences between GidA and TrmFO arise from the additional C-terminal sequence of GidA, which corresponds to the C-terminal helical domain (Fig. 2C). his GidA-speciic domain might function to interact with MnmE.46 TrmFO is also an FAD-binding protein, which has the DBM in the same N-terminal region as GidA.84,86 he puriied TrmFO from Bacillus subtilis reportedly catalyzed the methyltransferase reaction in vitro, using N5,N10-methylene-THF (CH2=THF) as the C1 donor and NAD(P)H/FAD as the reductant.84 As far as the chemical reaction is concerned, TrmFO and hyX (thymidylate synthase, dUMP to dTMP) catalyze a very similar type of methylation reaction, although they evolutionarily originated from completely diferent families of lavoproteins.87 Indeed, hyX also uses CH2=THF as a C1 donor and NADH/FAD as a reductant.88 he detailed mechanistic enzyme reaction of TrmFO has not been characterized yet and neither mutant experiments nor a crystal structure of TrmFO have been reported. TrmFO should have a THF-binding fold, but the region is still uncharacterized. he THF-binding fold in hyX lacks structural similarity with the THF-binding domain of MnmE.89 It is interesting that some similarities exist between the GidA/MnmE and TrmFO reactions. Both use a THF derivative as a carbon donor and NAD(P)H/FAD for reduction or oxidation (see also the discussion in the chapter by Myllykallio et al in this book).

Enzymatic Interdependence between MnmE and GidA

As indicated above, a functional link exists between MnmE and GidA. In many diverse bacteria, the genes encoding MnmE and GidA are in the same operon and usually in the order mnmE-gidA. In E. coli, the gidA gene is just 40 kb away from mnmE, with a linkage by an inverted sequence of the chromosome.59 he puriied MnmE and GidA proteins interact in vitro, suggesting that they form a functional complex (heterotetramer) that performs the modiication reaction.46 he C-terminal domain of GidA lacks structural similarity to any fold of known proteins in the database, suggesting that this domain was obtained in the GidA phylogeny for its functional association with MnmE. Although it had been suggested that THF might be a C1 unit donor in the modiication reaction, the question as to which oxidation state of THF is used still remains. In human mitochondria, taurine, instead of glycine, is incorporated by GTPBP3 and MTO1, which are the orthologous enzymes of bacterial MnmE and GidA.28 herefore, it was proposed that both catalyze the formation of an unknown intermediate and the subsequent activity of a taurine or glycine transferase is responsible for the construction of the τm5 group in humans, or the cmnm5 group in bacteria.58 Although no such intermediates have been found in the tRNAs of any mutants, they might be toxic and decomposed rapidly, or covalently bound to the enzyme during the reaction. Wittinghofer and coworkers postulated that MnmE catalyzes the transfer of the C1 unit from 5-formyl-THF to position C-5 of uracil.43,45,51 hey found 5-formyl-THF in the crystal structure of MnmE, although the cofactor was introduced by soaking.45 Since the pteridine ring binding pockets are conserved between DMGO and MnmE (see above), the addition of the cmnm group to C-5 of U34 could occur through a mechanism similar to that proposed for the known pyrimidine C-5 modifying enzymes, such as hyA (thymidylate synthase A), TrmA and RlmD (formerly RumA), which use a catalytic cysteine to activate pyrimidine C-5, by forming an enolate intermediate, for nucleophilic attack.90-92 he C-6 atom of the target uridine is covalently attached to the catalytic sulhydryl group of a cysteine residue.93,94 he essential conserved cysteine, located close to the 5-formyl-THF in the crystal structure, might form a covalent adduct via the C-6 position of uracil by a nucleophilic attack.45,51 However, this is diferent from the case of hyX, in which the C-6 activating residue is a serine. he substitution of the only cysteine by serine in MnmE results in the absolute null modiication in the C-5 position of U34 in vivo, although the GTPase activity of the mutant retains that of the wild-type MnmE protein.51

Enzymatic Formation of 5-Aminomethyl-Uridine Derivatives in tRNA

415

he C(I/L/V)GK sequence at the C-terminus of MnmE matches the CAAX motif (where A represents an aliphatic residue and X represents any residue, see also Fig. 2B) characteristic of the Ras proteins and of the isoprenylation that anchors small GTP-binding proteins to cell membranes in eukaryotic cells.61,95 Subcellular fractionation followed by immunoblotting, as well as immunoelectron microscopy, indicated that MnmE is localized in the cytoplasm, with a signiicant amount at the inner membrane.40,51 herefore, MnmE could be multifunctional, in vivo. he C-terminal cysteine might be involved in the membrane association of MnmE, as it occurs with other GTPases containing C-terminal cysteines.51 Puriied recombinant TrmFO catalyzes the CH2=THF-dependent formation of 5-methyluridine at position 54 of tRNA in vitro. In many thermophilic organisms, m5U54-containing tRNAs are further thiolated to produce m5s2U54 (or s2T54), while in certain Eukarya, a 2ʹ-O-methyl-derivative is occasionally found (m5Um54). his situation is also similar to the hypermodiication at the C-5 position of U34, as U54 is located in a loop region of tRNA (see the chapter by Noma et al in this book). However, GidA must be associated with MnmE to catalyze the formation of cmnm5U at position 34. hus, the GidA complexed with MnmE seems to represent a new evolutionary way, as compared to TrmFO, to use a folate derivative and FAD for a diferent arm of the tRNA. GidA might have lost the THF-binding domain because of its evolved cooperation with MnmE, or alternatively, TrmFO might have gained the THF-binding domain. Whatever the evolutionary solution was, the reason why glycine/taurine became incorporated into uracil by MnmE/GidA, instead of a simple methyl group as in the case of hyX/TrmFO, remains an interesting question. he development of an in vitro assay using a recombinant MnmE/GidA complex will hopefully clarify the molecular mechanism of the MnmE/GidA-dependent pathway.

Mechanistic Features of the Bifunctional Enzyme, MnmC

he enzymatic activity of the MnmC bifunctional enzyme (Fig. 2A,D) was irst identiied in an E. coli cell extract, using methyl-deicient tRNAGlu as a substrate.43,96 he two E. coli mutants, trmC1 and trmC2, which have no efect on the U34-thiolation reaction, are both defective in the synthesis of the mnm5U modiication.97-99 tRNA puriied from the trmC2 mutant has nm5s2U nucleosides (see Fig. 2A) and incorporates a methyl group, as determined by using the puriied enzyme and AdoMet as a donor, suggesting that trmC2 is a methyltransferase for the conversion of nm5s2U34 to mnm5s2U34.44,100 On the other hand, the trmC1 mutant has cmnm5s2U, instead of mnm5s2U, which is converted to nm5s2U by the puriied enzyme in the absence of AdoMet, suggesting that the enzymatic activity of trmC1 precedes that of trmC2.44 In E. coli, the genes corresponding to trmC1 and trmC2 are located on the same region of the genome and it was subsequently found that they correspond to a single ORF expressed as a fusion protein. herefore, in E. coli, trmC1 and trmC2 have been combined into trmC and renamed mnmC, as in other mnm genes.50 he yfcK ORF of E. coli was assigned to mnmC by a computational analysis.101 he sequence of E. coli MnmC shows similarity to the AdoMet-binding sites of MTases in the N-terminal domain and to the FAD-dependent oxidoreductases in the C-terminal domain (Fig. 2D).101,102 he C-terminal domain of MnmC is closely related to the FAD-linked oxidoreductases, especially the glycine/D-amino acid oxidases (GO/DAAO).48,80 he individual domains in MnmC retain independence as enzymes. he enzymatic conversion of cmnm5U to nm5U occurs without the participation of any cofactor and does not require any external energy.44 hese observations suggested an enzymatic mechanism for the FAD-dependent demodiication, in which a glyoxylic acid, an intermediate of the glyoxylate cycle, is eliminated (Fig. 5A).101,103 he N-terminal domain of MnmC, leading to mnm5U from nm5U, is an AdoMet-dependent MTase. he enzymatic activities are stimulated by NH4+, but are severely inhibited by Mg2+ ions.43,44 Most tRNA-modifying enzymes are either stimulated or do not respond to this ion. hus, the MTase activity of MnmC is unique in its sensitivity toward Mg2+ ions.44 he fused MnmC in E. coli, with its oxidoreductase and MTase domains, is conserved only in γ-proteobacteria, with a few additional members.101 he truncated N-terminal homolog protein (DUF752) is widely conserved in bacteria, but its enzymatic function is still unknown. he crystal

416

DNA and RNA Modii cation Enzymes

Figure 5. Cascade enzymatic reactions leading to mnm5U formation from cmnm5U. A) The proposed reaction mechanism of MnmC1,101 based on the FAD-dependent oxidoreductases and the glycine/D-amino acid oxidases.102,103 B) Structure of MnmC2 (DUF752) from Aquifex aeolicus complexed with AdoMet cofactor at 2.5 Å. A color version of this image is available at www.landesbioscience.com/curie.

structure of the DUF752 protein from Aquifex aeolicus revealed that this enzyme resembled a typical Rossmann-fold methyltransferase (RFM), especially an N-MTase (Fig. 5B, Bessho et al, unpublished), as in Trm1p m22G26 methyltransferase.104 In contrast, the closest homolog of MnmC1 seems to be very highly diverged and probably corresponds to a paralogue with a diferent function.101 he tertiary structure of the MnmC1 domain awaits an annotation based on structural similarity. he E. coli fusion MnmC may have a functional advantage, due to the spatial proximity of both domains. he N-terminal MnmC2 domain is capable of independent folding; however, the folding of the C-terminal MnmC1 domain requires the N-terminal domain, in E. coli.48 Although nm5U34 partially functions in translation (decoding, avoiding frameshits) and it is not toxic in vivo, the mnm5U modiication is much more eicient for both translational events.8

Evolutionary Aspects of the U34-Modiication Metabolism

An analysis of the phylogenetic distribution of the genes in 5-aminomethyl-uridine biosynthesis in the completely sequenced prokaryotic genomes revealed signiicant diversity in the structure of the pathway. In E.coli, the thiolated derivative, mnm5s2U, is present in the tRNAs speciic for Gln, Lys and Glu, whereas the nonthiolated derivative, cmnm5Um34, is present in Leu and mnm5U34 is found in Arg.105-107 his shows that the 2-thiolation of U34 occurs in tRNAs with U35 and the 2ʹ-O-methylation of U34, as well as C34, is performed in tRNA with A36. However, 5-carboxymethylaminomethylation (cmnm) by MnmE/GidA occurs in tRNAs with various patterns of anticodons. he discrimination of tRNAs by MnmE/GidA enzymes remains a puzzling problem. Since tRNALeu has the cmnm5 group in U34, MnmC seems to discriminate tRNA at U35 or C35. he mnm5U derivative appears to increase the pairing stability with G, as compared to the cmnm5U modiication in pyrimidine 35 of the anticodon.8 Gram-positive bacteria (such as B. subtilis) lack the mnmC1 and mnmC2 genes and the cmnm5s2U and cmnm5U(m) nucleotides are found in the tRNAs of these organisms.108,109 his may relect the unique properties of the corresponding tRNAs during translation on the ribosome. In addition, mitochondria have no

Enzymatic Formation of 5-Aminomethyl-Uridine Derivatives in tRNA

417

MnmC-related enzyme. Yeast mitochondrial MSS1p and MTO1p act together, as in the case of bacterial MnmE/GidA pathway.57 hey have been incorporated into the nuclear genome from an ancient mitochondrial gene derived from α-proteobacteria (endosymbiosis). Notably, fungal and animal mitochondria, as well as Mycoplasma (Gram-positive bacteria), have an abnormal tRNATrp with cmnm5U(m)34 (or τm5U34), which recognizes UGA as a tryptophan codon, instead of a stop codon.108,110-113 his is the 6th purine-ending two-codon set sharing near-cognate codons with Cys. To prevent misrecognition of cysteine codons, this U34-containing tRNATrp, newly evolved from the duplication of the gene encoding C34-containing tRNATrp, has to be recognized by MnmE/GidA enzymes. Higher animal mitochondria have 5-taurinomethyl-uridine, instead of 5-carboxymethylaminomethyl-uridine.28 In this case, during evolution, MSS1/MTO1 had to simply change its speciicity for a new, small-molecular substrate (glycine to taurine); it did not need to change its tRNA speciicity. In E. coli, tRNAGlyUCC reportedly recognizes only GGA and GGG codons.114 he modiied U34 base in tRNAGly is mnm5U34 (in E. coli) and cmnm5U34 (in Bacillus).115,116 hus, this codon box is poised to evolve into the 7th two-codon set, governed by the MnmE/GidA system, for the eventual incorporation of a distinct amino acid from Gly in the present day four-codon family box, based on the codon-capture theory in genetic codes.117-119 As far as archaeal U34-containing tRNAs are concerned, the compound mnm5s2U is reportedly present in tRNA from archaeal Methanococci species,20 although Archaea lack orthologues of mnmE and gidA.46,120 hus, it seems that if ancient mnmE and gidA homologues existed in Archaea, then divergent evolution between Bacteria and Archaea has produced proteins with very low homology.46 Instead, a gene encoding an MnmC2-type of AdoMet-dependent methyltransferase seems to exist in Methanococci, although MnmC1 is missing, as in some bacteria harboring a truncated MnmC2. A halophilic archaeon, Haloferax volcanii, lacks the bacterial type of mnm5 modiication, but it may have an as-yet unidentiied, new derivative of the eukaryal mcm5 modiication, as indicated by genome analyses of related enzymes.121,122 he archaeal U34 modiication of the two-codon sets is phylogenetically a mosaic of the bacterial and eukaryal types, which might have originated from horizontal gene transfer of related genes.20 Clariication awaits the phylogenetic approach of modomics in the Archaeal domain. Last but not least, the mnm5 group (but not the s2 group) is a positive determinant for some aminoacyl tRNA synthetases (ARS) from E. coli (see the chapter by Giegé and Lapointe in this volume).107 he ARS identities for xm5U34 should have coevolved with the discrimination of tRNA by modiication enzymes for proper decoding.

Conclusions and Future Prospects

he detailed reaction mechanism of the enzyme complex MnmE/GidA awaits complete elucidation, including the types and functions of cofactors, such as FAD and THF, as well as how the MnmE-GTPase is utilized for the modiication reaction. In addition, the manner by which MnmE/ GidA discriminates among the tRNA population, the cognate U34-tRNA substrates belonging to the two-codon sets, needs more systematic biochemical, genetic and structural investigations. It is important for solving the profound puzzle of the genetic codes, as well as for understanding how the modiication machinery emerged during the early evolution of life. he discrimination of tRNA for cmnm5U formation is complicated, in terms of the various patterns of anticodons, including tRNALeu(UAA), Gln(UUG), Lys(UUU), Glu(UUC), Trp(UCA), Arg(UCU), Gly(UCC) and suppressor tRNAs(UUA). GidA is certainly an ancient protein and its similarity to the paralogue TrmFO raises interesting questions about which one emerged irst, with both using THF, but apparently for distinct purposes. Many bacteria and archaea lack MnmC1. MnmC2 has a typical AdoMet-dependent MTase. he mechanism that generates hypomodiied nm5U is still unknown, but it might be related to that of the oxidoreductases.

Acknowledgements

We would like to thank Drs. H. Myllykallio, D. Brégeon, S. Osawa, G. Björk and H. Grosjean for valuable discussions. his work was supported by the Targeted Proteins Research Program (TPRP) from the Ministry of Education, Culture, Sports, Science, and Technology of Japan.

418

References

DNA and RNA Modii cation Enzymes

1. Yokoyama S, Nishimura S. Modiied nucleosides and codon recognition. In: Söll D, Rajbhandary UL, eds. tRNA: Structure, Biosynthesis and Function. Washington, DC: American Society for Microbiology, 1995:207-223. 2. Yokoyama S, Yamaizumi Z, Nishimura S et al. 1H NMR studies on the conformational characteristics of 2-thiopyrimidine nucleotides found in transfer RNAs. Nucleic Acids Res 1979; 6(7):2611-2626. 3. Yokoyama S, Watanabe T, Murao K et al. Molecular mechanism of codon recognition by tRNA species with modiied uridine in the irst position of the anticodon. Proc Natl Acad Sci USA 1985; 82(15):4905-4909. 4. Kawai G, Yamamoto Y, Kamimura T et al. Conformational rigidity of speciic pyrimidine residues in tRNA arises from posttranscriptional modiications that enhance steric interaction between the base and the 2ʹ-hydroxyl group. Biochemistry 1992; 31(4):1040-1046. 5. Sakamoto K, Kawai G, Watanabe S et al. NMR studies of the efects of the 5ʹ-phosphate group on conformational properties of 5-methylaminomethyluridine found in the irst position of the anticodon of Escherichia coli tRNA4Arg. Biochemistry 1996; 35(21):6533-6538. 6. Agris PF. Decoding the genome: a modiied view. Nucleic Acids Res 2004; 32(1):223-238. 7. Ohashi Z, Saneyoshi M, Harada F et al. Presumed anticodon structure of glutamic acid tRNA from E. coli: a possible location of a 2-thiouridine derivative in the irst position of the anticodon. Biochem Biophys Res Commun 1970; 40(4):866-872. 8. Hagervall TG, Björk GR. Undermodiication in the irst position of the anticodon of supG-tRNA reduces translational eiciency. Mol Gen Genet 1984; 196(2):194-200. 9. Krüger MK, Pedersen S, Hagervall TG et al. he modiication of the wobble base of tRNAGlu modulates the translation rate of glutamic acid codons in vivo. J Mol Biol 1998; 284(3):621-631. 10. Kurata S, Weixlbaumer A, Ohtsuki T et al. Modiied uridines with C5-methylene substituents at the irst position of the tRNA anticodon stabilize U-G wobble pairing during decoding. J Biol Chem 2008; 283(27):18801-18811. 11. Lustig F, Elias P, Axberg T et al. Codon reading and translational error. Reading of the glutamine and lysine codons during protein synthesis in vitro. J Biol Chem 1981; 256(6):2635-2643. 12. Inagaki Y, Kojima A, Bessho Y et al. Translation of synonymous codons in family boxes by Mycoplasma capricolum tRNAs with unmodiied uridine or adenosine at the irst anticodon position. J Mol Biol 1995; 251(4):486-492. 13. Durant PC, Bajji AC, Sundaram M et al. Structural efects of hypermodiied nucleosides in the Escherichia coli and human tRNALys anticodon loop: the efect of nucleosides s2U, mcm5U, mcm5s2U, mnm5s2U, t6A and ms2t6A. Biochemistry 2005; 44(22):8078-8089. 14. Vendeix FAP, Dziergowska A, Gustilo EM et al. Anticodon domain modiications contribute order to tRNA for ribosome-mediated codon binding. Biochemistry 2008; 47(23):6117-6129. 15. Gustilo EM, Vendeix FAP, Agris PF. tRNA’s modiications bring order to gene expression. Curr Opin Microbiol 2008; 11(2):134-140. 16. hompson RC. EFTu provides an internal kinetic standard for translational accuracy. Trends Biochem Sci 1988; 13(3):91-93. 17. Ninio J. Multiple stages in codon-anticodon recognition: double-trigger mechanisms and geometric constraints. Biochimie 2006; 88(8):963-992. 18. Farabaugh PJ, Björk GR. How translational accuracy inluences reading frame maintenance. EMBO J 1999; 18(6):1427-1434. 19. Urbonavičius J, Qian Q, Durand JMB et al. Improvement of reading frame maintenance is a common function for several tRNA modiications. EMBO J 2001; 20(17):4863-4873. 20. McCloskey JA, Graham DE, Zhou S et al. Post-transcriptional modiication in archaeal tRNAs: identities and phylogenetic relations of nucleotides from mesophilic and hyperthermophilic Methanococcales. Nucleic Acids Res 2001; 29(22):4699-4706. 21. Huang B, Johansson MJO, Byström AS. An early step in wobble uridine tRNA modii cation requires the elongator complex. RNA 2005; 11(4):424-436. 22. Kalhor HR, Clarke S. Novel methyltransferase for modiied uridine residues at the wobble position of tRNA. Mol Cell Biol 2003; 23(24):9283-9292. 23. Björk GR, Huang B, Persson OP et al. A conserved modiied wobble nucleoside (mcm5s2U) in lysyl-tRNA is required for viability in yeast. RNA 2007; 13(8):1245-1255. 24. Johansson MJO, Esberg A, Huang B et al. Eukaryotic wobble uridine modiications promote a functionally redundant decoding system. Mol Cell Biol 2008; 28(10):3301-3312. 25. Elseviers D, Petrullo LA, Gallagher PJ. Novel E. coli mutants deficient in biosynthesis of 5-methylaminomethyl-2-thiouridine. Nucleic Acids Res 1984; 12(8):3521-3534. 26. Nakayashiki T, Inokuchi H. Novel temperature-sensitive mutants of Escherichia coli that are unable to grow in the absence of wild-type tRNA6Leu. J Bacteriol 1998; 180(11):2931-2935.

Enzymatic Formation of 5-Aminomethyl-Uridine Derivatives in tRNA

419

27. Brégeon D, Colot V, Radman M et al. Translational misreading: a tRNA modiication counteracts a +2 ribosomal frameshit. Genes Dev 2001; 15(17):2295-2306. 28. Suzuki T, Suzuki T, Wada T et al. Taurine as a constituent of mitochondrial tRNAs: new insights into the functions of taurine and human mitochondrial diseases. EMBO J 2002; 21(23):6581-6589. 29. Muramatsu T, Nishikawa K, Nemoto F et al. Codon and amino-acid speciicities of a transfer RNA are both converted by a single post-transcriptional modiication. Nature 1988; 336(6195):179-181. 30. Muramatsu T, Yokoyama S, Horie N et al. A novel lysine-substituted nucleoside in the irst position of the anticodon of minor isoleucine tRNA from Escherichia coli. J Biol Chem 1988; 263(19):9261-9267. 31. Kuratani M, Yoshikawa Y, Bessho Y et al. Structural basis of the initial binding of tRNA(Ile) lysidine synthetase TilS with ATP and L-lysine. Structure 2007; 15(12):1642-1653. 32. Salazar JC, Ambrogelly A, Crain PF et al. A truncated aminoacyl-tRNA synthetase modiies RNA. Proc Natl Acad Sci USA 2004; 101(20):7536-7541. 33. Blaise M, Becker HD, Keith G et al. A minimalist glutamyl-tRNA synthetase dedicated to aminoacylation of the tRNAAsp QUC anticodon. Nucleic Acids Res 2004; 32(9):2768-2775. 34. Elkins BN, Keller EB. he enzymatic synthesis of N-(purin-6-ylcarbamoyl)threonine, an anticodon-adjacent base in transfer ribonucleic acid. Biochemistry 1974; 13(22):4622-4628. 35. Körner A, Söll D. N-(purin-6-ylcarbamoyl)threonine: biosynthesis in vitro in transfer RNA by an enzyme puriied from Escherichia coli. FEBS Lett 1974; 39(3):301-306. 36. Reddy DM, Crain PF, Edmonds CG et al. Structure determination of two new amino acid-containing derivatives of adenosine from tRNA of thermophilic bacteria and archaea. Nucleic Acids Res 1992; 20(21):5607-5615. 37. Münch HJ, Thiebe R. Biosynthesis of the nucleoside Y in yeast tRNA Phe : incorporation of the 3-amino-3-carboxypropyl-group from methionine. FEBS Lett 1975; 51(1):257-258. 38. Nishimura S, Taya Y, Kuchino Y et al. Enzymatic synthesis of 3-(3-amino-3-carboxypropyl)uridine in Escherichia coli phenylalanine transfer RNA: transfer of the 3-amino-3-carboxypropyl group from S-adenosylmethionine. Biochem Biophys Res Commun 1974; 57(3):702-708. 39. Sullivan MA, Cannon JF, Webb FH et al. Antisuppressor mutation in Escherichia coli defective in biosynthesis of 5-methylaminomethyl-2-thiouridine. J Bacteriol 1985; 161(1):368-376. 40. Cabedo H, Macián F, Villarroya M et al. he Escherichia coli trmE (mnmE) gene, involved in tRNA modiication, codes for an evolutionarily conserved GTPase with unusual biochemical properties. EMBO J 1999; 18(24):7063-7076. 41. Hagervall TG, Pomerantz SC, McCloskey JA. Reduced misreading of asparagine codons by Escherichia coli tRNALys with hypomodiied derivatives of 5-methylaminomethyl-2-thiouridine in the wobble position. J Mol Biol 1998; 284(1):33-42. 42. Martínez-Vicente M, Yim L, Villarroya M et al. Efects of mutagenesis in the switch I region and conserved arginines of Escherichia coli MnmE protein, a GTPase involved in tRNA modiication. J Biol Chem 2005; 280(35):30660-30670. 43. Taya Y, Nishimura S. Puriication and properties of the tRNA methylase speciic for synthesis of 5-methylaminomethyl-2-thiouridine. In: Salvatore F, Borek E, Zappia V et al. eds. he Biochemistry of Adenosylmethionine. New York: Columbia University Press, 1977:251-257. 44. Hager vall TG, Edmonds CG, McCloskey JA et al. Transfer RNA(5-methylaminomethyl-2thiouridine)-methyltransferase from Escherichia coli K-12 has two enzymatic activities. J Biol Chem 1987; 262(18):8488-8495. 45. Scrima A, Vetter IR, Armengod ME et al. he structure of the TrmE GTP-binding protein and its implications for tRNA modiication. EMBO J 2005; 24(1):23-33. 46. Yim L, Moukadiri I, Björk GR et al. Further insights into the tRNA modiication process controlled by proteins MnmE and GidA of Escherichia coli. Nucleic Acids Res 2006; 34(20):5892-5905. 47. Brierley I, Meredith MR, Bloys AJ et al. Expression of a coronavirus ribosomal frameshit signal in Escherichia coli: inluence of tRNA anticodon modiication on frameshiting. J Mol Biol 1997; 270(3):360-373. 48. Roovers M, Oudjama Y, Kaminska KH et al. Sequence-structure-function analysis of the bifunctional enzyme MnmC that catalyses the last two steps in the biosynthesis of hypermodiied nucleoside mnm5s2U in tRNA. Proteins 2008; 71(4):2076-2085. 49. Alam KY, Clark DP. Molecular cloning and sequence of the thdF gene, which is involved in thiophene and furan oxidation by Escherichia coli. J Bacteriol 1991; 173(19):6018-6024. 50. Leung H-CE, Hagervall TG, Björk GR et al. Genetic locations and database accession numbers of RNA-modifying and -editing enzymes. In: Grosjean H, Benne R, eds. Modiication and Editing of RNA. Washington, DC: American Society for Microbiology, 1998:561-567. 51. Yim L, Martínez-Vicente M, Villarroya M et al. he GTPase activity and C-terminal cysteine of the Escherichia coli MnmE protein are essential for its tRNA modifying function. J Biol Chem 2003; 278(31):28378-28387.

420

DNA and RNA Modii cation Enzymes

52. Decoster E, Vassal A, Faye G. MSS1, a nuclear-encoded mitochondrial GTPase involved in the expression of COX1 subunit of cytochrome c oxidase. J Mol Biol 1993; 232(1):79-88. 53. Li X, Guan MX. A human mitochondrial GTP binding protein related to tRNA modiication may modulate phenotypic expression of the deafness-associated mitochondrial 12S rRNA mutation. Mol Cell Biol 2002; 22(21):7701-7711. 54. Yasukawa T, Suzuki T, Suzuki T et al. Modiication defect at anticodon wobble nucleotide of mitochondrial tRNAsLeu(UUR) with pathogenic mutations of mitochondrial myopathy, encephalopathy, lactic acidosis and stroke-like episodes. J Biol Chem 2000; 275(6):4251-4257. 55. Yasukawa T, Suzuki T, Ishii N et al. Wobble modiication defect in tRNA disturbs codon-anticodon interaction in a mitochondrial disease. EMBO J 2001; 20(17):4794-4802. 56. Kirino Y, Suzuki T. Human mitochondrial diseases associated with tRNA wobble modiication deiciency. RNA Biol 2005; 2(2):41-44. 57. Colby G, Wu M, Tzagoloff A. MTO1 codes for a mitochondrial protein required for respiration in paromomycin-resistant mutants of Saccharomyces cerevisiae. J Biol Chem 1998; 273(43):27945-27952. 58. Umeda N, Suzuki T, Yukawa M et al. Mitochondria-speciic RNA-modifying enzymes responsible for the biosynthesis of the wobble base in mitochondrial tRNAs. Implications for the molecular pathogenesis of human mitochondrial diseases. J Biol Chem 2005; 280(2):1613-1624. 59. Caldon CE, Yoong P, March PE. Evolution of a molecular switch: universal bacterial GTPases regulate ribosome function. Mol Microbiol 2001; 41(2):289-297. 60. Caldon CE, March PE. Function of the universally conserved bacterial GTPases. Curr Opin Microbiol 2003; 6(2):135-139. 61. Bourne HR, Sanders DA, McCormick F. he GTPase superfamily: a conserved switch for diverse cell functions. Nature 1990; 348(6297):125-132. 62. Kjeldgaard M, Nyborg J, Clark BFC. he GTP binding motif: variations on a theme. FASEB J 1996; 10(12):1347-1368. 63. Sayed A, Matsuyama S, Inouye M. Era, an essential Escherichia coli small G-protein, binds to the 30S ribosomal subunit. Biochem Biophys Res Commun 1999; 264(1):51-54. 64. Lamb HK, hompson P, Elliott C et al. Functional analysis of the GTPases EngA and YhbZ encoded by Salmonella typhimurium. Protein Sci 2007; 16(11):2391-2402. 65. Bourne HR, Sanders DA, McCormick F. he GTPase superfamily: conserved structure and molecular mechanism. Nature 1991; 349(6305):117-127. 66. Wittinghofer A. Signaling mechanistics: aluminum luoride for molecule of the year. Curr Biol 1997; 7(11):R682-R685. 67. Scrima A, Wittinghofer A. Dimerisation-dependent GTPase reaction of MnmE: how potassium acts as GTPase-activating element. EMBO J 2006; 25(12):2940-2951. 68. Monleón D, Martínez-Vicente M, Esteve V et al. Structural insights into the GTPase domain of Escherichia coli MnmE protein. Proteins 2007; 66(3):726-739. 69. Yamanaka K, Hwang J, Inouye M. Characterization of GTPase activity of TrmE, a member of a novel GTPase superfamily, from hermotoga maritima. J Bacteriol 2000; 182(24):7078-7082. 70. Scrutton NS, Leys D. Crystal structure of DMGO provides a prototype for a new tetrahydrofolate-binding fold. Biochem Soc Trans 2005; 33(Pt 4):776-779. 71. Leys D, Basran J, Scrutton NS. Channelling and formation of ‘active’ formaldehyde in dimethylglycine oxidase. EMBO J 2003; 22(16):4038-4048. 72. Lee HH, Kim DJ, Ahn HJ et al. Crystal structure of T-protein of the glycine cleavage system. Cofactor binding, insights into H-protein recognition and molecular basis for understanding nonketotic hyperglycinemia. J Biol Chem 2004; 279(48):50514-50523. 73. Teplyakov A, Obmolova G, Sarikaya E et al. Crystal structure of the Ygf Z protein from Escherichia coli suggests a folate-dependent regulatory role in one-carbon metabolism. J Bacteriol 2004; 186(21):7134-7140. 74. Ote T, Hashimoto M, Ikeuchi Y et al. Involvement of the Escherichia coli folate-binding protein Ygf Z in RNA modiication and regulation of chromosomal replication initiation. Mol Microbiol 2006; 59(1):265-275. 75. von Meyenburg K, Hansen FG. he origin of replication, oriC, of the Escherichia coli chromosome: Genes near oriC and construction of oriC deletion mutations. Mechanistic studies of DNA replication and genetic recombination. ICN-UCLA Symp Mol Cell Biol 1980;137-159. 76. von Meyenburg K, Jørgensen BB, Nielsen J et al. Promoters of the atp operon coding for the membrane-bound ATP synthase of Escherichia coli mapped by Tn10 insertion mutations. Mol Gen Genet 1982; 188(2):240-248.

Enzymatic Formation of 5-Aminomethyl-Uridine Derivatives in tRNA

421

77. Li X, Li R, Lin X et al. Isolation and characterization of the putative nuclear modiier gene MTO1 involved in the pathogenesis of deafness-associated mitochondrial 12 S rRNA A1555G mutation. J Biol Chem 2002; 277(30):27256-27264. 78. Iwasaki W, Miyatake H, Miki K. Crystal structure of the small form of glucose-inhibited division protein A from hermus thermophilus HB8. Proteins 2005; 61(4):1121-1126. 79. Meyer S, Scrima A, Versées W et al. Crystal structures of the conserved tRNA-modifying enzyme GidA: implications for its interaction with MnmE and substrate. J Mol Biol 2008; 380(3):532-547. 80. Dym O, Eisenberg D. Sequence-structure analysis of FAD-containing proteins. Protein Sci 2001; 10(9):1712-1728. 81. White DJ, Merod R, homasson B et al. GidA is an FAD-binding protein involved in development of Myxococcus xanthus. Mol Microbiol 2001; 42(2):503-517. 82. Delk AS, Rabinowitz JC. Biosynthesis of ribosylthymine in the transfer RNA of Streptococcus faecalis: a folate-dependent methylation not involving S-adenosylmethionine. Proc Natl Acad Sci USA 1975; 72(2):528-530. 83. Delk AS, Romeo JM, Nagle DP Jr et al. Biosynthesis of ribothymidine in the transfer RNA of Streptococcus faecalis and Bacillus subtilis. A methylation of RNA involving 5,10-methylenetetrahydrofolate. J Biol Chem 1976; 251(23):7649-7656. 84. Urbonavičius J, Skouloubris S, Myllykallio H et al. Identification of a novel gene encoding a lavin-dependent tRNA:m5U methyltransferase in bacteria—evolutionary implications. Nucleic Acids Res 2005; 33(13):3955-3964. 85. Urbonavičius J, Brochier-Armanet C, Skouloubris S et al. In vitro detection of the enzymatic activity of folate-dependent tRNA (Uracil-54,-C5)-methyltransferase: evolutionary implications. Methods Enzymol 2007; 425:103-119. 86. Delk AS, Nagle DP Jr, Rabinowitz JC. Methylenetetrahydrofolate-dependent biosynthesis of ribothymidine in transfer RNA of Streptococcus faecalis. Evidence for reduction of the 1-carbon unit by FADH2. J Biol Chem 1980; 255(10):4387-4390. 87. Myllykallio H, Lipowski G, Leduc D et al. An alternative lavin-dependent mechanism for thymidylate synthesis. Science 2002; 297(5578):105-107. 88. Graziani S, Xia Y, Gurnon JR et al. Functional analysis of FAD-dependent thymidylate synthase hyX from Paramecium bursaria Chlorella virus-1. J Biol Chem 2004; 279(52):54340-54347. 89. Mathews, II, Deacon AM, Canaves JM et al. Functional analysis of substrate and cofactor complex structures of a thymidylate synthase-complementing protein. Structure 2003; 11(6):677-690. 90. Carreras CW, Santi DV. he catalytic mechanism and structure of thymidylate synthase. Annu Rev Biochem 1995; 64:721-762. 91. Lee TT, Agarwalla S, Stroud RM. A unique RNA Fold in the RumA-RNA-cofactor ternary complex contributes to substrate selectivity and enzymatic function. Cell 2005; 120(5):599-611. 92. Alian A, Lee TT, Griner SL et al. Structure of a TrmA-RNA complex: A consensus RNA fold contributes to substrate selectivity and catalysis in m5U methyltransferases. Proc Natl Acad Sci USA 2008; 105(19):6876-6881. 93. Kealey JT, Gu X, Santi DV. Enzymatic mechanism of tRNA (m5U54)methyltransferase. Biochimie 1994; 76(12):1133-1142. 94. Carcia GA, Goodenough-Lashua DM. Mechanisms of RNA-modifying and -editing enzymes. In: Grosjean H, Benne R, eds. Modiication and Editing of RNA. Washington, DC: American Society for Microbiology, 1998:135-168. 95. Clarke S. Protein isoprenylation and methylation at carboxyl-terminal cysteine residues. Annu Rev Biochem 1992; 61:355-386. 96. Taya Y, Nishimura S. Biosynthesis of 5-methylaminomethyl-2-thiouridylate. I. Isolation of a new tRNA-methylase speciic for 5-methylaminomethyl-2-thiouridylate. Biochem Biophys Res Commun 1973; 51(4):1062-1068. 97. Marinus MG, Morris NR, Söll D et al. Isolation and partial characterization of three Escherichia coli mutants with altered transfer ribonucleic acid methylases. J Bacteriol 1975; 122(1):257-265. 98. Björk GR, Kjellin-Stråby K. General screening procedure for RNA modiicationless mutants: isolation of Escherichia coli strains with speciic defects in RNA methylation. J Bacteriol 1978; 133(2):499-507. 99. Björk GR , Kjellin-Stråby K. Escherichia coli mutants with defects in the biosynthesis of 5-methylaminomethyl-2-thio-uridine or 1-methylguanosine in their tRNA. J Bacteriol 1978; 133(2):508-517. 100. Hagervall TG, Björk GR. Genetic mapping and cloning of the gene (trmC) responsible for the synthesis of tRNA (mnm5s2U)methyltransferase in Escherichia coli K12. Mol Gen Genet 1984; 196(2):201-207.

422

DNA and RNA Modii cation Enzymes

101. Bujnicki JM, Oudjama Y, Roovers M et al. Identiication of a bifunctional enzyme MnmC involved in the biosynthesis of a hypermodified uridine in the wobble position of tRNA. RNA 2004; 10(8):1236-1242. 102. Settembre EC, Dorrestein PC, Park JH et al. Structural and mechanistic studies on hiO, a glycine oxidase essential for thiamine biosynthesis in Bacillus subtilis. Biochemistry 2003; 42(10):2971-2981. 103. Todone F, Vanoni MA, Mozzarelli A et al. Active site plasticity in D-amino acid oxidase: a crystallographic analysis. Biochemistry 1997; 36(19):5853-5860. 104. Ihsanawati, Nishimoto M, Higashijima K et al. Crystal structure of tRNA N2,N2-guanosine dimethyltransferase Trm1 from Pyrococcus horikoshii. J Mol Biol 2008; 383(4):871-884. 105. Sakamoto K, Kawai G, Niimi T et al. A modiied uridine in the irst position of the anticodon of a minor species of arginine tRNA, the argU gene product, from Escherichia coli. Eur J Biochem 1993; 216(2):369-375. 106. Takai K, Horie N, Yamaizumi Z et al. Recognition of UUN codons by two leucine tRNA species from Escherichia coli. FEBS Lett 1994; 344(1):31-34. 107. Björk GR. Stable RNA modiication. In: Neidhardt FC, ed. Escherichia coli and Salmonella: Cellular and Molecular Biology. Washington, DC: American Society for Microbiology, 1996:861-886. 108. Andachi Y, Yamao F, Muto A et al. Codon recognition patterns as deduced from sequences of the complete set of transfer RNA species in Mycoplasma capricolum. Resemblance to mitochondria. J Mol Biol 1989; 209(1):37-54. 109. Sprinzl M, Horn C, Brown M et al. Compilation of tRNA sequences and sequences of tRNA genes. Nucleic Acids Res 1998; 26(1):148-153. 110. Yamao F, Muto A, Kawauchi Y et al. UGA is read as tryptophan in Mycoplasma capricolum. Proc Natl Acad Sci USA 1985; 82(8):2306-2309. 111. Martin RP, Sibler AP, Gehrke CW et al. 5-[[(carboxymethyl)amino]methyl]uridine is found in the anticodon of yeast mitochondrial tRNAs recognizing two-codon families ending in a purine. Biochemistry 1990; 29(4):956-959. 112. Watanabe K, Osawa S. tRNA sequences and variations in the genetic code. In: Söll D, Rajbhandary UL, eds. tRNA: Structure, Biosynthesis and Function. Washington, DC: American Society for Microbiology, 1995:225-250. 113. de Crécy-Lagard V, Marck C, Brochier-Armanet C et al. Comparative RNomics and modomics in Mollicutes: prediction of gene function and evolutionary implications. IUBMB Life 2007; 59(10):634-658. 114. Lustig F, Borén T, Guindy YS et al. Codon discrimination and anticodon structural context. Proc Natl Acad Sci USA 1989; 86(18):6873-6877. 115. Murao K, Ishikura H. A new uridine derivative located in the anticodon of tRNA1Gly from Bacillus subtilis. Nucleic Acids Res 1978; 1:s333-s338. 116. Björk GR, Hagervall TG. Transfer RNA modiication. In: Böck A, Curtiss III R, Kaper JB et al, eds. EcoSal—Escherichia coli and Salmonella: Cellular and Molecular Biology. Washington, DC: American Society for Microbiology, 2008:4.6.2. 117. Osawa S, Muto A, Ohama T et al. Prokaryotic genetic code. Experientia 1990; 46(11-12):1097-1106. 118. Osawa S. Evolution of the genetic code. New York, Tokyo: Oxford science publications, 1995. 119. Ohama T, Inagaki Y, Bessho Y et al. Evolving genetic code. Proc Jpn Acad Ser B Phys Biol Sci 2008; 84(2):58-74. 120. Mittenhuber G. Comparative genomics of prokaryotic GTP-binding proteins (the Era, Obg, EngA, hdF (TrmE), YchF and YihA families) and their relationship to eukaryotic GTP-binding proteins (the DRG, ARF, RAB, RAN, RAS and RHO families). J Mol Microbiol Biotechnol 2001; 3(1):21-35. 121. Gupta R. Halobacterium volcanii tRNAs. Identiication of 41 tRNAs covering all amino acids and the sequences of 33 class I tRNAs. J Biol Chem 1984; 259(15):9461-9471. 122. Grosjean H, Gaspin C, Marck C et al. RNomics and Modomics in the halophilic archaea Haloferax volcanii: identiication of RNA modiication genes. BMC Genomics 2008; 9:470-495.

Chapter 29

Deciphering the Complex Enzymatic Pathway for Biosynthesis of Wyosine Derivatives in Anticodon of tRNAPhe Jaunius Urbonavičius,* Louis Droogmans, Jean Armengaud and Henri Grosjean

Abstract

W

yosine derivatives are tricyclic ribonucleosides present exclusively at position 37 of tRNA speciic for phenylalanine in most Eukarya (cytoplasmic only) and Archaea, but not Bacteria. hey occur by posttranscriptional modiication of encoded guanosine in the tRNA precursor. Depending on the organism examined, eight wyosine derivatives have been identiied in naturally occurring tRNAPhe, eleven if one also includes biosynthetic intermediates. he large diversity of wyosine derivatives attests to the existence of complex biosynthetic routes that difer from one organism to another. In this review, we describe the chemical structures of these hypermodiied guanosine derivatives and the biosynthetic pathway of the wyosine derivative found in Saccharomyces cerevisiae. Not surprisingly given their location at position 37, 3ʹ adjacent to the GAA anticodon, wyosine derivatives have been demonstrated to play an essential role in translation, particularly in the regulation of frameshiting.

Introduction

All transfer RNA genes sequenced so far harbor a purine (A or G) at position 37, adjacent to the 3ʹ side of the anticodon.1,2 Ater transcription this purine is oten enzymatically modiied into more complex derivatives, the nature of which depends on the sequence of the anticodon and the organism,3,4 (recently reviewed in refs 5 and 6) recently reviewed in refs. 5 and 6. In the case of tRNAPhe (anticodon GAA, 31 sequences deposited in the tRNA databank—http://trnadb.bioinf. uni-leipzig.de/), two sets of unrelated compounds are found: i) isopentenyladenylate derivatives, such as isopentenyladenosine (i6A) in the cytoplasmic tRNAPhe of some eukaryotes and methylthiolated i6A and/or hydroxylated derivatives (io6A, ms2i6A, ms2io6A) in tRNAPhe of Bacteria and eukaryotic organelles; ii) a simple N1-methylguanine (m1G) in tRNAPhe of some Bacteria, Archaea and Eukaryotes and more complex derivatives of the hypermodiied nucleoside wyosine (in fact of demethylwyosine, the minimalist derivative in this series, see below) in the cytoplasmic tRNAPhe of other Eukarya7 and Archaea (Fig. 1A and below). Recently, Bujnicki and coworkers have analyzed in great detail the occurrence and biosynthetic pathway of the isopentenylated adenosine derivatives in 63 organisms.8 Here, we focus on the structure and the biosynthetic pathway of the other family of purine-37 derivatives, wyosines. *Corresponding Author: Jaunius Urbonavičius—Université Libre de Bruxelles, Laboratoire de Microbiologie, Institut de Recherches Microbiologiques J.-M. Wiame, Avenue E. Gryson 1, b-1070 Bruxelles, Belgium. Email: [email protected]

DNA and RNA Modiication Enzymes: Structure, Mechanism, Function and Evolution edited by Henri Grosjean. ©2009 Landes Bioscience.

424

DNA and RNA Modii cation Enzymes

Figure 1. Wyosine derivatives identified in naturally occurring tRNAPhe of Eukarya (from C to G) and Archaea (from G to J). Names and symbols are the conventional ones adopted from scientific literature. Numbering of atoms depends on whether it is used for the purines (conventional IUPAC numbering) or imidazopurines (as proposed by ref. 24). On the top right of the figure, part of tRNAPhe with its anticodon GAA and purine-37 is represented schematically. Inside the small boxes, various types of modified nucleotides found at position 37, 3ʹ-adjacent to anticodon in tRNAPhe of three Domains of Life are represented: Eukarya, Archaea (in bold) and in Bacteria.

Discovery of the So-Called ‘Y’ Base

In the lurry of excitement over the puriication and sequencing of tRNAs, scientists in the nineteen sixties observed some peculiarities in one fraction of nucleic acids containing a tRNA speciic for phenylalanine and isolated from baker’s yeast or animal liver. his eukaryotic tRNAPhe

Deciphering the Complex Enzymatic Pathway for Biosynthesis of Wyosine Derivatives

425

was found to contain a highly luorescent nucleoside.9-11 he presence of this peculiar base accounts for the remarkable hydrophobic behavior of tRNAPhe during column chromatography12 and has been shown to be essential for proper codon-anticodon interaction.11,13 Also, due to the lability of the glycosidic bond this compound was easily excised from tRNA by incubation under mildly acidic conditions (pH 2-4, 37˚C for a few hours) without breaking the tRNA phosphodiester backbone or afecting any other bases of the tRNAPhe.14 he structure of the acid liberated ‘Y’ base of baker’s yeast was subsequently determined to be a tricyclic purine (imidazopurine) now designated wybutine.14,15 Since the acid treatment might have altered the chemistry of the isolated Y base, its structure was independently conirmed by total synthesis.16,17 he position of the glycosidic linkage was established by chemical analysis of the nucleoside obtained ater mild enzymatic hydrolysis of puriied S. cerevisiae tRNAPhe 18 and much later veriied by chemical synthesis.19 Wybutosine (yW, Fig. 1C and Table 1), the nucleoside of wybutine, has a UV light absorption maxima at 234, 263 and 310 nm and a luorescence emission maximum at 443 nm with excitation maxima at 239 and 318 nm at pH 7.5.20 In view of the steric hindrance between the N4-methyl group and the ribose moiety, it is likely that yW exists in tRNA exclusively in the anti conformation (not in syn conformation as it is drawn in Fig. 1C), a conformation that has been conirmed in solution for the nucleoside by NMR21 and in the solid state by crystallography.22 Wyosine derivatives are highly photoreactive,23 a characteristic that has to be taken into account when testing the function of tRNAPhe. In this chapter, we use the numbering of the atoms as deined by Blobstein and coworkers24 for the tricyclic compounds when the imidazole is fused to the purine. his numbering difers from the IUPAC convention used for purines (compare the numberings in the Figs. 1A and 1C).

Other Members of the Wyosine Families in Eukaryota

Experiments aimed at identifying the structures of the modiied purine-37 in tRNAPhe were carried out on a large panel of eukaryotes, including the yeast Torulopsis utilis, animal liver (bovine, rat, chicken, rabbit), several plants (wheat germ, yellow lupine and maize seeds), insects (Drosophila melanogaster and Bombix mori), irmicute Mycoplasma kid and the aquatic fungus Geotrichum candidum. While a luorescent tricyclic imidazopurine was detected in most cases, it was clear that the precise chemical structure was dependent on the organism and the method of isolation. Interestingly, the structure of wyosine derivatives isolated from plants or animal liver were irst reported to be β-hydroperoxywybutosine (o2yW, Fig. 1D and Table 1), the only peroxy group within the lateral amino acid chain of wybutine.24-29 he same structure was later reported in tRNAPhe isolated from the aquatic fungus Geotrichum candidum.30 he presence of a hydroperoxide in tRNAPhe from L. luteus and wheat germ was supported by a positive Fe(SCN)2 coloring test.28,29 However, based on a negative Fe(SCN)2 coloring test and mass spectrometry data, Itaya and coworkers suggested that a hydroxyl group (OHyW, Fig. 1E and Table 1) instead of a hydroperoxide group was present in wybutosine of rat liver tRNAPhe 31, and later synthesized OHyW.32,33 However, because hydroperoxywybutosine (o2yW) can be formed during storage or manipulation of hydroxywybutosine (OHyW)33 and decomposition of o2yW leads to the formation of hydroxywybutosine (OHyW) and wybutosine (yW),29 the ultimate resolution of the relevant structure(s) may require the identiication of the corresponding metabolic enzyme(s) forming one or another of these two experimentally identiied wyosine derivatives. A simpler wyosine derivative, designated wyosine (imG), has been isolated and characterized from tRNAPhe of the yeast Torulopsis utilis (alternative names Candida utilis, Torula utilis, Pichia jadinii).34,35 Its structure (Fig. 1G and Table 1) has been unambiguously veriied by several methods, including comparison with the chemically synthesized nucleoside.19 Like the wyosine derivatives described above, imG is characterized by the susceptibility of the glycosidic bond to acid-catalyzed hydrolysis34 and remarkable luorescence when illuminated with a UV-lamp.35 Finally, intermediates of wybutosine biosynthesis have been isolated from a variety of cell types. For example, when Vero cells (a cell line derived from African green monkey kidney) are grown in medium deprived of methionine (Met starvation), the yW (normally present in an important fraction of the cellular tRNAPhe) is absent; addition of methionine to the growth medium leads

426

DNA and RNA Modii cation Enzymes

Table 1. Wybutine and wyosine derivatives found at position 37, 3’-adjacent to anticodon GAA of tRNAPhe in Eukarya (E) and Archaea (A) Common Name

Symbol

Full Name

Mr

Fig.

Wybutine (base)

Y

1H-imidazo[1,2-a]purine core

376.42

-

Wybutosine (nucleoside) In E only

yW

(α S)- α-[(methoxy-carbonyl)amino]-4,6-dimethyl-9 -oxo-3- β−D-ribofuranosyl-4,9-dihydro-3H-imidaz o[1,2- α]-purine-7-butanoic acid methyl ester

508.49

1C

HydroperoxyWybutosine In E only

o2yW

(α S, βS)- β -hydroperoxy- α-[(methoxy-carbonyl)am 540.48 ino]-4,6-dimethyl-9-oxo-3- β -D-ribofuranosyl-4,9dihydro-3H-imidazo[1,2- α]-purine-7-butanoic acid methyl ester

1D

HydroxyWybutosine In E only

OHyW

(αS, βS)-β-hydroxy- α-[(methoxy-carbonyl)amino]-4, 524.49 6-dimethyl-9-oxo-3-β-D-ribofuranosyl-4,9-dihydro -3H-imidazo[1,2- α]-purine-7-butanoic acid methyl ester

1E

Undermodified hydroxy-wybutosine In E only

OHyW* α-amino- β -hydroxy-4,6-dimethyl-9-oxo-3- β -D-ribofuranosyl-4,9-dihydro-3H-imidazo[1, 2- α]-purine-7-butanoic acid

446.45

1F

Wyosine In E and A

imG

4,6-dimethyl-3- β -D-ribofuranosyl-3,4-dihydro-9H -imidazo[1,2- α]-purine-9-one

335.23

1G

7-methyl-wyosine In A only

mimG

4,6,7-trimethyl-3- β -D-ribofuranosyl-3,4-dihydro-9 H-imidazo[1,2- α]-purine-9-one

349.34

1H

Iso-wyosine In A only

imG2

6,7-dimethyl-3- β -D-ribofuranosyl-3,4-dihydro-9H- 335 .32 imidazo[1,2- α]-purine-9-one

1I

4-Demethyl-wyosine In E and A

imG-14

6-methyl-3- β -D-ribofuranosyl-3,4-dihydro-9H-imi 321.23 dazo[1,2- α]-purine-9-one

1J

to reappearance of tRNAPhe fully modiied with wybutosine.36 he tRNAPhe isolated from various tumor cells (e.g., Ehrlich ascites, neuroblastoma and Novikof hepatoma) has been shown to be hypo-modiied at the lateral chain (OHyW*, Fig. 1F and Table 1) and contain a fraction of tRNAPhe with 1-methylguanosine at position 37 (m1G, Fig. 1B).37-39 he presence of m1G in tRNAPhe has also been observed in a number of other cell types and organisms, including rat liver hepatomas,40 the previtellogenic oocyte of amphibian Tinca tinca,41 the posterior silkgland of Bombyx mori,42 Mycoplasma kid43 and from Drosophila melanogaster.44,45

Wyosine Derivatives also Exist in Archaea

To date, the only tRNAPhe that has been sequenced from an archaeal organism originates from the euryarchaeon H. volcanii: it harbors an m1G37.46 However, wyosine derivatives have been identiied in the tRNAPhe of other Archaea by analysis of tRNA enzymatic digests with combined liquid chromatography-mass spectrometry (LC-MS). he irst wyosine derivative was found in digests of the hyperthermophilic crenoarchaeota Sulfolobus solfataricus, hermoproteus neutrophilus and Pyrodictium occultum. Based on the comparison of the UV absorption spectrum, luorescence properties and mass spectrometry with those of eukaryotic wyosine derivatives and of synthetic bases, a new wyosine derivative was identiied47 as 7-methylwyosine (mimG, Fig. 1H). However, further analysis of eleven additional thermophilic Archaea, including phylogenetically diverse representatives of thermophilic methanogenes and sulfur metabolizing hyperthermophiles of the

Deciphering the Complex Enzymatic Pathway for Biosynthesis of Wyosine Derivatives

427

euryarchaeota group (e.g., Methanobacterium thermoautotrophicum, hermoplasma acidophylum and Archaeoglobus fulgidus), revealed that only a few Archaea (mainly the crenoarchaeota) contain mimG in their tRNAs,48 albeit demethylwyosine, which lacks the methyl group at N4, can also be detected. In enzymatic tRNA extracts of Sulfolobus solfataricus P2, Methanococcus thermolithotrophicum and Stetteria hydrogenophila, two new compounds have been identiied: one corresponds to an isomer of wyosine (imG) and is designated isowyosine (imG2, Fig. 1I and Table 1), while the other one is a minimalist form of imG/imG2 with a molecular mass of 321, designated imG-14 because its molecular mass is 14 Da less than that of wyosine imG (Mr = 335, Fig. 1J and Table 1).49 Pyrolobus fumarii, a submarine crenarchaeote which grows optimally at 106°C, has been shown to contain several wyosine derivatives, including mimG, imG2, imG-14 and imG (the same as in T. utilis—Fig. 1G).50 he imG-14 (not mimG and imG/imG2) has also been detected in the psychrotolerant archaeon Methanococoides burtonii, which grows at 4-23˚C,51 demonstrating that imG-14 derivatives in archaea are not exclusively synthesized in thermophilic or hyperthermophilic organisms. A compound of yet unknown structure with Mr = 422, exhibiting UV absorption spectrum characteristic of wyosine derivatives (designated N422), has been found in M. maripaludis and M. vannielii52 but not in the other archaea analyzed so far. hus, depending on the archaeon, at least four imG-14 derivatives can be identiied in hydrolysates of bulk tRNA (imG, imG2, mimG (see Fig. 1, G-J) and possibly N422). heir structures appear to be unique to Archaea because of the presence of a simple methyl group (or none) at C7 of the imidazopurine (Figs.1H and 1I) instead of α-amino-α-carboxypropyl side chain as in eukaryal wybutosines (Fig. 2, C-F). Unlike all eukaryal wyosine derivatives identiied so far, archaeal derivatives do not always harbor a methyl group at N4.

Biosynthesis of Wyosine Derivatives in Eukarya Origin of Various Carbon Atoms in Wybutosine

he biosynthetic pathway of imG-14 derivatives remained largely unknown for many years. Early work with S. cerevisiae demonstrated that wybutosine (yW) is derived from posttrancriptional modiication of the encoded guanosine of the tRNAPhe precursor,53,54 a conclusion that appears evident today with the many sequences of tRNAPhe genes available. Subsequent work demonstrated incorporation of the α-amino-α-carboxypropyl group (symbol acp) from methionine into wybutosine.55 NMR analysis of tRNAPhe isolated from yeast grown in the presence of (methyl-13C)-S-adenosylmethionine (AdoMet) demonstrated that each of the two methyl groups of the acp side chain, one of the two carbons at position 6 or 7 of the imidazo ring and the methyl group at position N4 of the guanine moiety are all derived from methionine, but the methyl group attached to the C6 atom is not.56 Identical results were observed in the biosynthesis of methylwyosine (mimG) in Sulfolobus acidocaldarius grown at 65˚ in the presence of (methyl-13C)-AdoMet.57 Understanding how the imidazo ring is formed has been (and still is) a challenge. Early studies demonstrated incorporation of radioactivity into tRNAPhe of Vero cells (a monkey kidney cell line) incubated with labeled lysine.58 his observation was interpreted in the context of formation of the third ring and it was proposed that lysine is converted into α-aminodipic acid semialdehyde (a C3 precursor), which then reacted with the guanosine. Demonstration that the C7 atom from the imidazo ring in fact originates from N1-methylation of guanosine-37 came from experiments in which a chimeric yeast tRNAPhe harboring an unmodiied G37 in place of yW37 was microinjected into the cytoplasm of Xenopus laevis oocyte. Ater microinjection rapid formation of N1-methylguanosine was observed (Fig. 1B), followed by its successive transformation into two wyosine derivatives of unknown structures.59 Later, the gene encoding AdoMet-dependent tRNA:m1G37 methyltransferase Trm5p was identiied in S. cerevisiae (see below) and deletion of TRM5 gene was shown to abolish not only the formation of m1G37 in many cellular tRNAs and m1I37 in tRNAAla, but also the formation of yW37 in tRNAPhe, consistent with the irst step of wybutine biosynthesis being the formation of m1G37.60 While the hypothesis of Pergolizzi concerning the origin of the atoms C6, C7 and the attached methyl group of the imidazo ring is therefore not valid, the possibility remains that the C2 precursor used for the imidazo ring formation is connected to lysine

428

DNA and RNA Modii cation Enzymes

Figure 2. Wybutosine biosynthesis pathway. The various steps leading to the formation of wybutosine (yW) in the yeast S. cerevisiae (taken from ref. 67), yeast Torula utilis and plants/higher eukaryotes are shown. For the enzymatic steps where the cofactors involved still have to be discovered, they are symbolized by a question mark. AdoMet means S-Adenosyl-L-Methionine; FMN means flavin-mononucleotide; (4Fe-4S) corresponds to the iron-sulfur cluster (see text).

metabolism. Identiication of the next steps of wyosine metabolism came only recently from a comparative genomics approach followed by genetic/biochemical veriications.

Identiication of Genes Coding for Enzymes of the yW Metabolism

he irst gene encoding a protein involved in wybutosine biosynthesis, trmD, which encodes tRNA:m1G37 methyltransferase, was identiied in Archaea (M. jannaschii and M. vannielii) by Björk and coworkers during the course of long-time studies on m1G37 modiication in tRNA.60 he yeast ortholog TRM5 was identiied by sequence similarity to the archaeal genes. Interestingly, bacterial TrmD is evolutionarily unrelated to the archaeal/eukaryal enzymes.61,62 hese enzymes belong to the large Rossmann fold-containing superfamily of methyltransferases, while TrmD and all bacterial homologues belong to the SPOUT fold-containing superfamily of methyltransferases,63 (see also chapter by Czerwoniec et al in this volume). he discovery of additional genes in wyosine biosynthesis has been achieved in recent years through the application of various genome mining techniques. For example, analysis of orphan genes in the genomes of organisms that synthesize wyosine derivatives (e.g., M. jannashii, S. cerevisiae, S. pombe, H. sapiens and A. thaliana) with those that don’t (D. melanogaster, E. coli and B. subtilis) led to the identiication of a single gene family belonging to the Cluster of Orthologous Genes COG073. Deletion of the S. cerevisiae gene belonging to this COG (YPL207w) and nucleotide analysis of the tRNAPhe isolated from the mutant strain demonstrated the presence

Deciphering the Complex Enzymatic Pathway for Biosynthesis of Wyosine Derivatives

429

of m1G37 instead of wyosine, leading to the conclusion that YPL207w catalyzes the second step of wyosine biosynthesis64 (Fig. 2). In another approach, yeast genes encoding putative AdoMet-dependent methyltransferases harboring the characteristic seven ß-strand structures were systematically deleted and the nucleotide content of the tRNA isolated from the gene-deleted strains was analyzed. One of the candidate mutant strains (ΔYML005w) was shown to lack the yW derivative while accumulating the minimalist demethylwyosine (imG-14),65 (Fig. 1J), demonstrating that imG-14, initially discovered in archaeal tRNAs (see above), is a genuine intermediate of wybutine biosynthesis in S. cerevisiae. he newly discovered enzyme (YML005w, renamed Trm12p) was proposed to be involved in one of the steps following the ring cyclization, possibly the addition of the α-amino-α-carboxypropyl group (acp) on C7 atom of imG-14 or the methylation of N4 atom of the guanine ring. Finally, in an exhaustive, systematic analysis of bulk RNAs isolated from 351 different S. cerevisiae deletion mutants by high performance liquid chromatography coupled to mass spectrometry (LC-MS), termed ‘ribonucleome analysis’,66 only four strains were found to lack the yW modiication. In addition to the YPL207w and YML005w genes, previously shown to be involved in the formation of wybutosine, deletion of the YGL050w and YOL141w genes were also shown to disrupt the pathway leading to yW formation. In the corresponding mutant strains additional wyosine derivatives were observed; yW-86 in the tRNA from the ΔYGL050w strain and yW-72 in the ΔYOL141w strain (see explanation below and Fig. 2, also ref. 67). Using the corresponding recombinant proteins Yml005w, Ygl050w and Yol141w (renamed Tyw2p, Tyw3p and Tyw4p respectively), 3 steps of the wybutosine (yW) biosynthetic pathway were reconstituted in vitro.67

he Biosynthetic Route of Wybutosine Biosynthesis in S. cerevisiae

Integration of all the above information allows the following scheme to be proposed for the multistep enzymatic formation of wybutosine derivatives (Fig. 2). he complete biosynthetic route encompasses 7 steps (most taken from ref. 67): (1) Formation of m1G37 catalyzed by tRNA:m1G methyltransferase (Trm5p). he crystal structure of only the archaeal homolog (Methanocaldococcus jannaschii62) is known and raises interesting questions about the mechanism of action of this AdoMet-dependent methyltransferase (for the human homologue Trm5p, see also;68 (2) Addition of a C2-unit followed by cyclization of the 3rd ring catalyzed by Tyw1p. his enzyme contains a lavodoxin domain in its N-terminus and a radical-AdoMet domain including one (or possibly two) (4Fe-4S) cluster(s) in its C-terminus. he FMN cofactor in the lavodoxin domain probably serves as an electron storage unit. Mutations in one of the (4Fe-4S) cluster motifs abolished the yW biosynthesis in vivo, demonstrating that it is essential for catalytic activity.67 Crystal structures of the archaeal Tyw1p homologues lacking the N-terminal lavodoxin domain have recently been solved from M. jannaschii and Pyrococcus horikoshii.69,70 heir overall structures are similar to other radical-AdoMet enzymes71,72 and are consistent with the predicted enzymatic activity for imidazo ring formation (for details see chapter by Atta et al in this volume). he identity of the two-carbon donor in this reaction remains to be identiied; (3) Transfer of the α-amino-α-carboxypropyl group (acp) from the methionine moiety of AdoMet to imG-14 at C7 atom catalyzed by Tyw2p. his is consistent with the earlier observation that the acp moiety of yW originates from methionine;55 (4) AdoMet-dependent methylation of the yW-86 (7-aminocarboxypropyl-demethylwyosine) to give yW-72 (7- aminocarboxypropylwyosine) by Tyw3p. In S. cerevisiae it appears that the Tyw2p activity occurs before Tyw3p and not vice versus, while in T. utilis imG is the apparent inal modiied nucleoside, demonstrating that in some organisms Tyw3p can catalyze the AdoMet-dependent conversion of imG-14 to imG (Fig. 2, dashed arrow); (5,6) he AdoMet-dependent conversion of yW-72 (aminocarboxypropylwyosine) to yW-58 (aminocarboxypropylwyosine methyl ester) and further to yW (methoxycarbonyl-aminocarboxypropylwyosine methyl ester), catalyzed by Tyw4p. Only scanty amount of yW-58 intermediate is formed in vitro. Still, it demonstrates that recombinant TYW4p has an apparent Ado-Met-dependent methylation activity. he fact that yW is a major product of TYW4p-catalyzed reaction together with the absence of detectable yW-14 or other possible intermediates indicate that multistep enzymatic formation of yW from yW-72 is

430

DNA and RNA Modii cation Enzymes

Figure 3. Origins of ‘building blocks’ in wybutosine. his igure complements Figure 2 concerning the biosynthetic pathway. Various ‘building blocks’ of wyosine derivatives in Eukarya and Archaea are presented.

most probably triggered by Ado-Met-dependent methylation. Since small quantity of other protein involved in methoxycarbonylation of yW-58 might copurify with the his-tagged TYW4p protein by a Ni-chelating column, TYW4p was further puriied by anion exchange chromatography. Such puriied protein directly converts yW-72 to yW with production of only low amount of yW-58 intermediate, suggesting that TYW4p is a bifunctional enzyme catalyzing both methylation of α-carboxyl group and methylcarbonylation of the α-amine group in the lateral chain of wyosine. (7) Hydro(pero)xylation of wybutosines. he enzymes catalyzing such reactions in plants and higher eukaryotes are unknown. Altogether, at least ive molecules of AdoMet are consumed in the biosynthesis, making wybutosine perhaps the most costly modiied nucleotide formed in a single cellular tRNA.73 To conclude this part of the chapter, Figure 3 summarizes the present knowledge about the origin of various ‘building blocks’ of wyosine derivatives in Eukarya and Archaea.

Role of Wyosine Derivatives During Translation Process

he high-energetic cost and the requirement of many speciic enzymes for its biogenesis beg the question of the cellular function of wybutosine in tRNAPhe. Its location exclusively at position 37, 3ʹ-adjacent to the GAA anticodon, suggests a function in the decoding process on the ribosome. However, since there is signiicant structural diversity in the wyosine deriviatives found in Eukarya and Archaea (and Bacteria completely lack wyosine derivatives), how these nucleosides might participate in decoding is not immediately obvious. It appears from many studies that, in general, the modiication status of tRNAPhe, especially at purine-37, plays a role in modulating the stability of the codon-anticodon interaction by dangling end-type base stacking (stabilization function74). Also, because most modiied purines at position 37 cannot base pair in a Watson-Crick mode, their presence 3ʹ-adjacent to the anticodon restricts the tRNA to base pair with the in-frame codon-anticodon triplet, thus limiting (but not completely avoiding, see below) the risk of frameshiting during translation (‘antislip’ function, reviewed in refs. 6,75,76). In the special case of tRNAPhe (anticodon GAA), which has to read ‘potentially weakly binding’ UUU/C codons, sometimes in a run of ‘slippy U-rich’ codon contexts, there may be a particular need for the ‘stabilization and antislip’ functions of highly hydrophobic bases like the wyosine derivatives. Notably, the level of modiication at purine-37 and especially of the wybutosine derivatives in eukaryal tRNAPhe, depends on cell growth or stress conditions and the availability of the cofactor(s) needed for their enzymatic formation (see references above). hus, the presence or

Deciphering the Complex Enzymatic Pathway for Biosynthesis of Wyosine Derivatives

431

absence of wybutosine derivatives in tRNAPhe may afect and possibly regulate, the expression level of certain proteins. For example, in several retroviruses the production of gag-pol or pro-pol fusion proteins requires -1 frameshiting at the so-called ‘programmed frameshiting’ site U-UUU-UUA (the slippery sequence read by tRNAPhe is underlined).77 Interestingly, the tRNAPhe isolated from HIV infected H9 cells lacks the wyosine derivatives.78 Also, using a rabbit reticulocyte in vitro translation system and the ‘shity’ A-AAU-UUU sequence, Hatield and collaborators demonstrated that the rabbit reticulocyte tRNAPhe bearing a m1G37 instead of the yW37 stimulates -1 frameshiting fourfold.79,80 However, when tRNAPhe from the S. cerevisiae trm5 mutant, which contains G37, was tested in vivo with another slippery sequence U-UUU-UUA, no signiicant diference in the level of frameshiting was observed when compared with the wild type tRNAPhe containing yW37.81 hese contradictory observations indicate that either the consequences of a lack of yW37 depend on the position of the U-UUU sequence in the heptameric shity sequence, or the two biological systems (in vitro or in vivo) are not comparable. Recently, Schimmel and coworkers demonstrated a clear correlation between the degree of modiication of the G37 in yeast tRNAPhe and the eiciency of -1 frameshiting.82 Using an in vivo system based on mutants of S. cerevisiae ΔTYW1 and ΔTYW2 that accumulate tRNAPhe bearing m1G or imG-14, respectively (see Fig. 2), they probed all six naturally occurring shity sequences in SCV-LA virus: (G-G(G,A,U) U-UU(U,C)), followed by a characteristic pseudo-knot that was shown to favor the frameshit event. he results were very compelling; tRNAPhe bearing m1G37 was more prone to frameshiting (up to 35% in some cases) than the one bearing imG-14 at position 37 (about 25%), which itself was more prone to frameshit than the wild type tRNAPhe containing yW37 (around 15%). It now remains to explain these data at the molecular level and importantly, to understand why some types of cells do not contain wyosine derivatives and have m1G37 instead.

Conclusion and Future Prospects

Wyosine and related derivatives are found exclusively at position 37, 3ʹ-adjacent to anticodon GAA in tRNAPhe of many Eukarya and Archaea. hey are absent in tRNAPhe of Bacteria, which instead harbor m1G37. In Figure 4, arrows indicate the direction of chemical complexity of diferent derivatives found in the three phylogenetic domains, from simple m1G to the more elaborate imG-14, 7-methylwyosine (mimG) in Archaea and wybutosine derivatives (yW and o2yW/ OHyW) in Eukarya. Biosynthesis of these guanosine derivatives, especially in higher eukaryotes and plants, involves many enzymatic steps and is energetically costly in terms of AdoMet consumption. Its function is clearly to facilitate and regulate the production of proteins at the translation level (stabilization of codon-anticodon interaction and avoidance of slippage out of reading frame). However, there is also evidence that wyosine biosynthetic enzymes might be associated with other functions. For example, overexpression of the TRM12 gene, which encodes the human homologue of Tyw2p (step 3 in Fig. 2), has been observed in human breast cancer cells.83 Since it has been previously observed that in some tumors a hypomodiied tRNAPhe (bearing m1G37) is utilized in translation instead of the fully modiied isoacceptor,84 overproduction of this tRNA modiication enzyme may trap a fraction of the imG-14 harbouring tRNAPhe, leaving free the m1G37-bearing variant. hus, exploring the origin of the overexpression of TRM12 in cancer cells may reveal novel pathways of tumorogenesis. While considerable progress has been made in recent years elucidating the wyosine biosynthetic pathway, much remains to be done. For example, the two-carbon donor for the radical-AdoMet enzyme Tyw1p (step 2 in Fig. 2) has yet to be identiied, as well as the donor of methylcarbonyl (step 6 in Fig. 2) and the enzyme(s) involved in the hydro(peroxy)lation of wybutosine (step 7 in Fig. 2) in plants and higher eukaryotes are still unknown. Moreover, the potential exists for the organization of wyosine biosynthetic enzymes into a multiprotein complex (metabolon).67 At least in plants, genes coding for homologues of Tyw2p, Tyw3p and the C-terminal domain of Tyw4p are fused and code for a large ‘TYW3-4C-2’ protein.67 he close proximity of enzymes active-sites and substrates within multi-enzyme complexes generally enables the reactions to proceed more eiciently and without accumulation of intermediates. Crystallization of such type of complex

432

DNA and RNA Modii cation Enzymes

Figure 4. Distribution of modified purines found in tRNAphe in organisms of the three domains of life. Symbols are explained in the text. Arrows indicate the direction of chemical complexity. The first basic transition, common to both eukaryal and archaeal organisms, is the formation of imG-14 from m1G. Then, depending on the domain of life, different types of wyosine derivatives are found: imG2 and mimG are found only in Archaea, whereas the wybutosine derivatives (yW, OHyW, OHyW* and o2yW) are found in Eukarya (cytoplasmic tRNAPhe only). Presence of isopentenylate derivatives is characteristic of Bacteria and some Eukarya. In Archaea, there is no tRNAPhe harboring A37; only G37 derivatives are found.

should shed light about how the successive reactions are performed in a sequential manner. he three-dimensional structures of several wyosine and wybutosine biosynthetic enzymes have been solved, but so far only for archaeal homologues.62,69,70 Presently, the set of genes encoding enzymes of the wybutosine biosynthetic pathway is known only in S. cerevisiae. here are several diferent imG-14 derivatives in eukaryal and archaeal tRNA and searching for homologous genes in fully sequenced genomes (53 archaeal and 21 eukaryal ones are available at http://img.jgi.doe.gov, August 2008) and correlating them with the presence of imG-14 derivatives in each of these organisms may reveal the existence of alternative metabolic routes and provide insight on the emergence and evolution of this huge family of genes.

Acknowledgements

We thank Jef Rozenski (Katholieke Universiteit Leuven, Belgium) for suggestions regarding the nomenclature of wybutosine derivatives and Dirk Iwata-Reuyl (Portland State University, Portland, Oregon, USA) for advices and considerable improvements of the text. Current work on wyosine derivatives in LD laboratory is inanced by the Fonds pour la Recherche Fondamentale Collective (FRFC), Fonds Jean Brachet Recherche, Fonds E. Defay and Fonds D. et A. Van Buuren. JA thanks the Commissariat à lʹEnergie Atomique for inancial supports. JU was supported by a postdoctoral fellowship from the FRFC and by a FEBS Distinguished Young Investigator Award. HG (emeritus scientist) thanks Prof. Jean-Pierre Rousset from Université Paris-Sud for providing facilities to continue working in his laboratory.

Deciphering the Complex Enzymatic Pathway for Biosynthesis of Wyosine Derivatives

References

433

1. Marck C, Grosjean H. tRNomics: analysis of tRNA genes from 50 genomes of Eukarya, Archaea and Bacteria reveals anticodon-sparing strategies and domain-speciic features. RNA 2002; 8(10):1189-1232. 2. Juhling F, Morl M, Hartmann RK et al tRNAdb 2009: compilation of tRNA sequences and tRNA genes. Nucleic Acids Res:2009;37(Database issue):D159-62. Epub 2008 Oct 28 3. Nishimura S. Minor components in transfer RNA: their characterization, location and function. Prog Nucleic Acid Res Mol Biol 1972; 12:49-85. 4. Grosjean H, Cedergren RJ, McKay W. Structure in tRNA data. Biochimie 1982; 64(6):387-397. 5. Bjork GR, Hagervall T. Transfer RNA Modiication. In: Curtis R. I, ed. Escherichia coli and Salmonella: Cellular and molecular biology. Washington, D.C.: ASM Press, 2005:Module 4.6.2. 6. Agris PF. Decoding the genome: a modiied view. Nucleic Acids Res 2004; 32(1):223-238. 7. Fairield SA, Barnett WE. On the similarity between the tRNAs of organelles and prokaryotes. Proc Natl Acad Sci USA 1971; 68(12):2972-2976. 8. Kaminska KH, Baraniak U, Boniecki M et al. Structural bioinformatics analysis of enzymes involved in the biosynthesis pathway of the hypermodiied nucleoside ms(2)io(6)A37 in tRNA. Proteins 2008; 70(1):1-18. 9. Rajbhandary UL, Chang SH, Stuart A et al. Studies on Polynucleotides, Lxviii the Primary Structure of Yeast Phenylalanine Transfer RNA. Proc Natl Acad Sci USA 1967; 57(3):751-758. 10. Yoshikami D, Katz G, Keller EB et al. A luorescence assay for phenylalanine transfer RNA. Biochim Biophys Acta 1968; 166(3):714-717. 11. hiebe R, Zachau HG. A speciic modiication next to the anticodon of phenylalanine transfer ribonucleic acid. Eur J Biochem 1968; 5(4):546-555. 12. Wimmer E, Maxwell IH, Tener GM. A simple method for isolating highly puriied yeast phenylalanine transfer ribonucleic acid. Biochemistry 1968; 7(7):2623-2628. 13. Ghosh K, Ghosh HP. Role of modiied nucleoside adjacent to 3ʹ-end of anticodon in codon-anticodon interaction. Biochem Biophys Res Commun 1970; 40(1):135-143. 14. hiebe R, Zachau HG, Baczynskyj L et al. Study on the properties and structure of the modiied base Y+ of yeast tRNAPhe. Biochim Biophys Acta 1971; 240(2):163-169. 15. Nakanishi K, Furutachi N, Funamizu M et al. Structure of the luorescent Y base from yeast phenylalanine transfer ribonucleic acid. J Am Chem Soc 1970; 92(26):7617-7619. 16. Funamizu M, Terahara A, Feinberg AM et al. Total synthesis of dl-Y base from yeast phenylalanine transfer ribonucleic acid and determination of its absolute configuration. J Am Chem Soc 1971; 93(24):6706-6708. 17. Itaya T, Mizutani A. Studies on the synthesis of the luorescent bases from phenylalanine transfer ribonucleic acids. Nucleic Acids Symp Ser 1984; (15):13-15. 18. Blobstein SH, Gebert R, Grunberger D et al. Structure of the luorescent nucleoside of yeast phenylalanine transfer ribonucleic acid. Arch Biochem Biophys 1975; 167(2):668-673. 19. Itaya T, Kanai T, Iida T. Practical synthesis of wybutosine, the hypermodiied nucleoside of yeast phenylalanine transfer ribonucleic acid. Chem Pharm Bull (Tokyo) 2002; 50(4):530-533. 20. Maelicke A, von der Haar F, Sprinzl M et al. he structure of the anticodon loop of tRNAPhe from yeast as deduced from spectroscopic studies on oligonucleotides. Biopolymers 1975; 14(1):155-171. 21. Kan LS, Ts’o PO, von der Haar F et al. Proton magnetic resonance studies on the conformation of the hexanucleotide, GmpApApYpApsiP and Related fragments from the anticodong loop of baker’s yeast phenylalanine transfer ribonucleic acid. Biochemistry 1975; 14(14):3278-3291. 22. Suddath FL, Quigley GJ, McPherson A et al. hree-dimensional structure of yeast phenylalanine transfer RNA at 3.0 angstroms resolution. Nature 1974; 248(443):20-24. 23. Paszyc S, Rafalska M. Photochemical properties of Yt base in aqueous solution. Nucleic Acids Res 1979; 6(1):385-397. 24. Blobstein SH, Grunberger D, Weinstein IB et al. Isolation and structure determination of the luorescent base from bovine liver phenylalanine transfer ribonucleic acid. Biochemistry 1973; 12(2):188-193. 25. Dudock BS, Katz G, Taylor EK et al. Primary structure of wheat germ phenylalanine transfer RNA. Proc Natl Acad Sci USA 1969; 62(3):941-945. 26. Nakanishi K, Blobstein S, Funamizu M et al. Structure of the “peroxy-Y base” from liver tRNAPhe. Nat New Biol 1971; 234(47):107-109. 27. Fink LM, Lanks KW, Goto T et al. Comparative studies on mammalian and yeast phenylalanine transfer ribonucleic acids. Biochemistry 1971; 10(10):1873-1878. 28. Feinberg AM, Nakanishi K, Barciszewski J et al. Isolation and characterization of peroxy-Y base from phenylalanine transfer ribonucleic acid of the plant, Lupinus luteus. J Am Chem Soc 1974; 96(25):7797-7780. 29. Barciszewska M, Kaminek M, Barciszewski J et al. Lack of cytokinin activity of Y-type bases isolated from phenylalanine speciic tRNAs. Plant Science Letters 1981; 20:387-392.

434

DNA and RNA Modii cation Enzymes

30. Moshizuki A, Omata Y, Miyazawa Y. Structure determination of the luorescent base from Geotrichum candidum phenylalanine tRNA. Bull Chem Soc Jpn 1980; 53:813-814. 31. Kasai H, Yamaizumi Z, Kuchino Y et al. Isolation of hydroxy-Y base from rat liver tRNAPhe. Nucleic Acids Res 1979; 6(3):993-999. 32. Itaya T, Watanabe N, Mizutani A. Studies on the synthesis of the hypermodiied base isolated from rat liver phenylalanine transfer ribonucleic acid. Nucleic Acids Symp Ser 1986(17):49-51. 33. Itaya T, Kanai T. Synthesis and structure of the hypermodiied nucleoside of rat liver phenylalanine transfer ribonucleic Acid. Chem Pharm Bull (Tokyo) 2002; 50(10):1318-1326. 34. Takemura S, Kasai H, Goto M. Nucleotide sequence of the anticodon region of Torulopsis phenylalanine transfer RNA. J Biochem 1974; 75(5):1169-1172. 35. Kasai H, Goto M, Ikeda K et al. Structure of wye (Yt base) and wyosine (Yt) from Torulopsis utilis phenylalanine transfer ribonucleic acid. Biochemistry 1976; 15(4):898-904. 36. Pergolizzi RG, Engelhardt DL, Grunberger D. Formation of phenylalanine transfer RNA lacking the wye base in Vero cells during methionine starvation. J Biol Chem 1978; 253(18):6341-6343. 37. Salomon R, Giveon D, Kimhi Y et al. Abundance of tRNAPhe lacking the peroxy Y-base in mouse neuroblastoma. Biochemistry 1976; 15(24):5258-5262. 38. Kuchino Y, Kasai H, Yamaizumi Z et al. Under-modiied Y base in a tRHAPhe isoacceptor observed in tumor cells. Biochim Biophys Acta 1979; 565(1):215-218. 39. Kuchino Y, Borek E, Grunberger D et al. Changes of posttranscriptional modiication of wye base in tumor-speciic tRNAPhe. Nucleic Acids Res 1982; 10(20):6421-6432. 40. Grunberger D, Weinstein IB, Mushinski JF. Deiciency of the Y base in a hepatoma phenylalanine tRNA. Nature 1975; 253(5486):66-67. 41. Mazabraud A. Deficiency of the peroxy-Y base in oocyte phenylalanine tRNA. FEBS Lett 1979; 100(2):235-240. 42. Keith G, Dirheimer G. Primary structure of Bombyx mori posterior silkgland tRNAPhe. Biochem Biophys Res Commun 1980; 92(1):109-115. 43. Kimball ME, Szeto KS, Soll D. he nucleotide sequence of phenylalanine tRNA from Mycoplasma sp. (Kid). Nucleic Acids Res 1974; 1(12):1721-1732. 44. White BN, Tener GM. Properties of tRNA Phe from Drosophila. Biochim Biophys Acta 1973; 312(2):267-275. 45. Altwegg M, Kubli E. he nucleotide sequence of phenylalanine tRNA2 of Drosophila melanogaster: four isoacceptors with one basic sequence. Nucleic Acids Res 1979; 7(1):93-105. 46. Gupta R. Halobacterium volcanii tRNAs. Identiication of 41 tRNAs covering all amino acids and the sequences of 33 class I tRNAs. J Biol Chem 1984; 259(15):9461-9471. 47. McCloskey JA, Crain PF, Edmonds CG et al. Structure determination of a new luorescent tricyclic nucleoside from archaebacterial tRNA. Nucleic Acids Res 1987; 15(2):683-693. 48. Edmonds CG, Crain PF, Gupta R et al. Posttranscriptional modiication of tRNA in thermophilic archaea (Archaebacteria). J Bacteriol 1991; 173(10):3138-3148. 49. Zhou S, Sitaramaiah D, Noon KR et al. Structures of two new “minimalist” modiied nucleosides from archaeal tRNA. Bioorg Chem 2004; 32(2):82-91. 50. McCloskey JA, Liu XH, Crain PF et al. Posttranscriptional modiication of transfer RNA in the submarine hyperthermophile Pyrolobus fumarii. Nucleic Acids Symp Ser 2000(44):267-268. 51. Noon KR, Guymon R, Crain PF et al. Inluence of temperature on tRNA modiication in archaea: Methanococcoides burtonii (optimum growth temperature (Topt), 23 degrees C) and Stetteria hydrogenophila (Topt, 95 degrees C). J Bacteriol 2003; 185(18):5483-5490. 52. McCloskey JA, Graham DE, Zhou S et al. Post-transcriptional modiication in archaeal tRNAs: identities and phylogenetic relations of nucleotides from mesophilic and hyperthermophilic Methanococcales. Nucleic Acids Res 2001; 29(22):4699-4706. 53. Li HJ, Nakanishi K, Grunberger D et al. Biosynthetic studies of the Y base in yeast phenylalanine tRNA. Incorporation of guanine. Biochem Biophys Res Commun 1973; 55(3):818-823. 54. hiebe R, Poralla K. Origin of the nucleoside Y in yeast tRNAPhe. FEBS Lett 1973; 38(1):27-28. 55. Munch HJ, hiebe R. Biosynthesis of the nucleoside Y in yeast tRNAPhe: incorporation of the 3-amino-3-carboxypropyl-group from methionine. FEBS Lett 1975; 51(1):257-258. 56. Smith C, Schmidt PG, Petsch J et al. Nuclear magnetic resonance signal assignments of puriied (13C) methyl-enriched yeast phenylalanine transfer ribonucleic acid. Biochemistry 1985; 24(6):1434-1440. 57. McCloskey JA BG, Lindstrom EB, Peltier JM. Methylation of tRNA by S-adenosylmethionine in archaeal hyperthermophiles. Nucleic Acids Symposium Series 1996; 35:277-278. 58. Pergolizzi RG, Engelhardt DL, Grunberger D. Incorporation of lysine into Y base of phenylalanine tRNA in Vero cells. Nucleic Acids Res 1979; 6(6):2209-2216.

Deciphering the Complex Enzymatic Pathway for Biosynthesis of Wyosine Derivatives

435

59. Droogmans L, Grosjean H. Enzymatic conversion of guanosine 3ʹ adjacent to the anticodon of yeast tRNAPhe to N1-methylguanosine and the wye nucleoside: dependence on the anticodon sequence. EMBO J 1987; 6(2):477-483. 60. Bjork GR, Jacobsson K, Nilsson K et al. A primordial tRNA modiication required for the evolution of life? EMBO J 2001; 20(1-2):231-239. 61. Christian T, Hou YM. Distinct determinants of tRNA recognition by the TrmD and Trm5 methyl transferases. J Mol Biol 2007; 373(3):623-632. 62. Goto-Ito S, Ito T, Ishii R et al. Crystal structure of archaeal tRNA(m(1)G37)methyltransferase aTrm5. Proteins 2008; 72(4):1274-1289. 63. Elkins PA, Watts JM, Zalacain M et al. Insights into catalysis by a knotted TrmD tRNA methyltransferase. J Mol Biol 2003; 333(5):931-949. 64. Waas WF, de Crecy-Lagard V, Schimmel P. Discovery of a gene family critical to wyosine base formation in a subset of phenylalanine-speciic transfer RNAs. J Biol Chem 2005; 280(45):37616-37622. 65. Kalhor HR, Penjwini M, Clarke S. A novel methyltransferase required for the formation of the hypermodiied nucleoside wybutosine in eucaryotic tRNA. Biochem Biophys Res Commun 2005; 334(2):433-440. 66. Noma A, Suzuki T. Ribonucleome analysis identiied enzyme genes responsible for wybutosine synthesis. Nucleic Acids Symp Ser (Oxf ) 2006(50):65-66. 67. Noma A, Kirino Y, Ikeuchi Y et al. Biosynthesis of wybutosine, a hyper-modiied nucleoside in eukaryotic phenylalanine tRNA. EMBO J 2006; 25(10):2142-2154. 68. Brule H, Elliott M, Redlak M et al. Isolation and characterization of the human tRNA-(N1G37) methyltransferase (TRM5) and comparison to the Escherichia coli TrmD protein. Biochemistry 2004; 43(28):9243-9255. 69. Suzuki Y, Noma A, Suzuki T et al. Crystal structure of the radical SAM enzyme catalyzing tricyclic modiied base formation in tRNA. J Mol Biol 2007; 372(5):1204-1214. 70. Goto-Ito S, Ishii R, Ito T et al. Structure of an archaeal TYW1, the enzyme catalyzing the second step of wye-base biosynthesis. Acta Crystallogr D Biol Crystallogr 2007; 63(Pt 10):1059-1068. 71. Marsh EN, Patwardhan A, Huhta MS. S-adenosylmethionine radical enzymes. Bioorg Chem 2004; 32(5):326-340. 72. Frey PA, Hegeman AD, Ruzicka FJ. he Radical SAM Superfamily. Crit Rev Biochem Mol Biol 2008; 43(1):63-88. 73. Grosjean H, Marck C, de Crecy-Lagard V. he various strategies of codon decoding in organisms of the three domains of life: evolutionary implications. Nucleic Acids Symp Ser (Oxf ) 2007(51):15-16. 74. Grosjean H, Soll DG, Crothers DM. Studies of the complex between transfer RNAs with complementary anticodons. I. Origins of enhanced ainity between complementary triplets. J Mol Biol 1976; 103(3):499-519. 75. Grosjean H, Houssier, C, Romby, P et al. Modulatory role of modiied nucleotides in RNA loop-loop interaction In: Grosjean H, Benne R, ed. Modiication and editing of RNA. Washington, DC: ASM press; 1998:113-133. 76. Gustilo EM, Vendeix FA, Agris PF. tRNA’s modiications bring order to gene expression. Curr Opin Microbiol 2008; 11(2):134-140. 77. Jacks T, Madhani HD, Masiarz FR et al. Signals for ribosomal frameshiting in the Rous sarcoma virus gag-pol region. Cell 1988; 55(3):447-458. 78. Hatield D, Feng YX, Lee BJ et al. Chromatographic analysis of the aminoacyl-tRNAs which are required for translation of codons at and around the ribosomal frameshit sites of HIV, HTLV-1 and BLV. Virology 1989; 173(2):736-742. 79. Carlson BA, Mushinski JF, Henderson DW et al 1-Methylguanosine in place of Y base at position 37 in phenylalanine tRNA is responsible for its shitiness in retroviral ribosomal frameshiting. Virology 2001; 279(1):130-135. 80. Carlson BA, Lee BJ, Hatield DL. Ribosomal frameshiting in response to hypomodiied tRNAs in Xenopus oocytes. Biochem Biophys Res Commun 2008; 375(1):86-90. 81. Urbonavicius J, Stahl G, Durand JM et al. Transfer RNA modiications that alter +1 frameshiting in general fail to afect -1 frameshiting. RNA 2003; 9(6):760-768. 82. Waas WF, Druzina Z, Hanan M et al. Role of a tRNA base modiication and its precursors in frameshiting in eukaryotes. J Biol Chem 2007; 282(36):26026-26034. 83. Rodriguez V, Chen, Y, Elkahloun, A et al. Chromosome 8 BAC array comperative genomic hybridization and expression analysis identify ampliication and overexpression of TRM12 in breast cancer. Genes, chromosomes and cancer 2007; 46:694-707. 84. Smith DW, McNamara AL, Mushinski JF et al. Tumor-speciic, hypomodiied phenylalanyl-tRNA is utilized in translation in preference to the fully modiied isoacceptor of normal cells. J Biol Chem 1985; 260(1):147-151.

Chapter 30

Multicomponent 2ʹ-O-Ribose Methylation Machines: Evolving Box C/D RNP Structure and Function Keith T. Gagnon, Guosheng Qu and E. Stuart Maxwell*

Abstract

M

ethylation at the 2ʹ-O-ribose position is an abundant nucleotide modiication of both eukaryal and archaeal RNAs. he methyltransferase responsible for this modiication is frequently a ribonucleoprotein (RNP) complex consisting of a box C/D guide RNA and associated core proteins. hese RNP “machines” are responsible for the modiication of numerous cellular RNAs including ribosomal RNA, spliceosomal snRNAs and transfer RNAs. his chapter will review the structure and function of both eukaryotic and archaeal box C/D RNPs. A particular focus of our discussion will be the evolving components of the box C/D RNPs and the resultant consequences upon box C/D RNP structure and function.

Introduction

Guide RNAs for nucleotide modiication were irst described in the eukaryotic nucleolus where they were shown to modify ribosomal RNA. Based upon conserved sequence elements, these small nucleolar RNAs (snoRNAs) were classiied into two major families. he box C/D snoRNAs guide nucleotide 2ʹ-O-methylation whereas the H/ACA snoRNAs isomerize uridine to pseudouridine. Subsequent investigations revealed that box C/D and H/ACA guide RNAs are also found in Archaea. Further characterization of both eukaryotic and archaeal guide RNAs has demonstrated that they are bound by core proteins to form ribonucleoprotein (RNP) complexes. Both RNP families accomplish nucleotide modiication using a similar mechanism. Guide RNAs utilize complementary sequences to base pair with speciic target RNAs, thus designating a speciic nucleotide for modiication. he RNA-bound core proteins catalyze the 2ʹ-O-methyl transfer and pseudouridylation reactions. he focus of this chapter is the evolving structure and function of the box C/D RNPs. For a detailed discussion of the H/ACA RNP structure and function, the reader is referred to chapter by Grozdanov and Meier entitled “Multicomponent Machines in RNA Modiication: the H/ACA Ribonucleoproteins”.

Ribonucleotide Methylation and Methylation Function

Key features of ribose 2ʹ-O-methylation indicate that this abundant nucleotide modiication plays an important role in RNA folding and stability. Methylation at the ribose 2ʹ position stabilizes an RNA chain by inhibiting backbone cleavage and increasing the stability of base pairing and stacking interactions, thus potentially afecting the RNA’s structure and ultimately function.1,2 A number of important cellular RNAs are 2ʹ-O-methylated by box C/D RNPs. Although the *Corresponding Author: E. Stuart Maxwell—Department of Molecular and Structural Biochemistry, North Carolina State University, Raleigh, North Carolina, USA, 27695-7622. Email: [email protected]

DNA and RNA Modiication Enzymes: Structure, Mechanism, Function and Evolution edited by Henri Grosjean. ©2009 Landes Bioscience.

Multicomponent 2ʹ-O-Ribose Methylation Machines

437

function of rRNA modiication is not fully understood, disrupting box C/D snoRNA-directed 2ʹ-O-methylation results in slowed cell growth and reduced ability of cells to adapt to environmental changes with adverse afects on ribosome biogenesis and function.3-6 When mapped on the ribosome, these modiications cluster around functionally signiicant regions like the peptidyl transferase center.4 Eukaryotic spliceosomal RNAs (snRNA) are also methylated by a variety of box C/D snoRNAs and small Cajal RNAs (scaRNAs), RNAs localized to nuclear Cajal bodies that can contain both box C/D and H/ACA motifs.7 Box C/D snoRNA-directed methylation of select mRNAs has been implicated in regulating RNA editing and splicing of brain mRNAs.8,9 Computational analyses have recently revealed that alternative splice junctions may also be targets for snoRNA-guided modiication as they are oten complementary to a number of “orphan” box C/D snoRNA guide regions.10 Unique to Archaea, box C/D RNAs guide methylation of tRNAs, thus potentially afecting not only tRNA folding and structure but also tRNA function in translation.11,12

Box C/D RNP Function

he primary function of eukaryotic and archaeal box C/D RNPs is nucleotide methylation of diverse cellular RNAs. However, other functions in RNA metabolism have been demonstrated. In eukaryotes, box C/D snoRNAs function in pre-rRNA processing. Select box C/D snoRNAs are essential for speciic endonucleolytic cleavage events in pre-rRNA maturation, likely functioning as “organizers” for a trans-acting RNase.13-15 Several box C/D snoRNAs also play roles in pre-rRNA folding.15-17 For both functions, the box C/D snoRNA utilizes complementary sequences to base pair with the pre-rRNA. Notably, these additional functions have not yet been observed for archaeal box C/D sRNAs.18 his may relect a more limited examination of archaeal box C/D sRNA populations and functions or perhaps evolving functional roles of the box C/D snoRNP in eukaryotes.

Box C/D RNAs: Diversity of Sequence and Structure

Large populations of box C/D RNAs are found in eukaryotic and archaeal organisms. In various archaeal organisms, scores of box C/D sRNAs have been identiied using bioinformatic approaches and many have been experimentally veriied.19-22 However, the list of sRNAs remains small and is still limited to a handful of organisms. It appears that Archaea do not share box C/D RNA homologs with Eukarya, indicating an evolutionarily ancient divergence of eukaryotic and archaeal RNAs.19,23,24 Box C/D snoRNA populations are better deined in eukaryotes, although not nearly complete. In the unicellular eukaryote yeast, the deined box C/D snoRNA population consists of 46 species.25,26 In humans, a larger population of over 100 box C/D snoRNAs has been identiied and this number is likely to grow signiicantly.24 Interestingly, the identiication of brain-speciic species in mammals suggests an expanding complexity of tissue-speciic RNAs and perhaps snoRNA function in metazoan organisms.9,27,28 Even more numerous may be the plant box C/D RNAs whose populations are predicted to be in the hundreds.29 Although box C/D snoRNAs from diferent eukaryotic organisms can guide evolutionarily conserved modiications, most nucleotides targeted for modiication are unique to a given organism, relecting the general lack of snoRNA species conservation among eukaryotes. he hallmark of box C/D guide RNAs are the box C (RUGAUGA) and box D (CUGA) sequence elements located at the 5ʹ and 3ʹ RNA termini, respectively (Fig. 1A). Frequently present are internal box C’ and D’ elements which are well conserved in archaeal sRNAs but oten diicult to discern in eukaryotic snoRNAs.30 hese terminal and internal boxes establish the box C/D and C’/D’ motifs, respectively. Both motifs fold into RNA elements known as kink-turns (K-turns) irst revealed in U4 snRNA and archaeal ribosomes.31,32 K-turns are characterized by an asymmetric bulge lanked by two stems and stabilized by tandem, sheared G:A pairs. he G:A pairs hydrogen bond across the bulge to generate a sharp, archetypical bend, or kink of approximately 60˚ in the RNA backbone.31 Importantly, internal C’/D’ motifs fold into a modiied K-turn structure where canonical stem I is replaced by a loop. hese modiied K-turns have been

438

DNA and RNA Modii cation Enzymes

Figure 1. Archaeal and Eukaryal Box C/D RNPs. A) Secondary structural elements of box C/D RNAs and tertiary structure of the K-turn (box C/D) and K-loop (box C’/D’) motifs. Conserved box C, D, C’ and D’ nucleotides are indicated. B) RNP structure with protein distribution based upon current experimental evidence. Archaeal sRNP protein distribution based upon in vitro RNP assembly. Eukaryotic snoRNP protein distribution based upon in vivo crosslinking and in vitro protein binding. See text for specific experiments.

designated “K-loops”.33 K-turns have also been observed in mRNAs, archaeal H/ACA sRNAs and even the SAM riboswitch.34-36 K-turn and K-loop motifs are typically protein binding platforms, important for stabilizing tertiary RNA and RNP structures. Individual box C/D RNA species are deined by their unique guide sequences located upstream of boxes D and D’. Guide sequences are

Multicomponent 2ʹ-O-Ribose Methylation Machines

439

10-21 nucleotides long and complementary to their respective target RNA. It is the target RNA nucleotide which is base paired to the ith nucleotide of the guide sequence that is speciically 2ʹ-O-methylated by the RNP complex.37 Diferences in size and structure between the archaeal and eukaryotic box C/D RNAs has contributed to structural and perhaps functional diversity. Archaeal sRNAs are smaller (50-70 nucleotides) and possess terminal box C/D and internal C’/D’ motifs separated by minimal guide regions. Guide region length is highly conserved at 12 nucleotides in archaeal box C/D sRNAs and thus box C/D and C’/D’ motif spacing is conserved.20,38 Interestingly, circular box C/D sRNAs have been reported in some archaeal organisms.39 In contrast, eukaryotic snoRNAs are larger in size (most oten greater than 75 nucleotides) with signiicantly larger guide regions and associated spacer sequences between the two motifs. For those box C/D snoRNAs with hard to deine or missing C’/D’ motifs, the D guide sequence and associated spacer region can be quite large.24,25,29,38 Some eukaryotic box C/D snoRNAs utilize their guide regions for pre-rRNA processing steps. he larger RNA size and correspondingly larger guide regions may have contributed to and even facilitated the functional diversity of box C/D snoRNPs in Eukarya.

Box C/D RNP Structure and Assembly

Mature box C/D RNAs are assembled as ribonucleoprotein complexes bound with a limited number of highly conserved core proteins (Fig. 1B). Eukaryotic box C/D snoRNPs contain four conserved core proteins: the 15.5kD protein, nucleolar proteins Nop56 and Nop58 and the methyltransferase enzyme ibrillarin.40-42 hree highly homologous proteins, ribosomal protein L7Ae, Nop56/58 and ibrillarin, bind the archaeal box C/D sRNAs to assemble a simpler and what could be considered minimal box C/D sRNP complex.43,44 In vitro reconstitution of catalytically active archaeal box C/D sRNPs has revealed an order of core protein binding.44,45 L7Ae initiates sRNP assembly by binding the K-turn and K-loop motifs of the terminal box C/D and internal C’/D’ motifs, respectively.43-45 Nop56/58 and then ibrillarin bind both the terminal box C/D and internal C’/D’ motifs to assemble a “symmetric” sRNP with all three core proteins bound to both motifs.44,45 he assembly of a symmetric RNP is essential for eicient nucleotide methylation.45,46 Initial binding of L7Ae core protein stabilizes K-turn and K-loop structures and remodels the box C/D RNA to facilitate subsequent binding of the Nop56/58 and ibrillarin proteins.47,48 Remodeling of the sRNA continues with binding of Nop56/58 while ibrillarin has no signiicant efect on RNA structure.48 For the archaeal complex, RNA remodeling requires elevated temperature to increase RNA structure dynamics, thus facilitating core protein binding. Notably, in vitro assembly of archaeal box C/D sRNPs does not require accessory proteins for either RNA remodeling or hierarchical core protein binding.44,45 In contrast to the symmetric archaeal sRNP, the eukaryotic box C/D snoRNPs assemble an apparently “asymmetric” complex.49 he 15.5kD protein initiates snoRNP assembly but appears to bind only the K-turn of the terminal box C/D motif.50 Core proteins Nop58 and Nop56 have been predicted to bind the C/D and C’/D’ motifs, respectively, based upon in vivo crosslinking experiments.49 Only ibrillarin appears to be associated with both motifs. Unfortunately, the lack of a functional in vitro assembly system for the eukaryotic complex has hampered a more detailed analysis of box C/D snoRNP assembly and structure. Limited knowledge of eukaryotic box C/D snoRNP assembly has nonetheless revealed a highly complex and dynamic process requiring accessory factors. Assembly of the mammalian pre-snoRNP requires two trans acting AAA+ ATPases, TIP48 and TIP49.51,52 Additional processing/assembly factors for the U3 snoRNP include TGS1, La, LSm proteins and the exosome as well as nucleocytoplasmic transport factors such as PHAX, CRM1, CBC, Ran and Nopp140.51,53 Four novel human biogenesis factors (BCD1, NOP17, NUFIP and TAF9), which are likely to be involved in the formation of the U8 pre-snoRNP, have also been identiied.54 Most recently, the heat shock protein Hsp90 has been implicated in orchestrating assembly of the eukaryotic complex.55,56 Whereas archaeal sRNPs require elevated temperature (accessory factors in vivo?) to facilitate RNA remodeling required for in vitro sRNP assembly, the eukaryotic snoRNPs require

440

DNA and RNA Modii cation Enzymes

multiple accessory factors for in vitro and in vivo assembly. hese accessory factors are presumed to promote RNA remodeling and facilitate sequential core protein binding, an apparently common theme of both archaeal and eukaryotic box C/D RNP biogenesis. he vast majority of higher eukaryotic snoRNA coding sequences are positioned within introns of RNA Polymerase II protein-coding host genes. A second genomic organization, prevalent in yeast and plants, is box C/D snoRNA genes transcribed from independent RNA Pol II (infrequently Pol III) promoters.57 Archaeal box C/D sRNA genes, although not well characterized, appear to be intergenic and transcribed from independent promoters.11,20 Transcription of intronic box C/D snoRNAs is coupled with the transcription of the host pre-mRNA and linked to splicing.58 Box C/D snoRNP assembly is also coupled with posttranscriptional processing, maturation and transport to the nucleolus.59 he diferences in genomic organization for the eukaryotic box C/D snoRNAs versus archaeal sRNA coding sequences perhaps relects an evolution of gene structure for purposes of regulated expression.

Structure, Function and Evolution of the L7Ae/15.5kD Core Protein

Archaeal core protein L7Ae intitiates sRNP assembly by binding the terminal box C/D (K-turn) and internal C’/D’ (K-loop) motifs.44,45 L7Ae binding remodels sRNA structure and establishes a platform for subsequent box C/D sRNP core protein binding.47,48,60 Eukaryotic 15.5kD protein similarly initiates snoRNP assembly by binding the terminal box C/D core motif ’s K-turn.43,45,50 he diferential binding of L7Ae and 15.5kD proteins to K-turn and K-loop motifs in vitro is striking as the crystal structures of both proteins are nearly superimposable and their RNA-binding domains are well conserved across both domains of life31,61,62 (Fig. 2). L7Ae and 15.5kD are members of the L7Ae/L30 protein family31,61,62 (Fig. 2). Additional members of this family include rpL30e in Archaea and Rpp38, rpL30, rpL7a, SBP2 and Nhp2p proteins in Eukarya. Proteins in this closely related family are typically small and composed of an internal beta sheet surrounded by several alpha helices, a three-layer topology fold known as an alpha-beta-alpha sandwich.32,34 hey possess conserved RNA binding domains, almost uniformly recognize K-turn motifs and play critical roles in RNA stabilization and RNP assembly.31,32,34,42,55,61-63 Each family member is interesting from a functional standpoint. Family members in both Archaea and Eukarya function as ribosomal proteins of the large subunit. Eukaryotic ribosomal protein rpL30 is also capable of binding its own mRNA to regulate translation and ribosomal protein Rpp38 is a constituent protein of the MRP complex.34,64 SBP2 is another mRNA-binding protein, recognizing those mRNAs possessing the SECIS RNA element important for selenocysteine incorporation into selenoproteins. It consists of multiple domains including one very similar to that found in the L7Ae/15.5kD protein.65 An L7Ae/L30 sequence appears to have been inserted during genomic shuling, thus conferring K-turn RNA binding capability upon SBP2.35 he 15.5kD protein is not only a box C/D snoRNP core protein but also a component of the spliceosomal U4 snRNP where it also binds a K-turn motif in U4 and functions in snRNP assembly.32,42,60 Eukaryotic nonhistone chromosomal protein 2 (Nhp2p) is a core protein of the eukaryotic H/ACA snoRNPs and highly homologous to both archaeal L7Ae and eukaryotic 15.5kD proteins. Nhp2p binds a stem loop of the box H/ACA snoRNAs and is essential for H/ ACA snoRNP assembly. Notably, Nhp2p stands out as being the sole L7Ae/L30 family member without clear RNA-binding speciicity. Speciic recruitment of Nhp2p to the assembling snoRNP requires interaction with the RNA and other core proteins.66 Its functional equivalent in the archaeal H/ACA sRNP is L7Ae, the only guide RNA core protein of both domains to be found in both the box C/D and H/ACA RNPs. Despite great similarities in sequence and folded structure, each L7Ae/L30 family member has suiciently diverged such that its binding is speciic for the K-turn of its respective cognate RNA.31,34,35,45,50,66 he recurring theme of L7Ae/L30 protein function is RNP formation via recognition of the K-turn motif. he binding of L7Ae/L30 proteins to a variety of RNAs provides insight into the evolutionary emergence of the L7Ae/L30 protein family and even evolution of the box C/D RNPs. he limited number of L7Ae/L30 proteins in Archaea (two) and expansion of family members in

Multicomponent 2ʹ-O-Ribose Methylation Machines

441

Figure 2. Conserved Sequence and Structure of the L7Ae/L30 Protein Family. A) Sequence alignment of the conserved RNA-binding domain of known L7Ae/L30 protein family members. Conservation is indicated by shaded amino acids and the bar graph alignment below. B) Superimposed structures of M. jannaschii L7Ae and human 15.5kD bound to their respective K-turn RNAs. M. jannaschii L7Ae (1RLG) is shown in black and human 15.5kD (1E7K) is shown in white.

eukaryotes (six) suggest a continuing evolution and diversity of protein structure and function, particularly in eukaryotic organisms. L7Ae is a component of three separate RNPs in Archaea whereas in eukaryotes these same functions are carried out by three separate but closely related family members (ribosomal protein L7a, 15.5kD, Nhp2p). his would suggest that L7Ae is the progenitor of the L7Ae/ L30 protein family.31,43,45,50,62 We have previously proposed that L7Ae or an L7Ae-like protein binding a K-turn motif in a primitive RNP translational apparatus may be the ancestral RNP complex for this protein family.23 he utilization of a single archaeal core protein to bind K-turns in both archaeal box C/D and H/ACA RNPs suggests a common RNP origin for both guide RNP families early in evolution. he absence of L7Ae/L30 RNP complexes in Eubacteria implies an emergence and evolution of the protein family ater divergence of Archaea and Eukarya from Eubacteria.

442

DNA and RNA Modii cation Enzymes

Evolution of L7Ae/15.5kD RNA-binding capabilities may well have facilitated evolution of box C/D RNP structure and hence function. he minimal archaeal box C/D sRNP makes use of L7Ae for both K-turn and K-loop binding, thus assembling a symmetric sRNP whose box C/D and C’/D’ RNPs are spatially constrained and functionally coupled.38,45 In contrast, the evolved binding capability of eukaryotic 15.5kD to recognize only the box C/D K-turn could have allowed greater structural and consequently functional snoRNP diversity. Lack of sequence conservation in the C’/D’ motif of eukaryotic snoRNAs may relect a concomitant loss of 15.5kD binding, resulting in spatial decoupling of the internal and terminal motifs.38,45 hus, modern day eukaryotic box C/D snoRNAs are less conserved in sequence and larger in size. his lexibility in snoRNA structure may have allowed the eukaryotic complexes to drit further, acquiring new functions such as chaperoning pre-rRNA processing events.

Structure, Function and Evolution of the NOP56 and NOP58 Core Proteins

he Nop core proteins play essential structural and functional support roles in the box C/D RNPs. A single Nop56/58 is found in Archaea while two homologs, presumably arising from gene duplication and designated Nop56 and Nop58, are present in Eukarya.41,44 heir roles include bridging protein interactions within box C/D RNPs, RNA remodeling during RNP assembly, ibrillarin recruitment and assisting the methyltransferase reaction. Archaeal Nop56/58 helps to remodel RNA structure during in vitro box C/D sRNP assembly by restructuring guide regions and box elements ater initial remodeling by L7Ae.48 Nop56/58 interactions with other core proteins may afect RNA remodeling, perhaps helping to establish bridging interactions between the box C/D and C’/D’ RNPs.48,67 Archaeal Nop56/58 and eukaryotic Nop56 and Nop58 proteins interact with ibrillarin.45,67,68 Evidence from in vitro assembly of the archaeal sRNP suggests that Nop56/58 and ibrillarin may bind the assembling complex as a dimer.45,48,67 While the methyltransferase ibrillarin clearly interacts with guide and target RNAs, its binding in the archaeal sRNP is primarily through interaction with Nop56/58.45,48,67,69 Archaeal Nop56/58 may also assist in catalysis of the methyltransferase reaction as critical Nop56/58 amino acids are positioned adjacent to the S-adenosyl-l-methionine binding site of ibrillarin.70 Only a few members of the Nop protein superfamily have been well characterized. hey include the box C/D RNP core proteins Nop56/58 in Archaea, Nop56 and Nop58 in Eukarya and eukaryotic Prp31 (pre-mRNA processing factor 31).41,45,60,67-69 Nop proteins are composed of an N-terminal domain, a central coiled-coil domain, a Nop domain and a variable lysine-rich C-terminal tail (Fig. 3A). he N-terminal domain is not well characterized in eukaryotes but is responsible for dimerization with ibrillarin in Archaea.67 he coiled-coil domain may mediate protein interactions with other core proteins or regulatory factors. Crystal structures of the Nop56/58-ibrillarin dimer from Archaea show that the coiled-coil domain can dimerize with itself, leading to the suggestion that this interaction could mediate protein-protein or crosstalk interactions between the box C/D and C’/D’ RNPs.67,69 Best understood is the Nop domain, the deining feature of the Nop superfamily, which comprises most of the C-terminal region. A recent U4-15.5kD-Prp31 RNP crystal structure has provided new insight into the role of this domain in RNP assembly60 (Fig. 3B). he Prp31 Nop domain makes nearly equal contact with both U4 RNA and the 15.5kD protein, thus explaining a need for 15.5kD to be bound to U4 for Prp31 interaction.71 In a similar manner, archaeal Nop56/58 binds a box C/D RNA only ater L7Ae has irst bound the K-turn or K-loop motif.45 hus, the Nop proteins may serve as checkpoints in RNP assembly, ensuring that the K-turn recognition protein has irst bound RNA. Deletion of the Nop domain completely disrupts binding to the box C/D RNA-L7Ae complex, indicating that it is the necessary RNP assembly module of Nop protein family members.67 he highly charged, lysine-rich C-terminal tail, also called a KKE/D repeat, remains an enigma. It is poorly conserved in sequence and length and appears to be dispensable for Nop protein function in both Eukarya and Archaea.68,69

Multicomponent 2ʹ-O-Ribose Methylation Machines

443

Figure 3. Nop Protein Structure and RNP Interaction. A) Crystal structure of the Archaeoglobus fulgidus Nop56/58 core protein (1NT2). N-terminal, coiled-coil and C-terminal (Nop) domains are shown in black, gray and white, respectively. B) Crystal structure of the human Prp31 Nop domain protein bound to the 15.5kD-U4 snRNA RNP (2OZB) through its C-terminal (Nop) domain. Prp31 is shown in black, 15.5kD in gray and the U4 K-turn in white.

Eukaryotic box C/D snoRNPs may owe much of their structural and functional diversity to evolution of the Nop56/58 core protein. In archaeal box C/D sRNPs, the Nop56/58 protein binds both box C/D and C’/D’ motifs.45 In contrast, crosslinking experiments indicate that eukaryotic Nop56 and Nop58 may diferentially bind the C’/D’ and C/D motifs, respectively.49 Nop56 and Nop58 are highly related, with the mouse proteins having 43% identity and 63% similarity.41 Archaeal Nop56/58 from Methanocaldococcus jannaschii is 57% and 59% similar to mouse Nop56 and Nop58, respectively. hus, gene duplication of Nop56/58 coding sequence followed by co-evolution of the two eukaryotic proteins and the box C/D RNA could contribute to the apparent asymmetric structure of eukaryotic box C/D snoRNPs.23 As 15.5kD does not recognize the K-loop, association of Nop56 with the C’/D’ motif could suggest that this Nop protein has acquired the ability to bind RNA independently of 15.5kD.49,50 In vitro assembly of the archaeal sRNP has also shown that archaeal Nop56/58 along with ibrillarin can speciically, albeit weakly, bind the K-loop motif in the absence of L7Ae.45 he possible diferential recognition of Nop56 and Nop58 proteins to K-loop and K-turn motifs, respectively, as well as the K-turn speciicity of the 15.5kD protein, could also contribute to the uncoupling of the eukaryotic box C/D and C’/D’ RNP complexes.

444

DNA and RNA Modii cation Enzymes

Structure, Function and Evolution of Fibrillarin

Fibrillarin is the catalytic protein of the box C/D RNPs, yet it plays only a minor role in RNP assembly. In Archaea, ibrillarin is recruited to the complex primarily through protein-protein interaction with the Nop56/58 protein.45,48,67 In eukaryotes, ibrillarin may play a more active role in assembly. Eukaryotic ibrillarin contacts the box C/D snoRNAs and association of Nop56 requires the presence of ibrillarin.49,72,73 Fibrillarin is recruited to the RNP at a late stage of assembly.44,45,48,51 Fibrillarin was originally predicted to be the methyltransferase enzyme based on its sequence similarity to other S-adenosyl-l-methionine (SAM)-dependent methylases74 (Fig. 4A). Subsequent in vitro reconstitution of box C/D RNPs44,45 and crystallographic analyses of archaeal ibrillarins66,67,75 provided further evidence of the methlytransferase function of ibrillarin. Despite this progress, it is still unknown exactly how ibrillarin interacts with guide and target RNAs to accurately methylate the target nucleotide. Eukaryotic and archaeal ibrillarin proteins have both common and unique features. hey all share a highly conserved alpha-beta carboxy-terminal domain (CTD) in which is nested a short consensus sequence, the SAM-binding motif.67,75,76 he CTD of M. jannaschii ibrillarin (Mjib) is approximately 60% identical and 80% similar to vertebrate ibrillarins between residues 25 and 95 of the CTD, which harbors the SAM-binding motif. Even in poorly related regions outside this segment (Mjib residues 95-227), archaeal and eukaryotic ibrillarins are about 40% identical and 65% similar76 (Fig. 4A). In contrast to the CTD, ibrillarin proteins have variable sequence and structure in their N-terminal domains (NTD). Eukaryotic ibrillarins oten contain a glycine-arginine-rich (GAR) domain which is necessary and suicient for nucleolar localization of eukaryotic box C/D snoRNPs.77 However, archaeal ibrillarins lack this domain and their N-terminal regions are much shorter67,75,76 (Fig. 4A). Moreover, the ibrillarin NTD varies within archaeal species and may confer diferent protein binding properties upon them.75,78 For example, the Mjib NTD was reported to facilitate dimerization of ibrillarin molecules through speciic β-strand interactions.76 In contrast, available evidence indicates that ibrillarin from both Archaeoglobus fulgidus and Pyrococcus furiosus exist as monomers in solution and in crystalline state.67,75 Despite a lack of signiicant sequence homology, the archaeal ibrillarin CTD is structurally similar to other SAM-dependent methylases. he consensus topology for the methyltransferase catalytic domain is a seven-stranded β-sheet lanked by three α-helices on each side76 (Fig. 4B). he CTD of MjFib forms a Rossman fold like other methyltransferases and only difers from the consensus topology by the addition of a minihelix (α5). Fibrillarin is most closely related to other SAM-dependent RNA methyltransferases, like RrmJ from E. coli which catalyzes site-speciic 2ʹ-O-methylation of rRNAs, tRNAs and mRNAs independent of a guide RNA.79 he site-speciic RNA methyltransferases (MTases) related to RrmJ and snoRNA-directed RNA MTases related to ibrillarin form a closely related monophyletic clade. hey possess a spatially superimposable tetrad of conserved residues localized in the heart of the substrate-binding pocket, three of which (K-D-K) are essential for activity79,80 (Fig. 4C). his invariant triad is considered a synapomorphy, an ancient feature derived from a common ancestor that might have possessed ribose 2ʹ-O-MTase activity. Collectively, these observations suggest that methyltransferase enzymes evolved from a common ancestor to acquire substrate-speciic activities. Fibrillarin relies upon a guide RNA and other core proteins in an assembled box C/D RNP to catalyze nucleotide-speciic 2ʹ-O-methylation.44,45 Most other methyltranferases utilize accessory domains for substrate speciicity. For example, the DNA methylase HhaI recognizes and binds its double-stranded DNA substrate by utilizing a large peripheral domain which binds the DNA and lips the target base out of the duplex for modiication (for details see chapter by Klimasauskas and Liutkeviciute in this book).81 Evolution of ibrillarin appears to have occurred within the box C/D RNPs as well. Archaeal ibrillarins possess organism-speciic NTDs while eukaryotic ibrillarins have related GAR domains.78 Aside from afecting nucleolar localization, the GAR domain serves

Multicomponent 2ʹ-O-Ribose Methylation Machines

445

Figure 4. Conserved Sequence and Structure of Fibrillarin. A) Sequence alignment of three eukaryotic and three archaeal fibrillarins with the E. coli RrmJ methyltransferase. Degree of conservation is indicated by shades of gray. The highly conserved SAM-binding motif is boxed. B) Crystal structure of M. jannaschii fibrillarin (1FBN). The variable N-terminal domain is light gray, the SAM-binding motif circled and highly conserved catalytic residues designated [black sticks]. C) Spatial superposition of the E. coli RrmJ catalytic residues (black) (1EIZ) with those of M. jannaschii fibrillarin (light gray). The invariant catalytic triad (K-D-K) is labeled and peptide backbones are illustrated with lines.

as an interaction domain with the SMN protein which is transiently associated with nascent box C/D snoRNPs and important for assembly.82

446

DNA and RNA Modii cation Enzymes

Interestingly, eukaryotic ibrillarin may have other roles in addition to ribose methylation. Most eukaryotic box C/D snoRNPs appear to direct only one ribose methylation per snoRNA, even though ibrillarin is believed to bind both box C/D and C’/D’ motifs. Notably, box C/D snoRNPs involved only in pre-rRNA processing or folding, such as U3 and U8, also contain the ibrillarin core protein.54,83 hese observations suggest that eukaryotic ibrillarin may have acquired a more structural role in some RNPs and may possess other functions aside from strictly catalyzing the methyltransfer reaction.

he Evolving Box C/D RNP Machinery

RNA-guided nucleotide modiication complexes are ancient RNA:protein enzymes found in both Eukarya and Archaea. Despite their conservation in these two domains of life, the box C/D RNPs exhibit domain-speciic structural and functional features indicating an evolving RNP over time. he archaeal sRNP complex can well be considered a minimal RNP composed of smaller RNAs, three core proteins, with spatially and functionally coupled box C/D and C’/D’ RNPs. Known RNA targets are conined to ribosomal and transfer RNAs and its only function appears to be nucleotide modiication. he sRNAs are directly transcribed from intergenic genes and assembly of the sRNP does not require, at least in vitro, accessory proteins. In contrast, the eukaryotic snoRNP is more complex both structurally and functionally. It is composed of larger RNAs, one additional core protein resulting from gene duplication, with poorly conserved C’/D’ RNPs that do not appear to be spatially linked to the box C/D RNP. SnoRNP target RNAs are more diverse and RNP functions include rRNA folding and processing as well as nucleotide modiication. he snoRNA genes are varied in genomic organization, oten transcribed as introns, and snoRNA processing is essential with RNP assembly requiring numerous assembly factors.

Future Directions

In this chapter, we have presented the current state of knowledge concerning the structure and function of the box C/D RNPs. Our focus has been comparison of the archaeal and eukaryotic complexes, detailing their diferences to provide the reader with an overview of the evolving box C/D RNP complexes. However, much remains to be learned about box C/D RNA and RNP evolution. Computational approaches with improved bioinformatic tools to mine ever-growing genome and transcriptome databases will further deine box C/D RNA populations. Biochemical approaches coupled with deep sequencing will also contribute to our understanding of box C/D RNA diversity, particularly with respect to tissue-speciic populations. hese approaches will not only deine new box C/D RNAs but also reveal how RNA populations have evolved and target RNAs have expanded. Novel functions are likely to emerge as a consequence of expanding box C/D RNA and target RNA databases. Establishing the core protein composition of the C/D and C’/D’ RNP sub-complexes will reveal how the eukaryotic complex has retained structural aspects of the minimal archaeal sRNP core structure while evolving to accommodate or even facilitate new box C/D snoRNA functions. Of particular importance will be the identiication of additional snoRNP proteins that could potentially play important roles in the more structurally complex snoRNP and its expanded functions. Of particular interest will be a better understanding of box C/D RNA genomic organization and RNP assembly. How have the box C/D snoRNA genes evolved to become predominantly intronic and oten clustered? What are the implications of this organization and does it imply gene movement during evolution? Why is the expression of the intronic box C/D snoRNAs coordinated with that of their host genes and what are the functional implications of coordinated expression? What role do speciic transcription, RNA processing, and/or RNP assembly factors play in coordinating and/or regulating the potential diferential expression of these RNAs? Clearly, more remains to be learned about the evolution of these ancient RNA:protein enzymes and the coming years are certain to yield exciting and unexpected indings.

Multicomponent 2ʹ-O-Ribose Methylation Machines

447

Acknowledgements

he authors would like to thank Skip Fournier, Mike and Becky Terns, Tom Meier and Yi-Tao Yu for helpful comments on our chapter. his work was supported by NSF Grant MCB 0543741 to ESM.

References

1. Helm M. Post-transcriptional nucleotide modiication and alternative folding of RNA. Nucl Acids Res 2006; 34:721-733. 2. Chow C, Lamichhane TN, Mahto SK. Expanding the nucleotide repertoire of the ribosome with posttranscriptional modiications. ACS Chem Biol 2007; 2:610-619. 3. Tollervey D, Lehtonen H, Jansen R et al. Temperature-sensitive mutations demonstrate roles for yeast ibrillarin in pre-rRNA processing, pre-rRNA methylation and ribosome assembly. Cell 1993; 72:443-457. 4. Decatur W, Fournier MJ. rRNA modiications and ribosome function. Trends Biochem Sci 2002; 27:344-351. 5. Liang X, Hury A, Hoze E et al. Genome-wide analysis of C/D and H/ACA-like small nucleolar RNAs in Leishmania major indicates conservation among trypanosomatids in the repertoire and in their rRNA targets. Eukaryot. Cell 2007; 6:361-377. 6. Esguerra J, Warringer J, Blomberg A. Functional importance of individual rRNA 2ʹ-O-ribose methylations revealed by high-resolution phenotyping. RNA 2008; 14:649-656. 7. Darzacq X, Jady BE, Verheggen C et al. Cajal body-speciic small nuclear RNAs: a novel class of 2ʹ-O-methylation and pseudouridylation guide RNAs. EMBO J 2002; 21:2746-2756. 8. Beal P, Maydanovych O, Pokharel S. he chemistry and biology of RNA editing by adenosine deaminases. Nucl Acids Symp Ser 2007; 51:83-84. 9. Kishore S, Stamm S. he snoRNA HBII-52 regulates alternative splicing of the serotonin receptor 2C. Science 2006; 311:230-232. 10. Bazeley P, Shepelev V, Talebizadeh Z et al. SnoTARGET shows that human orphan snoRNA targets locate close to alternative splice junctions. Gene 2008; 408:172-179. 11. Dennis PP, Omer A, Lowe T. A guided tour: small RNA function in Archaea. Mol Microbiol 2001; 40:509-519. 12. Singh SK, Gurha P, Tran EJ et al. Sequential 2ʹ-O-methylation of archaeal pretRNATrp nucleotides is guided by the intron-encoded but trans-acting box C/D ribonucleoprotein of pre-tRNA. J Biol Chem 2004; 279:47661-47671. 13. Hughes JM, Ares Jr, M. Depletion of U3 small nucleolar RNA inhibits cleavage in the 5ʹ external transcribed spacer of yeast pre-ribosomal RNA and impairs formation of 18S ribosomal RNA. EMBO J 1991; 10:4231-4239. 14. Morrissey JP, Tollervey D. Yeast snR30 is a small nucleolar RNA required for 18S rRNA synthesis. Mol Cell Biol 1993; 13:2469-2477. 15. Liang WQ, Fournier MJ. U14 base-pairs with 18S rRNA: a novel snoRNA interaction required for rRNA processing. Genes Dev 1995; 9:2433-2443. 16. Peculis B, Steitz J. Disruption of U8 nucleolar snRNA inhibits 5.8S and 28S rRNA processing in the Xenopus oocyte. Cell 1993; 73:1233-1245. 17. Beltrame M, Tollervey D. Base pairing between U3 and the pre-ribosomal RNA is required for 18S rRNA synthesis. EMBO J 1995; 14:4350-4356. 18. Schoemaker RJ, Gultyaev AP. Computer simulation of chaperone efects of archaeal C/D box sRNA binding on rRNA folding. Nuc. Acids Res 2006; 34:2015-2026. 19. Omer AD, Lowe TM, Russell AG et al. Homologs of small nucleolar RNAs in Archaea. Science 2000; 288:517-522. 20. Gaspin C, Cavaille J, Erauso G et al. Archaeal homologs of eukaryotic methylation guide small nucleolar RNAs: lessons from the Pyrococcus genomes. J Mol Biol 2000; 297:895-906. 21. Huttenhofer A, Cavaille J, Bachellerie JP. Experimental RNomics: a global approach to identifying small nuclear RNAs and their targets in diferent model organisms. Methods Mol Biol 2004; 265:409-428. 22. Schattner P, Brooks AN, Lowe TM. he tRNAscan-SE, snoscan and snoGPS web servers for the detection of tRNAs and snoRNAs. Nuc Acids Res 2005; 33:W686-W689. 23. Tran E, Brown J, Maxwell ES. Evolutionary origins of the RNA-guided nucleotide-modiication complexes: from the primitive translation apparatus? Trends Biochem Sci 2004; 29:343-350. 24. Lestrade L, Weber MJ. snoRNA-LBME-db, a comprehensive database of human H/ACA and C/D box snoRNAs. Nuc Acids Res 2006; 34:D158-D162. 25. Samarsky DA, Fournier MJ. A comprehensive database for the small nucleolar RNAs from Saccharomyces cerevisiae. Nuc Acids Res 1999; 27:161-164.

448

DNA and RNA Modii cation Enzymes

26. Piekna-Przybylska D, Decatur WA, Fournier MJ. New bioinformatic tools for analysis of nucleotide modiications in eukaryotic rRNA. RNA 2007; 13:305-312. 27. Cavaille J, Buiting K, Keifmann M et al. Identiication of brain-speciic and imprinted small nucleolar RNA genes exhibiting an unusual genomic organization. Proc Natl Acad Sci USA 2000; 97:14311-14316. 28. Nahkuri S, Tat RJ, Korbie DJ et al. Molecular evolution of the HBII-52 snoRNA cluster. J Mol Biol 2008; 381:810-815. 29. Brown JW, Echeverria M, Qu LH et al. Plant snoRNA database. Nuc Acids Res 2003; 31:432-435. 30. Kiss-Laszlo Z, Henry Y, Kiss T. Sequence and structural elements of methylation guide snoRNAs essential for site-speciic ribose methylation of pre-rRNA. EMBO J 1998; 17:797-807. 31. Klein DJ, Schmeing TM, Moore PB et al. he kink-turn: a new RNA secondary structure motif. EMBO J 2001; 20:4214-4221. 32. Vidovic I, Nottrot S, Hartmuth K et al. Crystal structure of the spliceosomal 15.5kD protein bound to a U4 snRNA fragment. Mol Cell 2000; 6:1331-1342. 33. Nolivos S, Carpousis AJ, Clouet-d’Orval B. he K-loop, a general feature of the Pyrococcus C/D guide RNAs, is an RNA structural motif related to the K-turn. Nucl Acids Res 2005; 33:6507-6514. 34. Mao H, White SA, Williamson JR. A novel loop-loop recognition motif in the yeast ribosomal protein L30 autoregulatory RNA complex. Nat Struct Biol 1999; 6:1139-1147. 35. Clery A, Bourguignon-Igel V, Allmang C et al. An improved deinition of the RNA-binding speciicity of SECIS-binding protein 2, an essential component of the selenocysteine incorporation machinery. Nucl Acids Res 2007; 35:1868-1884. 36. Montange RK, Batey RT. Structure of the S-adenosylmethionine riboswitch regulatory mRNA element. Nature 2006; 441:1172-1175. 37. Kiss-Laszlo Z, Henry Y, Bachellerie M et al. Site-speciic ribose methylation of pre-ribosomal RNA: a novel function for small nucleolar RNAs. Cell 1996; 85:1077-1088. 38. Tran EJ, Zhang X, Lackey L et al. Conserved spacing between the box C/D and C’/D’ RNPs of the archaeal box C/D sRNP complex is required for eicient 2ʹ-O-methylation of target RNAs. RNA 2005; 11:285-293. 39. Starostina NG, Marshburn S, Johnson LS et al. Circular box C/D RNAs in Pyrococcus furiosus. Proc Natl Acad Sci 2004; 101:14097-14101. 40. Wu P, Brockenbrough JS, Metcalfe AC et al. Nop5p is a small nucleolar ribonucleoprotein component required for pre-18s rRNA processing in yeast. J Biol Chem 1998; 273:16453-16463. 41. Newman DR, Kuhn JF, Shanab GM et al. Box C/D snoRNA-associated proteins: two pairs of evolutionarily ancient proteins and possible links to replication and transcription. RNA 2000; 6:861-879. 42. Watkins NJ, Segault V, Charpentier B et al. A common core RNP structure shared between the small nucleolar box C/D RNPs and the spliceosomal U4 snRNP. Cell 2000; 103:457-466. 43. Kuhn J, Tran E, Maxwell ES. Archaeal ribosomal protein L7 is a functional homolog of the eukaryotic 15.5kD/Snu13p snoRNP core protein. Nuc Acids Res 2002; 30:931-941. 44. Omer A, Ziesche S, Ebhardt H et al. In vitro reconstitution and activity of a C/D box methylation guide ribonucleoprotein complex. Proc Natl Acad Sci USA 2002; 99:5289-5294. 45. Tran EJ, Zhang X, Maxwell ES. Eicient RNA 2ʹ-O-methylation requires juxtaposed and symmetrically assembled archaeal box C/D and C’/D’ RNPs. EMBO J 2003; 22:3930-3940. 46. Hardin JW, Batey RT. he bipartite architecture of the sRNA in an archaeal box C/D complex is a primary determinant of speciicity. Nucl Acids Res 2006; 34:5039-5051. 47. Turner B, Melcher SA, Wilson TJ et al. Induced it of RNA on binding the L7Ae protein to the kink-turn motif. RNA 2005; 11:1192-1200. 48. Gagnon KT, Zhang X, Agris PF et al. Assembly of the archaeal box C/D sRNP can occur via alternative pathways and requires temperature-facilitated sRNA remodeling. J Mol Biol 2006; 362:1025-1042. 49. Cahill NM, Friend K, Speckman W et al. Site-speciic cross-linking analyses reveal an asymmetric protein distribution for a box C/D snoRNP. EMBO J 2002; 21:3816-3828. 50. Szewczak LB, DeGregorio SJ, Strobel SA et al. Exclusive interaction of the 15.5 kD protein with the terminal box C/D motif of a methylation guide snoRNP. Chem Biol 2002; 9:1095-1107. 51. Watkins NJ, Lemm I, Ingelinger D et al. Assembly and maturation of the U3 snoRNP in the nucleoplasm in a large dynamic multiprotein complex. Mol Cell 2004; 16:789-798. 52. King T, Decatur WA, Bertrand E et al. A well-connected and conserved nucleoplasmic helicase is required for production of box C/D and H/ACA snoRNAs and localization of snoRNP proteins. Mol Cell Biol 2001; 21:7731-7746. 53. Boulon S, Verheggen C, Jady BE et al. PHAX and CRM1 are required sequentially to transport U3 snoRNA to nucleoli. Mol Cell 2004; 16:777-787. 54. McKeegan KS, Debieux CM, Boulon S et al. A dynamic scafold of pre-snoRNP factors facilitates human box C/D snoRNP assembly. Mol Cell Biol 2007; 27:6782-6793.

Multicomponent 2ʹ-O-Ribose Methylation Machines

449

55. Boulon S, Marmier-Gourrier N, Pradet-Balade B et al. he Hsp90 chaperone controls the biogenesis of L7Ae RNPs through conserved machinery. J Cell Biol 2008; 180:579-595. 56. Zhao R, Kakihara Y, Gribun A et al. Molecular chaperone Hsp90 stabilizes Pih1/Nop17 to maintain R2TP complex activity that regulates snoRNA accumulation. J Cell Biol 2008; 180:563-578. 57. Bachellerie JP, Cavaille J, Huttenhofer A. he expanding snoRNA world. Biochimie 2002; 84:775-790. 58. Hirose T, Shu MD, Steitz JA. Splicing-dependent and -independent modes of assembly for intron-encoded box C/D snoRNPs in mammalian cells. Mol Cell 2003; 12:113-123. 59. Kiss T, Fayet E, Jady BE et al. Biogenesis and intranuclear traicking of human box C/D and H/ACA RNPs. Cold Spring Harb Symp Quant Biol 2006; 71:407-417. 60. Liu S, Li P, Dybkov O et al. Binding of the human Prp31 Nop domain to a composite RNA-protein platform in U4 snRNP. Science 2007; 316:115-120. 61. Charron C, Manival X, Clery A et al. he archaeal sRNA binding protein L7Ae has a 3D structure very similar to that of its eukaryal counterpart while having a broader RNA-binding speciicity. J Mol Biol 2004; 342:757-773. 62. Koonin EV, Bork P, Sander C. A novel RNA-binding motif in omnipotent suppressors of translation termination, ribosomal proteins and a ribosome modiication enzyme? Nucl Acids Res 1994; 22:2166-2167. 63. Moore T, Zhang Y, Fenley MO et al. Molecular basis of box C/D RNA-protein interactions: cocrystal structure of archaeal L7Ae and a box C/D RNA. Structure 2004; 12:807-818. 64. Welting TJM, van Venrooij WJ, Pruijn GJM. Mutual interactions between subunits of the human RNase MRP ribonucleoprotein complex. Nucl Acids Res 2004; 32:2138-2146. 65. Allmang C, Carbon P, Krol A. he SBP2 and 15.5 kD/Snu13p proteins share the same RNA-binding domain: identification of SBP2 amino acids important to SECIS RNA binding. RNA 2002; 8:1308-1318. 66. Wang C, Meier UT. Architecture and assembly of mammalian H/ACA small nucleolar and telomerase ribonucleoproteins. EMBO J 2004; 23:1857-1867. 67. Aittaleb M, Rashid R, Chen Q et al. Structure and function of archaeal box C/D sRNP core proteins. Nat Struct Biol 2003; 10:256-263. 68. Gautier T, Berges T, Tollervey D et al. Nucleolar KKE/D repeat proteins Nop56p and Nop58p interact with Nop1p and are required for ribosome biogenesis. Mol Cell Biol 1997; 17:7088-7098. 69. Oruganti S, Zhang Y, Li H et al. Alternative conformations of the archaeal Nop56/58-ibrillarin complex imply lexibility in box C/D RNPs. J Mol Biol 2007; 371:1141-1150. 70. Aittaleb M, Visone T, Fenley MO et al. Structural and thermodynamic evidence for a stabilizing role of Nop5p in S-adenosyl-L-methionine binding to ibrillarin. J Biol Chem 2004; 279:41822-41829. 71. Liu S, Rauhut R, Vornlocher H-P et al. he network of protein—protein interactions within the human U4/U6.U5 tri-snRNP. RNA 2006; 12:1418-1430. 72. Fatica A, Galardi S, Altieri F et al. Fibrillarin binds directly and speciically to U16 box C/D snoRNA. RNA 2000; 6:88-95. 73. Lafontaine DL, Tollervey D. Synthesis and assembly of the box C+D small nucleolar RNPs. Mol Cell Biol 2000; 20:2650-2659. 74. Niewmierzycka A, Clarke S. S-Adenosylmethionine-dependent methylation in Saccharomyces cerevisiae. Identiication of a novel protein arginine methyltransferase. J Biol Chem 1999; 274:814-824. 75. Deng L, Starostina NG, Liu ZJ et al. Structure determination of ibrillarin from the hyperthermophilic archaeon Pyrococcus furiosus. Biochem Biophys Res Comm 2004; 315:726-732. 76. Wang H, Boisvert D, Kim K et al. Crystal structure of a ibrillarin homologue from Methanococcus jannaschii, a hyperthermophile, at 1.6Å resolution. EMBO J 2000; 19:317-323. 77. Snaar S, Wiesmeijer K, Jochemsen AG et al. Mutational analysis of ibrillarin and its mobility in living human cells. J Cell Biol 2000; 151:653-662. 78. Amiri KA. Fibrillarin-like proteins occur in the domain Archaea. J Bacteriol 1994; 176:2124-2127. 79. Feder M, Pas J, Wyrwicz LS et al. Molecular phylogenetics of the RrmJ/ibrillarin superfamily of ribose 2'-O-methyltransferases. Gene 2003; 302:129-138. 80. Hager J, Staker BL, Bugl H et al. Active site in RrmJ, a heat shock-induced methyltransferase. J Biol Chem 2002; 277:41978-41986. 81. Klimasauskas S, Kumar S, Roberts RJ et al. HhaI methyltransferase lips its target base out of the DNA helix. Cell 1994; 76:357-369. 82. Jones KW, Gorzynski K, Hales CM et al. Direct interaction of the spinal muscular atrophy disease protein SMN with the small nucleolar RNA-associated protein ibrillarin. J Biol Chem 2001; 276:38645-38651. 83. Watkins NJ, Dickmanns A, Luhrmann R. Conserved stem II of the box C/D motif is essential for nucleolar localization and is required, along with the 15.5K protein, for the hierarchical assembly of the box C/D snoRNP. Mol Cell Biol 2002; 22:8342-8352.

Chapter 31

Multicomponent Machines in RNA Modiication: H/ACA Ribonucleoproteins Petar Grozdanov and U. homas Meier*

Abstract

P

seudouridylation, the isomerization of uridine to pseudouridine, is the most frequent posttranscriptional modiication of RNA, such that pseudouridine has even been termed the ith nucleotide. Whereas eubacteria employ single protein enzymes to identify and modify target uridines, archaebacteria and eukaryotes additionally evolved more complex modiication machines, H/ACA ribonucleoproteins (RNPs). Each H/ACA RNP consists of a short RNA and the same four core proteins, one of which is the pseudouridine synthase related to the bacterial single protein enzymes. In this chapter, we will give an overview of these multicomponent machines with emphasis on the eukaryal systems that have acquired additional functions and that are the subject of the inherited bone marrow failure syndrome dyskeratosis congenita.

Introduction

Nuclei of metazoans harbor several hundred individual small nucleolar ribonucleoproteins (snoRNPs) that predominantly function in RNA modiication. hey are divided into two major classes according to their function-deining snoRNAs, box H/ACA and box C/D snoRNPs, which pseudouridylate and 2ʹ-O-methylate their target RNAs, respectively. SnoRNAs guide the modiication by site-speciic base pairing while an enzyme (which is one of four core proteins of each RNP) catalyzes the reaction. Collectively, the snoRNAs account for one of the largest families of noncoding RNAs. In this overview, we will focus on the H/ACA class of RNPs (see chapter by Gagnon et al for C/D RNPs).

H/ACA RNAs

H/ACA RNAs are generally 60-150 ribonucleotides in length, noncoding, trans-acting molecules, for reviews see.1,2-9 Deining features of H/ACA RNAs are two hairpins separated by a short single stranded sequence (hinge), which includes an ANANNA consensus hexanucleotide, and an ACA triplet exactly three nucleotides from their 3ʹ-end (Fig. 1A).10,11 Although the number of hairpins can vary, H/ACA RNAs are conserved from archaea to mammals. he hairpins contain internal bulges and can difer in size and organization of stems and loops (Fig. 1A). he vast majority of H/ACA RNAs contain in their bulges two 3-10 ribonucleotide long stretches (3ʹ and 5ʹ of the upper stem) that are complementary to the sequences lanking their target uridines (Fig. 1A, arrows).12,13 Hence, these internal loops are also known as pseudouridylation pockets. So *Corresponding Author: U. Thomas Meier—Department of Anatomy and Structural Biology, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, New York 10461, USA. Email: [email protected]

DNA and RNA Modiication Enzymes: Structure, Mechanism, Function and Evolution edited by Henri Grosjean. ©2009 Landes Bioscience.

Multicomponent Machines in RNA Modiication: H/ACA Ribonucleoproteins

451

Figure 1. A) Schematic of an H/ACA RNA (black) with two hairpins separated by the hinge region containing the conserved ANANNA sequence and ending in ACA exactly three nucleotides the 3ʹ end. A substrate RNA (gray) is modeled into the bulge (pseudouridylation pocket) of the 3ʹ hairpin placing the target uridine (bold) and an unpaired nucleotide at the bottom of the upper stem while base pairing with the guide RNA on either side (arrows). B) Schematic of the four core proteins and their arrangement in the complex. The positions of the central catalytic domain of NAP57 (upper half) and of its PUA domain, with the C-terminus and N-terminus wrapped around (N-//PUA-C, lower half), are indicated. C) 3D structure of a fragment of human U65 H/ACA RNA (black) base pairing with a piece of 28S ribosomal RNA (gray).81 The flipped-out target uridine (U) is indicated (arrowhead) and the marked helices (arrows) correspond to those in (A). The structure is based on coordinates deposited in the Protein Data Bank (ID code 2P89)81 and was rendered using MacPyMol software (http:// www.pymol.org).

far, targets of these so-called guide RNAs are ribosomal RNAs (rRNAs) and spliceosomal small nuclear RNAs (snRNAs).12-15 Although originally and together with C/D RNAs identiied in nucleoli as snoRNAs, H/ACA RNAs are now subdivided into guide and nonguide RNAs (that function in pseudouridylation or not). he former are further categorized into snoRNAs (located in nucleoli and functioning in the pseudouridylation of rRNA) and small Cajal body RNAs (scaRNAs, located in Cajal bodies and functioning in the pseudouridylation of snRNAs). Cajal bodies are approximately one micron sized structures numbering one to ive in most nuclei and serving as locale of snRNA modiication.16-18 ScaRNAs contain a Cajal body-localizing element, the CAB box (5ʹ-ugAG-3ʹ), in the terminal loop of one or both hairpins (see chapter by Yu et al).19 Two types of scaRNAs are unique, one, in combining features of H/ACA and C/D RNAs yielding hybrids and, two, in forming a twin H/ACA RNA with four hairpins.7,15,16,20 In mammals, H/ACA RNAs target the modiication of ∼100 uridines in rRNAs and 27 in snRNAs. However, it should be noted that not all pseudouridines are speciied by H/ACA RNAs. For example, the pseudouridines of eukaryotic tRNAs and yeast 5S rRNA21 are generated by protein-only enzymes that recognize the uridine and catalyze its modiication and yeast U2 snRNA is the target of both H/ACA RNPs and stand-alone pseudouridylases (see chapter by Karijolich et al).

H/ACA Core Proteins

All H/ACA RNAs associate with four conserved core proteins that are responsible for the metabolic stability of the RNAs and catalyze the isomerization of uridine to pseudouridine. hese proteins are the mammalian pseudouridine synthase NAP57 (aka dyskerin or in yeast Cbf5p and in archaea Cbf5), NOP10, NHP2 (L7Ae in archaea) and GAR1 (Fig. 1B). NAP57 was identiied in the immunoprecipitate of the highly phosphorylated nucleolar protein Nopp140 and termed Nopp140 associated protein with a relative molecular mass of 57 kD.22 NAP57 localizes to nucleoli and Cajal bodies and is 70% identical to yeast Cbf5p, which was previously identiied as a low-ainity centromeric DNA binding protein.23 he central part

452

DNA and RNA Modii cation Enzymes

of NAP57 (later identiied as the catalytic domain) showed 34% identity to a bacterial protein that was subsequently puriied based on its pseudouridylase activity.24 Analysis of the primary amino acid sequence of NAP57 reveals several distinct domains. One lysine-rich motif at the amino and three at the carboxyl terminus are separated by the catalytic and the pseudouridine and archaeosine transglycosylase (PUA) domains (see chapter by Mueller and Ferre-D’Amare). he catalytic domain contains a conserved aspartate that is important for catalysis (see chapter by Mueller and Ferre-D’Amare).25-28 he PUA domain is an RNA binding motif29-32 and the lysine-rich stretches can function as nuclear localization signals.33,34 NOP10 is the smallest polypeptide of the RNP with only 64 amino acids in mammals and a molecular mass of 7.7 kD.35 In the complex, it lines the catalytic domain of NAP57 stabilizing it and providing a docking site for NHP2.36,37 NHP2 was discovered as a nonhistone protein with molecular mass of 17 kDa.38 It is homologous to the ribosomal protein L30 and to 15.5K/NHP2L1/NHPX (Snu13p in yeast), which is part of C/D RNPs and the snRNP U4.39-42 he archaeal ortholog L7Ae is part of both archaeal H/ACA and C/D RNPs.43 L7Ae (and 15.5K) binds speciically to a kink-turn motif in RNA, whereas NHP2 binds RNA secondary structures in an unspeciic manner (see chapter by Gagnon et al for more details).35,37,44,45 GAR1 is a protein with a molecular mass of 22 kDa and consists of a central domain lanked by glycine-arginine rich (GAR) domains.46 GAR1 is an integral part of the active RNP complex and binds directly to NAP57.37,47-50 According to the crystal structure of an archaeal H/ACA RNP and to cryoelectron microscopic studies of puriied H/ACA particles, each of the normally two hairpins of H/ACA RNAs associates with its own set of four core proteins placing the catalytic core at the pseudouridylation pocket.40,48,51 herefore H/ACA RNPs consist of one RNA and two each of the four core proteins.

Beyond Formation of Pseudouridines

Although most H/ACA RNAs guide the modiication of RNA, their most prominent members do not. hey are the only essential H/ACA RNA, U17/E1 (snR30 in yeast), required for ribosomal RNA processing and the mammalian telomerase RNA, required for telomere maintenance.52,53 Of additional interest are tissue-speciic and orphan H/ACA RNAs (without complementarity to any stable RNAs).

Ribosomal RNA Processing

he H/ACA RNA U17/E1 is required for a processing event in the formation of 18S rRNA.54 hus, U17/E1 is essential for ribosome biogenesis and cell viability. Speciically, short stretches of highly conserved nucleotides in the bulge of the 3ʹ hairpin are engaged in the early cleavage steps of 35S pre-rRNA in yeast.55 he importance of these sequences is illustrated by their high degree of evolutionary conservation in budding and ission yeasts and in all vertebrates.53,56 In addition to the H/ACA core proteins, U17/E1 associates with the DEAD box helicase Has1p, which is required for snoRNP release from pre-rRNA.57 Additional interacting but as of yet uncharacterized proteins have been identiied.51,58 hese may be testimony of the specialized function of U17/E1.

Telomerase

Maintenance of chromosome ends (telomeres), which plays a crucial role in cellular senescence and cancer, is mediated by telomerase, an H/ACA RNP.52 Speciically, human telomerase consists of a 451 nucleotide long RNA (hTR) whose 3ʹ end is an H/ACA domain.59 Like all H/ ACA RNAs, hTR associates with all four core proteins that are important for its accumulation and stability.59,60 Activity of telomerase is dependent on the template region in the 5ʹ half of hTR and on the reverse transcriptase TERT. Although hTR (and its H/ACA core proteins) is (are) expressed in all cells, TERT (and telomerase activity) is (are) mostly restricted to stem and cancer cells. Not only is hTR an H/ACA RNA but it is also a scaRNA with a CAB box that localizes telomerase to Cajal bodies in a cell cycle and TERT dependent manner.61-65

Multicomponent Machines in RNA Modiication: H/ACA Ribonucleoproteins

453

Additional Functions

New H/ACA RNAs are still being identiied using a combination of biochemical and in silico approaches.14,66-70 hese approaches unearthed novel H/ACA RNAs that lack complementarity to any of the stable RNAs. hese so-called orphan H/ACA RNAs appear either to guide the pseudouridylation of yet to be identiied RNAs (e.g., mRNAs) or to exhibit separate functions (like U17/E1 and hTR). One of these orphan H/ACA RNAs, HBI-36, is of particular interest because, unlike all other H/ACA RNAs, it is expressed in a tissue speciic manner.71 Speciically, HBI-36 is expressed from an intron of the serotonin C2 receptor gene only in the choroid plexus of the brain suggesting a developmentally regulated function. In another case, scaRNA U100 possesses complementarity to a target RNA, however, the uridine that it speciies in U6 snRNA is apparently not modiied.72 herefore, even apparent guide RNAs may serve diferent purposes.

Architecture of H/ACA RNPS Overview

Recent years have produced a detailed view of H/ACA RNPs. Biochemical analyses revealed intra RNP protein-protein and -RNA interactions of eukaryal particles and X-ray crystallographic studies provided the details of partially and fully reconstituted archaeal RNPs.37,47,48,73-77 he major diference between archaeal and eukaryal H/ACA RNPs is between the homologous proteins L7Ae and NHP2, respectively. Whereas L7Ae recognizes and binds archaeal H/ACA RNAs independently, NHP2 does so only when complexed with NAP57 via NOP10.37,43-45 H/ACA RNPs appear unique among RNA-protein complexes. In place of the usual intertwined structures of proteins and RNA, e.g., in the cases of the U1 snRNP78 and C/D RNPs (see chapter by Gagnon et al), the four H/ACA core proteins form a planar, coherent surface accommodating individual H/ACA RNAs and their targets like a slice of bread being buttered. his arrangement may allow the accommodation of the 150 or so diferent H/ACA RNAs by the same protein complex.79,80

Intra-RNP Interactions

In eukaryotes the four core proteins can form an independent complex (archaeal L7Ae is held in place by the RNA) that resembles an equilateral triangle (Fig. 1B).47,48,75-77 Its corners are formed by GAR1, NHP2 and the C-terminal PUA domain of NAP57 (which also associates with the N-terminus). he body consists of the catalytic domain of NAP57, which is lined by NOP10 (that in turn binds NHP2) and which binds GAR1. One hairpin of an H/ACA RNA stretches across the NAP57-NOP10-NHP2 axis. he PUA domain of NAP57 anchors the ACA triplet on one end and NHP2 the terminal loop of a hairpin on the other thereby placing the pseudouridylation pocket over the catalytic domain of NAP57. he coninement of the ACA triplet to the PUA domain of NAP57 explains the constraint of 14 nucleotides between the ACA and the top of the pseudouridylation pocket (where the target uridine will be situated) for placement of the latter near the active site of NAP57.12,13,48 GAR1 is not required for RNA binding and the three proteins NAP57, NOP10 and NHP2 form an independent complex (the core trimer) that provides the speciicity for H/ACA RNA recognition. Despite this separation of GAR1 from the core trimer, UV-crosslinking experiments suggest that all eukaryal core proteins contact the H/ACA RNA in some fashion, whereas only NAP57 and GAR1 crosslink to the target uridine.37,73

RNP-Substrate Interactions

How an H/ACA guide RNA accommodates its target RNA has been visualized in solution and in the context of three core proteins.75,81,82 he pseudouridylation pocket of the guide RNA (Fig. 1C, in black) forms a more or less straight opening that base pairs on one side with the 5ʹ half of the target RNA (gray) (extending the bottom helix of the hairpin) and on the other with the 3ʹ half (extending the top helix of the hairpin) (arrows). his unique conformation forces the

454

DNA and RNA Modii cation Enzymes

substrate RNA into a tight turn at the two unpaired nucleotides lipping out the target uridine (Fig. 1C, gray U and arrowhead), which becomes accessible to the active site of NAP57. Additionally, this arrangement of the H/ACA guide-target RNA complex obviates the necessity of a helicase for loading and release of target RNAs.81,82

RNP Stability

Each of the proteins of the core trimer, but not GAR1, is essential for cell viability and for metabolic stability of all H/ACA RNAs and of each other.35,40,60,83,84 Consistent with these observations in yeast, mammalian RNP complexes of the core trimer and an H/ACA RNA, once assembled do not exchange their RNA.37 In particular, NAP57 remains stably associated with its H/ACA RNA in cell extracts, whereas NOP10 and NHP2 exchange to some extent and GAR1 more readily.85 In conclusion, H/ACA RNPs are stable complexes and formation of new particles requires de novo synthesis and assembly of its individual components.

Biogenesis of H/ACA RNPs

Despite the simple ive-component composition of H/ACA RNPs, eukaryal particles rely on accessory factors for their assembly. In particular, two factors, Naf1p and Shq1p have been identiied in yeast to be essential for the stable accumulation of H/ACA RNPs.86-88 Both proteins have homologs in mammals, NAF1 and SHQ1. NAF1 is recruited cotranscriptionally to the site of H/ACA RNA transcription and is also required for the assembly of human H/ACA RNPs including telomerase.89-92 NAF1 binds NAP57 at the same site as GAR1 indicating a sequential assembly.37,87,93 Although less is known about Shq1p, it also binds Cbf5p (the yeast NAP57) without being part of mature H/ACA RNPs.88 Consistent with these indings, both proteins are excluded from nucleoli and Cajal bodies, the sites of mature particles and localize to the nucleoplasm. In contrast to eukaryotes, archaea lack recognizable homologs of these assembly factors and their H/ ACA RNPs can be functionally reconstituted with just the ive core components alone.49,50 Two additional proteins, Nopp140 and SMN, have been implicated in H/ACA RNP biogenesis and/or function due to their ability to interact with them. In fact, NAP57 was identiied in immunoprecipitates of the highly phosphorylated nucleolar protein Nopp140,94 whereas the survival of motor neuron protein (SMN) that is afected in spinal muscular atrophy binds GAR1.95-97 Although SMN is clearly involved in the assembly of spliceosomal snRNPs, evidence for a similar function in H/ACA RNP biogenesis is lacking. herefore, NAF1 and SHQ1 are to date the only bona ide H/ACA RNP assembly factors. Finally, factors that may be involved in the biogenesis of both H/ACA and C/D RNPs have been identiied. hese include AAA+ helicases and chaperone proteins, e.g., the helicases Rvb1 (Tih1, TIP48, pontin, etc.) and Rvb2 (Tih2, TIP49, reptin, etc.) and the heat shock protein HSP90.98-102 hese factors may be more generally required for RNP biogenesis and, like that of the other assembly factors, their precise mechanism of action remains to be determined.

Dyskeratosis Congenita Overview

H/ACA RNPs have gained signiicant attention due to their association with the bone marrow failure syndrome dyskeratosis congenita (DC). DC is a rare but oten fatal inherited disease leading to stem cell loss particularly in rapidly proliferating tissues such as the bone marrow, skin and intestine.103,104 It is mainly characterized by bone marrow failure and the mucocutaneous triad of abnormal skin pigmentation, nail dystrophy and mucosal leukoplakia, but also causes a predisposition to malignant tumor formation.105 DC is inherited in three patterns, X-linked recessive (accounting for ∼45% of cases), autosomal recessive (∼50%) and autosomal dominant (∼5%). he X-linked and autosomal recessive forms usually are most severe with extreme cases of intrauterine growth retardation, whereas the autosomal dominant form is milder and can go unnoticed until the fourth or ith decade of life. he X-linked form is caused exclusively by mutations in NAP57, which is hence also referred to as dyskerin.106,107 he autosomal recessive

Multicomponent Machines in RNA Modiication: H/ACA Ribonucleoproteins

455

form is genetically heterogeneous. Although families with mutations in NOP10, NHP2 and the telomeric factor TIN2 have been identiied, the afected gene(s) of most families remain to be discovered.108-111 he autosomal dominant form is due to mutations in the telomerase RNA and reverse transcriptase genes.112,113

Pathogenesis

Although DC patients of all inheritance patterns exhibit shortened telomeres in peripheral blood, the degree to which the other functions of H/ACA RNPs are contributing to the pathogenesis and if and how certain classes of H/ACA RNPs are preferentially impaired in the recessive forms remains to be established. he autosomal dominant form is due to haploinsuiciency of telomerase and shows disease anticipation, i.e., shorter telomeres and earlier onset in subsequent generations.112,114 he recessive forms are more complex and mouse models point to a mixture of afected H/ACA RNP functions with telomerase featured prominently.107,115-118 he level of understanding or lack thereof is perhaps best illustrated by the absence of an explanation for the molecular impact of the many mutations in NAP57 in X-linked DC.

NAP57 Mutations

he forty or so DC mutations identiied in NAP57 cluster to its PUA domain including the C-terminus and to its N-terminus mostly avoiding the catalytic domain.119 In a model of the 3D structure (based on those from archaea) most of these mutations come together on one solvent accessible surface (at the bottom of the molecule in Fig. 1B).47,48 Despite their location in the PUA domain, the mutations apparently fail to impact the binding of the ACA triplet of the H/ ACA RNAs. Moreover, except for potential allosteric efects, the DC mutations do not impact intra-RNP protein-protein interactions. herefore, the mutation cluster may impair the interaction of the RNP with (a) yet to be identiied factor(s). Such a factor could be RNP-speciic and thus explain a preferential impact on, e.g., telomerase.

Conclusions and Anticipated Developments

he main function of H/ACA RNPs is the modiication of target RNAs and based on genetic, biochemical and more recently structural studies we have gained detailed insight into their structure and function. Some specialized aspects, such as their catalytic mechanism (see chapter by Mueller and Ferre-D’Amare) and their action on spliceosomal snRNPs (see chapter by Karijolich et al) are discussed in separate chapters of this book. In particular, two aspects have boosted research into H/ACA RNPs, irst, their involvement in an inherited disease (DC) and, second, their forming part of mammalian telomerase. Despite the wealth of information accumulated on these ive component particles, many questions remain. Although it is clear that overall and partial pseudouridylation of ribosomal RNA is important for ribosome biogenesis and function,120-122 we are far from understanding the importance of individual modiications, e.g., is it really the modiication that matters or is it the action (hybridization) of the respective H/ACA RNP on (to) the target site? In the future, the targets and functions of orphan H/ACA RNAs will undoubtedly be unraveled potentially opening entire new areas of H/ACA RNP research. he diferences between archaeal and eukaryal H/ACA RNPs have hampered extending indings from one to the other. Although archaeal RNPs can be functionally reconstituted from recombinant components and crystallized, mammalian RNPs require assembly factors. Moreover, the structures of mammalian RNPs can be modeled based on those of the archaeal ones, but about one-third of their entire RNP structure is still missing due to N- and C-terminal extensions of the individual proteins. In the future, mammalian H/ACA RNPs will need to be functionally reconstituted and crystallized from recombinant components and the action of their assembly factors determined in more detail.123 Eventually, the analysis of RNPs reconstituted from proteins with and without DC mutations and their impact on individual particles will provide insight into the molecular mechanism underlying DC.

456

DNA and RNA Modii cation Enzymes

Acknowledgements

We thank Sujayita Roy for critical reading of the manuscript. he work in the authors’ laboratory is supported by grant HL079566 (to U.T.M.) from the National Institute of Health.

References

1. Decatur WA, Fournier MJ. RNA-guided nucleotide modiication of ribosomal and other RNAs. J Biol Chem 2003; 278:695-698. 2. Bachellerie JP, Cavaille J, Huttenhofer A. The expanding snoRNA world. Biochimie 2002; 84:775-790. 3. Matera AG, Terns RM, Terns MP. Noncoding RNAs: lessons from the small nuclear and small nucleolar RNAs. Nat Rev Mol Cell Biol 2007; 8:209-220. 4. Meier UT. he many facets of H/ACA ribonucleoproteins. Chromosoma 2005; 114:1-14. 5. Filipowicz W, Pogacic V. Biogenesis of small nucleolar ribonucleoproteins. Curr Opin Cell Biol 2002; 14:319-327. 6. Henras AK, Dez C, Henry Y. RNA structure and function in C/D and H/ACA s(no)RNPs. Curr Opin Struct Biol 2004; 14:335-343. 7. Kiss T. Small nucleolar RNAs: an abundant group of noncoding RNAs with diverse cellular functions. Cell 2002; 109:145-148. 8. Lafontaine DL, Tollervey D. Birth of the snoRNPs: the evolution of the modiication-guide snoRNAs. Trends Biochem Sci 1998; 23:383-388. 9. Smith CM, Steitz JA. Sno storm in the nucleolus: new roles for myriad small RNPs. Cell 1997; 89:669-672. 10. Ganot P, Caizergues-Ferrer M, Kiss T. he family of box ACA small nucleolar RNAs is deined by an evolutionarily conserved secondary structure and ubiquitous sequence elements essential for RNA accumulation. Genes Dev 1997; 11:941-956. 11. Balakin AG, Smith L, Fournier MJ. he RNA world of the nucleolus: two major families of small RNAs deined by diferent box elements with related functions. Cell 1996; 86:823-834. 12. Ganot P, Bortolin ML, Kiss T. Site-speciic pseudouridine formation in preribosomal RNA is guided by small nucleolar RNAs. Cell 1997; 89:799-809. 13. Ni J, Tien AL, Fournier MJ. Small nucleolar RNAs direct site-speciic synthesis of pseudouridine in ribosomal RNA. Cell 1997; 89:565-573. 14. Hüttenhofer A, Kiefmann M, Meier-Ewert S et al. RNomics: an experimental approach that identiies 201 candidates for novel, small, nonmessenger RNAs in mouse. EMBO J 2001; 20:2943-2953. 15. Jady BE, Kiss T. A small nucleolar guide RNA functions both in 2ʹ-O-ribose methylation and pseudouridylation of the U5 spliceosomal RNA. EMBO J 2001; 20:541-551. 16. Darzacq X, Jady BE, Verheggen C et al. Cajal body-speciic small nuclear RNAs: a novel class of 2ʹ-O-methylation and pseudouridylation guide RNAs. EMBO J 2002; 21:2746-2756. 17. Cioce M, Lamond AI. Cajal bodies: a long history of discovery. Annu Rev Cell Dev Biol 2005; 21:105-131. 18. Handwerger KE, Gall JG. Subnuclear organelles: new insights into form and function. Trends Cell Biol 2006; 16:19-26. 19. Richard P, Darzacq X, Bertrand E et al. A common sequence motif determines the Cajal body-speciic localization of box H/ACA scaRNAs. EMBO J 2003; 22:4283-4293. 20. Kiss T. Biogenesis of small nuclear RNPs. J Cell Sci 2004; 117:5949-5951. 21. Decatur WA, Schnare MN. Diferent mechanisms for pseudouridine formation in yeast 5S and 5.8S rRNAs. Mol Cell Biol 2008; 28:3089-3100. 22. Meier UT, Blobel G. NAP57, a mammalian nucleolar protein with a putative homolog in yeast and bacteria. J Cell Biol 1994; 127:1505-1514. 23. Jiang W, Middleton K, Yoon H-J et al. An essential yeast protein, CBF5p, binds in vitro to centromeres and microtubules. Mol Cell Biol 1993; 13:4884-4893. 24. Nurse K, Wrzesinski J, Bakin A et al. Puriication, cloning and properties of the tRNA Ψ55 synthase from Escherichia coli. RNA 1995; 1:102-112. 25. Gu X, Liu Y, Santi DV. he mechanism of pseudouridine synthase I as deduced from its interaction with 5-luorouracil-tRNA. Proc Natl Acad Sci USA 1999; 96:14270-14275. 26. Hoang C, Ferre-D'Amare AR. Cocrystal structure of a tRNA Psi55 pseudouridine synthase: nucleotide lipping by an RNA-modifying enzyme. Cell 2001; 107:929-939. 27. Huang L, Pookanjanatavip M, Gu X et al. A conserved aspartate of tRNA pseudouridine synthase is essential for activity and a probable nucleophilic catalyst. Biochemistry 1998; 37:344-351. 28. Spedaliere CJ, Ginter JM, Johnston MV et al. he pseudouridine synthases: revisiting a mechanism that seemed settled. J Am Chem Soc 2004; 126:12758-12759.

Multicomponent Machines in RNA Modiication: H/ACA Ribonucleoproteins

457

29. Gustafsson C, Reid R, Greene PJ et al. Identiication of new RNA modifying enzymes by iterative genome search using known modifying enzymes as probes. Nucleic Acids Res 1996; 24:3756-3762. 30. Aravind L, Koonin EV. Novel predicted RNA-binding domains associated with the translation machinery. J Mol Evol 1999; 48:291-302. 31. Koonin EV. Pseudouridine synthases: four families of enzymes containing a putative uridine-binding motif also conserved in dUTPases and dCTP deaminases. Nucleic Acids Res 1996; 24:2411-2415. 32. Perez-Arellano I, Gallego J, Cervera J. he PUA domain—a structural and functional overview. FEBS J 2007; 274:4972-4984. 33. Heiss NS, Girod A, Salowsky R et al. Dyskerin localizes to the nucleolus and its mislocalization is unlikely to play a role in the pathogenesis of dyskeratosis congenita. Hum Mol Genet 1999; 8:2515-2524. 34. Youssouian H, Gharibyan V, Qatanani M. Analysis of epitope-tagged forms of the dyskeratosis congenital protein (dyskerin): identiication of a nuclear localization signal. Blood Cells Mol Dis 1999; 25:305-309. 35. Henras A, Henry Y, Bousquet-Antonelli C et al. Nhp2p and Nop10p are essential for the function of H/ACA snoRNPs. EMBO J 1998; 17:7078-7090. 36. Reichow SL, Varani G. Nop10 is a conserved H/ACA snoRNP molecular adaptor. Biochemistry 2008; 47:6148-6156. 37. Wang C, Meier UT. Architecture and assembly of mammalian H/ACA small nucleolar and telomerase ribonucleoproteins. EMBO J 2004; 23:1857-1867. 38. Kolodrubetz D, Haggren W, Burgum A. Amino-terminal sequence of a Saccharomyces cerevisiae nuclear protein, NHP6, shows signiicant identity to bovine HMG1. FEBS Lett 1988; 238:175-179. 39. Leung AK, Lamond AI. In vivo analysis of NHPX reveals a novel nucleolar localization pathway involving a transient accumulation in splicing speckles. J Cell Biol 2002; 157:615-629. 40. Watkins NJ, Gottschalk A, Neubauer G et al. Cbf5p, a potential pseudouridine synthase and Nhp2p, a putative RNA- binding protein, are present together with Gar1p in all box H/ACA-motif snoRNPs and constitute a common bipartite structure. RNA 1998; 4:1549-1568. 41. Watkins NJ, Segault V, Charpentier B et al. A common core RNP structure shared between the small nucleolar box C/D RNPs and the spliceosomal U4 snRNP. Cell 2000; 103:457-466. 42. Nottrott S, Hartmuth K, Fabrizio P et al. Functional interaction of a novel 15.5 kD [U4/U6.U5] tri-snRNP protein with the 5ʹ stem-loop of U4 snRNA. EMBO J 1999; 18:6119-6133. 43. Rozhdestvensky TS, Tang TH, Tchirkova IV et al. Binding of L7Ae protein to the K-turn of archaeal snoRNAs: a shared RNA binding motif for C/D and H/ACA box snoRNAs in Archaea. Nucleic Acids Res 2003; 31:869-877. 44. Henras A, Dez C, Noaillac-Depeyre J et al. Accumulation of H/ACA snoRNPs depends on the integrity of the conserved central domain of the RNA-binding protein Nhp2p. Nucleic Acids Res 2001; 29:2733-2746. 45. Vidovic I, Nottrott S, Hartmuth K et al. Crystal structure of the spliceosomal 15.5 kD protein bound to a U4 snRNA fragment. Mol Cell 2000; 6:1331-1342. 46. Girard JP, Lehtonen H, Caizergues-Ferrer M et al. GAR1 is an essential small nucleolar RNP protein required for prerRNA processing in yeast. EMBO J 1992; 11:673-682. 47. Rashid R, Liang B, Baker DL et al. Crystal structure of a Cbf5-Nop10-Gar1 complex and implications in RNA-guided pseudouridylation and dyskeratosis congenita. Mol Cell 2006; 21:249-260. 48. Li L, Ye K. Crystal structure of an H/ACA box ribonucleoprotein particle. Nature 2006; 443:302-307. 49. Baker DL, Youssef OA, Chastkofsky MI et al. RNA-guided RNA modiication: functional organization of the archaeal H/ACA RNP. Genes Dev 2005; 19:1238-1248. 50. Charpentier B, Muller S, Branlant C. Reconstitution of archaeal H/ACA small ribonucleoprotein complexes active in pseudouridylation. Nucleic Acids Res 2005; 33:3133-3144. 51. Lübben B, Fabrizio P, Kastner B et al. Isolation and characterization of the small nucleolar ribonucleoprotein particle snR30 from Saccharomyces cerevisiae. J Biol Chem 1995; 270:11549-11554. 52. Collins K, Mitchell JR. Telomerase in the human organism. Oncogene 2002; 21:564-579. 53. Eliceiri GL. he vertebrate E1/U17 small nucleolar ribonucleoprotein particle. J Cell Biochem 2006; 98:486-495. 54. Morrissey JP, Tollervey D. Yeast snR30 is a small nucleolar RNA required for 18S rRNA synthesis. Mol Cell Biol 1993; 13:2469-2477. 55. Atzorn V, Fragapane P, Kiss T. U17/snR30 is a ubiquitous snoRNA with two conserved sequence motifs essential for 18S rRNA production. Mol Cell Biol 2004; 24:1769-1778. 56. Cervelli M, Oliverio M, Bellini A et al. Structural and sequence evolution of U17 small nucleolar RNA (snoRNA) and its phylogenetic congruence in chelonians. J Mol Evol 2003; 57:73-84. 57. Liang XH, Fournier MJ. he helicase Has1p is required for snoRNA release from prerRNA. Mol Cell Biol 2006; 26:7437-7450.

458

DNA and RNA Modii cation Enzymes

58. Smith JL, Walton AH, Eliceiri GL. UV-crosslinking of E1 small nucleolar RNA to proteins in frog oocytes. J Cell Physiol 2005; 203:202-208. 59. Mitchell JR, Cheng J, Collins K. A box H/ACA small nucleolar RNA-like domain at the human telomerase RNA 3ʹ end. Mol Cell Biol 1999; 19:567-576. 60. Dez C, Henras A, Faucon B et al. Stable expression in yeast of the mature form of human telomerase RNA depends on its association with the box H/ACA small nucleolar RNP proteins Cbf5p, Nhp2p and Nop10p. Nucleic Acids Res 2001; 29:598-603. 61. heimer CA, Jady BE, Chim N et al. Structural and functional characterization of human telomerase RNA processing and cajal body localization signals. Mol Cell 2007; 27:869-881. 62. Tomlinson RL, Abreu EB, Ziegler T et al. Telomerase reverse transcriptase is required for the localization of telomerase RNA to Cajal bodies and telomeres in human cancer cells. Mol Biol Cell 2008; 19:3793-3800. 63. Jady BE, Bertrand E, Kiss T. Human telomerase RNA and box H/ACA scaRNAs share a common Cajal body-speciic localization signal. J Cell Biol 2004; 164:647-652. 64. Zhu Y, Tomlinson RL, Lukowiak AA et al. Telomerase RNA accumulates in Cajal bodies in human cancer cells. Mol Biol Cell 2004; 15:81-90. 65. Cristofari G, Adolf E, Reichenbach P et al. Human telomerase RNA accumulation in Cajal bodies facilitates telomerase recruitment to telomeres and telomere elongation. Mol Cell 2007; 27:882-889. 66. Freyhult E, Edvardsson S, Tamas I et al. Fisher: a program for the detection of H/ACA snoRNAs using MFE secondary structure prediction and comparative genomics—assessment and update. BMC Res Notes 2008; 1:49. 67. Kiss AM, Jady BE, Bertrand E et al. Human box H/ACA pseudouridylation guide RNA machinery. Mol Cell Biol 2004; 24:5797-5807. 68. Luo Y, Li S. Genome-wide analyses of retrogenes derived from the human box H/ACA snoRNAs. Nucleic Acids Res 2007; 35:559-571. 69. Schattner P, Barberan-Soler S, Lowe TM. A computational screen for mammalian pseudouridylation guide H/ACA RNAs. RNA 2006; 12:15-25. 70. Yang JH, Zhang XC, Huang ZP et al. SnoSeeker: an advanced computational package for screening of guide and orphan snoRNA genes in the human genome. Nucleic Acids Res 2006; 34:5112-5123. 71. Cavaille J, Buiting K, Kiefmann M et al. Identification of brain-specific and imprinted small nucleolar RNA genes exhibiting an unusual genomic organization. Proc Natl Acad Sci USA 2000; 97:14311-14316. 72. Vitali P, Royo H, Seitz H et al. Identiication of 13 novel human modiication guide RNAs. Nucleic Acids Res 2003; 31:6543-6551. 73. Dragon F, Pogacic V, Filipowicz W. In vitro assembly of human H/ACA small nucleolar RNPs reveals unique features of U17 and telomerase RNAs. Mol Cell Biol 2000; 20:3037-3048. 74. Henras AK, Capeyrou R, Henry Y et al. Cbf5p, the putative pseudouridine synthase of H/ACA-type snoRNPs, can form a complex with Gar1p and Nop10p in absence of Nhp2p and box H/ACA snoRNAs. RNA 2004; 10:1704-1712. 75. Liang B, Xue S, Terns RM et al. Substrate RNA positioning in the archaeal H/ACA ribonucleoprotein complex. Nat Struct Mol Biol 2007; 14:1189-1195. 76. Hamma T, Reichow SL, Varani G et al. he Cbf5-Nop10 complex is a molecular bracket that organizes box H/ACA RNPs. Nat Struct Mol Biol 2005; 12:1101-1107. 77. Manival X, Charron C, Fourmann JB et al. Crystal structure determination and site-directed mutagenesis of the Pyrococcus abyssi aCBF5-aNOP10 complex reveal crucial roles of the C-terminal domains of both proteins in H/ACA sRNP activity. Nucleic Acids Res 2006; 34:826-839. 78. Stark H, Dube P, Luhrmann R et al. Arrangement of RNA and proteins in the spliceosomal U1 small nuclear ribonucleoprotein particle. Nature 2001; 409:539-542. 79. Li H. Unveiling substrate RNA binding to H/ACA RNPs: one side its all. Curr Opin Struct Biol 2008; 18:78-85. 80. Meier UT. How a single protein complex accommodates many diferent H/ACA RNAs. Trends Biochem Sci 2006; 31:311-315. 81. Wu H, Feigon J. H/ACA small nucleolar RNA pseudouridylation pockets bind substrate RNA to form three-way junctions that position the target U for modiication. Proc Natl Acad Sci USA 2007; 104:6655-6660. 82. Jin H, Loria JP, Moore PB. Solution structure of an rRNA substrate bound to the pseudouridylation pocket of a box H/ACA snoRNA. Mol Cell 2007; 26:205-215. 83. Lafontaine DLJ, Bousquet-Antonelli C, Henry Y et al. he box H+ACA snoRNAs carry Cbf5p, the putative rRNA pseudouridine synthase. Genes Dev 1998; 12:527-537. 84. Bousquet-Antonelli C, Henry Y, Gélugne J-P et al. A small nucleolar RNP protein is required for pseudouridylation of eukaryotic ribosomal RNAs. EMBO J 1997; 16:4770-4776.

Multicomponent Machines in RNA Modiication: H/ACA Ribonucleoproteins

459

85. Kittur N, Darzacq X, Roy S et al. Dynamic association and localization of human H/ACA RNP proteins. RNA 2006; 12:2057-2062. 86. Dez C, Noaillac-Depeyre J, Caizergues-Ferrer M et al. Naf1p, an essential nucleoplasmic factor speciically required for accumulation of box H/ACA small nucleolar RNPs. Mol Cell Biol 2002; 22:7053-7065. 87. Fatica A, Dlakic M, Tollervey D. Naf1p is a box H/ACA snoRNP assembly factor. RNA 2002; 8:1502-1514. 88. Yang PK, Rotondo G, Porras T et al. he Shq1p.Naf1p complex is required for box H/ACA small nucleolar ribonucleoprotein particle biogenesis. J Biol Chem 2002; 277:45235-45242. 89. Ballarino M, Morlando M, Pagano F et al. he cotranscriptional assembly of snoRNPs controls the biosynthesis of H/ACA snoRNAs in Saccharomyces cerevisiae. Mol Cell Biol 2005; 25:5396-5403. 90. Darzacq X, Kittur N, Roy S et al. Stepwise RNP assembly at the site of H/ACA RNA transcription in human cells. J Cell Biol 2006; 173:207-218. 91. Hoareau-Aveilla C, Bonoli M, Caizergues-Ferrer M et al. hNaf1 is required for accumulation of human box H/ACA snoRNPs, scaRNPs and telomerase. RNA 2006; 12:832-840. 92. Yang PK, Hoareau C, Froment C et al. Cotranscriptional recruitment of the pseudouridylsynthetase Cbf5p and of the RNA binding protein Naf1p during H/ACA snoRNP assembly. Mol Cell Biol 2005; 25:3295-3304. 93. Leulliot N, Godin KS, Hoareau-Aveilla C et al. The box H/ACA RNP assembly factor Naf1p contains a domain homologous to Gar1p mediating its interaction with Cbf5p. J Mol Biol 2007; 371:1338-1353. 94. Meier UT, Blobel G. NAP57, a mammalian nucleolar protein with a putative homolog in yeast and bacteria. J Cell Biol (correction appeared in 140: 447) 1994; 127:1505-1514. 95. Jones KW, Gorzynski K, Hales CM et al. Direct interaction of the spinal muscular atrophy disease protein SMN with the small nucleolar RNA-associated protein ibrillarin. J Biol Chem 2001; 276:38645-38651. 96. Pellizzoni L, Baccon J, Charroux B et al. he survival of motor neurons (SMN) protein interacts with the snoRNP proteins ibrillarin and GAR1. Curr Biol 2001; 11:1079-1088. 97. Whitehead SE, Jones KW, Zhang X et al. Determinants of the interaction of the spinal muscular atrophy disease protein SMN with the dimethylarginine-modiied box H/ACA small nucleolar ribonucleoprotein GAR1. J Biol Chem 2002; 277:48087-48093. 98. Boulon S, Marmier-Gourrier N, Pradet-Balade B et al. he Hsp90 chaperone controls the biogenesis of L7Ae RNPs through conserved machinery. J Cell Biol 2008; 180:579-595. 99. Watkins NJ, Dickmanns A, Luhrmann R. Conserved stem II of the box C/D motif is essential for nucleolar localization and is required, along with the 15.5K protein, for the hierarchical assembly of the box C/D snoRNP. Mol Cell Biol 2002; 22:8342-8352. 100. King TH, Decatur WA, Bertrand E et al. A well-connected and conserved nucleoplasmic helicase is required for production of box C/D and H/ACA snoRNAs and localization of snoRNP proteins. Mol Cell Biol 2001; 21:7731-7746. 101. Venteicher AS, Meng Z, Mason PJ et al. Identiication of ATPases pontin and reptin as telomerase components essential for holoenzyme assembly. Cell 2008; 132:945-957. 102. Zhao R, Kakihara Y, Gribun A et al. Molecular chaperone Hsp90 stabilizes Pih1/Nop17 to maintain R2TP complex activity that regulates snoRNA accumulation. J Cell Biol 2008; 180:563-578. 103. Marsh JC, Will AJ, Hows JM et al. “Stem cell” origin of the hematopoietic defect in dyskeratosis congenita. Blood 1992; 79:3138-3144. 104. Walne AJ, Dokal I. Dyskeratosis Congenita: a historical perspective. Mechanisms of Ageing and Development 2008; 129:48-59. 105. Kirwan M, Dokal I. Dyskeratosis congenita: a genetic disorder of many faces. Clin Genet 2008; 73:103-112. 106. Heiss NS, Knight SW, Vulliamy TJ et al. X-linked dyskeratosis congenita is caused by mutations in a highly conserved gene with putative nucleolar functions. Nat Genet 1998; 19:32-38. 107. Mitchell JR, Wood E, Collins K. A telomerase component is defective in the human disease dyskeratosis congenita. Nature 1999; 402:551-555. 108. Savage SA, Giri N, Baerlocher GM et al. TINF2, a component of the shelterin telomere protection complex, is mutated in dyskeratosis congenita. Am J Hum Genet 2008; 82:501-509. 109. Vulliamy T, Beswick R, Kirwan M et al. Mutations in the telomerase component NHP2 cause the premature ageing syndrome dyskeratosis congenita. Proc Natl Acad Sci USA 2008; 105:8073-8078. 110. Walne AJ, Vulliamy T, Marrone A et al. Genetic heterogeneity in autosomal recessive dyskeratosis congenita with one subtype due to mutations in the telomerase-associated protein NOP10. Hum Mol Genet 2007; 16:1619-1629.

460

DNA and RNA Modii cation Enzymes

111. Walne AJ, Vulliamy TJ, Beswick R et al. TINF2 mutations result in very short telomeres: Analysis of a large cohort of patients with dyskeratosis congenita and related bone marrow failure syndromes. Blood 2008;112(9):3594-600. 112. Armanios M, Chen JL, Chang YP et al. Haploinsuiciency of telomerase reverse transcriptase leads to anticipation in autosomal dominant dyskeratosis congenita. Proc Natl Acad Sci USA 2005; 102:15960-15964. 113. Vulliamy T, Marrone A, Goldman F et al. he RNA component of telomerase is mutated in autosomal dominant dyskeratosis congenita. Nature 2001; 413:432-435. 114. Goldman F, Bouarich R, Kulkarni S et al. he efect of TERC haploinsuiciency on the inheritance of telomere length. Proc Natl Acad Sci USA 2005; 102:17119-17124. 115. Gu BW, Bessler M, Mason PJ. A pathogenic dyskerin mutation impairs proliferation and activates a DNA damage response independent of telomere length in mice. Proc Natl Acad Sci USA 2008; 105:10173-10178. 116. Mochizuki Y, He J, Kulkarni S et al. Mouse dyskerin mutations afect accumulation of telomerase RNA and small nucleolar RNA, telomerase activity and ribosomal RNA processing. Proc Natl Acad Sci USA 2004; 101:10756-10761. 117. Ruggero D, Grisendi S, Piazza F et al. Dyskeratosis congenita and cancer in mice deicient in ribosomal RNA modiication. Science 2003; 299:259-262. 118. Yoon A, Peng G, Brandenburger Y et al. Impaired control of IRES-mediated translation in X-linked dyskeratosis congenita. Science 2006; 312:902-906. 119. Marrone A, Dokal I. Dyskeratosis congenita: molecular insights into telomerase function, ageing and cancer. Expert Rev Mol Med 2004; 6:1-23. 120. King TH, Liu B, McCully RR et al. Ribosome structure and activity are altered in cells lacking snoRNPs that form pseudouridines in the peptidyl transferase center. Mol Cell 2003; 11:425-435. 121. Liang XH, Liu Q, Fournier MJ. rRNA modiications in an intersubunit bridge of the ribosome strongly afect both ribosome biogenesis and activity. Mol Cell 2007; 28:965-977. 122. Zebarjadian Y, King T, Fournier MJ et al. Point mutations in yeast CBF5 can abolish in vivo pseudouridylation of rRNA. Mol Cell Biol 1999; 19:7461-7472. 123. Meier UT. In: Smith HC, ed. RNA and DNA Editing: Molecular Mechanisms and heir Integration into Biological Systems. Hoboken:Wiley and Sons, Inc., 2008:162-174.

Chapter 32

Spliceosomal snRNA Pseudouridylation John Karijolich, Chao Huang and Yi-Tao Yu*

Abstract

T

he spliceosomal U snRNAs, which are essential for pre-mRNA splicing, contain a number of posttranscriptionally modiied nucleotides, in particular pseudouridine. he location of many of the pseudouridine residues has been conserved throughout evolution. he pseudouridylation of spliceosomal snRNAs can be catalyzed by both RNA-independent (protein-only) and RNA-dependent mechanisms. his chapter discusses our current understanding regarding the mechanism of snRNA pseudouridylation in both lower eukaryotes and higher eukaryotes, as well as the molecular function snRNA pseudouridylation plays in pre-mRNA splicing.

Introduction

In eukaryotic organisms, messenger RNAs (mRNAs) are generally transcribed as precursor mRNAs (pre-mRNAs). hus, before an mRNA can be exported to the cytoplasm where it directs the translation of protein, the pre-mRNA must undergo several processing reactions. Included in these reactions is the excision of noncoding sequences, introns, and the ligation of coding sequences (including the 5ʹ and 3ʹ untranslated sequences), exons. In eukaryotes, removal of introns is catalyzed by a large and highly dynamic RNA-protein complex termed the spliceosome. he activity of the spliceosome is dependant on ive uridyl-rich small nuclear RNAs (U snRNAs), namely U1, U2, U4, U5 and U6 and a large number of protein components.1-4 he U snRNAs that participate in the splicing reaction do so in the form of a small nuclear ribonucleoprotein (snRNP) complex, which includes a single U snRNA in complex with a number of proteins. Spliceosome assembly is a multi-step process involving an intricate and dynamic network of RNA-RNA interactions among the snRNAs and pre-mRNA (Fig. 1).2,5-9 he rearrangement of RNA-RNA interactions during spliceosome assembly facilitates two speciic transesteriication reactions that result in the removal of intronic sequences. he irst step in the assembly of the spliceosome is formation of the commitment complex (E complex) which involves the recognition of the 5ʹ splice site by the U1 snRNP and various protein factors.10-15 he second step is an ATP-dependant step, in which the U2 snRNP interacts with the branch site sequence (BPS) through complementary base pairing interactions, thus converting the commitment complex to a presplicing complex, namely complex A.15-23 Subsequent to the generation of complex A, the U4/U6.U5 tri-snRNP joins the U1-U2-pre-mRNA complex to form complex B.19,22,23 A series of RNA-RNA rearrangements proceed which result in the destabilization and release of U1 and U4 snRNPs.24,25 he result of these structural rearrangements is the formation of complex C and concomitant activation of the spliceosome. It is complex C that catalyzes two successive transesteriication reactions, also known as the two chemical steps of splicing. Following *Corresponding Author: Yi-Tao Yu—Department of Biochemistry and Biophysics, University of Rochester Medical Center, 601 Elmwood Avenue, Rochester, New York 14642, USA. Email: [email protected]

DNA and RNA Modiication Enzymes: Structure, Mechanism, Function and Evolution edited by Henri Grosjean. ©2009 Landes Bioscience.

462

DNA and RNA Modii cation Enzymes

Figure 1. Major spliceosome assembly and catalysis of pre-mRNA splicing. The thick lines represent the intron and the boxes are exons. The 5ʹ splice site (5ʹ-SS), the 3ʹ splice site (3ʹ-SS) and the branch point adenosine (BP) are indicated in the pre-mRNA. The conserved residues at the 5ʹ and 3ʹ splice sites and the branch site are shown. The headed thin lines are snRNAs with their names in the ellipses. The short thick lines between RNA strands represent Watson-Crick base-pairing interactions. The lightning symbols depict nonWatson-Crick base-pairing interactions. The 2ʹ-OH groups of branch point adenosine and the cut-off 5ʹ exon are pictured in the activated spliceosome. The small arrows near those 2ʹ-OH group indicate the nucleophilic chemical reactions also known as trans-esterification reactions.

the second transesteriication reaction, the postspliceosomal complex, containing the excised introns and the U2, U5 and U6 snRNPs, disassembles and the snRNPs are recycled for further rounds of splicing.22,23 A shared feature of spliceosomal snRNAs is their large content of posttranscriptional modiications, in particular, pseudouridylation.26,27 Pseudouridylation is a uridine-speciic modiication which results in the formation of the 5-ribosyl isomer of uridine, pseudouridine (Fig. 2). Over the past decade or so, numerous laboratories, including our own, have begun to unravel the molecular function of spliceosomal snRNA pseudouridylation. It has become increasingly clear that pseudouridine residues are not just bystanders in the process of pre-mRNA splicing, but actually participate in and perhaps directly inluence this catalytic reaction. Aside from an understanding of the function of snRNA pseudouridylation, the last decade has ushered in an understanding of the mechanism by which this modification is introduced. It is now clear that two distinct molecular mechanisms exist that are capable of site-specifically introducing pseudouridine residues within spliceosomal snRNAs. Here, we discuss spliceosomal snRNA pseudouridylation with an emphasis on the mechanistic strategies employed to carry out this modification, as well as review the importance of this modification in pre-mRNA splicing.

Spliceosomal snRNA Pseudouridylation

463

Figure 2. Pseudouridine is the 5-ribosyl isomer of uridine. Pseudouridine is formed from uridine by breakage of the glycosidic bond between N1 and C1’, 180˚ rotation of the base along the C6-N3 axis and reformation of the glycosidic bond between C5 and C1’. Pseudouridine has one more hydrogen bond donor (d) than uridine, while hydrogen bond acceptors (a) remain the same.

Discovery of U snRNA Pseudouridylation

he discovery of modiied nucleotides as constituents of RNA molecules irst occurred in the early 1950’s. In 1951 Cohn and Volkin preformed anion exchange chromatographic analysis of RNA hydrolysates and uncovered, in addition to the four classical nucleotides of RNA, a minute amount of an unidentiied material designated?28 In 1956 the nucleoside designated as? was identiied as 5-ribosyluracil, the 5-ribosyl isomer of 1-ribosyluracil (uridine).29 In addition to being known as 5-ribosyluracil this novel nucleoside was given the name pseudouridine and is abbreviated as Ψ.30 Shortly following the discovery of noncanonical ribonucleotides, the abundant spliceosomal snRNAs U1 and U2 were identiied.31-33 Surprisingly, sequencing analysis demonstrated that these snRNAs contained a fairly large amount of pseudouridine26 (Fig. 3). he U4, U5 and U6 snRNAs which were identiied later in time, once sequenced, were also shown to contain pseudouridine residues26,34 (Fig. 3).

Pseudouridylation of snRNA in Vertebrates

While the spliceosomal snRNAs were known to contain a large number of pseudouridine residues, research into the mechanism by which this modiication is introduced did not yield fruitful results until the discovery of snoRNA-guided ribosomal RNA (rRNA) modiication. In the snoRNA-guided modiication scheme, small noncoding RNAs, namely Box C/D and Box H/ACA RNA, are responsible for directing site-speciic 2ʹ-O-methylation and pseudouridylation (Fig. 4), respectively (see chapter by Grozdanov and Meier regarding Box H/ACA RNAs and the chapter by Maxwell and Colleagues regarding Box C/D RNAs.). Both RNAs assemble with an evolutionary conserved, yet distinct set of four core proteins (C/D RNAs: Nop1p, Nop56p, Nop58p and Snu13p; H/ACA RNAs: Cbf5p, Nhp2p, Nop10p and Gar1p).35-51 While the RNA component is responsible for dictating site-speciicity through complementary base-pairing interactions with the substrate RNA, the catalytic activity, i.e., modiication activity, is provided by one of the core protein components (Nop1p for 2ʹ-O-methylation and Cbf5p for pseudouridylation).52-55

464 DNA and RNA Modii cation Enzymes

Figure 3. Pseudouridines residues in human spliceosomal snRNAs. Primary and secondary structures of human major spliceosomal snRNAs (U1, U2, U4, U5 and U6) are shown. Pseudouridines (Ψ. are surrounded by rectangles. The thick lines indicate the nucleotides participating in RNA-RNA interactions or involved in catalysis during pre-mRNA splicing. The gray boxes highlight the Sm-binding sites. The 5ʹ caps (2,2,7 trimethylated guanosine cap for U1, U2, U4, U5 and γ-methylated guanosine cap for U6) are also depicted. 2ʹ-O-methylated residues are omitted for clarity. Pseudouridines which are also found in S. cerevisiae have a star below them.

Spliceosomal snRNA Pseudouridylation

465

Figure 4. Schematic representation of a Box H/ACA RNA. The minimal components of Box H/ACA RNAs are a lower stem, internal loop, upper stem, apical loop and a Box H and Box ACA. The Box ACA is typically located 3 nucleotides upstream of the 3ʹ end. The internal loop is capable of base-pairing with complementary sequences within the substrate RNA. The uridine residue targeted for pseudouridylation, as well the adjacent downstream nucleotide, are positioned at the base of the upper stem approximately 14-16 nucleotides upstream of either Box H or Box ACA and are left unpaired so as to remain accessible for isomerization Also shown is the CAB box (5ʹ-UGAG-3ʹ) which is depicted in the apical loop of the first hairpin. The CAB box is required for the retention of certain Box H/ACA RNPs within the Cajal body.

As both rRNAs and snRNAs contain a number of posttranscriptional modiications, in particular 2ʹ-O-methylation and pseudouridylation, it was hypothesized that similar mechanisms would be utilized to carry out both rRNA and snRNA modiication. Indeed, in 1998 U6 snRNA provided the irst glimpses into the mechanism of metazoan U snRNA modiication.56 Taking advantage of conserved elements identiied in RNAs that direct 2ʹ-O-methylation of rRNA, Tycowski et al56 searched available databases and identiied two Box C/D RNAs that may be responsible for U6 snRNA 2ʹ-O-methylation. Using Xenopus oocytes, they were able to show that depletion of the two endogenous Box C/D RNAs abolished 2ʹ-O-methylation of U6 snRNA at the predicted sites. Furthermore, 2ʹ-O-methylation of U6 snRNA was restored upon the injection of the two in vitro transcribed Box C/D RNAs. he following year, Box C/D RNA-directed 2ʹ-O-methylation was also demonstrated in mammalian cells.57 Though it was clear that 2ʹ-O-methylation of U6 snRNA was directed by snoRNPs, whether the RNA-guided mechanism applied to the pseudouridylation of U6 was still unclear. Furthermore, given that U6 snRNA difers from the other splicesomal snRNAs (U1, U2, U4 and U5) in various ways, whether the RNA-guided mechanism applied to the other spliceosomal snRNAs remained elusive.2,58 For instance, while U6 snRNA is transcribed by RNA polymerase III (Pol III), all other U snRNAs are all transcribed by RNA polymerase II (Pol II).59-62 Furthermore, U1, U2, U4 and U5 all possess a tri-methyl guanosine cap and tightly bind Sm core proteins, while U6 snRNA carries a γ-methyl cap and does not bind to Sm core proteins.63-67 Lastly, the biogenesis of U6 snRNA difers from that of the other snRNAs. While the other U snRNAs shuttle between the nucleus and cytoplasm during their biogenesis, U6 snRNA is believed to remain nuclear.

466

DNA and RNA Modii cation Enzymes

The first indications that Pol II-derived spliceosomal snRNAs were modified in an RNA-dependent manner came from the identiication of a number of mammalian small RNAs containing either Box C/D or Box H/ACA motifs, along with guide sequence(s) that could potentially target U2 and U4 snRNAs.68 However, the deinitive experimental proof came from the test of a novel RNA (U85) in human and drosophila that contained both Box C/D and Box H/ ACA motifs.69 Careful inspection of U85 revealed sequence complementarity to U5 snRNA, with the potential to modify C45 and U46 for 2ʹ-O-methylation and pseudouridylation, respectively. Subsequent analysis conirmed U85 as the guide RNA responsible for directing 2ʹ-O-methylation and pseudouridylation at positions C45 and U46 of U5 snRNA, respectively. Shortly thereater, three additional “hybrid” guide RNAs (U87, U88 and U89) were identiied that were predicted to guide 2ʹ-O-methylation of U4 and U5 snRNA as well as pseudouridylation of U5 snRNA.70 As research continued on the mechanism of U snRNA modiication it was soon realized that not all guide RNAs directing U snRNA modiication were “hybrids”; most fell into either the Box C/D or Box H/ACA RNA class.71 In fact, one Box H/ACA RNA, pugU2-34/44, was experimentally shown to direct pseudouridylation at two diferent sites within the branch point recognition region of Xenopus U2 snRNA.72 Analysis of the subnuclear localization of guide RNAs directing spliceosomal snRNA pseudouridylation demonstrated that they reside primarily in Cajal bodies. Cajal bodies are subnuclear structures that serve as sites for snRNA modiication.70,73 hus, these guide RNAs have been designated scaRNAs, for small Cajal body-speciic RNAs.70 It has recently been demonstrated that the retention of Box H/ACA RNPs within the Cajal body requires a sequence element referred to as the CAB box (5ʹ-UGAG-3ʹ) located in the apical loop of either hairpin74 (Fig. 4). Furthermore, the Sm proteins, SmB and SmD3, are necessary for Cajal body retention and speciically interact with scaRNAs through the CAB box.75 Interestingly, however, pugU2-34/44, a Xenopus Box H/ACA RNA that directs U2 pseudouridylation, appears to reside within the nucleoplasm of Xenopus oocytes.72 As recognition that spliceosomal snRNA pseudouridylation is catalyzed by scaRNPs grew, so did the efort to identify all of the Box H/ACA RNAs. hrough size fractionation of RNAs and co-immunoprecipitation with antibodies against Box H/ACA core proteins, as well as using bioinformatic approaches, numerous small RNAs have been identiied as potential Box H/ACA guide RNAs.68,71,76 To date, 16 of the 24 known sites of pseudouridylation within the major spliceosomal snRNAs (U1, U2, U4, U5 and U6) have been proven or predicted to be catalyzed by Box H/ACA RNPs.68-70,77-80 hus, the challenge remains to identify the remaining spliceosomal snRNA pseudouridylation guide RNAs. However, it is possible that snRNA pseudouridylation may be catalyzed by an RNA-independent or protein-only mechanism.

Pseudouridylation of snRNA in Saccharomyces cerevisiae

he pre-mRNA splicing machineries of HeLa cells and S. cerevisiae have been the most extensively studied systems with respect to spliceosome assembly and the catalysis of pre-mRNA splicing. However, whether the U snRNAs of S. cerevisiae were pseudouridylated remained unexplored until the late 1990’s. In 1999, Massenet et al81 identiied six pseudouridine residues in the spliceosomal snRNAs of S. cerevisiae, two within U1, three within U2 and one within U5. Furthermore, by screening yeast deletion strains of all previously identiied pseudouridine synthases they determined that Ψ44, one of the three pseudouridine residues in the branch site recognition region of U2 snRNA, was catalyzed by a single polypeptide enzyme known as Pus1p.81 Intriguingly, Pus1p was already shown to be responsible for the formation of at least eight diferent pseudouridine residues in tRNA.82a, 83a As the search continued to deine the mechanisms involved in the pseudouridylation of spliceosomal snRNA in S. cerevisiae it appeared that pseudouridine formation in the spliceosomal snRNA of S. cerevisiae was mechanistically distinct from that of pseudouridine formation in higher eukaryotes. In 2003, Ma et al82b bolstered this hypothesis when they utilized a yeast GST-ORF genomic library to show that the previously uncharacterized ORF YOR243c

Spliceosomal snRNA Pseudouridylation

467

catalyzed the formation of Ψ35 of U2 snRNA. ORF YOR243c was subsequently renamed as PUS7.82b Surprisingly, when the amino acid sequence of Pus7p was compared with those of other known pseudouridine synthases, namely those of the TruA, TruB, RluA and RsuA families, no signiicant homology was identiied. hus, Pus7p represented a family of pseudouridine synthases whose other members were yet to be identiied. Furthermore, a BLAST search of all available databases indicated the presence of Pus7p homologs in several diferent organisms including Schizosaccharomyces pombe, Caenorhabditis elegans, Drosophila melanogaster and humans. Shortly following the identiication of Pus7p, an E. coliI pseudouridine synthase, TruD, was identiied which contained homology to Pus7p.83b Pus7p has since been classiied as a member of the TruD family of pseudouridine synthases. hough mammalian spliceosomal snRNAs are pseudouridylated in a Box H/ACA guide RNA-dependent manner, all experimental evidence indicated that yeast spliceosomal snRNAs were pseudouridylated by a protein-only mechanism. Consistent with this idea, extensive yeast database searches failed to identify any Box H/ACA RNAs capable of directing pseudouridylation of yeast spliceosomal snRNA. However, in 2005, Ma et al experimentally determined that the pseudouridylation activity responsible for Ψ42 formation within U2 snRNA was pulled down by a GST-Nhp2p fusion protein.84 As Nhp2p is one of the core proteins of the Box H/ ACA RNP, Ma et al expressed tandem ainity puriication (TAP) tagged Cbf5p or Gar1p and further showed that these proteins were associated with the pseudouridylation activity responsible for Ψ42 formation. Subsequently, they created an RNA library from RNAs pulled down with TAP-tagged Gar1p. As a result the Box H/ACA RNA snR81 was shown to direct formation of Ψ42.84 Interestingly, snR81 also catalyzes the formation of Ψ1051 within the 25S rRNA using its other pseudouridylation pocket. his raises two interesting implications for RNA-guided pseudouridylation. First, as a single guide RNA is responsible for the pseudouridylation of two distinct types of RNA (spliceosomal snRNA and rRNA), which are known to be present in separate subnuclear compartments, it suggests that functional Box H/ACA RNPs may be capable of intranuclear transport, or that the division of substrates to distinct compartments is not necessary for their modiication. Secondly, if fully functional Box H/ACA RNPs are capable of intranuclear transport, it raises the possibility that other RNAs, in particular mRNAs, may be substrates for RNA-guided pseudouridylation as well.

Spliceosomal snRNA Pseudouridylation Afects Pre-mRNA Splicing

While it has long been known that the spliceosomal snRNAs contain pseudouridine, research directed to understand their in vivo function remained lacking until the 1990’s. A strong clue that pseudouridylation played an important role in pre-mRNA splicing comes from the fact that this modiication is found clustered within the snRNAs, particularly in regions with known functional importance, such as sites of RNA-RNA interactions (Fig. 3).26 Furthermore, there is a conservation of pseudouridylation within functionally important regions among various organisms. One of the earliest recognized functions of snRNA pseudouridylation was its role in U snRNP biogenesis and spliceosome assembly. In vitro analysis using HeLa cell S100 and nuclear extracts depleted of endogenous U2 snRNA demonstrated that in vitro transcribed U2 snRNA (which lacks posttranscriptional modiications) was unable to form stable U2 snRNP when analyzed by cesium-sulfate buoyant density gradient centrifugation.85,86 Further analyses in Xenopus oocytes using anti-snRNP immunoprecipitation in conjunction with glycerol gradient sedimentation demonstrated that while U2 snRNA lacking pseudouridine residues within its 5ʹ half is able to form nonfunctional 12S U2 snRNP particles, it is unable to detectably form functional 17S particles.87 A role for pseudouridylation in spliceosome assembly was further demonstrated by native gel analysis which indicates U2 snRNA lacking pseudouridine residues is incompetent in forming splicing complexes A, B and C.87,88 Furthermore, Zhao and Yu (2004) were able to show that pseudouridine residues within the branch site recognition region of Xenopus U2 snRNA are essential for U2 snRNP assembly and spliceosome assembly.88 Interestingly, the rate at which in vitro

468

DNA and RNA Modii cation Enzymes

transcribed U2 snRNA is modiied within the branch site recognition region is signiicantly faster than within the 5ʹ region of U2 snRNA, when injected into U2 depleted Xenopus oocytes.89 Besides a role in the assembly of catalytically competent snRNPs and splicing complexes, U snRNA pseudouridylation, in particular pseudouridylation within the branch site recognition region of U2 snRNA, has been demonstrated to inluence, directly or indirectly, the catalytic process of pre-mRNA splicing. Data regarding this aspect of U snRNA pseudouridylation has come primarily from the use of the genetically tractable yeast S. cerevisiae, in which it is rather easy to construct clean deletions of enzymes responsible for speciic U snRNA pseudouridine residues. Deletion of the gene encoding Pus7p, the pseudouridine synthase responsible for Ψ35 formation in yeast U2 snRNA, although viable, displays reduced itness under conditions of high salt or when in competition with a wild-type strain.82b Further analysis demonstrated that loss of Ψ35 in conjunction with U40G or U40Δ mutations in U2 snRNA severely reduced the organism’s itness.89 Analysis of pre-mRNA splicing by semi-quantitative RT-PCR indicated an accumulation of pre-mRNA in the pus7Δ U2-U40G and pus7Δ U2-U40Δ strains, while any single mutation resulted in minimal if any accumulation of pre-mRNA.90 he degree of pre-mRNA accumulation varied depending on the transcript analyzed, with fold increases ranging from 0 to >10.89 In line with the notion of pseudouridylation within the branch site recognition region of U2 snRNA afecting the catalytic potential of the spliceosome, the change of a single uridine (U35) to pseudouridine (Ψ35) signiicantly enhances the production of X-RNA, a product generated by a splicing related reaction in a cell- and protein-free system.91,92 Furthermore, both crystallographic and NMR data suggests Ψ35 is important for stabilizing the pre-mRNA/U2 snRNA duplex as well as maintaining the bulged out branch point adenosine for nucleophilic attack (the irst transesteriication reaction) during pre-mRNA splicing.93-95 Recently it has also been suggested that potent anti-cancer drug 5-luorouracil (5-FU), which is known to inhibit pseudouridine synthases, partially exerts its toxicity through the inhibition of pre-mRNA splicing.96 In line with this notion, it was demonstrated that treatment of HeLa cells with 5-FU results in the incorporation of 5-FU into natural sites of U2 snRNA pseudouridylation, almost completely blocking the formation of pseudouridines at these sites. Consequently, blockage of pseudouridylation results in an accumulation of pre-mRNA. Further analyses indicate that U2 snRNA puriied from 5-FU treated HeLa cells is unable to reconstitute splicing in U2 depleted Xenopus oocytes,96 thus conclusively showing that treatment of cells with 5-FU does have a potent and detrimental efect on U2 pseudouridylation and its function in pre-mRNA splicing. While the molecular function of U2 snRNA pseudouridylation has been the most extensively characterized, data from the Krainer lab suggests that U1 snRNA pseudouridylation functions in 5ʹ splice site selection.97 However, pseudouridylation of the other U snRNAs (U4, U5 and U6), has yet to be investigated.

Minor Spliceosomal snRNAs Are Pseudouridylated

While the majority of intronic sequences are removed by the aforementioned spliceosome (or the major spliceosome), within metazoans, there exists a rare class of introns (∼1-300) that are removed by a functionally similar, yet structurally distinct spliceosome, which is of much lower abundance (∼104 copies per cell) relative to components of the major spliceosome.98,99 hus, this spliceosome is referred to as the minor spliceosome. he activity of the minor spliceosome requires four distinct U snRNAs, namely U11, U12, U4atac and U6atac, while sharing the U5 snRNA with the major spliceosome (Fig. 5).98 Anaylsis of minor spliceosomal snRNAs from HeLa cells has demonstrated that they too contain pseudouridine residues (Fig. 5).100 To date, four pseudouridines have been identiied in the minor spliceosomal snRNAs, two within U12 and one each within U4atac and U6atac. However, pseudouridine formation within this class of snRNAs has yet to be mechanistically deined. Although fewer pseudouridine residues are present in the minor spliceosomal snRNAs when compared to the major spliceosomal snRNAs, the positions of pseudouridylation within U12 and U4atac are homologous to those within U2 and U4, respectively (Fig. 5), thus suggesting that these

Spliceosomal snRNA Pseudouridylation

469

Figure 5. Shown are primary and secondary structures of human minor spliceosomal snRNAs, U11, U12, U4atac and U6atac. U5 snRNA is shared by both the major and minor spliceosomes. Pseudouridines within U12 and U4atac are believed to function analogously to their homologous modifications within U2 and U4 snRNAs, respectively (for comparison and detailed legend, see Fig. 3).

pseudouridines are important for the splicing of minor introns. Interestingly, introns removed by the minor spliceosome contain more constrained consensus sequences at the 5ʹ end of the intron and BPS.101-103 hus, it is reasonable to hypothesize that the increased amount of pseudouridine residues present in the major spliceosomal snRNAs is necessitated by the fact that the major class (U2-type) introns contain less conserved consensus splice site sequences than the minor class (U12-type) introns. In support of this hypothesis, the introns of S. cerevisiae contain highly conserved consensus splice site sequences, while the spliceosomal snRNAs contain relatively few pseudouridine residues.

Concluding Remarks and Future Prospect

hroughout the last decade research into the mechanism of spliceosomal snRNA pseudouridylation has rapidly expanded. It appears that yeast exists as a sort of transitional fossil with regard to snRNA pseudouridylation, utilizing both a protein-only mechanism and an RNA-guided mechanism to carry out pseudouridylation, while higher eukaryotes appear to predominantly (probably only) utilize the RNA-guided mechanism (Table 1). Although the mechanistic generalities (i.e., protein-only vs RNA-guided) have been deduced, a more detailed picture is lacking. For instance, Box H/ACA RNAs have yet to be identiied for more than half of the pseudouridine residues in the human spliceosomal snRNA (major and minor). Furthermore, yeast spliceosomal snRNAs also contain several pseudouridine residues whose mechanism of formation is yet to be elucidated. Identiication of the enzymes responsible for the remaining pseudouridylations will provide a means to carry out a systematic analysis of their function in pre-mRNA splicing.

470

DNA and RNA Modii cation Enzymes

Table 1. Sites of pseudouridylation within yeast and human U snRNAs Organism

snRNA

Yeast

U1 U2

U5 Human

U1 U2

U4

U5

U6

U4atac

Position

Mechanism

Enzyme

Verified/Predicted

Reference

Ψ5

NR

NR

NR

81

Ψ6

NR

NR

NR

81

Ψ35

Protein only

Pus 7

Verified

81,82b

Ψ42

H/ACA RNP

snR81

Verified

81,84

Ψ44

Protein only

Pus 1

Verified

81

Ψ99

NR

NR

NR

81

Ψ5

H/ACA RNP

ACA47

Predicted

78,104

Ψ6

H/ACA RNP

U109

Predicted

79,104

Ψ6

NR

NR

NR

105,26

Ψ7

H/ACA RNP

U100

Predicted

68,80,105,26

Ψ15

NR

NR

NR

105,26

Ψ34

H/ACA RNP

U92

Predicted

70,105,26

Ψ37

H/ACA RNP

ACA45

Predicted

78,105,26 78,105,26

Ψ39

H/ACA RNP

ACA26

Predicted

Ψ41

H/ACA RNP

ACA26

Predicted

78,105,26

Ψ43

NR

NR

NR

105,26

Ψ44

H/ACA RNP

U92

Predicted

70,105,26

Ψ54

H/ACA RNP

U93

Predicted

77,80,105,26

Ψ58

NR

NR

NR

105,26

Ψ89

H/ACA RNP

ACA35

Predicted

78,105,26

Ψ91

NR

NR

NR

105,26

Ψ4

NR

NR

NR

107,106

Ψ72

NR

NR

NR

106

Ψ79

NR

NR

NR

107,106

Ψ43

H/ACA RNP

ACA57

Predicted

78,108

Ψ46

H/ACA RNP

U85

Verified

69,108

H/ACA RNP

U89

Predicted

70,108

Ψ53

H/ACA RNP

U93

Predicted

77,80,108

Ψ31

H/ACA RNP

ACA65

Predicted

80,109

Ψ40

H/ACA RNP

ACA12

Predicted

78,109

HBI-100

Predicted

68,109

Ψ86

H/ACA RNP

ACA65

Predicted

68,109

Ψ12

NR

NR

NR

100 100

U6atac

Ψ83

NR

NR

NR

U12

Ψ19

H/ACA RNP

ACA68

Predicted

80,100

Ψ28

H/ACA RNP

ACA66

Predicted

80,100

Note: NR is for not reported.

Spliceosomal snRNA Pseudouridylation

471

Although the pseudouridylation of U2 snRNA has been extensively investigated, many questions still remain unaddressed. For instance, does spliceosomal snRNA pseudouridylation function during splice site selection? Is the large amount of spliceosomal snRNA pseudouridylations present in higher eukaryotes necessitated by the lack of strong consensus sequences in pre-mRNA? Does the presence of snRNA pseudouridylation increase the catalytic eiciency and accuracy of pre-mRNA splicing? With the growing attention that has been given to spliceosomal snRNA pseudouridylation, we expect the answer to these as well as other mysteries concerning the mechanisms and functions of snRNA pseudouridylation will soon emerge.

Acknowledgements

We would like to thank the members of the Yu lab for insightful discussions regarding spliceosomal snRNA pseudouridylation. Our work was supported by grant GM62937 (to Yi-Tao Yu) from the National Institute of Health. J.K. was supported by a NIH Institutional Ruth L. Kirschstein National Research Service Award GM068411.

References

1. Staley JP, Guthrie C. Mechanical devices of the spliceosome: motors, clocks, springs, and things. Cell 1998; 92(3):315-326. 2. Yu YT, Scharl EC, Smith CM et al. he growing world of small nuclear ribonucleoproteins. In: Gesteland RF, Cech TR, Atkins JF, eds. he RNA World, 2nd ed. Cold Spring Harbor: Cold Spring Harbor Laboratory Press, 1999:487-524. 3. Jurica MS, Moore MJ. Pre-mRNA splicing: awash in a sea of proteins. Mol Cell 2003;12(1):5-14. 4. Valadkhan S. snRNAs as the catalysts of pre-mRNA splicing. Curr Opin Chem Biol 2005; 9(6):603-608. 5. Nilsen TW. RNA-RNA interactions in the spliceosome: unraveling the ties that bind. Cell 1994; 78(1):1-4. 6. Madhani HD, Guthrie C. Dynamic RNA-RNA interactions in the spliceosome. Annu Rev Genet 1994; 28:1-26. 7. Nilsen TW. RNA–RNA interactions in nuclear pre-mRNA splicing. In: Simons RW and Grunberg-Manago M, eds. RNA Structure and Function, Cold Spring Harbor: Cold Spring Harbor Laboratory Press, 1998:279-307. 8. Konarska MM, Query CC. Insights into the mechanisms of splicing: more lessons from the ribosome. Genes Dev 2005; 19(19):2255-2260. 9. Smith DJ, Query CC, Konarska MM. “Nought may endure but mutability”: spliceosome dynamics and the regulation of splicing. Mol Cell 2008; 30(6):657-666. 10. Bindereif A, Green MR. An ordered pathway of snRNP binding during mammalian pre-mRNA splicing complex assembly. EMBO J 1987; 6(8):2415-2424. 11. Ruby SW, Abelson J. An early hierarchic role of U1 small nuclear ribonucleoprotein in spliceosome assembly. Science 1988; 242(4881):1028-1035. 12. Legrain P, Seraphin B, Rosbash M. Early commitment of yeast pre-mRNA to the spliceosome pathway. Mol Cell Biol 1988; 8(9):3755-3760. 13. Seraphin B, Rosbash M. Identiication of functional U1 snRNA-pre-mRNA complexes committed to spliceosome assembly and splicing. Cell 1989; 59(2):349-358. 14. Seraphin B, Rosbash M. he yeast branchpoint sequence is not required for the formation of a stable U1 snRNA-pre-mRNA complex and is recognized in the absence of U2 snRNA. EMBO J 1991; 10(5):1209-1216. 15. Michaud S, Reed R. An ATP-independent complex commits pre-mRNA to the mammalian spliceosome assembly pathway. Genes Dev 1991; 5(12B):2534-2546. 16. Konarska MM, Sharp PA. Electrophoretic separation of complexes involved in the splicing of precursors to mRNAs. Cell 1986; 46(6):845-855. 17. Pikielny CW, Rymond BC, Rosbash M. Electrophoresis of ribonucleoproteins reveals an ordered assembly pathway of yeast splicing complexes. Nature 1986; 324(6095):341-345. 18. Frendewey D, Kramer A, Keller W. Diferent small nuclear ribonucleoprotein particles are involved in diferent steps of splicing complex formation. Cold Spring Harb Symp Quant Biol 1987; 52:287-298. 19. Cheng SC, Abelson J. Spliceosome assembly in yeast. Genes Dev 1987; 1(9):1014-1027. 20. Konarska MM, Sharp PA. Interactions between small nuclear ribonucleoprotein particles in formation of spliceosomes. Cell 1987; 49(6):763-774. 21. Lamond AI, Konarska MM, Grabowski PJ et al. Spliceosome assembly involves the binding and release of U4 small nuclear ribonucleoprotein. Proc Natl Acad Sci U S A 1988; 85(2):411-415. 22. Ruby SW, Abelson J. Pre-mRNA splicing in yeast. Trends Genet 1991; 7(3):79-85.

472

DNA and RNA Modii cation Enzymes

23. Moore MJ, Query CC. Sharp PA. Splicing of precursors to mRNAs by the splicesome. In Gesteland RF, Atkins JF, eds. he RNA World, 1st ed. Cold Spring Harbor: Cold Spring Harbor Laboratory Press, 1993:303-358. 24. Sawa H, Abelson J. Evidence for a base-pairing interaction between U6 small nuclear RNA and 5’ splice site during the splicing reaction in yeast. Proc Natl Acad Sci U S A 1992; 89(23):11269-11273. 25. Lesser CF, Guthrie C. Mutations in U6 snRNA that alter splice site speciicity: implications for the active site. Science 1993; 262(5142):1982-1988. 26. Reddy R, and Busch H. Small nuclear RNAs: RNA sequences, structure, and modiications. In: Birnsteil ML, ed. Structure and Function of Major and Minor Small Nuclear Ribonucleoprotein Particles, Heidelberg: Springer-Verlag Press, 1988:1-37. 27. Massenet S, Mougin A, Branlant C. Posttranscriptional modiications in the U small nuclear RNAs. In: Grosjean H, ed. Modiication and Editing of RNA, Washington DC: ASM Press, 1998:201-228. 28. Cohn WE. Nucleoside-5’-phosphates from ribonucleic acid. Nature 1951; 167:483-484. 29. Davis FF, Allen FW. Ribonucleic acids from yeast which contain a ith nucleotide. J Biol Chem 1957; 227(2):907-915. 30. Cohn WE. Pseudouridine, a carbon-carbon linked ribonucleoside in ribonucleic acids: isolation, structure, and chemical characteristics. J Biol Chem 1960; 235:1488-1498. 31. Muramatsu M, Busch H. Studies on the nuclear and nucleolar ribonucleic acid of regenerating rat liver. J Biol Chem 1965; 240(10):3960-3966. 32. Hodnett JL, Busch H. Isolation and characterization of uridylic acid-rich 7 S ribonucleic acid of rat liver nuclei. J Biol Chem 1968; 243(24):6334-6342. 33. Weinberg RA, Penman S. Small molecular weight monodisperse nuclear RNA. J Mol Biol 1968; 38(3):289-304. 34. Lerner MR, Steitz JA. Antibodies to small nuclear RNAs complexed with proteins are produced by patients with systemic lupus erythematosus. Proc Natl Acad Sci U S A 1979; 76(11):5495-5499. 35. Ochs RL, Lischwe MA, Spohn WH et al. Fibrillarin: a new protein of the nucleolus identiied by autoimmune sera. Biol Cell 1985; 54(2):123-133. 36. Gautier T, Berges T, Tollervey D et al. Nucleolar KKE/D repeat proteins Nop56p and Nop58p interact with Nop1p and are required for ribosome biogenesis. Mol Cell Biol 1997; 17(12):7088-7098. 37. Henras A, Henry Y, Bousquet-Antonelli C et al. Nhp2p and Nop10p are essential for the function of H/ ACA snoRNPs. EMBO J 1998; 17(23):7078-7090. 38. Watkins NJ, Gottschalk A, Neubauer G et al. Cbf5p, a potential pseudouridine synthase, and Nhp2p, a putative RNA-binding protein, are present together with Gar1p in all H BOX/ACA-motif snoRNPs and constitute a common bipartite structure. RNA 1998; 4(12):1549-1568. 39. Lafontaine DL, Tollervey D. Nop58p is a common component of the box C+D snoRNPs that is required for snoRNA stability. RNA 1999; 5(3):455-467. 40. Pogacic V, Dragon F, Filipowicz W. Human H/ACA small nucleolar RNPs and telomerase share evolutionarily conserved proteins NHP2 and NOP10. Mol Cell Biol 2000; 20(23):9028-9040. 41. Watkins NJ, Segault V, Charpentier B et al. A common core RNP structure shared between the small nucleoar box C/D RNPs and the spliceosomal U4 snRNP. Cell 2000; 103(3):457-466. 42. Watanabe Y, Gray MW. Evolutionary appearance of genes encoding proteins associated with box H/ACA snoRNAs: cbf5p in Euglena gracilis, an early diverging eukaryote, and candidate Gar1p and Nop10p homologs in archaebacteria. Nucleic Acids Res 2000; 28(12):2342-2352. 43. Lafontaine DL, Tollervey D. Synthesis and assembly of the box C+D small nucleolar RNPs. Mol Cell Biol 2000; 20(8):2650-2659. 44. Dragon F, Pogacic V, Filipowicz W. In vitro assembly of human H/ACA small nucleolar RNPs reveals unique features of U17 and telomerase RNAs. Mol Cell Biol 2000; 20(9):3037-3048. 45. Klein DJ, Schmeing TM, Moore PB et al. he kink-turn: a new RNA secondary structure motif. EMBO J 2001; 20(15):4214-4221. 46. Kuhn JF, Tran EJ, Maxwell ES. Archaeal ribosomal protein L7 is a functional homolog of the eukaryotic 15.5kD/Snu13p snoRNP core protein. Nucleic Acids Res 2002; 30(4):931-941. 47. Galardi S, Fatica A, Bachi A et al. Puriied box C/D snoRNPs are able to reproduce site-speciic 2’-O-methylation of target RNA in vitro. Mol Cell Biol 2002; 22(19):6663-6668. 48. Omer AD, Ziesche S, Ebhardt H et al. In vitro reconstitution and activity of a C/D box methylation guide ribonucleoprotein complex. Proc Natl Acad Sci U S A 2002; 99(8):5289-5294. 49. Rozhdestvensky TS, Tang TH, Tchirkova IV et al. Binding of L7Ae protein to the K-turn of archaeal snoRNAs: a shared RNA binding motif for C/D and H/ACA box snoRNAs in Archaea. Nucleic Acids Res 2003; 31(3):869-877. 50. Wang C, Meier UT. Architecture and assembly of mammalian H/ACA small nucleolar and telomerase ribonucleoproteins. EMBO J 2004; 23(8):1857-1867. 51. Yu YT, Terns RM, Terns MP. Mechanisms and Functions of RNA-guided RNA Modiication. In: Grosjean H, ed. Topics in current genetics, vol 12. Berlin- Heidelberg: Springer-Verlag, 2005:223-262.

Spliceosomal snRNA Pseudouridylation

473

52. Schimmang T, Tollervey D, Kern H et al. A yeast nucleolar protein related to mammalian ibrillarin is associated with small nucleolar RNA and is essential for viability. EMBO J 1989; 8(13):4015-4024. 53. Tollervey D, Lehtonen H, Jansen R et al. Temperature-sensitive mutations demonstrate roles for yeast ibrillarin in pre-rRNA processing, pre-rRNA methylation, and ribosome assembly. Cell 1993; 72(3):443-457. 54. Lafontaine DL, Bousquet-Antonelli C, Henry Y et al. he box H + ACA snoRNAs carry Cbf5p, the putative rRNA pseudouridine synthase. Genes Dev 1998; 12(4):527-537. 55. Zebarjadian Y, King T, Fournier MJ et al. Point mutations in yeast CBF5 can abolish in vivo pseudouridylation of rRNA. Mol Cell Biol 1999; 19(11):7461-7472. 56. Tycowski KT, You ZH, Graham PJ et al. Modiication of U6 spliceosomal RNA is guided by other small RNAs. Mol Cell 1998; 2(5):629-638. 57. Ganot P, Jady BE, Bortolin ML et al. Nucleolar factors direct the 2’-O-ribose methylation and pseudouridylation of U6 spliceosomal RNA. Mol Cell Biol 1999; 19(10):6906-6917. 58. Matera AG, Terns RM, Terns MP. Non-coding RNAs: lessons from the small nuclear and small nucleolar RNAs. Nat Rev Mol Cell Biol 2007; 8(3):209-220. 59. Ro-Choi TS, Raj NB, Pike LM et al. Efects of alpha-amanitin, cycloheximide, and thioacetamide on low molecular weight nuclear RNA. Biochemistry 1976; 15(17):3823-3828. 60. Frederiksen S, Hellung-Larsen P, Gram Jensen E. he diferential inhibitory efect of alpha-amanitin on the synthesis of low molecular weight RNA components in BHK cells. FEBS Lett 1978; 87(2):227-231. 61. Kunkel GR, Maser RL, Calvet JP et al. U6 small nuclear RNA is transcribed by RNA polymerase III. Proc Natl Acad Sci U S A 1986; 83(22):8575-8579. 62. Reddy R, Henning D, Das G et al. he capped U6 small nuclear RNA is transcribed by RNA polymerase III. J Biol Chem 1987; 262(1):75-81. 63. Bringmann P, Reuter R, Rinke J et al. 5’-terminal caps of snRNAs are accessible for reaction with 2,2,7-trimethylguanosine-speciic antibody in intact snRNPs. J Biol Chem 1983; 258(5):2745-2747. 64. Bringmann P, Rinke J, Appel B et al. Puriication of snRNPs U1, U2, U4, U5 and U6 with 2,2,7-trimethylguanosine-speciic antibody and deinition of their constituent proteins reacting with anti-Sm and anti-(U1)RNP antisera. EMBO J 1983; 2(7):1129-1135. 65. Singh R, Reddy R. Gamma-monomethyl phosphate: a cap structure in spliceosomal U6 small nuclear RNA. Proc Natl Acad Sci U S A 1989; 86(21):8280-8283. 66. Seraphin B. Sm and Sm-like proteins belong to a large family: identiication of proteins of the U6 as well as the U1, U2, U4 and U5 snRNPs. EMBO J 1995; 14(9):2089-2098. 67. Mayes AE, Verdone L, Legrain P et al. Characterization of Sm-like proteins in yeast and their association with U6 snRNA. EMBO J 1999; 18(15):4321-4331. 68. Huttenhofer A, Kiefmann M, Meier-Ewert S et al. RNomics: an experimental approach that identiies 201 candidates for novel, small, non-messenger RNAs in mouse. EMBO J 2001; 20(11):2943-2953. 69. Jady BE, Kiss T. A small nucleolar guide RNA functions both in 2’-O-ribose methylation and pseudouridylation of the U5 spliceosomal RNA. EMBO J 2001; 20(3):541-551. 70. Darzacq X, Jady BE, Verheggen C et al. Cajal body-speciic small nuclear RNAs: a novel class of 2’-O-methylation and pseudouridylation guide RNAs. EMBO J 2002; 21(11):2746-2756. 71. Kiss T, Jady BE. Functional characterization of 2’-O-methylation and pseudouridylation guide RNAs. Methods Mol Biol 2004; 265:393-408. 72. Zhao X, Li ZH, Terns RM et al. An H/ACA guide RNA directs U2 pseudouridylation at two diferent sites in the branchpoint recognition region in Xenopus oocytes. RNA 2002; 8(12):1515-1525. 73. Jady BE, Darzacq X, Tucker KE et al. Modiication of Sm small nuclear RNAs occurs in the nucleoplasmic Cajal body following import from the cytoplasm. EMBO J 2003; 22(8):1878-1888. 74. Richard P, Darzacq X, Bertrand E et al. A common sequence motif determines the Cajal body-speciic localization of box H/ACA scaRNAs. EMBO J 2003; 22(16):4283-4293. 75. Fu D, Collins K. Human telomerase and Cajal body ribonucleoproteins share a unique speciicity of Sm protein association. Genes Dev 2006; 20(5):531-536. 76. Vitali P, Royo H, Seitz H et al. Identiication of 13 novel human modiication guide RNAs. Nucleic Acids Res 2003; 31(22):6543-6551. 77. Kiss AM, Jady BE, Darzacq X et al. A Cajal body-speciic pseudouridylation guide RNA is composed of two box H/ACA snoRNA-like domains. Nucleic Acids Res 2002; 30(21):4643-4649. 78. Kiss AM, Jady BE, Bertrand E et al. Human box H/ACA pseudouridylation guide RNA machinery. Mol Cell Biol 2004; 24(13):5797-5807. 79. Gu AD, Zhou H, Yu CH et al. A novel experimental approach for systematic identiication of box H/ ACA snoRNAs from eukaryotes. Nucleic Acids Res 2005; 33(22):e194. 80. Schattner P, Barberan-Soler S, Lowe TM. A computational screen for mammalian pseudouridylation guide H/ACA RNAs. RNA 2006; 12(1):15-25.

474

DNA and RNA Modii cation Enzymes

81. Massenet S, Motorin Y, Lafontaine DL et al. Pseudouridine mapping in the Saccharomyces cerevisiae spliceosomal U small nuclear RNAs (snRNAs) reveals that pseudouridine synthase pus1p exhibits a dual substrate speciicity for U2 snRNA and tRNA. Mol Cell Biol 1999; 19(3):2142-2154. 82a. Motorin Y, Keith G, Simon C et al. he yeast tRNA:pseudouridine synthase Pus1p displays a multisite substrate speciicity. RNA 1998; 4(7):856-869. 83a. Behm-Ansmant I, Massenet S, Immel F et al. A previously unidentiied activity of yeast and mouse RNA:pseudouridine synthases 1 (Pus1p) on tRNAs. RNA 2006; 12(8):1583-1593. 82b. Ma X, Zhao X, Yu YT. Pseudouridylation (Psi) of U2 snRNA in S. cerevisiae is catalyzed by an RNA-independent mechanism. EMBO J 2003; 22(8):1889-1897. 83b. Kaya Y, Ofengand J. A novel unanticipated type of pseudouridine synthase with homologs in bacteria, archaea, and eukarya. RNA 2003; 9(6):711-721. 84. Ma X, Yang C, Alexandrov A et al. Pseudouridylation of yeast U2 snRNA is catalyzed by either an RNA-guided or RNA-independent mechanism. EMBO J 2005; 24(13):2403-2413. 85. Patton JR. Multiple pseudouridine synthase activities for small nuclear RNAs. Biochem J 1993; 290 (Pt 2):595-600. 86. Patton JR. Ribonucleoprotein particle assembly and modiication of U2 small nuclear RNA containing 5-luorouridine. Biochemistry 1993; 32(34):8939-8944. 87. Yu YT, Shu MD, Steitz JA. Modiications of U2 snRNA are required for snRNP assembly and pre-mRNA splicing. EMBO J 1998; 17(19):5783-5795. 88. Donmez G, Hartmuth K, Luhrmann R. Modiied nucleotides at the 5’ end of human U2 snRNA are required for spliceosomal E-complex formation. RNA 2004; 10(12):1925-1933. 89. Zhao X, Yu YT. Pseudouridines in and near the branch site recognition region of U2 snRNA are required for snRNP biogenesis and pre-mRNA splicing in Xenopus oocytes. RNA 2004; 10(4):681-690. 90. Yang C, McPheeters DS, Yu YT. Psi35 in the branch site recognition region of U2 small nuclear RNA is important for pre-mRNA splicing in Saccharomyces cerevisiae. J Biol Chem 2005; 280(8):6655-6662. 91. Valadkhan S, Manley JL. Splicing-related catalysis by protein-free snRNAs. Nature 2001; 413(6857):701-707. 92. Valadkhan S, Manley JL. Characterization of the catalytic activity of U2 and U6 snRNAs. RNA 2003; 9(7):892-904. 93. Newby MI, Greenbaum NL. A conserved pseudouridine modiication in eukaryotic U2 snRNA induces a change in branch-site architecture. RNA 2001; 7(6):833-845. 94. Newby MI, Greenbaum NL. Sculpting of the spliceosomal branch site recognition motif by a conserved pseudouridine. Nat Struct Biol 2002; 9(12):958-965. 95. Lin Y, Kielkopf CL. X-ray structures of U2 snRNA-branchpoint duplexes containing conserved pseudouridines. Biochemistry 2008; 47(20):5503-5514. 96. Zhao X, Yu YT. Incorporation of 5-luorouracil into U2 snRNA blocks pseudouridylation and pre-mRNA splicing in vivo. Nucleic Acids Res 2007; 35(2):550-558. 97. Roca X, Sachidanandam R, Krainer AR. Determinants of the inherent strength of human 5’ splice sites. RNA 2005; 11(5):683-698. 98. Tarn WY, Steitz JA. A novel spliceosome containing U11, U12, and U5 snRNPs excises a minor class (AT-AC) intron in vitro. Cell 1996; 84(5):801-811. 99. Patel AA, Steitz JA. Splicing double: insights from the second spliceosome. Nat Rev Mol Cell Biol 2003; 4(12):960-970. 100. Massenet S, Branlant C. A limited number of pseudouridine residues in the human atac spliceosomal UsnRNAs as compared to human major spliceosomal UsnRNAs. RNA 1999; 5(11):1495-1503. 101. Dietrich RC, Incorvaia R, Padgett RA. Terminal intron dinucleotide sequences do not distinguish between U2- and U12-dependent introns. Mol Cell 1997; 1(1):151-160. 102. Sharp PA, Burge CB. Classiication of introns: U2-type or U12-type. Cell. 1997; 91(7):875-879. 103. Burge CB, Padgett RA, Sharp PA. Evolutionary fates and origins of U12-type introns. Mol Cell 1998; 2(6):773-785. 104. Reddy R, Henning D, Busch H. Pseudouridine residues in the 5’-terminus of uridine-rich nuclear RNA I (U1 RNA). Biochem Biophys Res Commun 1981; 98(4):1076-1083. 105. Shibata H, Ro-Choi TS, Reddy R et al. he primary nucleotide sequence of nuclear U-2 ribonucleic acid. he 5’-terminal portion of the molecule. J Biol Chem 1975; 250(10):3909-3920. 106. Krol A, Branlant C, Lazar E et al. Primary and secondary structures of chicken, rat and man nuclear U4 RNAs. Homologies with U1 and U5 RNAs. Nucleic Acids Res 1981; 9(12):2699-2716. 107. Reddy R, Henning D, Busch H. he primary nucleotide sequence of U4 RNA. J Biol Chem 1981; 256(7):3532-3538. 108. Krol A, Gallinaro H, Lazar E et al. he nuclear 5S RNAs from chicken, rat and man. U5 RNAs are encoded by multiple genes. Nucleic Acids Res 1981; 9(4):769-787. 109. Epstein P, Reddy R, Henning D et al. he nucleotide sequence of nuclear U6 (4.7 S) RNA. J Biol Chem 1980; 255(18):8901-8906.

Chapter 33

Transfer RNA Aminoacylation and Modiied Nucleosides Richard Giegé* and Jacques Lapointe

Abstract

A

mong RNAs, the transfer RNAs are those showing the highest level of posttranscriptional modiications. Ater an overview on early data, the chapter discusses the present knowledge on the role modiied nucleosides have on tRNA structure and function with emphasis on tRNA aminoacylation. he concept of tRNA aminoacylation identity will be outlined and the cases discussed where individual modiied nucleosides act either as positive determinants (for recognition by the cognate synthetases) or negative antideterminants (preventing recognition by a noncognate synthetase). Furthermore, the collective participation of the ensemble of modiied nucleosides in a given tRNA will also be analyzed. Evolutionary aspects will be illustrated by the unprecedented property of a paralog of bacterial glutamyl-tRNA synthetase restricted to the catalytic module of the synthetase, that aminoacylates the Q-base of bacterial tRNAAsp. his has evolutionary implications suggesting that modern tRNA originated by duplication of an ancestral minihelix and inds support with the existence of sequence similarities between the anticodon stem-loop of tRNAAsp and the accepting end of tRNAGlu. Altogether and contrarily to a common belief, posttranscriptional modiications in tRNA play an active role in a majority of aminoacylation systems, although in many cases by indirect structure-dependent efects.

Introduction

Aminoacylation of tRNAs is a cardinal process in all forms of life. It dictates faithful protein synthesis since tRNA mischarging can lead to incorporation of false amino acids into the growing protein chains.1 Aminoacylation reactions are catalyzed by aminoacyl-tRNA synthetases (aaRSs), a family of enzymes with great structural variability ranked in two classes according to the architecture of their catalytic domain (reviewed in ref. 2). In general there is one aaRS speciic for each of the twenty amino acids of the genetic code, although AsnRS and GlnRS can be missing in archaeal and bacterial organisms as well as in organelles within eukaryotes (reviewed in ref. 3). It is common sense to say that speciicity of tRNA aminoacylation relies on structural features in tRNA molecules. As anticipated, it is indeed governed by sequence features that are identity determinants and antideterminants (for tRNA sequences, see reference 4 and http://trnadb.bioinf.uni-leipzig.de). Determinants are nucleotides that interact with aaRSs such as the G3−U70 base pair for alanine identity5,6 and antideterminants are other nucleotides that prevent false tRNA interactions with noncognate aaRSs, such as A36 in Escherichia coli tRNATrp that prevents its recognition by ArgRS.7 he inding that partially modiied tRNA transcripts8 and even completely unmodiied tRNA transcripts can be eicient aaRS substrates9,10 argued against a role of modiied nucleosides dur*Corresponding Author: Richard Giegé—Architecture et Réactivité de l’ARN, Université Louis Pasteur de Strasbourg, CNRS, IBMC, 15 rue René Descartes, 76084 Strasbourg, France. Email: [email protected]

DNA and RNA Modiication Enzymes: Structure, Mechanism, Function and Evolution edited by Henri Grosjean. ©2009 Landes Bioscience.

476

DNA and RNA Modii cation Enzymes

ing tRNA aminoacylation. At opposite, however and because tRNAs are the RNAs containing the highest level of posttranscriptional modiications with about a hundred such residues so far characterized (http://medlib.med.utah.edu/RNAmods/), it can also be conjectured that some of these modiied nucleosides will play a role in tRNA aminoacylation. his statement inds support with the demonstration of the antideterminant role of m1G37 in yeast tRNAAsp preventing recognition by yeast ArgRS11,12 and the discovery of glutamylated-Q34 in the anticodon loop of bacterial tRNAAsp, with the cyclopentenediol ring of Q being the amino acid acceptor for eicient charging by a GluRS paralog (reviewed in refs. 13-15) (for further details see below). On the other hand, it was found that aminoacylation is sensitive to subtle conformational features in tRNA (discussed in refs. 16,17) and that tRNA conformation is tuned by the presence of modiied nucleosides (see e.g., refs. 18-21). hese observations as well suggest a functional role of modiied residues during aminoacylation, although this role would be of indirect nature. his chapter outlines the idiosyncratic features in tRNA identity brought by the presence of modiied nucleosides. he discussion will be based on the present understanding of tRNA identities, taking into account observed subtleties in their expression and the role of architectural features in tRNA.

Role of Modiied Nucleosides for tRNA Structure

In contrast to DNA that is chemically robust, RNA is intrinsically fragile due to the presence of the 2ʹ-OH group of ribose and this holds true for tRNA (reviewed in ref. 22). his 2ʹ-OH group plays the key role in the hydrolytic and enzymatic cleavage mechanisms of RNA when forming a cyclic intermediate with the neighboring 3ʹ-phosphate. hese chemical characteristics have biological signiicance since integrity of genetic information within DNA has to be maintained, whereas fragility of RNA is compatible with its short transient lifetime within cells and could even facilitate its physiological recycling. Nevertheless, to ensure an improved chemical stability of RNA, nature developed two strategies to protect it against hydrolysis. hey consist in preventing cyclic phosphate formation, either by reducing the conformational lexibility of the ribophosphate chain or by modifying the 2ʹ-OH group of ribose. In the case of tRNA, its rather compact structure as such is a good protection against hydrolysis, but does not completely prevent cleavages in the more lexible loop regions and the terminal -NCCAOH sequences.22 Interestingly, D-loops, that are the most mobile regions, very oten contain nucleotides with 2ʹ-OH methylated ribose moieties4,23 and it was explicitly shown that such methylation protects against hydrolytic cleavages.22 It was further found that modiied tRNAs have more rigid structures than unmodiied transcripts, as irst demonstrated by thermal melting,24 NMR25 and chemical probing.26 On the functional side it was shown that cuts in a tRNA sequence do not necessarily lead to inactivation since fragment-reconstituted tRNA molecules with cleaved anticodon-, T- and D-loops can keep aminoacylation capacity.27,28 his indicates that structural integrity of tRNA is not a prerequisite for aminoacylation. Also, destabilization of the native tRNA conformation, for instance by organic solvents, favors mischarging, in other words recognition of tRNA by noncognate aaRSs (reviewed in ref. 20). hese properties are in line with the capacity of many tRNA transcripts with relaxed speciicity to be aminoacylated and suggest that tRNA modiications improve the overall speciicity of the aminoacylation process. On the other hand biochemical and computational studies have shown that RNA including tRNA can fold in alternative structures (reviewed in ref. 29). Interestingly, the presence of modiied nucleotides restricts the conformation space and in a few cases it was shown that modiications are required to reach the canonical cloverleaf-conformation.21 How individual modiied residues in tRNA, diferent from modiied riboses, exert their action on tRNA stability is essentially unknown. A puzzle of results, however, brings partial answers, such as the correlation between thermostability and the presence of thiolation of T54 in the T-loop of tRNAs from thermophilic organisms,30,31 the preferential binding of water molecules to tRNA helices containing Ψ-residues,32 the folding of human mitochondrial tRNALys mediated by the methyl group of m1A9 that controls equilibrium

477

Transfer RNA Aminoacylation and Modiied Nucleosides

between cloverleaf and hairpin conformations,33,34 the inluence of the highly conserved nucleoside modiications in the T-arm (m1C49, T54 and Ψ55) on Mg2+-dependent tRNA folding,35 or the need of m1A9 in T-armless tRNASer from nematode mitochondria for high levels of aminoacylation.36 he general trend suggests that structural stabilization of tRNA is accomplished by subtle cooperative efects of the ensemble of modiied residues.21 A typical example is E. coli tRNAGlu (ref. 37). Here the overall content of modiied residues was modulated by overproducing the tRNA in its homologous host. As a result, several distinctly modiied forms of tRNAGlu, named modivariants, could be isolated (Table 1) that revealed discrete conformational changes in their loops and variable regions as shown by chemical probing experiments. hus the predominant tRNAGlu modivariant Table 1. Effect of modified nucleosides on E. coli tRNAGlu structure and on its aminoacylation by cognate GluRS Modivariant (n˚) 1

2

3

4

5

Ψ

U

Ψ

U

2

sU

s2U

Location of modified nucleosidesa 13

Ψ

34

Ψ 5 2

mnm s U

5 2

mnm s U

37

mA

mA

A

A

m2 A

54

T

T

T

T

T

55

Ψ

Ψ

Ψ

Ψ

Ψ

65

Ψ

U

U

U

U

2

2

Accessibility to structural probes (modivariant n˚2 as reference) Ethylnitrosourea (phosphate specific) Increasedb

U35 G50

Reference

U35

U35

U35

Decreasedb

G23 A 24

Reference

A 21 A46 C48

-

-

Pb2+ (probes flexibility of ribophosphate backbone) Increasedb

C36

Reference

C12 Ψ13 C20b A49 A59

-

C36

Decreasedb

-

Reference

A37

U35 A37

-

6.5

8.5

1.5

6.2

9.6

1.8

1.0

25

3.6

0.77

2.2

1.0

3.4

5.6

7.6

0.6

1.0

520

0.76

0.82

Kinetic parameters kcat (sec−1) ATP Lc Glutamate Lc tRNA

Glu

Lc

Modified nucleosides that are not present in all modivariants are in bold; accessibility of modivariants to structural probes (− means same accessibility as in reference modivariant n˚2); c Loss of catalytic efficiencies (L) expressed as the ratio (kcat /Km)modified tRNA /(kcat /Km)unmodified transcript. Differential accessibilities to structural probes in anticodon loop and remarkable kinetic features are emphasized in bold. Adapted from reference 37. a

b

478

DNA and RNA Modii cation Enzymes

(n˚2), which is also the one most eiciently aminoacylated, contains ive modiied residues (Ψ13, mnm5s2U34, m2A37, T54 and Ψ55) while the four others difer from it by the lack of modiication at positions 13, 37 or 65, a partial modiication at position 34, or the presence of Ψ65. his additional modiication in the T-stem of modivariant n˚1 decreases its stability and its speciicity for GluRS (see below). Probing these molecules revealed that modivariant n˚2 is the less reactive one in the anticodon loop. Moreover, removal of the mnm5s2U34 and m2A37 modiications as in modivariant n˚3 creates additional perturbations in the D- and T-arms and variable region (Table 1). hese structural perturbations inluence the aminoacylation properties of the tRNA (see below). Summarizing, analysis of available data indicates that nucleoside modiications play a role in tRNA folding and restrict its conformational plasticity. On the other hand and as far as tRNA aminoacylation is concerned, structural biology of tRNA:aaRS complexes and functional studies2,38 indicate that conformational changes in tRNA occur during the aminoacylation process. his duality needs to be clariied. A few ideas are suggested. It is plausible that functional necessity dictates speciic local conformations, e.g., higher local stifness for codon:anticodon interaction39 and higher overall plasticity for tRNA:aaRS interaction.17 In a broader perspective, the high content of modiied residues in tRNA not only would enlarge the repertoire of chemical groups for speciic recognition of its many macromolecular partners,17 but also would prevent alternative folding (L-shaped versus rod-like structures) without perturbing local lexibility. In this context it is worth mentioning that RNAs domains requiring folding plasticity and present in mRNAs where they participate in regulation of gene expression, in particular riboswitches40 and tRNA-mimics,41,42 are structures where no modiications have been found to date.

Idiosyncratic Involvement of Modiied Nucleosides in tRNA Identity Basic and Reined Deinition of Identity Rules

Identity rules16,38,43,44 account for the speciicity of tRNA aminoacylation in ribosome-dependent protein synthesis and are presently grossly understood. hese rules are referred to as the second genetic code.45 his code relies on a few number of determinants that can interact with aaRSs, but also on less known antideterminants that prevent false tRNA interactions with noncognate aaRSs. Determinants and antideterminants are deined as nucleotides and more precisely as chemical groups on these nucleotides (such chemical groups that act as determinants or antideterminants, however, are not well known); they are located mainly at the two distal ends of the tRNA17,38 and in most cases contact identity amino acids on synthetases. hree main technical approaches were used to ind identity determinants, namely computer search of speciic sequence diferences between tRNA molecules,46,47 in vitro aminoacylation studies of appropriate tRNA variants, either derived from modiied tRNA prepared by molecular microsurgery procedure using chemical and/or enzymatic methods48-50 or unmodiied transcripts prepared in vitro from artiicial genes24,51 and in vivo studies of suppressor tRNAs with a reporter system based on the reading by engineered suppressors of an amber mutation at position 10 of a dihydrofolate reductase gene.43,52 he strength of an identity determinant is given by the functional efect produced by its mutation (the kcat/Km ratio of aminoacylation for in vitro methods or the strength of suppression for in vivo methods). Strongest determinants are located mainly at both extremities of the L-shaped tertiary structure of tRNA and are essentially conserved in evolution. Completion of an identity set is veriied by transplantation experiments based on the introduction of the putative identity set into the background of a tRNA with another identity. his rather simple theoretical framework allowed to unambiguously characterize most of the strongest determinants and to have a gross understanding on the role of architecture, but did not allow to unravel idiosyncrasies and mechanistic subtleties underlying expression of identities.16,17,20 he reasons are multiple and are both methodological and conceptual: the in vivo method does not give access to anticodon identity elements nor to mechanistic aspects of identity expression; in contrast the in vitro approach can in principle screen the entire sequence space of a tRNA. However because of technical diiculties compared to the easier preparation of active transcripts,

Transfer RNA Aminoacylation and Modiied Nucleosides

479

production of mutants of native and modiied tRNA by enzymatic and chemical methods12,53-55 was only sparsely used. As a consequence, studying the inluence of tRNA posttranscriptional modiications on aminoacylation was neglected, if not abandoned.

Global Investigations

One straightforward way to characterize residues important for tRNA aminoacylation is to replace them by analogs and to ind whether the replacement afects the function of the tRNA. In a few cases, such replacement can be achieved by in vivo incorporation of the analog in the neosynthesized tRNA. he U derivative 5-luorouracil can substitute for U in yeast and E. coli growing cells and is eiciently incorporated in all cellular RNAs. In tRNA it substitutes not only for U but also for T-, D-, Ψ- and thiolated U-residues, while keeping tRNA conformation56,57 and aminoacylation capacities58,59 essentially unafected. For E. coli puriied 5-luorouracil-substituted tRNAfMet, more detailed aminoacylation studies yielded a similar kcat/Km-value similar to that for the wild-type control tRNA.60 hese precursory studies indicate that T-, D, Ψ and s2U-residues are not mandatory for tRNA aminoacylation identity. Chemical reagents that selectively alter modiied nucleosides present a particular interest for structure-function studies of tRNA (reviewed in refs. 61-63). his is the case e.g., of reagents speciic of thiopyrimidine nucleosides,64,65 of sodium borohydride speciic of D residues66 and of aqueous iodine that modiies N6-(Δ2-isopentenyl)-A or i6A (ref. 67). he overall output of these early studies is an absence of drastic efect on tRNA aminoacylation ater chemical alteration of modiied residues except for the thiol modiications in bacterial tRNAGlu (see below), although treatment of anticodon loop modiied nucleosides oten interferes with tRNA binding on mRNA, as e.g., for i6A modiication.67 his conclusion does not exclude subtle efects of modiications, either direct (contacts with aaRS) or indirect (structural efects), that tune expression of aminoacylation identities in cellular environments. Other studies aimed at evaluating the role of the ensemble of modiied nucleosides in a given tRNA in the aminoacylation process, by comparing the aminoacylation eiciency of native tRNAs with those of their unmodiied transcripts (Table 2). Data can be ranked in three categories: (i) when the transcript is as active as the fully modiied native tRNA molecule (e.g., yeast tRNAAsp, tRNALeu, tRNATyr or E. coli tRNAAla),11,68-70 (ii) when the transcript is moderately less active than the native tRNA (e.g., yeast tRNAPhe and E. coli tRNACys)24,71 and (iii) when the transcript is strongly less eicient than the native tRNA with L = 10- to >1000-fold (e.g., E. coli tRNAGlu, tRNALys, tRNA1Ile, tRNAPhe or yeast tRNAIle).7,72-75 Note that strongest efects mainly result from a kcat decrease, although in some cases the decreased eiciency can be Km-dependent.71,73

Identity Determinants

Ten modiied nucleosides were explicitly characterized as “identity” determinants. hey play an active role, either direct or indirect, in tRNA aminoacylation by E. coli IleRS, GluRS, GlnRS, LysRS, TyrRS, by yeast IleRS, TyrRS, PheRS and by archaeal SepRS (an atypical aaRS that charges phosphoserine on tRNACys, see below) (Table 3). hese determinants are all located in anticodon loops and more precisely within anticodons at positions 34 to 36 and at neighboring position 37. his limited panel does not mean that only a few aaRSs use posttranscriptional modiications of tRNA as recognition/identity signals but more likely relects the fact that only few investigations covered this research ield. In what follows, data are displayed according to the class ranking of aaRSs. Isoleucine system (with class Ia IleRS): Minor E. coli tRNA2Ile speciic for the codon AUA has a lysidine (or k2C) in the irst position of its anticodon. Enzymatic replacement of this k2C34 modiied C residue with unmodiied C34 results in a marked reduction of the isoleucine-accepting capacity of the mutant tRNA, with initial aminoacylation velocities less than 1/10 of that for wild-type tRNA2Ile and much decreased isoleucylation plateau.76 his indicates that k2C34 plays a pronounced role in isoleucine identity. However, this role could be indirect, since major tRNA1Ile which is also eiciently isoleucylated, has a G at position 34 that is structurally totally diferent from k2C, what could mean that E. coli IleRS does not recognize the nucleoside in position 34. Surprisingly, the tRNA2Ile modivariant with an unmodiied C34, while having its isoleucylation

480

DNA and RNA Modii cation Enzymes

Table 2. A few typical examples on the effect of the ensemble of modified nucleosides in a tRNA on its aminoacylation capacity Main Effect On L (X-Fold)

kcat

Km

Refs.

tRNA

Modified Nucleoside Content

Ec tRNAPhe

s4U8, D16,20, Ψ32,39,55, ms2i6A37, m7G46, acp3U47, T54

1000

++a)

+

72

Sc tRNAIle

m1G9, m2G10, D16,17,20,20a,47, I34, Ψ36,55, t6A37, m5C48, T54, m1A58

410

++



75

Ec tRNA1Ile

D17,20,20a, t6A37, m7G46, acp3U47, T54, Ψ55,65

400

++



74

Ec tRNALys

D16,17,20, mnm5s2U34, t6A37, Ψ39,55, m7G46, acp3U47, T54

140

++

+

7

Ec tRNAGlu

Ψ13,55, mnm5s2U34, m2A 37, T54

100

+

++

73

Sc tRNAPhe

m2G10, D16,17, m22G26, Cm32, Gm34, yW37, Ψ39,55, m5C40,49, m7G46, T54, m1A58

14

++a)

+

24

Ec tRNACys

c) 4

5.5



+

71

Sc tRNALeu

m2G10, ac4C12, Gm18, D20,20a, m22G26, m1G37, m5C48, Ψ39,55, T54

3.0





69

Ec tRNA Ala

D16, ac5U34, m7G46, T54, Ψ55

1.3





68

Sc tRNA Asp

Ψ13,32,55, D16,20, m1G37, m5C50, T54

1.1





11

Sc tRNATyr

m2G10, D16,17,20,20a,20b,47, m22G26, Ψ35,39,55, m5C48, T54, m1A58

0.6

+

−b)

70

s U8, D20,20a, ms2i6A37, T54, Ψ55

Sc: Saccharomyces cerevisiae; Ec: Escherichia coli. Those residues that were shown to act individually are in bold or in italic for yW37 in yeast tRNAPhe (see text and Table 3). L is the loss of catalytic efficiency (see Table 1 for definition). (+, ++) Moderate or strong effect on kcat (decrease) or Km (increase), (−) no effect on kinetic parameters. aDependent on Mg 2+ concentration; bKm is two-fold decreased; cconserved modified residues within the isoacceptor tRNALeu species. For sequence see tRNA database.4

capacity decreased, acquired signiicant methionine-accepting capacity76 (see below). Moreover, the drastic reduction of the kcat-dependent isoleucylation capacity of E. coli tRNA1Ile (with GAU anticodon) ater replacement of t6A37 by A37 shows that t6A37 is an isoleucine identity determinant and probably involved in a direct interaction with E. coli IleRS.74 Yeast possesses two isoleucine isoacceptor tRNAs, with an IAU anticodon for major tRNAIle and a ΨAΨ anticodon for minor tRNAIle. he role of I34 in tRNAIle(IAU) was directly evaluated by in vitro aminoacylation of a set of unmodiied transcripts with NAU anticodon (N being A, G, U or C) and a hybrid transcript with I34 but deprived of all other modiied nucleosides.75 he hybrid transcript has a kcat/Km catalytic eiciency which is reduced 26-fold when compared to the fully modiied molecule. his loss in eiciency is further increased about 3-, 16- and >120-fold for the transcripts with G34, A34, U34 and C34, respectively. Altogether, this demonstrates that I34 is a strong isoleucine determinant in yeast and suggests that the functional groups in I34 common with G (N1 and O6) are the actual determinants that would contact IleRS.75 he role of the Ψ-residues

481

Transfer RNA Aminoacylation and Modiied Nucleosides

Table 3. Modified nucleosides explicitly involved in tRNA identity for aminoacylation

Nucleosidea

In tRNA x

Determinant for aaRSx

Antideterminant Against aaRSy

k C34 (L)

tRNA

IleRS (E. coli)

MetRS (E. coli)

76

s2U34

tRNA1Gln (E. coli)

GlnRS (E. coli)

-

83

s2U34

tRNAGlu (E. coli)

GluRS (E. coli)

-

37,73

mnm5s2U34*

tRNALys (E. coli)

LysRS (E. coli)

-

7,90

Q34

tRNA

TyrRS (E. coli)

-

88

I34

tRNAIle (yeast)

IleRS (yeast)

-

75

Ψ35

tRNATyr (yeast)

TyrRS (yeast)

-

50

Ψ36b

tRNAIle (yeast)

IleRS (yeast)

-

75

2

Ile 2

Tyr

(E. coli)

(E. coli)

Refs.

yW37

tRNA

(yeast)

PheRS (yeast)

-

53

t6A37

tRNAIle (E. coli)

IleRS (E. coli)

-

74

m1G37

tRNA Asp (yeast)

-

ArgRS (yeast)

11,12

tRNACys (Mja)

SepRS (Mja)

-

104

Phe

Nucleosides in bold have a strong contribution to identity; could act together with Ψ34 for isoleucine identity of minor yeast tRNAIle(ΨAΨ), since Ψ36 is associated with Ψ34, while in major tRNAIle(IAU), I34 acts alone; *only strong in vitro; Mja for Methanocaldococcus jannaschii. a

b

in the anticodon of minor tRNAIle(ΨAΨ) could not be tested directly but was deduced from the 40-fold decrease in aminoacylation activity of the unmodiied transcript with UAU anticodon. Since U34 is a strong antideterminant for yeast MetRS,77 it can be concluded that Ψ34 (perhaps in combination with Ψ36) is a isoleucine determinant in yeast, but of weak strength as compared to the strong I34 determinant in the major tRNAIle isoacceptor.75 Methionine system (with class Ia MetRS): E. coli initiator tRNAfMet has a CAU anticodon replaced in elongator tRNAmMet by ac4CAU. Given that wobble position 34 is important for methionylation78 could imply that ac4C34 is a methionine identity determinant. his possibility, however, is invalidated since methionylation capacity is fully maintained in a bisulite treated elongator tRNA with ac4C34 converted to C34 (ref. 79). Glutamate system (with class Ib GluRS): A direct role of modiied nucleosides in tRNA aminoacylation came from the functional study of partially modiied E. coli tRNAGlu molecules.37,73,80 Studying the glutamylation of a set of tRNAGlu variants prepared by recombinant RNA technology and difering in the content of modiied nucleosides, allowed to show that mnm5s2U34 in tRNAGlu is required for eicient aminoacylation by E. coli GluRS.73 his conclusion was reined by the determination of the kinetic parameters of the overall aminoacylation process of 5 tRNAGlu modivariants (Table 1), obtained by overexpressing the tRNAGlu gene.37 Partial hypo-modiication of some tRNAs, extreme under the above-mentioned artiicial conditions, may be observed at a lower degree under physiological conditions that induce the imbalance between the levels of tRNA modifying enzymes and the levels of their tRNA substrates (reviewed in ref. 81). hus, strongest kinetic efects were found with modivariant n˚3 deprived of modiication in the anticodon loop with a 25-fold loss of the Km for ATP and a 520-fold loss of tRNA glutamylation eiciency. he facts that this variant lacking mnm5s2U34 and m2A37 is inactive but that modivariant n˚4 also lacking these two modiications but containing instead s2U34 is eiciently aminoacylated, conclusively indicate that the s2-group in mnm5s2U34 is a strong identity determinant for E. coli GluRS.37 As observed for most major identity determinants that are mainly conserved through evolution, the s2-group

482

DNA and RNA Modii cation Enzymes

at position 34 is indeed conserved in mnm5s2U34 anticodon residue, but only in bacterial tRNAGlu species.23 his conclusion was already suggested by earlier indings that showed that treatment of E. coli bulk tRNA or of puriied tRNAGlu by the thiol speciic reactant cyanogen bromide decreases glutamate acceptance65,82,83 and perturbs the aminoacylation kinetics with a displacement of the rate-determinant step of the overall glutamylation process.80 However, ancient work characterized a E. coli mutant strain lacking mnm5s2U (ref. 84) and an amber suppressor tRNAGlu(CUA) with mainly glutamate identity,52 suggesting that the U34 modiication is not essential in vivo. A recent genome screening combining reversed genetics and mass spectrometry that detected E. coli strains in which mnm5s2U is absent, supports this conclusion.85 How cells overcome the hypo-modiication of tRNAGlu and its concomitant poor aminoacylation, remains to be elucidated. Glutamine system (with class Ib GlnRS): Treatment of E. coli tRNA1Gln (with s2U34) with the thiol reagent BrCN produces an important decrease in glutamine acceptance, while tRNA2Gln (with C34) and other tRNAs not possessing a reactive thiol group in their anticodon were not afected by the BrCN treatment.83 his suggests a determinant role of s2U34 in glutamine identity. Tyrosine system (with class Ic TyrRS): his is the only system that was thoroughly investigated by the RNA microsurgery approach that can replace in yeast tRNATyr anticodon loop residues 33-35 with any desired residues.53 he approach is based on the in vitro annealing of appropriate engineered half-molecules and sealing the reconstituted tRNA with RNA ligase. hus the putative role of Ψ35 in tyrosylation could be evaluated. Tyrosylation of Ψ35 variants showed diferent sensitivities: substitution by U decreased only 2-fold the apparent kcat/Km of tyrosylation, while U substitution by C or A decreased it 9-15-fold. his clearly establishes a determinant function of position Ψ35 for tyrosylation, but with a weak contribution of the U to Ψ change.50 Interestingly, replacement of Ψ35 by several modiied residues (f 5U, D, m3U or 3-deazaU) afects diferently tyrosylation and suggests a functional role of the N1 and N3 hydrogens of Ψ35 in tyrosylation.50 his observation is partly consistent with the crystal structure of the yeast tRNATyr:TyrRS complex showing that only the N3 atom of Ψ35 forms a speciic hydrogen bond with TyrRS (at Cys255).86 he distribution of modiied nucleosides in E. coli tRNATyr (s4U8, Gm18, Q34, ms2i6A37, Ψ39,54 and T55) difers signiicantly from that found in yeast tRNATyr, in particular in the anticodon loop with Q34 absent in yeast and Ψ35 only present in yeast. his diference likely contributes to the species-speciic tRNA recognition by TyrRSs,87 in particular between yeast and E. coli where Ψ35 is a determinant in yeast (see above) and Q34 probably in E. coli as suggested by the poor tyrosylation activity of a Q34->C mutant.88 Finally, notice that removal of the D16D17Gm18G19 tetranucleotide from the D-loop of tRNATyr from the yeast Torulopsis utilis decreases the aminoacylation level of the reconstituted tRNATyr by about 55% (ref. 89), suggesting an indirect role of the modiied tetranucleotide in the tyrosylation process. Most likely the deletion perturbs the overall conformation of the tRNA and thus its interaction with the TyrRS. his agrees with the crystallographic structures of tRNA:TyrRS complexes that clearly show that the D-loop of tyrosine tRNAs does not contact the TyrRSs (reviewed in ref. 86). Lysine system (with class IIb and class Ib LysRSs): Native E. coli tRNALys is heavily modiied and contains 10 modiied nucleosides (Table 2) and is 140-fold more eiciently aminoacylated than its unmodiied transcript, indicating that the modiications play an important role in lysylation.7 Taking into account this result, together with an extensive mutational study of the tRNALys transcript,7 the inding that the conversion of the mnm5s2U34U35U36 anticodon of E. coli tRNALys to UCA results in an inactive opal tRNALys suppressor90 and an earlier inding ater chemical modiication of native tRNALys (ref. 65), would suggest that the mnm5s2 modiication at U34 plays a role in lysine identity.7 On the other hand, however, in vivo studies and recent structure-function investigations led to opposite conclusions. hus, the in vivo aminoacylation level of E. coli hypo-modiied tRNALys lacking the mnm5-group at U34 is not afected.91,92 Further, mass spectroscopic analysis of overproduced and highly active E. coli tRNALys revealed the absence of mnm5s2U34 and t6A37 and indicated that the recognition of tRNALys by class IIb LysRS would not depend on the modiications in the anticodon loop.93 Likewise archaeal tRNALys containing mnm5Se2U34 (with the thio-group replaced by a seleno-group at position 2 in U34) retains strong aminoacylation capacity with class Ib

Transfer RNA Aminoacylation and Modiied Nucleosides

483

LysRS from Methanococcus maripaludis ater cyanogen bromide treatment of the tRNA that removes the mnm5-group from modiied U34 (ref. 94). In addition, it appears that the mnm5- and s2-groups of U34 as well as the other modiications in anticodon loops of class IIb aaRSs (namely in AspRS, AsnRS and LysRS) do not contact the synthetases (reviewed in ref. 95). How to reconcile these seemingly contradictory conclusions? Closer examination of the experimental data, in fact reveals faint functional diferences when comparing the aminoacylation activities of tRNALys species produced in wild-type and mutant strains. hus, the lysylation level of a tRNA lacking the s2-group in U34 is reduced,92 although less than in the case of tRNAGlu deprived of this modiication (see above). More precisely a 5-fold loss in catalytic eiciency (L), due to an increased Km, was found for a E. coli overproduced U34UUA37 tRNALys variant.93 Likewise, tRNA lysylation by class Ib LysRS from M. maripaludis senses the removal of the mnm5-group from U34 in E. coli or M. maripaludis tRNALys by faint Km- and kcat-efects, respectively.94 Altogether, this suggests that modiication of U34 plays a role in lysine identity of E. coli and archaeal tRNALys that is solely due to the s2 or Se2-groups. his role is moderate, but is signiicantly increased in completely unmodiied tRNA. his suggests further that the concerted action of the ensemble of tRNALys modiication is essential for the structural tuning of productive tRNALys:LysRS complexes. Noteworthy is the inding of 5-taurinomethyl-2-thio-U (τm5s2U) at the irst anticodon position 34 in human and bovine and likely in all vertebrate and protochordate mitochondrial tRNALys species.96 Like in bacterial tRNALys where modiied mnm5s2U34 participates in lysine identity, it is tempting to suggest that resembling τm5s2U34 could play a similar role in mitochondria, in particular via its s2-group. However comparative lysylation on wild-type mitochondrial tRNALys and a MEERF variant where U34 is unmodiied did not reveal strong efect on aminoacylation in contrast to codon reading that is strongly perturbed.97 Whether the moderate kinetic efect on lysylation has functional signiicance remains to be further investigated. Phenylalanine system (with class IIc PheRS): Extensive studies on yeast tRNAPhe have not revealed strong modiication-dependent efects on aminoacylation ater chemical alteration or replacement of its modiied nucleosides (early work reviewed in ref. 61). Nevertheless substitution of hypermodiied yW37 (also known as Y-residue) by G decreases the level of aminoacylation at high ionic strength and has a discernable efect on the Km and Vmax of aminoacylation at low ionic strength.53 Since yW37 most likely does not contact PheRS, as implied by chemical probing98 and the crystal structure of the hermus thermophilus tRNAPhe:PheRS complex,99 suggests that the bulky yW residue strengthens the determinant role of neighboring GAA anticodon by indirect structural efects, that facilitate the competent positioning of tRNAPhe on PheRS and thus the optimal phenylalanylation eiciency. Two other results on the aminoacylation capacity of yeast tRNAPhe deserve a mention in the present context. First, the observation that conversion with a puriied rabbit m2G methylase of G10 to m2G10 in E. coli tRNAPhe markedly increases the Vmax for phenylalanylation by E. coli PheRS.100 he second observation concerns the phenylalanylation activity of a modiied three-quarter tRNAPhe molecule deprived of its 5ʹ-extremity and thus with a single-stranded amino acid accepting stem. his molecule obtained by tRNA cleavage with Pb2+ in the D-loop at D18 residue101 is inactive but acquires signiicant phenylalanylation capacity ater removal of m7G46 in the variable region.102 his does not mean that m7G46 is an identity determinant but is explained by a relaxation of the three-quarter fragment minus the m7G-base facilitating its adaptation on yeast PheRS. Cysteine system (with class IIc SepRS): Phosphoserine (Sep) is an amino acid that is excluded from the present genetic code, but that can be charged to tRNACys in methanogenic Archaea by SepRS. hese organisms lack CysRS and use an indirect pathway to synthesize cysteinyl-tRNA where a SepCys synthase converts the tRNA-bound phosphoserine to cysteine.103 Steady-state and single-turnover kinetics combined with a mutational analysis of tRNACys aminoacylation by Methanocaldococcus jannaschii SepRS, conclusively showed that m1G37 in M. jannaschii tRNACys is a cysteine identity determinant.104 Interestingly, the modiied residue has little efect on the binding of tRNACys on SepRS, but enhances the discrimination against tRNA

484

DNA and RNA Modii cation Enzymes

mutations at conserved cysteine identity nucleotides and afects kinetically the overall process of cysteinyl-tRNACys formation. Note that the archaeal SepRS difers from E. coli CysRS that does not require m1G37 for tRNA aminoacylation.104

Identity Antideterminants

So far only 2 modiied residues were explicitly characterized as antideterminants11,12,76 for the rejection of noncognate tRNAs by aaRSs (Table 3). hey are k2C34 (also known as lysidine) (refs. 76,105) and m1G37 (refs. 11,12) and were discovered in E. coli tRNAIle and yeast tRNAAsp, respectively. hese negative signals prevent the two tRNAs to be mischarged respectively by E. coli MetRS and yeast ArgRS. As for determinants, they are located in the anticodon loop. Interestingly, these two negative antideterminants can also act as positive identity determinants (see above). Isoleucine/Methionine system (with class Ia IleRS and MetRS): When wobble nucleoside k2C34 in the minor E. coli tRNA2Ile is replaced by C, the mutant tRNA has a signiicantly reduced isoleucylation activity but acquires a marked methionine-accepting activity, while native tRNA with k2C34 is not recognized by E. coli MetRS.76 his indicates that k2C34 in tRNA2Ile is both an identity determinant for isoleucylation by IleRS (see above) and an antideterminant preventing its methionylation by MetRS. Importantly, k2C34 is also essential for speciic recognition of the AUA codon,76 so that this modiied nucleoside has a triple function in the biology of its carrier tRNA in E. coli. Aspartate/Arginine system (with class IIb AspRS and class Ia ArgRS): Unexpectedly, the yeast tRNAAsp transcript was found to be an eicient arginine acceptor in the presence of yeast ArgRS, while keeping its capacity to be eiciently aspartylated by yeast AspRS.11 his property was already suggested by the weak mischarging potential of modiied tRNAAsp by ArgRS.106,107 his indicates that a nucleoside modiication is responsible for the protection of tRNAAsp against ArgRS recognition. Comparison of the sequences of yeast tRNAAsp and tRNAArg shows that only three modiied residues, Ψ13, Ψ32 and m1G37, are speciic to tRNAAsp and thus could be responsible of the antideterminant efect. Engineering of a tRNAAsp molecule with only m1G37 allowed to conclude that the single methyl group of m1G37 is suicient to prevent the mischarging of tRNAAsp by ArgRS.12 In contrast to k2C34 that is also an isoleucine identity determinant,76 m1G37 does not participate in Asp identity in yeast.108

Structure-Based Understanding of the Role of Modiied Nucleosides in Identity

he obvious question one would like to answer is how modiied nucleosides exert an identity role, either as determinants or as antideterminants. A comprehensive answer is presently not possible because of lack of crystallographic knowledge. Indeed, among the 59 structures of tRNA:aaRS complexes hosted (in July 2008) by the RCSB Protein Data Bank,109 only that of the yeast tRNATyr:TyrRS complex86 is directly related with the present problem. Twelve other structures are indirectly related, namely the E. coli tRNAGln:GlnRS complex,110 where tRNAGln is the isoacceptor with C34 instead of s2U34 and the 11 structures from organisms closely related with those listed in Table 3. hese structures are from T. thermophilus (the Glu111, Phe99 and Tyr112 complexes), from Staphylococcus aureus (3 Ile heterologous complexes with E. coli tRNAIle transcripts113) and from Archaeoglobus fulgidus (4 Sep complexes114). Notice that the E. coli Gln110, the T. thermophilus Phe99 and Tyr112 and the yeast Tyr86 complexes were solved with modiied tRNA. Analysis of the structures reveals a direct contact of Ψ35 with yeast TyrRS, with N3 of Ψ hydrogen-bonding with a conserved Cys-residue of the TyrRS;86 but this does not explain the exact role of Ψ, since the same contact would be possible with U35. Contacts or proximities are suggested in 6 other systems. In the E. coli Gln complex,110 exocyclic N4, N3 and exocyclic O2 of C34 make hydrogen-bonds with GlnRS and in the T. thermophilus Glu complex, anticodon nucleoside C34 or U34, similarly hydrogen-bonds with GluRS.111 We anticipate that the same interaction scheme involving N3 and the s2-group of s2U34 could exist in both tRNAGln and tRNAGlu when interacting with GlnRS or GluRS. Proximity with E. coli TyrRS of Q34, the hypermodiied G derivative in cognate tRNATyr, is suggested by the T. thermophilus Tyr complex,

Transfer RNA Aminoacylation and Modiied Nucleosides

485

showing that G34 is base-speciically recognized.112 Finally, proximities with aaRSs of identity modiied residues at position 37 are suggested by the T. thermophilus Phe,99 S. aureus Ile113 and A. fulgidus Sep114 complexes. In this last complex, N1 and N2 of G37 hydrogen-bonds with conserved Gly- and Asp-residues in SepRS. his could mean that N1 methylation of G37 would remove one interaction, thereby favoring a better functional adaptation of tRNACys to SepRS. An intriguing case is brought by conserved anticodon identity position G34 in tRNAAsp (ref. 16). his position oten occupied by the G analog Q in bacterial tRNAAsp or its hypermodiied derivatives in higher eukarya, is unmodiied in yeast.4 Crystallography shows contacts between the O6 and N7 atoms of G34 or Q34 in all tRNAAsp:AspRS complexes solved so far (reviewed in ref. 115), but no contact between the bulky extension of Q34 in the E. coli complex.116 hese data and the activity of an E. coli tRNAAsp variant just lacking Q34 exclude a determinant role of the Q-modiication in Asp identity, but on the other hand suggest an antideterminant role of the extension preventing mischarging of tRNAAsp by noncognate E. coli aaRSs.117 Further work is needed to test this suggestion. As to the m1G37 antideterminant in yeast tRNAAsp that mediates its rejection by yeast ArgRS12, the yeast Arg complex118 provides a robust basis for a structural understanding. his structure shows an intricate network of contact between ArgRS and anticodon loop residues of tRNAArg , with A37 in close interaction with the synthetase. his interaction would be hampered by a bulkier residue at position 37 as is the case with m1G37, the antideterminant that prevents eicient arginylation of tRNAAsp. Altogether the data on archaeal Sep, bacterial Ile, Gln, Glu, Tyr and eukaryal Arg, Phe, Tyr systems, are in agreement with the general trend that identity determinants make speciic interactions with aaRSs and that antideterminants act by steric clash. However, these data do not inform about the mechanisms by which the modiications per se express their role during tRNA aminoacylation, e.g., how contacts or proximities of the modiied nucleosides with aaRS anticodon binding domains trigger identity expression. For a deeper understanding more structural and functional work is needed.

Considerations on Evolution Background

Assuming that present life on earth inds its origin in the RNA world implies rather sophisticated RNA chemistry for self-replication and metabolic purposes in LUCA, our Last Universal Common Ancestor and likely the occurrence of modiied nucleosides in the prebiotic and RNA worlds. In agreement with this hypothesis, analysis of the distribution of modern modiied nucleosides suggests that 15 such residues would have been present in tRNA when the eukaryal— archaeal branch diverged from the bacterial lineage.119 hese modiied nucleosides are m1A, m62A, I, t6A, ms2t6A, Um, m5U, D, Ψ, Gm, m1G, m7G, Q, Cm and ac4C. Notice that some of them contribute to the chemical stability of tRNA (e.g., Um, Gm, Cm), some are aminoacylated with threonine (e.g., t6A, ms2t6A) and others are involved in aminoacylation identity (e.g., m1G) (see above). On the other hand, tRNAs and aaRSs are ancient molecules present in all domains of life and thus have an origin in the prebiotic world, likely as simpliied progenitors.120 hus it can be conjectured that modern genomes contain cryptic imprints of this ancient world (for details see ref. 121); in agreement with this view, genes encoding small proteins homologous to one or several domains of extant aaRSs have been found and the functions or catalytic activities of some of them have been identiied (reviewed in refs. 122-124).

An RNA Modifying Enzyme Paralog of an Aminoacyl-tRNA Synthetase

One aaRS paralog is the product of the E. coli yadB gene. his protein displays 34.5% amino acid sequence identity with domains 1 and 2 and a part of domain 3 of E. coli GluRS and its 3D structure superimposes perfectly with that of GluRS.125 However, the paralog does not recognize tRNAGlu, nor does it have regions corresponding to the anticodon-binding domains 4 and 5 of GluRS.125,126 Biochemical studies, however conclusively showed that the YadB protein aminoacylates a tRNA.

486

DNA and RNA Modii cation Enzymes

his tRNA, surprisingly is E. coli tRNAAsp as unambiguously shown by RNA sequencing.126 Even more surprising, YadB glutamylates eiciently the anticodon Q-base of tRNAAsp, through a labile ester linkage with the cis-diol group of the queuine base.15,127,128 he reaction is speciic since the other E. coli tRNAs containing a Q-base at the wobble position of anticodon, namely tRNAAsn, tRNATyr and tRNAHis, are not glutamylated.15 his paralog of GluRS, initially referred to as YadB, is now named glutamyl-queuosine tRNAAsp synthetase (Glu-Q-RS). Its speciic recognition of tRNAAsp may be due to structural mimicry of the anticodon of tRNAAsp arm with the acceptor arm of tRNAGlu (ref. 127). As does GluRS, Glu-Q-RS activates glutamate by forming glutamyl-AMP (Glu-AMP), as evidenced by its inhibition by glutamol-AMP, a stable analog of Glu-AMP126 and a competitive inhibitor of GluRS with respect to glutamate and ATP.129 On the other hand, its mechanism difers from that of GluRS by the fact that it can activate glutamate in the absence of tRNA,125 whereas this activation reaction is catalyzed by GluRS only in the presence tRNAGlu which switches ATP binding to a productive mode.130 A comparison of the crystallographic structures of the GluRS:glutamate and Glu-Q-RS:glutamate complexes revealed that a restricted number of residues determine distinct catalytic properties of amino acid recognition and activation by the two enzymes.131 For more details on Q derivatives see reference 132.

Evolutionary Implications

he inding of the aminoacylation of a tRNA anticodon by a paralog of an aaRS was a surprise13,14 and renewed the interest for those modiied nucleosides that have an amino acid derivative in their structure. Today seven such nucleosides are known, namely t6A and related hn6A, k2C, GluQ, g6A, τm5s2U and acp3U, with the amino acids being respectively threonine, valine, lysine, glutamate, glycine, taurine and a 3-amino-3-carboxypropyl- derivative. All these modiied residues are located in anticodon loops. he fact that Glu-Q-RS from E. coli (and likely from other bacteria coding for a YadB protein) aminoacylates eiciently the Q-base in the anticodon of bacterial tRNAAsp shows that catalytic sites in aaRSs responsible for tRNA aminoacylation have memorized structural features from the tRNA anticodon region and remarkably this memory remains imprinted in modern tRNAAsp and tRNAGlu sequences.127 hus the 5ʹ-C38GCAGG43-3ʹ sequence in the glutamate-accepting anticodon stem-loop of E. coli tRNAAsp is found in reverse orientation 3ʹ-C74GCAGG69-5ʹ within the amino acid accepting domain of E. coli tRNAGlu, a sequence mimicry globally conserved among tRNAAsp and tRNAGlu species from bacteria with the yadB gene.127 his remarkable inding, together with the two-fold symmetry in L-shaped tRNA and the activity of tRNA minihelices are experimental arguments for an origin of tRNA by RNA duplication.120

Conclusions and Perspectives

Multifunctional tRNA17 contains sequence elements specifying its structure and diferent functions, some of these elements having multiple roles. his is the case of modiied residues that participate both in tRNA architecture and functions. Due to their essential role in mRNA decoding,39 investigations of possible other roles of modiied residues was somehow neglected. his concerns certainly the role of modiied nucleosides in tRNA aminoacylation and the related issue of tRNA identity. Because tRNA aminoacylation can occur on unmodiied transcripts, it was misleadingly generalized that modiied nucleosides are not required, despite a few early discovered exceptions.7,80,83,89 his view is an oversimpliication and despite a lack of systematic studies, modiied residues have been explicitly characterized as actors in tRNA aminoacylation. Residues that are individual actors are all located in anticodon loops, the others acting collectively being found in the core of the tRNA architecture (Fig. 1). Most of the 10 residues acting individually in anticodon loops are positive determinants; two are antideterminants with lysidine 34 (k2C) having the dual function of determinant and antideterminant. his panel is markedly enlarged if one adds the 18 residues found in tRNAs with decreased aminoacylation eiciency when unmodiied (Table 2). hese additional

Transfer RNA Aminoacylation and Modiied Nucleosides

487

Figure 1. Three-dimensional structure of tRNA with modified nucleosides participating in tRNA identity. The L-shaped backbone of yeast tRNAPhe (PDB entry:1evv) is displayed on a bipartite background, grey (top) and white (bottom), delineating respectively two tRNA domains where participation of modified nucleosides in identity is either collective or individual. Positions of modified nucleosides that act individually are numbered in bold or in italics when participation in identity is either individual or collective (see Table 3, for nature of nucleosides; asterisks at positions 34 and 37 indicate a dual participation). (Top) Aminoacylation systems are ranked according to the effects produced on identity by the collective lack of modified nucleosides, either by increase of Km (left box) or decrease of kcat (right box) (the total number of modified residues involved is given; the digit between brackets indicates the number of residues that act individually). Notice that T54 and Ψ55 (positions underlined) are conserved in the 7 aminoacylation systems dependent on modified residues. (Bottom) The four boxes list modified nucleosides explicitly characterized in tRNA as identity determinants (D) or as antideterminants (AD), together with the cognate or noncognate aaRSs they contribute to recognize or to reject (AD residues k 2C34 and m1G37 were found in E. coli tRNA 2Ile and yeast tRNA Asp, respectively; weak determinants are in grey; aonly the s2 group of mnm5s2U34 participates in identity). Notice that positions 34 and 37 are most populated and that participation in identity of the listed modified nucleosides can be direct or indirect (see text for details).

residues are preferentially located in the tRNA core (Fig. 1) and tune collectively the competent coniguration of individual tRNAs for optimal interaction with their cognate aaRSs. Altogether, this means that at least 33% of the modiied nucleosides found to date in bacterial and eukaryal tRNAs are involved in tRNA aminoacylation (data on archaeal tRNAs are missing). However, most mechanistic details accounting for the functionality of the modiied nucleosides are lacking. Understanding these details will be essential for a deeper insight on tRNA identity, which presently is only globally understood from the viewpoint of phenomenology.17 Finally, we anticipate existence of subtle efects of modiied nucleosides involved in kingdom, taxon and species speciicities of tRNA aminoacylation, as well as in dysfunctions of tRNA aminoacylation in disease.

488

DNA and RNA Modii cation Enzymes

Acknowledgements

he authors wish to acknowledge the support of Centre National de la Recherche Scientiique (CNRS), Université Louis Pasteur, the French Ministry for Research (ACI BCMS “Code génétique: mieux connaître ses déviations pour comprendre son évolution”), the Natural Sciences and Engineering Research Council (NSERC) of Canada and Commission permanente de cooperation franco-québeoise (projet 61-103) together with PICS program from CNRS. We thank C. Florentz for advice and comments, all our Strasbourg and Québec colleagues for their contributions over the years, H. Grosjean (Orsay), M. Helm (Heidelberg), M. Ibba (Columbus), P. Romby (Strasbourg) and T. Suzuki (Tokyo) for comments on the manuscript and C. Cambillau (Marseille) who introduced us in the structural biology of YadB via his high throughput structural genomics project.

References

1. Chapeville F, Lipmann F, Ehrenstein GV et al. On the role of soluble ribonucleic acid in coding for amino acids. Proc Acad Sci USA 1962; 48:1086-1092. 2. Ibba M, Francklyn C, Cusack S. he Aminoacyl-tRNA Synthetases. Georgetown: Landes Bioscience 2005. 3. Sheppard K, Yuan J, Hohn MJ et al. From one amino acid to another: tRNA-dependent amino acid biosynthesis. Nucleic Acids Res 2008; 36:1813-1825. 4. Jühling F, Mörl M, Hartmann R et al. tRNAdb 2009: compilation of tRNA sequences and tRNA genes. Nucleic Acids Res 2009; 37(Database issue):in press. 5. Hou Y-M, Schimmel P. A simple structural feature is a major determinant of the identity of a transfer RNA. Nature 1988; 333:140-145. 6. McClain WH, Foss K. Changing the identity of a tRNA by introducing a G-U wobble pair near the 3ʹ acceptor end. Science 1988; 240:793-796. 7. Tamura K, Himeno H, Asahara H et al. In vitro study of E. coli tRNAArg and tRNALys identity. Nucleic Acids Res 1992; 20:2335-2339. 8. Zeevi M, Daniel V. Aminoacylation and nucleoside modiication of in vitro synthesised transfer RNA. Nature 1975; 260:72-74. 9. Samuelsson T, Boren T, Johansen TI et al. Properties of a transfer RNA lacking modiied nucleosides. J Biol Chem 1988; 27:13692-13699. 10. Sampson JR, DiRenzo AB, Behlen LS et al. Nucleotides in yeast tRNA Phe required for the speciic recognition by its cognate synthetase. Science 1989; 243:1363-1366. 11. Perret V, Garcia A, Grosjean H et al. Relaxation of transfer RNA speciicity by removal of modiied nucleotides. Nature 1990; 344:787-789. 12. Pütz J, Florentz C, Benseler F et al. A single methyl group prevents the mischarging of a tRNA. Nature Struct Mol Biol 1994; 1:580-582. 13. Grosjean H, de Crécy-Lagard V, Björk GR. Amino acylation of the anticodon stem by a tRNA-synthetase paralog: Relic of an ancient code? Trends Biochem Sci 2004; 29:519-522. 14. Ibba M, Francklyn C. Turning tRNA upside down: When aminoacylation is not a prerequisite to protein synthesis. Proc Natl Acad Sci USA 2004; 101:7493-7494. 15. Blaise M, Becker HD, Lapointe J et al. Glu-Q-tRNAAsp synthetase coded by the yadB gene, a new paralog of aminoacyl-tRNA synthetase that glutamylates tRNAAsp anticodon. Biochimie 2005; 87:847-861. 16. Giegé R, Sissler M, Florentz C. Universal rules and idiosyncratic features in tRNA identity. Nucleic Acids Res 1998; 26:5017-5035. 17. Giegé R. Toward a more complete view of tRNA biology. Nat Struct Mol Biol 2008; 15:1007-1014. 18. Giegé R, Florentz C, Garcia A et al. Exploring the aminoacylation function of transfer RNA by macromolecular engineering approaches. Involvement of conformational features in the charging process of yeast tRNAAsp. Biochimie 1990; 72:453-461. 19. Perret V, Florentz C, Puglisi JD et al. Efect of conformational features on the aminoacylation of tRNAs and consequences on the permutation of tRNA speciicities. J Mol Biol 1992; 226:323-333. 20. Giegé R, Puglisi JD, Florentz C. tRNA structure and aminoacylation eiciency. Prog Nucleic Acid Res Mol Biol 1993; 45:129-206. 21. Hayrapetyan A, Seidu-Larry S, Helm M. Function of modiied nucleosides in RNA stabilization. In: Grosjean H, ed. DNA and RNA Modiication Enzymes: Structure, Mechanism, Function, and Evolution. Austin: Landes Bioscience, 2009:550-563. 22. Giegé R, Helm M, Florentz C. Classical and novel chemical tools for RNA structure probing. In: Söll D, Nishimura S, Moore P, eds. RNA. Amsterdam: Elsevier Science B.V., 2001:71-89. 23. Dunin-Horkawicz S, Czerwoniec A, Gajda MJ et al. MODOMICS: a database of RNA modiication pathways. Nucleic Acids Res 2006; 34:D145-149.

Transfer RNA Aminoacylation and Modiied Nucleosides

489

24. Sampson JR, Uhlenbeck OC. Biochemical and physical characterization of an unmodiied yeast phenylalanine transfer RNA transcribed in vitro. Proc Natl Acad Sci USA 1988; 85:1033-1037. 25. Hall KB, Sampson JR, Uhlenbeck OC et al. Structure of an unmodiied tRNA molecule. Biochemistry 1989; 28:5794-5801. 26. Perret V, Garcia A, Puglisi JD et al. Conformation in solution of yeast tRNAAsp transcripts deprived of modiied nucleotides. Biochimie 1990; 72:735-744. 27. Mirzabekov AD, Lastity D, Levina ES et al. Self-assembly of transfer RNA fragments. FEBS Lett 1970; 7:95-98. 28. Wübbeler W, Lossow C, Fittler F et al. Amino acid incorporation into tRNA fragments and into heterologous combinations of fragments. Eur J Biochem 1975; 59:405-413. 29. Helm M. Post-transcriptional nucleotide modiication and alternative folding of RNA. Nucleic Acids Res 2006; 34:721-733. 30. Horie N, Hara-Yokoyama M, Yokoyama S et al. Two tRNAIle1 species from an extreme thermophile, hermus thermophilus HB8: Efect of 2-thiolation of ribothymidine on the thermostability of tRNA. Biochemistry 1985; 24:5711-5715. 31. Shigi N, Sakaguchi Y, Suzuki T et al. Identiication of two tRNA-thiolation genes required for cell growth at extremely high temperatures. J Biol Chem 2006; 281:14296-14306. 32. Auinger P, Westhof E. RNA hydration: three nanoseconds of multiple molecular dynamics simulations of the solvated tRNAAsp anticodon hairpin. J Mol Biol 1997; 269:326-341. 33. Helm M, Brulé H, Degoul F et al. he presence of modiied nucleotides is required for cloverleaf folding of a human mitochondrial tRNA. Nucleic Acids Res 1998; 26:1636-1643. 34. Voigts-Hofmann F, Hengesbach M, Kobitski AY et al. A methyl group controls conformational equilibrium in human mitochondrial tRNALys. J Am Chem Soc 2007; 129:13382-13383. 35. Nobles KN, Yarian CS, Liu G et al. Highly conserved modiied nucleosides inluence Mg2+-dependent tRNA folding. Nucleic Acids Res 2002; 30:4571-4760. 36. Sakurai M, Ohtsuki T, Watanabe K. Modiication at position 9 with 1-methyladenosine is crucial for structure and function of nematode mitochondrial tRNAs lacking the entire T-arm. Nucleic Acids Res 2005; 33:1653-1661. 37. Madore E, Florentz C, Giegé R et al. Efect of modiied nucleotides on Escherichia coli tRNAGlu structure and on its aminoacylation by glutamyl-tRNA synthetase—Predominant and distinct roles of the mnm5 and s2 modiications of U34. Eur J Biochem 1999; 266:1128-1135. 38. Giegé R, Frugier M. Transfer RNA structure and identity. In: Lapointe J, Brakier-Gringas L, eds. Translation Mechanisms. Georgetown: Landes Biociences, 2003:1-24. 39. Agris PF, Vendeix FA, Graham WD. tRNA’s wobble decoding of the genome: 40 years of modiication. J Mol Biol 2007; 366:1-13. 40. Montange RK, Batey RT. Riboswitches: emerging themes in RNA structure and function. Annu Rev Biophys 2008; 37:117-133. 41. Romby P, Springer M. Bacterial translational control at atomic resolution. Trends Genet 2003; 19:155-161. 42. Ryckelynck M, Masquida B, Giegé R et al. An intricate RNA structure with two tRNA-derived motives directs complex formation between yeast aspartyl-tRNA synthetase and its mRNA. J Mol Biol 2005; 354:614-629. 43. McClain WH. Transfer RNA identity. FASEB J 1993; 7:72-78. 44. Beuning PJ, Musier-Forsyth K. Transfer RNA recognition by aminoacyl-tRNA synthetases. Biopolymers 1999; 52:1-28. 45. de Duve C. he second genetic code. Nature 1988; 333:117-118. 46. McClain WH, Nicholas HBJ. Differences between transfer RNA molecules. J Mol Biol 1987; 194:635-642. 47. Freyhult E, Moulton V, Ardell DH. Visualizing bacterial tRNA identity determinants and antideterminants using function logos and inverse function logos. Nucleic Acids Res 2006; 34:905-916. 48. Bruce AG, Uhlenbeck OC. Speciic interaction of anticodon loop residues with yeast phenylalanyl-tRNA synthetase. Biochemistry 1982; 21:3921-3926. 49. Bare L, Uhlenbeck OC. Aminoacylation of anticodon loop substituted yeast tyrosine transfer RNA. Biochemistry 1985; 24:2354-2360. 50. Bare LA, Uhlenbeck OC. Speciic substitution into the anticodon loop of yeast tyrosine transfer RNA. Biochemistry 1986; 25:5825-5830. 51. Lowary P, Sampson J, Milligan J et al. A better way to make RNA for physical studies. In: van Knippenberg PH, Hilbers CW, eds. Structure and dynamics of RNA. New York, London: Plenum Press, 1986:69-76. 52. Normanly J, Kleina LG, Masson JM et al. Construction of Escherichia coli amber suppressor tRNA genes. III. Determination of tRNA speciicity. J Mol Biol 1990; 213:719-726.

490

DNA and RNA Modii cation Enzymes

53. Bruce AG, Uhlenbeck OC. Enzymatic replacement of the anticodon of yeast phenylalanine transfer ribonucleic acid. Biochemistry 1982; 21:855-861. 54. Carbon P, Haumont E, De Henau S et al. Enzymatic replacement in vitro of the irst anticodon base of yeast tRNAAsp: application to the study of tRNA maturation in vivo ater microinjection into frog oocytes. Nucleic Acids Res 1982; 10:3715-3732. 55. Helm M, Giegé R, Florentz C. A Watson-Crick base-pair disrupting methyl group (m1A9) is suicient for cloverleaf folding of human mitochondrial tRNALys. Biochemistry 1999; 38:13338-13346. 56. Giegé R, Heinrich J, Weil J-H et al. Etude de l’incorporation du 5-luorouracile dans les acides ribonucléiques de transfert et ribosomique de levure. Biochem Biophys Acta 1969; 174:43-52. 57. Kaiser II. Structural properties of 5-luorouracil-containing ribonucleic acid from Escherichia coli. Biochemistry 1971; 10:1540-1545. 58. Giegé R, Heinrich J, Weil J-H et al. Etude des propriétés biologiques des acides ribonucléiques de transfert de levure ayant incorporé du 5-luorouracile. Biochim Biophys Acta 1969; 174:53-70. 59. Kaiser II. Studies on 5-luorouracil-containing ribonucleic acid. I. Separation and partial characterization of luorouracil-containing transfer ribonucleic acids from Escherichia coli. Biochemistry 1969; 8:231-238. 60. Hills DC, Cotten ML, Horowitz J. Isolation and characterization of two 5-luorouracil-substituted Escherichia coli initiator methionine transfer ribonucleic acids. Biochemistry 1983; 22:1113-1122. 61. Goddard JP. he structure and function of transfer RNA. Prog Biophys Mol Biol 1977; 32:233-308. 62. Ofengand J. Structure and function of tRNA and aminoacyl-tRNA synthetases in eukaryotes. In: Pérez-Bercof R, ed. Protein biosynthesis in eukaryotes. New York: Plenum Publishing Corporation, 1982:1-67. 63. Giegé R. he early history of tRNA recognition by aminoacyl-tRNA synthetases. J Biosci 2006; 31:477-488. 64. Saneyoshi M, Nishimura S. Selective modiication of 4-thiouridylate residue in Escherichia coli transfer RNA with cyanogen bromide. Biochim Biophys Acta 1970; 204:389-399. 65. Saneyoshi M, Nishimura S. Selective inactivation of amino acid acceptor and ribosome binding activities of Escherichia coli tRNA by modiication with cyanogen bromide. Biochim Biophys Acta 1971; 246:123-131. 66. Molinaro M, Sheiner LB, Neelon FA et al. Efect of chemical modiication of dihydrouridine in yeast transfer ribonucleic acid on amino acid acceptor activity and ribosomal binding. J Biol Chem 1968; 243:1277-1282. 67. Fittler F, Hall RH. Selective modiication of yeast seryl-t-RNA and its efect on the acceptance and binding functions. Biochem Biophys Res Commun 1966; 25:441-446. 68. Tamura K, Asahara H, Himeno H et al. Identity elements of Escherichia coli tRNAAla. J Mol Recog 1991; 4:129-132. 69. Soma A, Kumagai R, Nishikawa K et al. he anticodon loop is a major identity determinant of Saccharomyces cerevisiae tRNALeu. J Mol Biol 1996; 263:707-714. 70. Fechter P, Rudinger-hirion J, héobald-Dietrich A et al. Identity of tRNA for yeast tyrosyl-tRNA synthetase: Tyrosylation is more sensitive to identity nucleotides than to structural features. Biochemistry 2000; 39:1725-1733. 71. Komatsoulis GA, Abelson J. Recognition of tRNACys by Escherichia coli cysteinyl-tRNA synthetase. Biochemistry 1993; 32:7435-7444. 72. Tinkle Peterson E, Uhlenbeck OC. Determination of recognition nucleotides for Escherichia coli phenylalanyl-tRNA synthetase. Biochemistry 1992; 31:10380-10389. 73. Sylvers LA, Rogers KC, Shimizu M et al. A 2-thiouridine derivative in tRNAGlu is a positive determinant for aminoacylation by Escherichia coli glutamyl-tRNA synthetase. Biochemistry 1993; 32:3836-3841. 74. Nureki O, Niimi T, Muramatsu T et al. Molecular recognition of the identity-determinant set of isoleucine transfer RNA from Escherichia coli. J Mol Biol 1994; 236:710-724. 75. Senger B, Auxilien S, Englisch U et al. he modiied wobble base Inosine in yeast tRNAIle is a positive determinant for aminoacylation by isoleucyl-tRNA synthetase. Biochemistry 1997; 36:8269-8275. 76. Muramatsu T, Nishikawa K, Nemoto F et al. Codon and amino-acid speciicities of a transfer RNA are both converted by a single posttranscriptional modiication. Nature 1988; 336:179-181. 77. Senger B, Despons L, Walter P et al. he anticodon triplet is not suicient to confer methionine acceptance to a transfer RNA. Proc Natl Acad Sci USA 1992; 89:10768-10771. 78. Schulman LH, Pelka H, Susani M. Base substitutions in the wobble position of the anticodon inhibit aminoacylation of E. coli tRNAfMet by E. coli Met-tRNA synthetase. Nucleic Acids Res 1983; 11:1439-1455. 79. Stern L, Schulman LH. he role of the minor base N4-acetylcytidine in the function of the Escherichia coli noninitiator methionine transfer RNA. J Biol Chem 1978; 253:6132-6139.

Transfer RNA Aminoacylation and Modiied Nucleosides

491

80. Kern D, Lapointe J. Glutamyl-tRNA synthetase of Escherichia coli. Effect of alteration of the 5-(methylaminomethyl)-2-thiouridine in the anticodon of glutamic tRNA on the catalytic mechanism. Biochemistry 1979; 18:5819-5826. 81. Björk GR, Rasmuson T. Links between tRNA modiication and metabolism and modiied nucleosides as tumor markers. In: Grosjean H, Benne R, eds. Modiication and Editing of RNA. Washington DC: Am Soc Microbiol Press, 1998:471-492. 82. Agris PF, Söll D, Seno T. Biological function of 2-thiouridine in Escherichia coli glutamic acid transfer ribonucleic acid. Biochemistry 1973; 12:4331-4337. 83. Seno T, Agris PF, Söll D. Involvement of the anticodon region of Escherichia coli tRNAGln and tRNAGlu in the speciic interaction with cognate aminoacyl-tRNA synthetase. Alteration of the 2-thiouridine derivatives located in the anticodon of the tRNAs by BrCN or sulfur deprivation. Biochim Biophys Acta 1974; 349:328-338. 84. Marinus MG, Morris NR, Söll D et al. Isolation and partial characterization of three Escherichia coli mutants with altered transfer ribonucleic acid methylases. J Bacteriol 1975; 122:257-265. 85. Ikeuchi Y, Shigi N, Kato J et al. Mechanistic insights into sulfur relay by multiple sulfur mediators involved in thiouridine biosynthesis at tRNA wobble positions. Mol Cell 2006; 21:97-108. 86. Tsunoda M, Kusakabe Y, Tanaka N et al. Structural basis for recognition of cognate tRNA by tyrosyl-tRNA synthetase from three kingdoms. Nucleic Acids Res 2007; 35:4289-4300. 87. Bonnefond L, Giegé R, Rudinger-hirion J. Evolution of the tRNATyr/TyrRS aminoacylation systems. Biochimie 2005; 87:873-883. 88. Sherman JM, Rogers MJ, Söll D. Competition of aminoacyl-tRNA synthetases for tRNA ensures the accuracy of aminoacylation. Nucleic Acids Res 1992; 20:2847-2852. 89. Ohyama T, Nishikawa K, Takemura S. Studies on T. utilis tRNATyr variants with enzymatically altered D-loop sequences. I. Deletion of the conserved sequence Gm-G and its efect on aminoacylation and conformation. J Biochem 1985; 97:29-36. 90. McClain WH, Foss K, Jenkins RA et al. Nucleotides that determine Escherichia coli tRNAArg and tRNALys acceptor identities revealed by analyses of mutant opal and amber suppressor tRNAs. Proc Natl Acad Sci USA 1990; 87:9260-9264. 91. Hagervall TG, Pomerantz SC, McCloskey JA. Reduced misreading of asparagine codons by Escherichia coli tRNALys with hypomodiied derivatives of 5-methylaminomethyl-2-thiouridine in the wobble position. J Mol Biol 1998; 284:33-42. 92. Kruger MK, Sorensen MA. Aminoacylation of hypomodiied tRNAGlu in vivo. J Mol Biol 1998; 284:609-620. 93. Commans S, Lazard M, Delort F et al. tRNA anticodon recognition and speciication within subclass IIb aminoacyl-tRNA synthetases. J Mol Biol 1998; 278:801-813. 94. Ibba M, Losey HC, Kawarabayasi Y et al. Substrate recognition by class I lysyl-tRNA synthetases: a molecular basis for gene displacement. Proc Natl Acad Sci USA 1999; 96:418-423. 95. Brevet A, Chen J, Commans S et al. Anticodon recognition in evolution: switching tRNA speciicity of an aminoacyl-tRNA synthetase by site-directed peptide transplantation. J Biol Chem 2003; 278:30927-30935. 96. Suzuki T, Suzuki T, Wada T et al. Taurine as a constituent of mitochondrial tRNAs: new insights into the functions of taurine and human mitochondrial diseases. EMBO J 2002; 21:6581-6589. 97. Yasukawa T, Suzuki T, Ishii N et al. Wobble modiication defect in tRNA disturbs codon-anticodon interaction in a mitochondrial disease. EMBO J 2001; 20:4794-802. 98. Romby P, Moras D, Bergdoll M et al. Yeast tRNAAsp tertiary structure in solution and areas of interaction of the tRNA with aspartyl-tRNA synthetase. A comparative study of the yeast phenylalanine system by phosphate alkylation experiments with ethylnitrosourea. J Mol Biol 1985; 184:455-471. 99. Goldgur Y, Mosyak L, Reshetnikova L et al. he crystal structure of phenylalanyl-tRNA synthetase from hermus thermophilus complexed with cognate tRNAPhe. Structure 1997; 5:59-68. 100. Roe B, Michael M, Dudock B. Function of N2 methylguanine in phenylalanine transfer RNA. Nature 1973; 246:135-138. 101. Werner C, Krebs B, Keith G et al. Speciic cleavage of pure tRNAs by plombous ions. Biochim Biophys Acta 1976; 432:161-175. 102. Renaud M, Ehrlich R, Bonnet J et al. Lack of correlation between ainity of the tRNA for the aminoacyl-tRNA synthetase and aminoacylation capacity as studied with modiied tRNAPhe. Eur J Biochem 1979; 100:157-164. 103. Sauerwald A, Zhu W, Major TA et al. RNA-dependent cysteine biosynthesis in Archaea. Science 2005; 307:1969-1972. 104. Zhang CM, Liu C, Slater S et al. Aminoacylation of tRNA with phosphoserine for synthesis of cysteinyl-tRNACys. Nat Struct Mol Biol 2008; 15:507-514.

492

DNA and RNA Modii cation Enzymes

105. Muramatsu T, Yokoyama S, Horie N et al. A novel lysine-substituted nucleoside in the irst position of the anticodon of minor isoleucine tRNA from Escherichia coli. J Biol Chem 1988; 263:9261-9267. 106. Ebel J-P, Giegé R, Bonnet J et al. Factors determining the speciicity of the tRNA aminoacylation reaction. Biochimie 1973; 55:547-557. 107. Ganglof J, Ebel J-P, Dirheimer G. Isolation of a complex between yeast arginyl-tRNA synthetase and yeast tRNAAsp and mischarging of tRNAAsp with arginine. Intern Res Communication System 1973; 12:8. 108. Pütz J, Puglisi JD, Florentz C et al. Identity elements for speciic aminoacylation of yeast tRNAAsp by cognate aspartyl-tRNA synthetase. Science 1991; 252:1696-1699. 109. Giegé R, Touzé E, Lorber B et al. Crystallogenesis trends of free and liganded aminoacyl-tRNA synthetases. Crystal Growth and Design 2008; in press. 110. Rould MA, Perona JJ, Steitz TA. Structural basis of anticodon loop recognition by glutaminyl-tRNA synthetase. Nature 1991; 352:213-218. 111. Sekine S, Nureki O, Shimada A et al. Structural basis for anticodon recognition by discriminating glutamyl-tRNA synthetase. Nat Struct Mol Biol 2001; 8:203-206. 112. Yaremchuk A, Kriklivyi I, Tukalo M et al. Class I tyrosyl-tRNA synthetase has a class II mode of cognate tRNA recognition. EMBO J 2002; 21:3829-3840. 113. Silvian LF, Wang J, Steitz TA. Insights into editing from an ile-tRNA synthetase structure with tRNAIle and mupirocin. Science 1999; 285:1074-1077. 114. Fukunaga R, Yokoyama S. Structural insights into the irst step of RNA-dependent cysteine biosynthesis in archaea. Nat Struct Mol Biol 2007; 14:272-279. 115. Giegé R, Rees B. Aspartyl-tRNA synthetases. In: Ibba M, Francklyn C, Cusack S, eds. he Aminoacyl-tRNA Synthetases. Georgetown, TX: Landes Bioscience, 2005:210-226. 116. Eiler S, Dock-Bregeon AC, Moulinier L et al. Synthesis of aspartyl-tRNAAsp in Escherichia coli—a snapshot of the second step. EMBO J 1999; 18:6532-6541. 117. Martin F, Eriani G, Eiler S et al. Overproduction and puriication of native and queuine-lacking Escherichia coli tRNAAsp. Role of the wobble base in tRNAAsp acylation. J Mol Biol 1993; 234:965-974. 118. Delagoutte B, Moras D, Cavarelli J. tRNA aminoacylation by arginyl-tRNA synthetase: induced conformations during substrates binding. EMBO J 2000; 19:5599-5610. 119. Cermakian N, Cedergreen R. Modiied nucleotides always were: an evolutionary model. In: Grosjean H, Benne R, eds. Modiication and Editing of RNA. Washington DC: Am Soc Microbiol Press, 1998: 535-541. 120. Schimmel P, Giegé R, Moras D et al. An operational RNA code for amino acids and possible relationship to genetic code. Proc Natl Acad Sci USA 1993; 90:8763-8768. 121. Forterre P, Grosjean H. he emergence of DNA, an hypermodiied RNA molecule from the RNA world. In: Grosjean H, ed. DNA and RNA modiication Enzymes: Comparative Structure, Mechanism, Function, Cellular Interactions and Evolution. Georgetown, TX: Landes Bioscience, 2009:this volume. 122. Schimmel P, Ribas de Pouplana L. Footprints of aminoacyl-tRNA synthetases are everywhere. Trends Biochem Sci 2000; 25:207-209. 123. Ibba M, Söll D. Aminoacyl-tRNAs: setting the limits of the genetic code. Genes and Development 2004; 18:731-738. 124. Francklyn C. tRNA synthetase-like proteins. In: Ibba M, Francklyn C, Cusack S, eds. he Aminoacyl-tRNA Synthetases. Georgetown, TX: Landes Bioscience, 2005:285-297. 125. Campanacci V, Dubois DY, Becker HD et al. he Escherichia coli YadB gene product reveals a novel aminoacyl-tRNA synthetase like activity. J Mol Biol 2004; 337:273-283. 126. Dubois DY, Blaise M, Becker HD et al. An aminoacyl-tRNA synthetase-like protein encoded by the Escherichia coli yadB gene glutamylates speciically tRNAAsp. Proc Natl Acad Sci USA 2004; 101:7530-7535. 127. Blaise M, Becker HD, Keith G et al. A minimalist glutamyl-tRNA synthetase dedicated to aminoacylation of the tRNAAsp QUC anticodon. Nucleic Acid Res 2004; 32:2768-2775. 128. Salazar JC, Ambrogelly A, Crain PF et al. A truncated aminoacyl-tRNA synthetase modiies RNA. Proc Natl Acad Sci USA 2004; 101:7536-7541. 129. Desjardins M, Garneau S, Desgagnés J et al. Glutamyl adenylate analogues are inhibitors of glutamyl-tRNA synthetase. Bioorg Chem 1998; 26:1-13. 130. Sekine S, Nureki O, Dubois DY et al. ATP binding by glutamyl-tRNA synthetase is switched to the productive mode by tRNA binding. EMBO J 2003; 22:676-688. 131. Blaise M, Olieric V, Sauter C et al. Crystal structure of glutamyl-queuosine tRNAAsp synthetase complexed with L-glutamate: structural elements mediating tRNA-independent activation of glutamate and glutamylation of tRNAAsp anticodon. J Mol Biol 2008; 381:1224-1237. 132. Iwata-Reuyl et al. Enzymatic formation of the 7-deazaguanosine hypermodiied nucleosides of tRNA. In: Grosjean H, ed. DNA and RNA Modiication Enzymes: Structure, Mechanism, Function, and Evolution. Austin: Landes Bioscience, 2009:377-391.

Chapter 34

Crystallographic Studies of Decoding by Modiied Bases: Correlation of Structure and Function Albert Weixlbaumer* and Frank V. Murphy IV

Abstract

T

he ribosome is a large macromolecular machine that carries out template-directed protein synthesis by translating the sequence of triplet codons found in an mRNA to the sequence of amino acids forming a protein molecule. he ribosome decodes the genetic information with remarkable accuracy and very low error rates. Recent structural and kinetic studies have shed light on the mechanism of decoding, showing how the ribosome diferentiates between cognate and near- or noncognate codon-anticodon interactions. Amongst the large number of modiied bases that can be found in a tRNA, modiied bases at position 34 (the wobble base) and position 37 (3ʹ to the anticodon) are particularly interesting as they have a direct inluence on the decoding capacity of the ribosome. Since the codon-anticodon interaction can be studied by X-ray crystallography in the context of the ribosomal A-site, the role of these modiications are starting to be understood on a structural level. Here we summarize the results from several crystal structures that address wobble base modiications and modiied bases at position 37. he results discussed are based on structural studies of inosine, mnm5U and cmo5U in position 34 and t6A and m6A in position 37.

Introduction

In order for a cell to synthesise proteins, a protein coding gene has to be transcribed and translated. During translation the sequence of codons in an mRNA directs the synthesis of a polypeptide. he ribosome, a large ribonucleoprotein particle, interprets the information encoded in the sequence of nucleic acids by directing the polymerization of amino acids from aminoacyl-tRNAs (aa-tRNAs) which are delivered to the ribosome in a ternary complex with elongation factor Tu (EF-Tu) and guanosine-5ʹ-triphosphate (GTP). In all species the ribosome consists of two subunits. In eubacteria these subunits are denoted large or 50S subunit and small or 30S subunit together forming the 70S ribosome, whereas in eukaryotes 40S and 60S subunits form an 80S ribosome. Both subunits have three tRNA binding sites denoted A (aminoacyl), accepting the new incoming tRNA; P (peptidyl), holding the tRNA with the nascent peptide; and E (exit) which holds the deacylated tRNA that is about to leave the ribosome. he 30S subunit binds mRNA and tRNA anticodon stem-loops (ASLs) ensuring translational idelity by monitoring the codon-anticodon interaction. he 50S subunit binds the acceptor arms of the tRNAs catalysing peptide bond formation between the nascent chain bound to the P-site tRNA and the incoming aminoacyl tRNA in the A site. Despite the diference in molecular *Corresponding Author: Albert Weixlbaumer—MRC—Laboratory of Molecular Biology, Hills Road, Cambridge CB1 0QH, UK. Email: [email protected].

DNA and RNA Modiication Enzymes: Structure, Mechanism, Function and Evolution edited by Henri Grosjean. ©2009 Landes Bioscience.

494

DNA and RNA Modii cation Enzymes

weight and composition amongst ribosomes from diferent species and organelles, it is believed that the underlying mechanism of protein synthesis is identical. Despite several decades of research, the exact mechanism of protein synthesis emerged only recently with the advent of more sophisticated biochemical techniques as well as high resolution crystal structures of the subunits and more recently of the entire 70S ribosome in complex with mRNA and tRNA ligands.1-5 he universal genetic code consists of 61 sense and 3 stop codons (Fig. 1). In almost all species a set of about 26 to 50 cytoplasmic tRNAs is suicient to decode the complete set of protein coding genes.6 However, this necessitates that certain tRNAs be capable of reading multiple codons. he Wobble Hypothesis, proposed in 1966 by Francis Crick, predicted that the base pairs between tRNA and mRNA in the irst two codon positions are in strict compliance with Watson-Crick geometry.7 In contrast, based on the available data at that time Crick predicted that the third base pair between codon and anticodon is somewhat less restricted, allowing the formation of non Watson-Crick base pairs. It is thought that the primitive genetic code was a two letter code with the third position being completely degenerate.8 However, the present genetic code has evolved so that only 8 out of the 16 codon boxes are fully degenerate in the third position, i.e., all four codons that begin with the same two bases encode the same amino acid (Fig. 1). Generally these fully degenerate or family codon boxes are decoded by multiple isoaccepting tRNAs.

Figure 1. The genetic code can be divided into 16 codon boxes defined by the first two bases of the codon. 8 of the codon boxes code for more than one amino acid or code for amino acids and stop (grey background). The remaining 8 codon boxes are called family codon boxes in which only the first two bases determine the identity of the amino acid. A set of about 40 tRNAs in E. coli decodes the complete genome (each tRNA species is represented by the connected open circles indicating the spectrum of codons it can decode). However, many of them are modified at position 34 (the wobble base; the types of modifications are indicated).

Crystallographic Studies of Decoding by Modiied Bases: Correlation of Structure and Function

495

Protein synthesis is inely balanced between speed and accuracy with error frequencies in vivo having been estimated to be as low as 1 × 10−4.9 Ultimately the base complementarities between codon and anticodon determines translational accuracy. However, it was recognized early on that the energetic diference between cognate (no mismatch) and near-cognate (one mismatch) base pairing interactions alone cannot account for the observed error rates.10 his led to the proposal that tRNA selection on the ribosome takes place in two distinct steps which are separated by an irreversible release of energy by GTP hydrolysis—kinetic proofreading.11,12 In contrast to the kinetic proofreading hypothesis, it was also proposed that the ribosome not only serves as a passive platform to bring the substrates together, but has an active role and selectively enhances the small energetic diferences leading to more stable interactions for cognate mRNA-tRNA-complexes but not for near- or noncognate ones.13 Both of these hypotheses are strongly supported by mutations in ribosomes that enhance or reduce translational idelity. It is currently accepted that both hypotheses are correct—the ribosome does interact directly with codon-anticodon pairs in a two step selection process. Crystallographic studies using the 30S ribosomal subunit in complex with mRNA and cognate or near-cognate tRNA mimics in presence or absence of an antibiotic that lowers translational idelity shed light on the structural basis for decoding.14,15 hese structures in combination with kinetic data16,17 resulted in a detailed mechanistic model of the decoding process. Briely, when an anticodon interacts with a cognate codon in the 30S A site, several universally conserved bases of 16S rRNA undergo a conformational change to form a network of hydrogen bonds that can only be formed when the irst and the second base pairs adopt Watson-Crick geometry. Signiicantly, these contacts are speciic for the geometry rather than the identity of the base pairs (Fig. 2). As a result the 30S subunit undergoes a global conformational change believed to be the signal leading to GTP hydrolysis by EF-Tu and consequent release of the incoming tRNA into the A site (accommodation). his is conirmed by kinetic studies which show that an antibiotic that lowers idelity accelerates the rate of GTP hydrolysis in the case of near cognate tRNAs bound to the A-site codon. Crystallographic studies showed that in presence of near cognate tRNAs this antibiotic induces a 30S conformation similar to the one it adopts when a cognate anticodon is bound. Subsequent to tRNA binding and codon recognition a second selection step takes place in which near cognate tRNAs have much higher dissociation rates than cognate tRNAs and therefore less likely undergo peptide bond formation. In contrast to the irst and second codon-anticodon base pairs, the third base pair is not as closely monitored because the ribosome forms a contact to the third position of only the codon, allowing the wobble base in the tRNA (i.e., the irst base of the anticodon in position 34) more conformational freedom (Fig. 2). he fact that the irst and the second codon-anticodon base pairs but not the third are very closely monitored for Watson-Crick geometry is also relected in the organization of the genetic code. A mismatch in the irst or second codon position would always result in the incorporation of the wrong amino acid whereas a mismatch in the third position would frequently remain silent. In addition to the four standard ribonucleotides RNA molecules can contain a vast number of modiied nucleotides. Pseudouridine (Ψ) was amongst the irst modiied nucleotides to be discovered and identiied.18 When the irst sequences of tRNAs became available it was noted that modiied nucleotides are also present in tRNA.19 Furthermore, modiied nucleotides were also found in and close to the anticodon, suggesting a direct role in multiple codon recognition20 (Fig. 3). To date a large number of modiied nucleotides present in tRNAs have been found. In Crick’s wobble hypothesis it was predicted that based on pure geometric considerations uridine should in principle be able to pair with all 4 bases and in mitochondria it is known that an unmodiied U in the wobble position of the tRNA can decode all 4 bases in a codon.21 Position 34, the wobble base, as well as position 37, the base 3ʹ adjacent to the anticodon are the two most frequently modiied bases and also show the greatest variety in the chemistry of their modiications (Fig. 4). Modiied bases in position 34 oten enable unorthodox base pairing and the wobble rules required revision because in many cases they do not agree with the rules originally formulated by Crick (see e.g., refs. 22, and 23 and references therein). Modiied adenines have been shown to be almost always

496

DNA and RNA Modii cation Enzymes

Figure 2. High resolution crystal structures shed light on the structural basis of decoding. A) Schematic representation of the interaction between ASLPhe (green) with its cognate codon (orange). The base in position 34 of the tRNA (G34, green) forms a wobble base pair to the third base of the codon (U3, orange). B) The universally conserved 16S rRNA base A1493 (grey) forms hydrogen bonds to the first codon-anticodon base pair in a sequence independent manner. C) The two universally conserved bases G530 and A1492 of 16S rRNA form a network of contacts to the second codon-anticodon base pair. Similar as for the first base pair these contacts can be made to any base pair as long as it adopts Watson-Crick geometry. D) The third base pair is not as closely monitored and the ribosome only forms contacts to the ribose in the codon (U3). The wobble base in the tRNA (G34) is allowed more conformational freedom so that G•U wobble geometry can be tolerated. A color version of this image is available at www.landesbioscience.com/curie.

present in position 37 for codons starting with either A or U, indicating their importance for accurate protein synthesis.24-26 As the preceding example highlights, the presence of modiied bases to afect decoding is heavily dependent on the context of the tRNA. It has been shown that while some tRNAs function well in un- or hypo-modiied form, others absolutely require modiications to fulill their function. It was recognised early on that although C or G are frequently found as wobble nucleotides, unmodiied A or U are very rare, suggesting they are possibly lethal for a cell.24 To date, only two examples of cytosolic tRNAs with an unmodiied adenine in the wobble position are known: one in Mycoplasma spp and one isolated from a mutant Salmonella typhimurium strain.27,28 Instead of adenine, inosine derived from adenine by deamination is usually present at position 34. Inosine can base pair to U, but also C and A and is therefore exclusively found in family codon boxes in eubacteria and is more prevalent in eukaryotes.23 he interaction of inosine with both adenine and cytosine in the decoding centre of the 30S ribosomal subunit was studied structurally and will be discussed in more detail later.29 Uridine in position 34 (U34) generally pairs with A or G. However, posttranscriptionally modiied uridines are very common and are found both in fully degenerate codon boxes as well

Crystallographic Studies of Decoding by Modiied Bases: Correlation of Structure and Function

497

Figure 3. Overview of the three-dimensional structure of a tRNA molecule. The terminal adenine of the CCA end forms the tip of the acceptor arm and carries the amino acid esterified to the 3ʹOH of the ribose. The anticodon stemloop (ASL) is indicated and corresponds to the part of the tRNA that binds to the 30S ribosomal subunit. The uridine in position 33 (U33, blue) is universally conserved in tRNAs and forms a structural motif called the U-turn. The anticodon is comprised of the bases in position 34 (the wobble base which is very often modifed, purple), position 35 and position 36 (green). These three bases bind the third, second and first base of the codon respectively. In position 37, 3ʹ adjacent to the anticodon is usually a purine, which is almost always modified (orange). A color version of this image is available at www.landesbioscience.com/curie.

as in split codon boxes. he modiications of U34 have been shown to modulate the decoding capacity of the tRNAs they are found in. Uridines found in position 34 of tRNAs involved in the decoding of split codon boxes have modiications such as 5-methylaminomethyl uridine (mnm5U) that allow eicient pairing to A and G but not to U and C (Fig. 1). he interaction of mnm5U base paired to G in the context of the 30S ribosomal subunit was studied structurally and will be discussed in more detail later.30 In contrast, the widespread 5-hydroxyuridine derivatives have an expanded repertoire of base recognition and are found in family codon boxes. One of the most prevalent modiications of this type is uridine 5-oxyaceticacid (cmo5U) which is present in 6 diferent family codon boxes, speciic for leucine, valine, serine, proline, threonine and alanine in eubacteria (Fig. 1). A tRNA with this type of modiied uridine is able to read A-, G- and U-ending codons and for at least three of the six codon boxes it has been shown that even the C-ending ones can be read. To gain more insight into the mechanism of cmo5U, crystal structures of the 30S subunit with an ASLValcmo5UAC in complex with the four valine codons were solved and will be discussed later.31 In the following sections, results from crystal structures using the 30S ribosomal subunit in complex with mRNA as well as tRNA mimics will be discussed. hese structures together with mutational, biochemical and kinetic

498

DNA and RNA Modii cation Enzymes

Figure 4. A small selection of RNA base modifications found in the wobble position 34 or position 37 (3ʹ adjacent to the anticodon) of tRNA. Some of these have been studied structurally in the context of the 30S decoding centre. The modifications are highlighted in red. A color version of this image is available at www.landesbioscience.com/curie.

data allow us now to draw a more detailed picture about the role of modiications in the decoding process in translation.

Structural Studies on Inosine

Although inosine had been previously identiied as a component of both E. coli32 and yeast33 ribosomes, the irst identiication of inosine in a ribonucleic acid sequence came in yeast alanine tRNA.19 Inosine, formed by the hydrolytic deamination of adenine in tRNA transcripts,34,35 is most structurally similar to guanine, but lacking the 2-amino group it is unable to form three hydrogen bonds when base paired. However, inosine has the added functionality of being able to form base pairs with not only cytosine, but also uridine and adenosine. his enhanced repertoire of base pairs over adenosine (U) and guanosine (C and U) proves to be very advantageous in the decoding of family codon boxes, allowing an organism to synthesize fewer tRNAs to decode four codons. X-ray crystallographic studies of an ASL derived from tRNAArgICG complexed with its cognate codons CGC and CGA and bound in the decoding center of the 30S ribosomal subunit were performed to examine inosine-containing wobble base pairs at atomic resolution.29 he I•C wobble base pair formed exactly as predicted, similar to a G-C pair, but missing the third hydrogen bond

Crystallographic Studies of Decoding by Modiied Bases: Correlation of Structure and Function

499

between the absent guanosine N2 and the O2 of cytosine. he structure of the I-A base pair showed that the bases were in the standard Ianti•Aanti conformation. he two proposed conformations for the I•A base pair were Isyn•Aanti and Ianti•Aanti. he Isyn•Aanti conformation was proposed based on several lines of evidence: his geometry had been observed in several crystal structures of adenosine-inosine pairs,36-38 rotation of the inosine relative to its ribose is relatively unrestrained39 and the hypothesis that the decoding center would favour a minimization of geometric strain and stabilize rare base tautomers.40 It is important to note that proposals for the Isyn•Aanti geometry were made in the absence of any structural data for the decoding center, which would have doubtlessly afected the hypotheses. Crick proposed the Ianti•Aanti geometry based upon model building and pointed out that it would represent the widest possible base pair to occur at the wobble position.7 he crystal structure makes it very clear why the Ianti•Aanti geometry prevails: in the wobble position, only the codon base (A) is ixed in place. he I is relatively free to move and the wobble base pair is not monitored for Watson-Crick pairing as the irst two positions of the codon-anticodon minihelix. In addition, changes of the torsion angles in the sugar phosphate backbone of the anticodon base allow accommodation of a purine-purine base pair with only small changes in the overall width.

N6-hreonylcarbamoyladenosine 37

N6-hreonylcarbamoyladenosine (t6A) is a member of related modiications to purines 3ʹ to the ASL (ex. N6-methyl-N6-threonylcarbamoyladenosine (mt6A37) and 2-methylthio-N6- threonylcarbamoyladenosine (ms2t6A37)) (Fig. 4). hese modiications are prevalent in all kingdoms of life in various tRNAs.41 Examples include all cytoplasmic tRNAMeti (initiator tRNA) in eukaryotes but not prokaryotes42 and especially relevant to the current example all lysine tRNAs.43 Efects of modiied bases at position 37 were observed in temperature jump experiments which measured self-complementary anticodon-anticodon pairing in solution. he binding ainities of the tRNAs were approximately 6 orders of magnitude higher than expected for simple hydrogen bonding due to base pairing. One of the factors cited for this discrepancy was stacking of modiied purines 3ʹ to the ASL with the ASL-ASL minihelix.44 In fact, earlier model building provided for special stacking of alkylated purines in position 37.45 Biochemical efects of t6A37 were observed in assays of codon-anticodon binding on the ribosome: unmodiied ASLLysUUU is unable to bind to its cognate codon AAA, but when the ASLLysUUU carries the t6A37 modiication binding is restored.46 Importantly, this singly modiied ASL is still not competent to bind to its cognate codon AAG or undergo A- to P-site translocation,47,48 which will be discussed in the following section. Nuclear magnetic resonance (NMR) and thermodynamic studies have identiied how the t6A37 modiication afects the tRNA very precisely. It has been shown that the stem of the ASL is slightly destabilized relative to the unmodiied form, but the loop is much more ordered.42,49 he primary efect of the modiication on the free tRNA is the abrogation of a U33-A37 intraloop base pair, which allows the ASL to form a canonical U-turn (Figs. 3,5) and is presumably the source of ASL destabilization as well.49 he atomic structure of a ASLLysUUU- t6A37 bound to its cognate codon AAA in the A site of the 30S ribosomal subunit gives a clear understanding of how the t6A37 modiication inluences codon-anticodon binding.30 he abrogation of the intraloop U33-A37 base pair is implicit in the chemical modiication of adenosine N6, the hydrogen bond donor for base pairing with U33 and the X-ray structure gives no new information on this point. What the crystal structures do show is that the t6A modiication is so bulky as to position the base diferently than the unmodiied base, changing its stacking. Additionally, the ureido portion of the modiication forms a remarkably planar structure which is coplanar with the adenine rings, stabilized by a hydrogen bond from N11 of the modiication to N1 of the adenine base (Fig. 5). his pseudo-tricyclic conformation of the modiied base and concomitant charge delocalization in the ureido moiety was previously observed50 and has been subsequently observed by NMR.42 he combination of steric positioning and tricyclic base creates a cross-strand stack such that A38 of the ASL is strongly stacked with t6A37,

500

DNA and RNA Modii cation Enzymes

Figure 5. t6A 37 is involved in cross-strand stacking with bases in the anticodon stem and the codon. A) t6A 37 (black, foreground) stacks with A1 of the A1-U36 base pair (grey, background). The dashed lines indicate hydrogen bonding of the base pair and between the modification and the adenine. B) t6A 37 (black, background) stacks with A 38 of the anticodon stem (grey, foreground). Note the extensive use of the ureido moiety for stacking with A 38 in (B) and the bulkiness of the threonyl moiety of the modification.

which then stacks with A1 of the codon (the 1st position of the codon) (Fig. 5). Cross-strand stacks have been observed to compensate for loss of stacking due to shearing of bases or nonWatson-Crick base pairing,51 but in the case of tRNALysUUU it is the poor enthalpy of binding and poor stacking in the UUU ASL which is compensated for by the cross-strand stack. It is expected that this will be a common mode of action of the modiied purines at position 37. he importance of context to base modiication cannot be overemphasized. t6A is needed for ASLLysUUU due to the poor stacking within its UUU anticodon. Modiications are used to ‘tweak’ the thermodynamic properties of the decoding system so that the ribosome can recognize a diverse set of codon-anticodon interactions as cognate (or not).

Structural Studies on 5-Methylaminomethyluridine 34

5-Methylaminomethyluridine is again a member of much larger family of similar modiications (ex. 5-methylaminomethyl-2-thio-uridine (mnm5 s2U), 5-methylaminomethyl-2-seleno-uridine (mnm5 se2U)), which spans all the kingdoms of life. hese types of modiications are generally referred to as xm5U (Fig. 4). In eubacteria, it is critical to understand that all tri-pyrimidine anticodons (lysine, arginine, glutamic acid, glycine) bear either mnm5U34 or mnm5s2U34. Parsing this even more inely, one observes that the mnm5s2 modiication occurs only for the 2 tRNAs which contain a U35, while only the mnm5 modiication occurs for the two tRNAs which contain a C35. As has been previously noted, anticodon sequence context is central to understanding the function of tRNA modiications. he case discussed here is tRNALysUUU- mnm5s2U34- t6A37, the only tRNA in the eubacterial cell decoding the two codons AAA and AAG. Ribosome binding assays showed that the t6A37 modiication is suicient and necessary for binding of the AAA codon, but both t6A37 and mnm5s2U34 are necessary for binding of the AAG codon52 and both are necessary for proper translocation from the A to the P site.47,48 he crystal structure of an ASLLysUUU- mnm5U34-t6A37 (doubly-modiied ASL) bound to its cognate codon AAG in the A site of the 30S ribosomal subunit gives a direct comparison between the non mnm5U34 structure bound to AAA (singly-modiied ASL discussed in previous section), allowing ine details to be observed.30 It is important to note that while the biochemical studies were carried out on the native mnm5s2U, structural studies were completed using the nonthiolated mnm5U. Although the efects of this change should be quite small, they could be unpredictable. Although the modiication itself is not observable in the electron density, the efects of the modiication are readily apparent (and its presence was conirmed by mass spectrometry). While the A-U pair is in normal Watson-Crick geometry, the G•mnm5U pair is not in the expected ‘G•U

Crystallographic Studies of Decoding by Modiied Bases: Correlation of Structure and Function

501

Figure 6. The mnm5U34•G3 base pair (black, foreground) takes a conformation which maintains some stacking on the neighbouring U35-A 2 base pair (grey, background). Hydrogen bonds for both base pairs are marked as dashed lines. Were the mnm5U34•G3 base pair a standard G•U wobble geometry, the U would move toward the bottom of the figure, totally unstacking from U35. In its observed conformation, O2 (which is involved in 2 hydrogen bonds) is positioned directly over the ring of U35, the optimal position for stacking interactions.

wobble’ geometry7,23 or other predicted alternatives,53 but instead takes up a structure that appears to balance the efects of stacking and hydrogen bonding, both of which are afected by the mnm5U modiication (Fig. 6). Interestingly, the unusual position of mnm5U34 changes the hydrogen bonding of the base pair, with a bifurcated hydrogen bond observed between O2 of U34 and N1 and N2 of G3 instead of the normal two hydrogen bonds. he energy loss in hydrogen bonding seems to be made up in extra stacking energy, as the mnm5U34 is positioned such that the O2 carbonyl is positioned directly over the ring of U35, the optimal placement for stacking stabilization.54 his arrangement fully explains the unusual geometry of the base pair: normal G•U wobble geometry would totally unstack U34 from U35, unacceptable in the context of the codon-anticodon minihelix. An alternate geometry must be arrived at and this is accomplished by tweaking the electronic properties of the ring by modiication by mnm5 and s2. A similar observation has been made more recently for another xm5U-type modiied uridine: 5-taurinomethyluridine (τm5U), which is present in mitochondrial tRNAs and is responsible for decoding leucine and tryptophan.55 For a more extensive discussion on the stacking of G•U base pairs, please see the later section. his model also its well with several other observed modiications for the same codons. mnm5U is replaced by mcm5U (5-methylcarboxymethyluridine) in eukaryotes and the lack of direct hydrogen bonding in the observed structure explains how such dissimilar modiications can accomplish the same task. he presence of s2U for U35-containing anticodons and its absence in C35-containing anticodons its well with the positioning of this moiety for stacking: sulphur should stack slightly more strongly and U is well known to be poor in stacking, so the better-stacking C35 anticodon does not require the sulphur.

Structural Studies on cmo5U and m6A

In 1970 cmo5U was discovered as a new constituent of tRNAVal in E. coli.56 his modiication or its derivatives (mo5U or mcmo5U, Fig. 4) are exclusively found in the wobble position of tRNAs.43 Originally it was proposed that tRNAs with these types of modiied uridines in the wobble position are able to read codons ending in A, G or U and are therefore exclusively found in family codon boxes.56 In accordance with this it is also important to note that besides a tRNA that has cmo5U (or one of its derivatives) in the wobble position (34) generally decoding NNA or NNG (where N can be any nucleotide), with one known exception57 there is always at least one other tRNA isoacceptor with a G in position 34 present to decode NNC or NNU. Given

502

DNA and RNA Modii cation Enzymes

the presence of these two isoacceptors the role of the modiied uridine is not apparent because according to the wobble rules one would expect that an unmodiied uridine is able to read A as well as G-ending codons. However, early in vitro studies and more recently studies on mutant Salmonella enterica (S. enterica) strains indicated that a tRNA isoacceptor with the cmo5U at the wobble position (cmo5U34) is suicient for reading all four codons in certain family codons boxes and is always able to read the A-, G- and U-ending codons.58-61 Nevertheless, the presence of at least one other isoaccepting tRNA with G in the wobble position 34 suggests the main role of the modiication is to read the codons ending with A and G (A3 and G3). he studies on mutant S. enterica strains seem to conirm this because unexpectedly the efect of hypomodiied U34 was most drastic for the G-ending codons but was not as pronounced for the ones having a uridine in the third position (U3).61 Recently crystal structures were solved of ASLs of tRNAValcmo5UAC (ASLValcmo5UAC) from E. coli bound to the four cognate valine codons in the context of the 30S ribosomal subunit (Fig. 7).31 hese structures together with mutational and recent kinetic data allow us to propose a mechanism for the action of 5-hydroxyuridine derivatives in general.61,62 Similar as described before for inosine and mnm5U, the crystal structures made it possible to visualize codon anticodon interaction along with the modiication in the decoding centre of the ribosome. An interesting feature that was observed in all 4 structures is a hydrogen bond between the 2ʹOH of U33 and the ether oxygen of the modiication (O5). his contact seems to lock the modiied uridine in its position and presumably only allows limited lateral freedom. It is plausible to assume that this contact also plays a role in solution prior to tRNA binding to the A site of the ribosome. herefore, one would expect that this hydrogen bond pre-orders the anticodon loop and should have a positive entropic efect on binding. It is important to note that this contact can be formed in all of the derivatives of cmo5U because all of them have an oxygen attached to the C5 position of the base.

Figure 7. Conformation of the 4 wobble base pairs as seen in the 4 crystal structures. A) The cmo5U-A base pair shows no obvious role for the modification. Nevertheless it is well ordered and points towards the mRNA. B) The cmo5U•G base pair resembles a Watson-Crick basepair surprisingly. This requires a shift in the keto-enol equilibrium that is presumably further stabilized by the inductive effect of G. C) The cmo5U•U base pair shows only one strong hydrogen bond (dashed line) and the low pKa of the carboxyl group suggests no additional hydrogen bond to the keto oxygen of the U. D) The cmo5U•C base pair also forms only one strong hydrogen bond. In addition the stacking is less favourable compared to the cmo5U•U which could explain the lower efficiency in decoding C-ending codons by cmo5U.

Crystallographic Studies of Decoding by Modiied Bases: Correlation of Structure and Function

503

Recently the inluence of modiied bases on pre-ordering the anticodon loop for ASLValcmo5UAC has also been experimentally veriied.63 Perhaps unsurprisingly no obvious role for the modiication itself in decoding A-ending codons was observed. his agrees with A-site binding studies showing only moderate efects when comparing unmodiied versus modiied tRNAs.64 he slightly lower eiciency of unmodiied versus modiied tRNAs in reading codons ending with adenine could relect the inluence of the aforementioned contact between U33 and the modiication which is absent in unmodiied uridines. Unexpectedly, the cmo5U34•G3 base pair also resembles Watson-Crick rather than wobble geometry. his requires that either G or the modiied uridine adopt their rare enol form. NMR, IR as well as crystallographic studies suggested that only one in about 104 bases adopts its enol form at any given time under physiological conditions (see ref. 54 and references therein). his suggested a role for the modiication in shiting the keto-enol equilibrium. High resolution studies on 5-methoxy as well as 5 hydroxyuridine (mo5U and ho5U respectively) seem to conirm this idea.65 Both crystal structures of the isolated nucleosides showed a change in bond length that indicates a shit in the keto-enol equilibrium induced by the electron donating capacity of the substituent on the 5-position. No experimental data is available to show the extent of this shit, but the high-resolution crystal structures indicated that it depends on the nature of the modiication because the diferences in bond lengths compared to an unmodiied uridine were more pronounced in the mo5U derivative compared to ho5U. his immediately leads to the question why a base pair resembling Watson-Crick geometry at the end of the codon anticodon helix is advantageous compared to one having wobble geometry. It is a particularly important question because the opposite, a G in the wobble position decoding U-ending codons, is possible and does not require the involvement of a modiied base. In the decoding centre of the ribosome, the third position of the codon is held in place by hydrogen bonds to ribosomal RNA such that it is compliant with RNA A-form helix geometry. his in turn means that the wobble base in the tRNA has to change its conformation in order to allow base pair geometries that deviate from that of a canonical one, which is what has been observed in high resolution crystal structures.14 However, if a U in the wobble position has to decode a G in the third position of the codon, it is expected that the altered geometry of this base pair requires an unfavorable stacking interaction with the preceding base pair of the codon anticodon helix (Fig. 8). Whereas Watson-Crick base

Figure 8. A) Stacking interaction between the cmo5U•G base pair as observed in the crystal structure (black, foreground) with the preceding base pair (A35-U2, grey, background). B) The position of an unmodified uridine for a modelled U•G wobble base pair is also shown (black, dashed lines). Due to contacts with 16S rRNA, the third base of the codon is restrained and the unmodified uridine would be required to move towards the major groove loosing its stacking overlap. Furthermore, the six codon boxes that are decoded by tRNAs with cmo5U or derivatives all have a pyrimidine-purine base pair with the pyrimidine being in the second position of the codon (U2, grey, background). This results in a particularly unfavourable stacking between G3 and the pyrimidine.

504

DNA and RNA Modii cation Enzymes

pairs are isosteric, G•U wobble base pairs are not. his nonisostericity of G•U pairs is also relected in their asymmetric stacking interaction with their nearest neighbours in an RNA helix. It was noticed by Mizuno and Sundaralingam early on that 5ʹ-G•U-3ʹ base pairs appear more frequently at the end of RNA helices than 5ʹ-U•G-3ʹ pairs, which they attributed to the diferences in stacking.66 Statistics on rRNA conirm the preference for stacked 5ʹ-G•U-3ʹ at the ends of helices.67,68 hermodynamically a helix ending with this type of wobble pair also seems more stable than one ending with 5ʹ-U•G-3ʹ.69,70 Furthermore, in the six codon boxes where cmo5U or derivatives are involved in decoding the situation is particularly unfavourable for a terminal 5ʹ-U•G-3ʹ base pair because all these codons have a pyrimidine in position 2. his results in a particularly unfavourable stacking interaction between the guanine in the third position and the pyrimidine in position 2.67 he modiied cmo5U is not present in eukaryotes. In the yeast Saccharomyces cerevisiae 5-carbamoylmethyluridine (ncm5U) is used as wobble base in tRNAs having a similar distribution as cmo5U except for the leucine codon box.71 Similar as for cmo5U this modiication is required in particular to read G-ending codons and, context dependent, can in certain cases also read all 4 codons of a degenerate codon box.71 However, due to the diference in the nature of the modiied base the underlying molecular mechanism for the function of ncm5U in contrast to cmo5U must be diferent and remains to be shown. To summarize this part, in eubacteria 6 of the 8 family codon boxes require tRNAs with cmo5U or derivatives in the wobble position 34. he current evidence suggests that these modiied bases are mainly required to decode codons ending in G. he modiication shits the keto enol equilibrium of the base and allows the enol form of the U to base pair with G. his results in the formation of a 5ʹ-U•G-3ʹ base pair at the end of the codon anticodon helix that resembles a canonical Watson-Crick base pair and this overcomes the otherwise unfavourable stacking interaction with the preceding base pair. Surprisingly, the crystal structures of ASLValcmo5UAC bound to the GUU and GUC codons showed that for both pyrimidine•pyrimidine base pairs the cmo5U only forms one strong hydrogen bond. he ribose does not adopt the C2ʹ-endo conformation as seen in NMR studies of the isolated nucleotide and due to intra-molecular contacts also cannot move close enough to form stronger interactions as seen before in crystal structures of short RNA helices having U•U and C•U base pairs.72-74 Interestingly, in the crystal structures the stacking interaction of the cmo5U•U base pair with the preceding base pair is more favourable compared to the cmo5U•C base pair. his provides a possible explanation for the observation that C-ending codons are not as eiciently decoded compared to U-ending ones by a tRNA with cmo5U in the wobble position. As mentioned previously, similar to the wobble base, the universal purine in position 37, 3ʹ adjacent to the anticodon (Fig. 3) of tRNA is very oten modiied. his as well as the correlation between the sequence 3ʹ to the anticodon with the anticodon itself led Yarus and coworkers to propose the idea of an extended anticodon.75 he ASL of tRNAValcmo5UAC from E. coli used to study the role of cmo5U also contains the companion modiication N6-methyladenosine (m6A) in position 37 (Fig. 4). Modiied bases in position 37 have been implicated in preventing the formation of intra-loop base pairs, thereby keeping the anticodon loop in an open conformation.76 his suggestion seems to be conirmed by the high frequency of modiied bases in anticodon loops that otherwise would have the potential to form one or two base pairs within the anticodon loop. However, similarly as for t6A, an interesting alternative is that the modiication of the purine in position 37 increases the area of stacking of this base with the irst codon anticodon base pair. he crystal structures of ASLValcmo5UAC showed that the modiied adenine stacks on top of both bases in the codon and anticodon. An increased stacking interaction should have a positive efect on the stability of the codon anticodon interaction.

Crystallographic Studies of Decoding by Modiied Bases: Correlation of Structure and Function

Conclusions and Future Prospects

505

Recent crystallographic and kinetic data have provided a detailed description of decoding in protein synthesis. Selection of a tRNA can roughly be divided into two stages. During initial selection, the ribosome determines if a given mRNA-tRNA pairing is cognate based on the formation of Watson-Crick base pairs in the irst two positions of the codon-anticodon minihelix. At this point noncognate tRNAs are completely rejected, while near-cognate tRNAs are strongly discriminated against with higher of rates and lower rates of GTPase activation compared to cognate tRNAs (approximately 10-fold). his results in at least a 10 fold increase of the GTP hydrolysis rate of EF-Tu for cognate compared to near-cognate tRNAs. he GTP hydrolysis rate for noncognate tRNAs is about a 100,000 fold lower. Following GTP hydrolysis by EF-Tu, a second selection step takes place, for which the rate for tRNA dissociation is about a 100 fold higher for near-cognate tRNAs compared to cognate (for a more detailed discussion see ref. 5). During this second selection step, the overall energy of binding of the tRNA to the codon is a critical component. If there is a non Watson-Crick base pair in either the irst or second position the ribosome cannot make a transition to its closed conformation and the tRNA is rejected during the irst selection step. If the binding energy for a cognate tRNA to the codon is insuicient, the dissociation rate is increased and despite being cognate, the tRNA has a higher probability of being rejected during the second selection step. he converse case is also damaging: if the energy of binding is too great, translocation is impaired, the tRNA will not exit properly when its duty is completed and translation will slow. All three of these cases are severely detrimental to the viability of the host organism. RNA modiications are used to manipulate this delicate balance by tuning the energy of binding of the tRNA to a codon presented in the ribosomal A site. Some modiications are employed in family codon boxes, in which tRNAs with modiied bases like inosine or the 5-hydroxyuridine derivatives in their wobble position can read 3 and sometimes even 4 codons. Other modiications such as 5-methylaminomethyluridine appear in split codon boxes and have evolved to enable binding of anticodons to their cognate codons only. Although no modiications are observed in the irst two positions of the anticodon (36 and 35), presumably as this would inluence the ribosome’s monitoring of these positions, the base 3ʹ to the anticodon (37) is frequently modiied. Modiications at this position are again used to tune the energy of binding, by abrogation of intra-loop hydrogen bonding or alteration of minihelix base stacking as presented for N6-hreonylcarbamoyladenosine. So far the available evidence suggests that the primary role of RNA modiications that are involved in decoding is to promote eicient protein synthesis, oten by facilitating the binding of tRNAs to their respective cognate codons. his can be achieved either by directly inluencing the energetics of base pairing at the wobble position; or by inluencing the structure of the anticodon stem loop prior to binding to the A site; or by a combination of the above. All the structures of ASL-codon complexes discussed here are hypothesized to represent the fully accommodated state, ater GTP hydrolysis has taken place and the tRNA in question has either been rejected or accepted to take part in the peptidyl transferase reaction. his is the limitation inherent in crystal structures and constrains the interpretations of said structures to static states which have reached equilibrium in the decoding center. Interpretation of crystal structures, however, does not take place in vacuo and great attention has been paid to the wealth of other information available about these RNA modiications and their functions. here do remain gaps in our knowledge and these deserve pointing out. Herein tRNA base modiications have been discussed that act in pre-ordering the ASL, modifying the electronic properties of the bases and altering stacking potential. here are other potential modes of action, we will suggest a few. Direct participation in base pair hydrogen bonding is the most obvious role for base modiications which has not yet been observed. his would seem the simplest type of modiication to arrive at evolutionarily, so it may be that such a dominant modiication is diicult to accommodate into the multiple stages of decoding, preventing the widespread adoption of this strategy. Modiications could also function in ways counter to what

506

DNA and RNA Modii cation Enzymes

has been observed to date. Just as cmo5U pre-orders the ASL, it may be that a modiication causes an increase in loop disorder, changing the thermodynamics of ASL-codon binding substantially. Equally likely are modiications that reduce base stacking. Just as modiications improve the binding tRNAs to their codons, they may also lessen the binding of strongly binding tRNAs to latten out the variations in codon-anticodon binding ainitites. It is an attractive hypothesis that modiied bases, which play a direct role in decoding evolved to allow a cell to reduce the actual number of tRNAs. However, given the current available data this seems somewhat less likely. Recently this was conirmed by analyzing the available genome sequences of mollicutes, which are unicellular parasitic eubacterial species. his work demonstrated that despite substantially reducing their genome and the number of tRNAs, only a few RNA modiications seem to be absolutely essential for protein synthesis to take place.77 It is clear that we are only beginning to understand the role of some of these RNA modiications. In order to understand decoding in full, many more studies of the types that have already proved so useful will be necessary.

Acknowledgements

he authors thank V. Ramakrishnan for help and advice and for critically reading the manuscript. he authors were funded by the Austrian Academy of Sciences and the Medical Research Council (AW) and the National Institutes of Health (FVM).

References

1. Ban N, Nissen P, Hansen J et al. he complete atomic structure of the large ribosomal subunit at 2.4 Å resolution. Science 2000; 289:905-20. 2. Wimberly BT et al. Structure of the 30S ribosomal subunit. Nature 2000; 407:327-39. 3. Schluenzen F et al. Structure of functionally activated small ribosomal subunit at 3.3 angstroms resolution. Cell 2000; 102:615-23. 4. Selmer M et al. Structure of the 70S ribosome complexed with mRNA and tRNA. Science 2006; 313:1935-42. 5. Rodnina MV, Wintermeyer W. Fidelity of aminoacyl-tRNA selection on the ribosome: kinetic and structural mechanisms. Annu Rev Biochem 2001; 70:415-35. 6. Grosjean H, Marck C, de Crecy-Lagard V. he various strategies of codon decoding in organisms of the three domains of life: evolutionary implications. Nucleic Acids Symp Ser (Oxf ) 2007; 15-6. 7. Crick FH. Codon—anticodon pairing: the wobble hypothesis. J Mol Biol 1966; 19:548-55. 8. Crick FH. he origin of the genetic code. J Mol Biol 1968; 38:367-79. 9. Edelmann P, Gallant J. Mistranslation in E. coli. Cell 1977; 10:131-7. 10. Eigen M, de Maeyer L. Chemical means of information storage and readout in biological systems. Naturwissenschaten 1966; 53:50-7. 11. Hopield JJ. Kinetic proofreading: a new mechanism for reducing errors in biosynthetic processes requiring high speciicity. Proc Natl Acad Sci USA 1974; 71:4135-9. 12. Ninio J. A semi-quantitative treatment of missense and nonsense suppression in the strA and ram ribosomal mutants of Escherichia coli. Evaluation of some molecular parameters of translation in vivo. J Mol Biol 1974; 84:297-313. 13. Potapov AP. A stereospeciic mechanism for the aminoacyl-tRNA selection at the ribosome. FEBS Lett 1982; 146:5-8. 14. Ogle JM et al. Recognition of cognate transfer RNA by the 30S ribosomal subunit. Science 2001; 292:897-902. 15. Ogle JM, Murphy FV, Tarry MJ et al. Selection of tRNA by the ribosome requires a transition from an open to a closed form. Cell 2002; 111:721-32. 16. Pape T, Wintermeyer W, Rodnina M. Induced it in initial selection and proofreading of aminoacyl-tRNA on the ribosome. EMBO J 1999; 18:3800-7. 17. Pape T, Wintermeyer W, Rodnina MV. Conformational switch in the decoding region of 16S rRNA during aminoacyl-tRNA selection on the ribosome. Nat Struct Biol 2000; 7:104-7. 18. Cohn WE. 5-Ribosyl uracil, a carbon-carbon ribofuranosyl nucleoside in ribonucleic acids. Biochim Biophys Acta 1959; 32:569-71. 19. Holley RW, Everett GA, Madison JT et al. Nucleotide Sequences in the Yeast Alanine Transfer Ribonucleic Acid. Journal of Biological Chemistry 1965; 240:2122-2128. 20. Soll D, RajBhandary UL. Studies on polynucleotides. LXXVI. Speciicity of transfer RNA for codon recognition as studied by amino acid incorporation. J Mol Biol 1967; 29:113-24.

Crystallographic Studies of Decoding by Modiied Bases: Correlation of Structure and Function

507

21. Barrell BG et al. Diferent pattern of codon recognition by mammalian mitochondrial tRNAs. Proc Natl Acad Sci USA 1980; 77:3164-6. 22. Agris PF, Vendeix FA, Graham WD. tRNA’s wobble decoding of the genome: 40 years of modiication. J Mol Biol 2007; 366:1-13. 23. Yokoyama S, Nishimura S. Modiied nucleosides and codon recognition. In: Söll D, RajBhandary UL, eds. tRNA: Structure, biosynthesis and function. Washington, DC: American Society for Microbiology Press, 1995; 207-23. 24. Nishimura S. Minor components in transfer RNA: their characterization, location and function. Prog Nucleic Acid Res Mol Biol 1972; 12:49-85. 25. Nishimura S, Yamada Y, Ishikura H. he presence of 2-methylthio-N6-(delta-2-isopentenyl) adenosine in serine and phenylalanine transfer RNA’s from Escherichia coli. Biochim Biophys Acta 1969; 179:517-20. 26. Yamada Y, Nishimura S, Ishikura H. he presence of 2-methylthio-N 6 -( 2 -isopentenyl)adenosine in leucine, tryptophan and cysteine tRNA’s from Escherichia coli. Biochim Biophys Acta 1971; 247:170-4. 27. Andachi Y, Yamao F, Iwami M et al. Occurrence of unmodiied adenine and uracil at the irst position of anticodon in threonine tRNAs in Mycoplasma capricolum. Proc Natl Acad Sci USA 1987; 84:7398-402. 28. Chen P, Qian Q, Zhang S et al. A cytosolic tRNA with an unmodiied adenosine in the wobble position reads a codon ending with the noncomplementary nucleoside cytidine. J Mol Biol 2002; 317:481-92. 29. Murphy FVt, Ramakrishnan V. Structure of a purine-purine wobble base pair in the decoding center of the ribosome. Nat Struct Mol Biol 2004; 11:1251-2. 30. Murphy FVt, Ramakrishnan V, Malkiewicz AJ et al. he role of modiications in codon discrimination by tRNALysUUU. Nature Structural and Molecular Biology 2004; 11:1186-11991. 31. Weixlbaumer A et al. Mechanism for expanding the decoding capacity of transfer RNAs by modiication of uridines. Nat Struct Mol Biol 2007; 14:498-502. 32. Szafranski P, Lane BG. Biochimica et Biophysica Acta 1962; 61:141-. 33. Hall RH. Isolation of 1-methylinosine and inosine from yeast soluble ribonucleic acid. Biochemical and Biophysical Research Communications 1963; 13:394-398. 34. Auxilien S, Crain PF, Trewyn RW et al. Mechanism, speciicity and general properties of the yeast enzyme catalysing the formation of inosine 34 in the anticodon of transfer RNA. J Mol Biol 1996; 262:437-58. 35. Gerber AP, Keller W. An adenosine deaminase that generates inosine at the wobble position of tRNAs. Science 1999; 286:1146-1149. 36. Carter RJ, Baeyens KJ, SantaLucia J et al. he crystal structure of an RNA oligomer incorporating tandem adenosine-inosine mismatches. Nucleic Acids Research 1997; 25:4117-4122. 37. Leonard GA, Booth ED, Hunter WN et al. he conformational variability of an adenosine.inosine base-pair in a synthetic DNA dodecamer. Nucleic Acids Research 1992; 20:4753-4759. 38. Subramanian E, Madden JJ, Bugg CE. A syn conformation for inosine, the wobble nucleoside in some tRNA’s. Biochem Biophys Res Commun 1973; 50:691-696. 39. Haschemeyer AE, Rich AJ. Nucleoside conformations: an analysis of steric barriers to rotation about the glycosidic bond. Journal of Molecular Biology 1967; 27:369-384. 40. Topal MD, Fresco JR. Base pairing and fidelity in codon-anticodon interaction. Nature 1976; 263:289-293. 41. Crain PF, Rozenski J, McCloskey JA. he RNA modiication database. Salt Lake City, UT, 2008. 42. Lescrinier E et al. he naturally occurring N6-threonyl adenine in anticodon loop of Schizosaccharomyces pombe tRNAi causes formation of a unique U-turn motif. Nucleic Acids Research 2006; 34:2878-2886. 43. Sprinzl M, Horn C, Brown M et al. Compilation of tRNA sequences and sequences of tRNA genes. Nucleic Acids Res 1998; 26:148-53. 44. Grosjean H, Söll DG, Crothers DM. Studies of the complex between transfer RNAs with complementary anticodons. Journal of Molecular Biology 1976; 103:499-519. 45. Fuller W, Hodgson A. Conformation of the anticodon loop in tRNA. Nature 1967; 215:817-821. 46. Yarian C et al. Modiied nucleoside dependent Watson-Crick and wobble codon binding by tRNALysUUU species. Biochemistry 2000; 39:13390-13395. 47. Phelps SS, Jerenic O, Joseph S. Universally conserved interactions between the ribosome and the anticodon stem-loop of A site tRNA important for translocation. Molecular Cell 2002; 10:799-807. 48. Phelps SS, Malkiewicz AJ, Agris PF et al. Modiied nucleotides in tRNA(Lys) and tRNA(Val) are important for translocation. Journal of Molecular Biology 2004; 338;439-444. 49. Stuart JW et al. Functional anticodon architecture of human tRNALys3 includes disruption of intraloop hydrogen bonding by the naturally occurring amino acid modiication, t6A. Biochemistry 2000; 39:13396-13404.

508

DNA and RNA Modii cation Enzymes

50. Parthasarathy R, Ohrt JM, Chheda GB. Modiied nucleosides and conformation of anticodon loops: crystal structure of t6A and g6A. Biochemistry 1977; 16:4999-5008. 51. Chou SH, Tseng YY. Cross-strand purine-pyrimidine stack and sheared purine.pyrimidine pairing in the human HIV-1 reverse transcriptase inhibitors. Journal of Molecular Biology 1999; 285:41-48. 52. Yarian C et al. Accurate translation of the genetic code depends on tRNA modiied nucleosides. Journal of Biological Chemistry 2002; 27:16391-16395. 53. Takai K, Yokoyama S. Roles of 5-substituents of tRNA wobble uridines in the recognition of purine-ending codons. Nucleic Acids Research 2003; 31:6383-6391. 54. Saenger W. Principles of Nucleic Acid Structure. New York: Springer, 1984. 55. Kurata S et al. Modiied uridines with C5-methylene substituents at the irst position of the tRNA anticodon stabilize U{middle dot}G wobble pairing during decoding. J Biol Chem 2008; 283:18801-11. 56. Murao K, Saneyoshi M, Harada F et al. Uridin-5-oxy acetic acid: a new minor constituent from E. coli valine transfer RNA I. Biochem Biophys Res Commun 1970; 38:657-62. 57. Yamada Y, Matsugi J, Ishikura H et al. Bacillus subtilis tRNA(Pro) with the anticodon mo5UGG can recognize the codon CCC. Biochim Biophys Acta 2005; 1728:143-9. 58. Mitra SK et al. Relative eiciency of anticodons in reading the valine codons during protein synthesis in vitro. J Biol Chem 1979; 254:6397-401. 59. Mitra SK, Lustig F, Akesson B et al. Codon-acticodon recognition in the valine codon family. J Biol Chem 1977; 252:471-8. 60. Nasvall SJ, Chen P, Bjork GR . The modified wobble nucleoside uridine-5-oxyacetic acid in tRNAPro(cmo5UGG) promotes reading of all four proline codons in vivo. RNA 2004; 10:1662-73. 61. Nasvall SJ, Chen P, Bjork GR. he wobble hypothesis revisited: Uridine-5-oxyacetic acid is critical for reading of G-ending codons. Submitted 2007; 62. Kothe U, Rodnina MV. Codon reading by tRNAAla with modiied uridine in the wobble position. Mol Cell 2007; 25:167-74. 63. Vendeix FA et al. Anticodon domain modiications contribute order to tRNA for ribosome-mediated codon binding. Biochemistry 2008; 47:6117-29. 64. Takai K, Takaku H, Yokoyama S. Codon-reading speciicity of an unmodiied form of Escherichia coli tRNA1Ser in cell-free protein synthesis. Nucleic Acids Res 1996; 24:2894-9. 65. Hillen W, Egert E, Lindner HJ et al. 5-Methoxyuridine: he inluence of 5-substituents on the keto-enol tautomerism of the 4-carbonyl group. J Carbohydrates-Nucleosides-Nucleotides 1978; 5:23-32. 66. Mizuno H, Sundaralingam M. Stacking of crick wobble pair and watson-crick pair: stability rules of G-U pairs at ends of helical stems in tRNAs and the relation to codon-anticodon wobble interaction. Nucleic Acids Res 1978; 5:4451-61. 67. Gautheret D, Konings D, Gutell RR. G.U base pairing motifs in ribosomal RNA. RNA 1995; 1:807-14. 68. van Knippenberg PH, Formenoy LJ, Heus HA. Is there a special function for U.G basepairs in ribosomal RNA? Biochim Biophys Acta 1990; 1050:14-7. 69. He L, Kierzek R, SantaLucia J Jr et al. Nearest-neighbor parameters for G.U mismatches: [formula; see text] is destabilizing in the contexts [formula; see text] and [formula; see text] but stabilizing in [formula; see text]. Biochemistry 1991; 30:11124-32. 70. Wu XQ, Iyengar P, RajBhandary UL. Ribosome-initiator tRNA complex as an intermediate in translation initiation in Escherichia coli revealed by use of mutant initiator tRNAs and specialized ribosomes. EMBO J 1996; 15:4734-9. 71. Johansson MJ, Esberg A, Huang B et al. Eukaryotic wobble uridine modiications promote a functionally redundant decoding system. Mol Cell Biol 2008; 28:3301-12. 72. Yokoyama S et al. Molecular mechanism of codon recognition by tRNA species with modiied uridine in the irst position of the anticodon. Proc Natl Acad Sci USA 1985; 82:4905-9. 73. Baeyens KJ, De Bondt HL, Holbrook SR. Structure of an RNA double helix including uracil-uracil base pairs in an internal loop. Nat Struct Biol 1995; 2:56-62. 74. Holbrook SR, Cheong C, Tinoco I Jr et al. Crystal structure of an RNA double helix incorporating a track of nonWatson-Crick base pairs. Nature 1991; 353:579-81. 75. Yarus M. Translational eiciency of transfer RNA’s: uses of an extended anticodon. Science 1982; 218:646-52. 76. Dao V et al. Ribosome binding of DNA analogs of tRNA requires base modiications and supports the “extended anticodon”. Proc Natl Acad Sci USA 1994; 91:2125-9. 77. de Crecy-Lagard V, Marck C, Brochier-Armanet C et al. Comparative RNomics and modomics in Mollicutes: prediction of gene function and evolutionary implications. IUBMB Life 2007; 59:634-58.

Chapter 35

Roles of the Ultra-Conserved Ribosomal RNA Methyltransferase KsgA in Ribosome Biogenesis Jason P. Rife*

Abstract

K

sgA is a ribosomal RNA methyltransferase that modiies two adjacent adenosines in the small subunit. It was originally identiied by virtue of the fact that its absence in several bacteria provides resistance to the antibiotic kasugamycin. he ksgA gene appears to be ubiquitously spread throughout all phylogenies, thus suggesting that it was part of the last universal common ancestor. Despite common origins many present-day orthologs of KsgA perform unrelated secondary functions. Examples include KsgA orthologs in eukarotic organisms, termed Dim1, that play an essential role in the processome complex in ribosome biogenesis and mt-TFB, a nuclear encoded enzyme that functions in mitochondria as a transcription factor and a KsgA-like methyltransferase. KsgA itself plays a larger role in ribosome biogenesis in Escherichia coli beyond that of a methyltransferase; it is a critical factor in the late stages of 30S assembly. A strong evolutionary relationship is seen between KsgA and the antibiotic resistance enzyme Erm. While Erm and KsgA act on separate substrates, they show remarkable structural similarity and catalyze essentially the same reaction. Despite nearly 40 years of investigation recent reports of KsgA describe fundamental aspects of substrate binding and function and illustrate that many questions remain to be answered.

Introduction

Ribosomes across all phylogeny share numerous common attributes of design and function, but appear to diverge greatly in their biogenesis. Despite the observed diferences in eukaryotic and bacterial ribosome biogenesis all ribosomes are generally made in three general, concurrent steps. (1) Long pre-rRNA transcripts are processed to mature lengths. (2) Ribosomal proteins bind and integrate into the maturing subunits. (3) Speciic nucleotides and amino acids are chemically modiied.1,2 he preceding steps are accomplished with the aid of trans-acting factors, of which only one is common to all life. his sole universally conserved factor is an adenosine dimethyltransferase termed KsgA (or RsmA) in bacteria and Dim1 in other phylogenetic domains that was present in the last universal common ancestor.3 Nominally, this methyltransferase is responsible for converting two adjacent adenosines in small subunit ribosomal RNA into N6, N6-dimethyladenosine (Fig. 1), but depending on the organism or organelle it can have varied additional functions as well. In eukaryotes, Dim1 is an essential component of the small ribosomal subunit processome,4,5 a vast complex that forms during rRNA transcription and is critical for biogenesis of the small ribosomal *Jason P. Rife—Department of Medicinal Chemistry and the Institute for Structural Biology and Drug Discovery, Virginia Commonwealth University, Richmond, Virginia 23298, USA. Email: [email protected]

DNA and RNA Modiication Enzymes: Structure, Mechanism, Function and Evolution edited by Henri Grosjean. ©2009 Landes Bioscience.

510

DNA and RNA Modii cation Enzymes

Figure 1. Please see the figure legend on the following page.

Roles of the Ultra-Conserved Ribosomal RNA Methyltransferase KsgA in Ribosome Biogenesis

511

Figure 1, viewed on previous page. The nucleotide modification catalyzed by KsgA. A) The chemical reaction catalyzed by KsgA. The enzyme transfers a total of four methyl groups, two each to two adjacent adenosines. SAM is the methyl donor S-adenosylmethionine, while SAH is the product S-adenosylhomocysteine. The nucleotide numbers 1518 and 1519 refer to nucleotide numbering of the small subunit ribosomal RNA from E. coli. B) The secondary structure of 16S rRNA from E. coli is shown with helix 45 in expanded form. The nucleotides 1518 and 1519 are circled. The 16S rRNA secondary structure is from the Comparative RNA Web Site and Project (http://www.rna.ccbb.utexas.edu/) and is slightly modified for presentation here. Please refer to this site for an explanation of the 16S rRNA secondary structure and annotations.

subunit.1 In bacteria, KsgA does double duty as a methyltransferase and in a newly proposed role as a monitor of correct small subunit assembly5b. Surprisingly, in some eukaryotic organisms the imported mitochondrial factor mtTFB, orthologous to KsgA, is both a methyltransferase and a transcription factor.6 Finally, at some point in the past the ksgA gene was duplicated, with one copy evolving to become the critical antibiotic resistance gene termed erm.7 his review summarizes biochemical and evolutionary aspects of the KsgA/Dim1 orthologous group described since the last and only review of KsgA and the conserved dimethyladenosines, which was published in 1986.8

Biology, Chemistry, and Evolution of KsgA KsgA and Kasugamycin Resistance

Despite KsgA’s ubiquitous presence in life, its discovery was not made by scanning the genomes of the hundreds of sequenced organisms. Rather the ksgA gene was identiied and named decades ago on the basis of being the irst reported gene involved in resistance to the antibiotic kasugamycin.9 Kasugamycin is an atypical aminoglycoside antibiotic isolated from a soil bacterium from the grounds of the Kasuga Shrine in Japan.10 It has good activity against a variety of microorganisms and has found commercial use in the agricultural sector as a fungicide.10-13 Bacterial resistance to kasugamycin readily occurs with mutation of the ksgA gene.9 Biochemical and chemical analysis of ribosomes from wild-type and kasugamycin-resistant strains of Escherichia coli reported that ribosomes in the strain lack two dimethylated adenosines near the 3ʹ end of 16S rRNA foretelling the approximate binding site of kasugamycin.14,15 he determinant of kasugamycin resistance was further narrowed down to m26A1519; absence of methylation at A1518 was found to provide no resistance to kasugamycin.16 he kasugamycin binding site has been unequivocally established as the messenger RNA channel of the 30S subunit between G926 and A794, but surprisingly with no direct interactions with m26A1518 and m26A1519;17,18 this indicates that the lack of methylation of A1519 provides resistance via an indirect mechanism, which is consistent with the observation that kasugamycin-resistant ribosomes still bind kasugamycin.17 In addition to kasugamycin resistance there are other consequences for ribosomes that lack adenosine dimethylation. A recently described role for the methyl groups is that they, along with other methylated nucleotides, are critical for selectively recruiting initiator tRNA over elongator tRNAs19 (also this monograph). Also, the methylation of A1518 and A1519 might lead to the release of KsgA from the pre-30S subunit with the efect of permitting downstream biogenesis efects (see below). Finally, older evidence points to a role in translation idelity, but here the methyl groups appear to be only weak efectors.20 Although it has been known for some time that small subunit rRNA from many divergent organisms contains dimethyladenosine, only when Lafontaine et al used complementation experiments to identify a KsgA ortholog in the eukaryotic organism Saccharomyces cerevisiae did it become clear that the distribution of KsgA might also be phylogenetically widespread.4 Genome searches reveal that the enzyme is ubiquitously present in life, including archaea and bacteria with minimal genomes,21 and its presence was conirmed biochemically in the archaeon Methanocaldococcus jannaschii.22 Remarkably, conservation of this modiication pathway also extends to mitochondrial and plastid organelles.6,23-25

512

DNA and RNA Modii cation Enzymes

From multiple studies it is clear that KsgA orthologous enzymes from diverse organisms can methylate 16S rRNA from E. coli in vivo and in vitro, which strongly implies that the methyltransferase mechanism is the same for all organisms.4,6,22 herefore, the study of Dim1 and KsgA methyltransferases from diferent organisms will to some degree inform us about all members of the KsgA/Dim1 family22b. However, since disparate multiple functions exist for KsgA/Dim1 in some organisms, study of one KsgA/Dim1 family member is not suicient to describe the entire class. herefore, scrutiny of the dissimilar functions requires multiple, independent studies. Additionally, detailed study of multiple KsgA/Dim1 members provides a welcome opportunity to view how a universally conserved protein can be co-opted to perform additional biological roles within the cell.

Biochemistry and Evolutionary Divergence of KsgA

At one level, the KsgA/Dim1 orthologs are ribosome biogenesis factors in that they act to chemically modify small subunit rRNA in the course of ribosome maturation. However, in some instances secondary roles are carried out by this highly adaptable enzyme. Each characterized KsgA/Dim1 family member will be discussed in terms of its role as a methyltransferase and its other cellular functions, beginning with KsgA in bacteria. Bacterial dependence on KsgA is somewhat of a paradox. It has been known for some time that many species of bacteria, including E. coli, can survive without functional KsgA with only modest ill efects,8,16,26-28 yet it remains an evolutionary constant that even extends to bacteria with extremely small genomes.21 he view that KsgA is of only marginal importance is too simplistic, as evidenced by recent data and fundamental arguments of conservation. First, the observation that all sequenced bacterial genomes include a copy of KsgA supports the notion that retention of KsgA receives strong evolutionary pressure and that doing without it comes at too great a cost. All of the cases where KsgA has been shown to be dispensable for robust survival are laboratory strains grown under controlled conditions, which presumably do a poor job of replicating natural pressures. For example, a ΔksgA strain of Yersinia pseudotuberculosis is no longer virulent to exposed mice, demonstrating reliance on the presence of KsgA.29 Second, other ribosome biogenesis factors in E. coli are associated with KsgA function, suggesting a deep integration of KsgA function in overall ribosome biogenesis.30,31 Finally, recent growth studies and polysome proiles report that lack of KsgA leads to reduced doubling times and the accumulation of immature 30S subunits, phenomena that are readily apparent at cold temperatures5b. Adding catalytically inactive KsgA produces a dominant negative efect with more profound slow growth and altered 30S assembly phenotypes when compared to the ΔksgA strain5b. Taken together, it is clear that KsgA is an important element in cell itness that is largely centered on ribosome biogenesis. Inouye’s group recently suggested another role for KsgA in E. coli as a transcription factor involved in the acid shock response.26 While the authors did observe that KsgA is able to directly bind double stranded DNA, a complete picture of this function of KsgA is yet to emerge. Nevertheless, it is tempting to speculate that the transcription factor activity reported by Inouye and the known mitochondrial transcription factor activity of mtTFB (see below) are somehow related. Whatever the inal complement of functions exhibited by KsgA in E. coli, it is certain that expression levels of KsgA are highly regulated at both the transcriptional and translational stages of expression. Translation of KsgA was reported to be inhibited by autogenous regulation of the ksgA mRNA by KsgA protein.32 On a second level, transcriptional expression occurs via two promoter sites, both of which are tightly controlled to match overall growth rate.33,34 Tight regulation presumably protects the cell against the deleterious nature of over-expression of KsgA in E. coli during log phase growth, a time when ribosomes are being made5b.

Structure and Mechanism of KsgA

KsgA is a canonical S-adenosylmethionine (SAM) dependent methyltransferase composed of two sequential domains, where the N-terminal domain or catalytic domain is composed of a modiied Rossmann fold followed by the largely α-helical, C-terminal domain of undetermined function35 (Fig. 2). he catalytic domain has two well-formed pockets at the catalytic site, one

Roles of the Ultra-Conserved Ribosomal RNA Methyltransferase KsgA in Ribosome Biogenesis

513

Figure 2. The structure of KsgA. A) The three dimensional structure of KsgA is shown in ribbon format. The N-terminal or catalytic domain is separated from the C-terminal domain by a solid line. B) The same structure is shown in the same orientation as in (A) but represented as solid surface to highlight the deep catalytic pockets present. The SAM binding pocket and the target adenosine (A1518 and A1519) binding pocket are identified. The molecular graphic used in this figure was generated using the software package Pymol using coordinates generated from published work 36 (PDB: 1QYR).

to accept the methyl-donating SAM molecule and one to accept, at diferent times, A1518 and A1519 of 16S rRNA for methyl transfer. Signiicant interdomain motion is allowed by virtue of the two domains being connected by a lexible linker. he partially understood mechanism by which KsgA modiies 30S subunits requires the transfer of a total of four methyl groups to two adjacent adenosines within the apical loop of helix 45 of 16S rRNA. he overall complexity is enhanced by the fact that KsgA requires a mostly formed 30S subunit to be assembled before eicient catalysis can occur.36 Although full 30S subunits isolated from a KsgA deicient strain of E. coli can be used as a substrate for in vitro methylation experiments, a minimal in vitro substrate was described as 16S rRNA plus a core set of body and platform proteins (S4, S6, S8, S11, S15, S16, S17 and S18). Interestingly, the substrate must exist in the translationally inactive conformation that comes from lowering the concentration of Mg++37. Although two diferent nucleotides must gain access to the catalytic site, there is no obligate order of methylation.38 However, when the reaction temperature and SAM concentrations are simultaneously reduced only A1519 is methylated suggesting a preferred order of methylation.39 When the question of whether KsgA functions processively (multiple methylation events per ribosome binding event) or distributively (the enzyme must rebind prior to each methylation event) was addressed, a mixed answer was obtained suggesting that KsgA functions processively, but can also rebind and methylate released intermediates.22

514

DNA and RNA Modii cation Enzymes

Figure 3. An experimentally derived complex between KsgA and the E. coli 30S ribosomal subunit. A) Two views of the enzyme substrate complex. KsgA is rendered in dark gray, while the majority of the 30S subunit is in light gray. Ribosomal proteins that are required for efficient methylation by KsgA are labeled. B) A close-up and stripped down representation of (A). The portion of RNA in light gray is helix 44, while helix 45 is in dark gray. The two catalytic pockets of KsgA are annotated. The molecular graphic used in this figure was generated using the software package Pymol using coordinates derived from published work.40

It was suggested that KsgA could bind a 16S rRNA fragment containing the 3ʹ most 49 nucleotides, which includes helix 45 and the target adenosines.32 However, recent directed hydroxyl radical probing experiments clearly deined KsgA’s principal binding site as a portion of helix 44, along with regions of the 790 loop of 16S rRNA40 (Fig. 3). KsgA is expected to make a set of extensive interactions along helix 44 that are likely to involve extensive shape and charge complementarity. Although interaction with the 790 loop is believed to occur, prediction of speciic interactions is not presently possible. Poor understanding of some KsgA/30S interactions rests in part with the fact that KsgA recognizes a structure of 30S that is conformationally distinct from what has been reported in published crystal structures.37 Interestingly, none of the ribosomal proteins required for eicient methylation (listed above) are within contact distance of bound KsgA, indicating that these proteins contribute to KsgA activity indirectly, presumably by ordering the 16S rRNA into a productive conformation. he question remains, if KsgA predominately binds to helix 44, yet the target adenosines are located on helix 45, then how does methylation take place? Important insight comes from the crystal structures of 30S ribosomal subunits,41-43 which show that the apical loop of helix 45, including A1518 and A1519, are nestled into the minor groove of helix 44, somewhat near the catalytic pocket of KsgA when it is bound to the substrate. However, the two target adenosines are not close enough to enter the active site of KsgA and are in fact deeply buried in a tertiary interaction with helix 44. Reduction of the Mg++ concentration disrupts the tertiary interaction between the loop of helix 45 and helix 44.44 Critically, there is no high-resolution structure of the low Mg++ structure of the 30S subunit. herefore, it must be assumed that in the low Mg++ conformation, the loop nucleotides of helix 45, including A1518 and A1519, adopt a position more proximal to the active site of bound KsgA. A hypothetical mechanism of substrate binding, consistent with all available data, involves KsgA binding to helix 44 and the 790 loop of KsgA to await the acceptance of A1518 and A1519, in turn, into the active pocket (Fig. 4). In this manner, it is possible for KsgA to bind once to the pre-30S subunit and methylate multiple times as the two adenosines exchange access into the catalytic pocket and the product S-adenosylhomocysteine exchanges for fresh S-adenosylmethionine.

Roles of the Ultra-Conserved Ribosomal RNA Methyltransferase KsgA in Ribosome Biogenesis

515

Figure 4. Cartoon of a hypothetical model of KsgA’s functional cycle. Helix 44 and helix 45 of 16S rRNA are illustrated with the two target adenosines noted. The methyl donor, SAM and product SAH are shown. Modified nucleotides contain asterisks. The model proceeds in three post-KsgA binding stages: (1) At a certain level of pre-30S maturation helix 45 is positioned into close proximity to helix 44, thereby orienting A1518 and A1519 near the catalytic pocket of KsgA; (2) Methylation of the two adenosines proceeds processively with A1518 and A1519 exchanging positions within the catalytic pocket. While KsgA remains bound to the pre-30S particle, the product SAH is exchanged for fresh SAM; (3) After four methylation steps the product is pre-30S product is released.

his model could also explain the observation that the body and platform of 30S must be formed before catalysis can take place, because prior to this point the loop of helix 45 may not be in close enough proximity to helix 44 to allow ready access of A1518 and A1519 to the catalytic pocket of KsgA. Finally, this model provides an explanation of how KsgA can methylate two adjacent adenosines, something that otherwise would seem to require a sliding or rebinding mechanism to gain access to both adenosines. Although consistent with a wide range of data, this model has not been directly tested.

516

DNA and RNA Modii cation Enzymes

KsgA as a Ribosome Biogenesis Factor

Nearly all of the mechanistic work on KsgA has been done in vitro, which when combined with a handful of in vivo observations is beginning to clarify KsgA’s role as an E. coli ribosome biogenesis factor. Methylation of A1518 and A1519 is known to be a late event in ribosome maturation,45,46 in agreement with the in vitro observations that KsgA cannot methylate until a substantial part of the 30S has been assembled.36 By virtue of this late methylation step evidence supports a broad gate-keeping function of KsgA; the tight-binding KsgA sequesters immature 30S subunits until it has completed methylation of A1518 and A1519. his regulatory mechanism suggests that KsgA can sense the maturation level of the forming 30S subunit and, by extension, the idelity of it. In essence, pre-30S subunits that are defective in some way and cannot be methylated are prevented by bound KsgA from entering the translation cycle. Additional support for this model comes from the observation that an E. coli strain lacking ribosomal protein S20 accumulates two types of 30S-like particles.47 he irst Type 1 (the majority of particles) lacks dimethylation at A1518 and A1519 and never enters the translation cycle. he second Type 2 contains the canonical methylations at A1518 and A1519 and does enter the translation cycle. hus, KsgA can withhold catalytic function (albeit imperfectly) to prevent defective 30S subunits from engaging in translation. Based on the observation that KsgA and IF-3 share a binding site on 30S, it has been suggested that KsgA may be able to sequester pre-30S from the initiation step of translation.40 Another possibility is that KsgA may hold pre-30S subunits in a conformation refractory to downstream processing events5b. During growth in cold temperatures the ΔksgA strain of E. coli accumulates immature 30S particles that contain 17S pre-rRNA. he 17S pre-rRNA requires processing at the 5ʹ end by RNase E and RNase G and at the 3ʹ end by a yet to be identiied nuclease. Expression of a catalytically inactive KsgA mutant leads to the accumulation of the same pre-30S particle to an even greater degree. A model was proposed based on the above observation in which KsgA recognizes a nearly mature 30S particle that undergoes a conformational change upon methylation. his conformational change subsequently allows for the release of KsgA and downstream processing of 16S rRNA. Interestingly, deletion or mutation of other ribosome biogenesis factors, such as Era, RbfA and RsgA/YjeQ, likewise leads to pre-30S particles with the same or similar rRNA precursor,31,48-50 suggesting that these factors operate at approximately the same point as KsgA in the 30S biogenesis pathway. Interrelationship between KsgA and ribosome biogenesis factors has been demonstrated. Overexpression of KsgA can suppress the cold-sensitive phenotype of the E200K Era mutant.30 Further, the slow growth phenotype of the ΔyjeQ strain is enhanced when the ksgA gene is also knocked-out.31 A total of 10 methyltransferases, including KsgA, are thought to be responsible for the 10 methylated nucleotides known to exist in the 16S rRNA of E. coli.51 While KsgA acts on a mostly mature 30S subunit intermediate, other methyltransferases, such as RsmB, are capable of methylating much simpler substrates.52 At the other end of the spectrum, the rRNA methyltransferases RsmE and RsmF eiciently act on completely assembled 30S subunits in in vitro assays.51,53 herefore, despite the fact that the modiied nucleotides cluster within a small region of the 30S subunit, methylation likely occurs throughout at all stages of 30S subunit biogenesis. Such scheduling might explain why ‘methyltransferase crowding’ does not appear to be a complication during the process of ribosome biogenesis.

KsgA Orthologs Eukaryotic Dim1 and Ribosome Biogenesis

Ribosome biogenesis in eukaryotic organisms is more complex than in bacteria, requiring more steps, more trans-acting factors and transport from the nucleolus into the cytoplasm.1 Part of this increased complexity comes from the greater number of rRNA modiications and the fact that multiple modiication systems are present. he greatest number of modiied nucleotides in yeast are pseudouridines (ψ) and 2ʹ-O-methylnucleotides (2ʹ-OMe), each requiring unique H/ ACA (ψ) and box C/D (2ʹ-OMe) guide snoRNAs, helper proteins and the enzymes Cbf5p and

Roles of the Ultra-Conserved Ribosomal RNA Methyltransferase KsgA in Ribosome Biogenesis

517

Nop1p, respectively. he second class of modiication systems uses individual enzymes to chemically modify one or a small number of nucleotides without the use of guide RNAs. Typically these modiication enzymes act on nucleobases, but in limited cases 2ʹ-O methylation occurs. In at least two cases, the methyltransferase itself is more important to ribosome biogenesis than the modiications it catalyzes. Bud23 is a recently described small subunit methyltransferase in S. cerevisiae whose deletion leads to a variety of defects in cell growth and small subunit biogenesis in addition to loss of the G1575 modiication.54 However, a catalytically inactive mutant is able to restore wild-type growth levels, indicating that the methyl group itself is not overly important for growth. Similarly, the eukaryotic KsgA ortholog Dim1 is required for ribosome biogenesis, but its methyltransferase activity is also dispensable.55 In S. cerevisiae, Dim1 is required for at least two independent functions.4,5 First, it converts A1779 and A1780 (equivalent to A1518 and A1519 in E. coli) into dimethyladenosines, a function that can be eliminated without obvious consequences to yeast itness in vivo. Second, Dim1 is an integral component of the processome, a highly complex, multi-factor RNP responsible for multiple processing steps in pre-rRNA maturation (Fig. 5). Depletion of Dim1 abolishes pre-rRNA cleavage at the A1 and A2 cleavage sites, which leads to the accumulation of the dead-end 22S rRNA product and the ultimately lethal inability of cells to produce functional 40S subunits.5 he essential ribosome biogenesis factor Dim2 is believed to interact, perhaps directly, with Dim1.56 However, the nature and molecular details of this interaction remain unknown. Nevertheless, it is clear that there is a yet to be described regulatory mechanism that permits Dim1 to bind at the earliest identiied stage of ribosome biogenesis, but not carry out its enzymatic function until the penultimate step of 40S subunit maturation. he structures of two Dim1 proteins were determined and both are very similar to the structure of KsgA from E. coli (2H1R;57 1ZQ9 Structural Genomics Consortium, unpublished). Several reports showed that eukaryotic Dim1 can complement for E. coli KsgA in vivo and in vitro,4,22,58 which correlates well with the similar functions and protein structures of KsgA and Dim1. hese experiments demonstrated that detailed interactions between Dim1 and eukaryotic pre-40S subunits and KsgA and bacterial pre-30S subunits are conserved22b. On the other hand, eukaryotic Dim1 enzymes contain a conserved insert of unknown function, which is not predicted to interact directly with the ribosome22b. It is tempting to speculate that this domain is involved with the upstream function of Dim1 as a member of the processome, but this remains to be demonstrated. While eukaryotic and archaeal Dim1 proteins can complement for KsgA in E. coli, KsgA and archaeal Dim1 failed to reciprocally complement for Dim1 in S. cerevisiae supporting the notion that Dim1 performs a unique role in ribosome biogenesis (Pulicherla et al, unpublished results).

Biological Roles of Archaeal Dim1

Relative to the bacterial and eukaryotic phylogenetic domains we know very little about ribosome biogenesis in the archaeal domain of life and even less about the requirements of Dim1 in this process. In general, ribosome biogenesis in archaeal organisms is more like that found in eukaryotes than it is in bacteria.59 Adenosine dimethylation of small subunit rRNA is present in archaeal organisms,60 as is the ubiquitous presence of KsgA/Dim1 orthologs (Pulicherla et al, unpublished results). To date the only archaeal Dim1 protein to be characterized is from Methanocaldococcus jannaschii and like eukaryotic Dim1 it can complement for E. coli KsgA in vitro and in vivo.22 As expected from the complementation study, there is a strong similarity in the structures of the bacterial KsgA from E. coli, eukaryotic Dim1 proteins and the archaeal Dim1 from M. jannaschii (Pulicherla et al, unpublished results). However, nothing is known about whether or not archaeal Dim1 proteins have any function in ribosome biogenesis beyond that of a methyltransferase.

Biological Roles of mtTFB and Pfc1 in Organelles

Eukaryotic organisms contain nuclear encoded KsgA/Dim1 orthologs that are transported into mitochondria and serve as mitochondrial transcription factors and, in most cases, as methyltransferases analogous to KsgA.6,24,61-63 here is strong divergence in sequence and function among members of the mtTFB family. Metazoans usually contain two nuclear genes that encode

518

DNA and RNA Modii cation Enzymes

Figure 5. Role of Dim1 in eukaryotic ribosome biogenesis. The preribosomal intermediate 90S is the first identifiable particle in S. cerevisiae ribosomal biogenesis. It contains components that will eventually lead to the mature 40S and 60S subunits. Also contained within this particle are scores of ribosome biogenesis factors, including Dim1. Dim1 is a member of the multicomponent processome required for processing at both the A1 and A2 sites. Upon cleavage at the pre-rRNA site A2 the two subunit assembly pathways diverge. Pre-40S is shuttled into the cytoplasm where Dim1 dimethylates A1779 and A1780 and site D of the pre-rRNA is processed.

for distinct copies of mtTFB, termed mtTFB1 and mtTFB2,62,64,65 while fungi and protists usually have a single member, termed mtTFB.24,63 All three classes originally descended from the endogenous KsgA of the endosymbiont that gave rise to the present day mitochondria.63 Interestingly, many phylogenetically diverse eukaryotic organisms lack any identiiable mtTFB, with no known candidate for the mitochondrial rRNA modiication and transcription functions.63 he three mtTFB classes (mtTFB, mtTFB1 and mtTFB2) demonstrate difering abilities to function as a methyltransferase and transcription factor.24,65 mtTFB1 is a strong methyltransferase, but among the group it is the poorest transcription factor.24,63 Conversely, both mtTFB2 and mtTFB excel as transcription factors; however, mtTFB2 has relatively poor methyltransferase activity and mtTFB lack methyltransferase activity completely24,35,22b. In Drosophila melanogaster the two are not entirely redundant.66 herefore, mtTFB in fungi has evolved to support transcription at

Roles of the Ultra-Conserved Ribosomal RNA Methyltransferase KsgA in Ribosome Biogenesis

519

the expense of rRNA methylation and mtTFB2 proteins in some eukaryotes have evolved along similar lines, but to a lesser degree. Upon examination of the structure of mtTFB from S. cerevisiae,67 it is easy to understand how mtTFB has lost the ability to perform as a methyltransferase. Methyltransferases that are structurally related to mtTFB contain numerous residues in the active site that have mutated with the efect of occluding the ligand binding pockets35,22b. What features of the mtTFBs allow them to serve as transcription factors is largely unknown, but it was noted that all three classes contain two inserts in the N-terminal domain relative to KsgA and mtTFB and mtTFB2 contain a third N-terminal insert22b. One of these inserts has been shown to be important for interaction with the mitochondrial RNA polymerase.68 Interestingly, polymorphisms near the mtTFB1 gene have been shown to modify the deafness phenotype of the well-described A1555G mutation in the small subunit mitochondrial rRNA.69 he close spatial proximity of the dimethylated adenosines of helix 45 and the A1555G locus in helix 44 has led to the speculation that this is a direct efect of the level of methylation.70 he dimethyladenosines located in the small subunit rRNA in chloroplasts from Arabidopsis thaliana are formed by the KgsA ortholog paleface 1 (Pfc1), which is critical to chloroplast development for this plant when grown under chilling conditions.25 Under nonpermissive conditions for the Δpfc1 strain, accumulation of mis-processed rRNA occurs, suggesting that the efect might be related to ribosome biogenesis ( J. Tokuhisa, personal communication).

KsgA’s Relationship to ERM Methyltransferases

In 1971 the existence of both Erm and KsgA were uncovered,9,71 constituting the irst two known rRNA methyltransferases. Remarkably, these two proteins are both adenosine dimethyltransferases and are close ancestral relatives. It was recognized some time ago that the two proteins share signiicant sequence similarity.72 a relationship marked by equally close 3D similarity homology.35,67 Erm enzymes (about 30 related families are known) usually dimethylate and sometimes monomethylate the N6 position of A2058 (E. coli numbering) of the 23S rRNA contained in the 50S ribosomal subunit. Once modiied, ribosomes show increased resistance to the efects of the MLSB grouping of antibiotics that include the clinically useful drugs erythromycin (macrolide), clindamycin (lincosamide) and streptogramin B (this monograph). Direct resistance occurs because members of this grouping of drugs can no longer eiciently bind the rRNA when A2058 is methylated. Critically, erm resistance genes are frequently transmitted from one bacterial pathogen to another via resistance plasmids. herefore, erm resistance genes have made deep penetration into a wide range of human pathogens and constitute a real health concern, which has prompted numerous drug discovery eforts to either modify members of the MLSB class of drugs or to therapeutically inactivate the Erm methyltransferase. Despite obvious similarities between KsgA and Erm, they have diverged to the point where there is complete separation of substrate speciicity. As mentioned above KsgA requires that a mostly intact 30S subunit be formed before it can methylate A1518 and A1519. While Erm cannot methylate a fully formed 50S subunit, it can methylate 23S rRNA stripped of ribosomal proteins and even synthetic RNA hairpins as small as 32 nucleotides.73,74 Interestingly, erythromycin can induce the stalling of an assembly intermediate of the 50S subunit in Staphylococcus aureus that contains 23 of 36 large subunit ribosomal proteins; this assembly intermediate is a substrate for an Erm protein.75 Both KsgA and Erm bind to RNA helices, but probably do so in fundamentally diferent ways. KsgA binds to part of the helical region of helix 44 of 16S rRNA, with the interaction extending along KsgA’s long axis, resulting in alternating minor groove/major groove/minor groove interactions involving both protein domains.40 In contrast, it is believed that helix 23 of 23S rRNA binds across the long axis of Erm near the interface of the N-terminal and C-terminal domains, but without critical contributions from the C-terminal domain.76 Until the respective RNA binding interactions are understood at atomic detail, there will be no real understanding of the evolutionary path required to go from ancestral KsgA to present day Erm.

520

DNA and RNA Modii cation Enzymes

Conclusions and Future Prospects

KsgA has had a remarkable path of discovery and study over the past 40 years, yet fundamental questions remain. One of the deepest and therefore most vexing, questions is why has this protein been universally conserved. Given that the dimethyladenosines of small subunit rRNA have broad phylogenetic penetration, we can assume that the last universal common ancestor (LUCA) of all present-day life had a KsgA-like protein that performed analogous chemistry. However, despite the conservation of the tandem dimethyladenosines, neither E. coli nor S. cerevisiae sufers signiicantly when the methyl groups are absent. If LUCA were likewise ambivalent to the presence of the dimethyladenosines, then it is hard to imagine that the methylated adenosines themselves provided the overwhelming selective advantage to preserve KsgA/Dim1. In the two organisms where Dim1 and KsgA function has been studied, the eukaryote S. cerevisiae and the bacterium E. coli these enzymes are involved in broader roles in ribosome biogenesis—eukaryotic Dim1 as an essential member of the processome and the bacterial KsgA as a gate-keeper to assure biogenic idelity. Yet here there appears to be little overlap between the nonmethylation functions of KsgA and Dim1, making common ancestry from LUCA based on a nonmethyltransferase function dificult to envision. It is possible that the observed adaptability of the KsgA/Dim1 protein to evolve new functions will ultimately obscure the role that the LUCA KsgA/Dim1 played in ribosomal biogenesis. Perhaps the best understanding of a nonmethyltransferase uniied function, if any, will come when ribosome biogenesis in phylogenetically distant organisms is understood at the molecular level. In this way we can let an understanding of ribosome biogenesis inform us of the function of Dim1/KsgA, rather than the other way around. he chemical details of KsgA methylation function remain to be established. For example, site directed mutagenesis could help to understand why KsgA dimethylates the adenosine, while a structurally similar DNA methyltransferase, M.TaqI, stops ater a single addition when methylating its DNA target at the analogous position. Interestingly, similar questions were posed for the Trm1 enzyme, a tRNA N2, N2-gaunosine dimethyltransferase77,78,79. Here again, detailed mutagenic studies are probably required to understand the mechanism of dimethylation. A deeper understanding of the interactions between KsgA and 16S rRNA are likely to come from study of 16S rRNA mutants, 3D structural analysis of KsgA in complex with helix 44 from 16S rRNA and possibly crystallographic analysis of the entire KsgA/30S subunit complex.

Acknowledgements

I would like to thank Dr. Heather C. O’Farrell for providing help on some aspects of this work, Dr. Gloria M. Culver and her group for numerous helpful discussions and numerous direct contributions to the understanding of KgsA and inally the members of my research group who are the engine that keeps this work moving. he NIH (GM66900) provides research support to my laboratory for the study of KgsA.

References

1. Henras AK, Soudet J, Gerus M et al. he post-transcriptional steps of eukaryotic ribosome biogenesis. Cell Mol Life Sci 2008 Aug;65(15):2334-59. 2. Kaczanowska M, Ryden-Aulin M. Ribosome biogenesis and the translation process in Escherichia coli. Microbiol Mol Biol Rev 2007; 71:477-494. 3. Harris JK, Kelley ST, Spiegelman GB et al. he genetic core of the universal ancestor. Genome Res 2003; 13:407-412. 4. Lafontaine D, Delcour J, Glasser AL et al. he DIM1 gene responsible for the conserved m6(2) Am6(2)A dimethylation in the 3ʹ-terminal loop of 18S rRNA is essential in yeast. J Mol Biol 1994; 241:492-497. 5. Lafontaine D, Vandenhaute J, Tollervey D. he 18S rRNA dimethylase Dim1p is required for preribosomal RNA processing in yeast. Genes Dev 1995; 9:2470-2481. 5b. Connolly K, Rife JP, Culver G. Mechanistic insight into the ribosome biogenesis functions of the ancient protein KsgA. Mol Microbiol. 2008; 70(5):1062-1075. 6. Seidel-Rogol BL, McCulloch V, Shadel GS. Human mitochondrial transcription factor B1 methylates ribosomal RNA at a conserved stem-loop. Nat Genet 2003; 33:23-24.

Roles of the Ultra-Conserved Ribosomal RNA Methyltransferase KsgA in Ribosome Biogenesis

521

7. van Buul CP, van Knippenberg PH. Nucleotide sequence of the ksgA gene of Escherichia coli: Comparison of methyltransferases efecting dimethylation of adenosine in ribosomal RNA. Gene 1985; 38:65-72. 8. van Knippenberg PH. Structural and functional aspects of the N6,N6 dimethyladenosines in 16S ribosomal RNA. In: Hardesty B, Kramer G, eds. Structure, Function and Genetics of Ribosomes. New York: Springer-Verlag, 1986:412-424. 9. Sparling PF. Kasugamycin resistance: 30S ribosomal mutation with an unusual location on the Escherichia coli chromosome. Science 1970; 167:56-58. 10. Umezawa H, Hamada M, Suhara Y et al. Kasugamycin, a new antibiotic. Antimicrob Agents Chemother (Bethesda) 1965; 5:753-757. 11. Takeuchi T, Ishizuka M, Takayama H et al. Pharmacology of kasugamycin and the efect on pseudomonas infection. J Antibiot (Tokyo) 1965; 18:107-110. 12. Ishiyama T, Hara I, Matsuoka M et al. Studies on the preventive efect of kasugamycin on rice blast. J Antibiot (Tokyo) 1965; 18:115-119. 13. Hamada M, Hashimoto T, Takahashi T et al. Antimicrobial activity of kasugamycin. J Antibiot (Tokyo) 1965; 18:104-106. 14. Helser TL, Davies JE, Dahlberg JE. Change in methylation of 16S ribosomal RNA associated with mutation to kasugamycin resistance in Escherichia coli. Nat New Biol 1971; 233:12-14. 15. Helser TL, Davies JE, Dahlberg JE. Mechanism of kasugamycin resistance in Escherichia coli. Nat New Biol 1972; 235:6-9. 16. Vila-Sanjurjo A, Squires CL, Dahlberg AE. Isolation of kasugamycin resistant mutants in the 16S ribosomal RNA of Escherichia coli. J Mol Biol 1999; 293:1-8. 17. Schuwirth BS, Day JM, Hau CW et al. Structural analysis of kasugamycin inhibition of translation. Nat Struct Mol Biol 2006; 13:879-886. 18. Schluenzen F, Takemoto C, Wilson DN et al. he antibiotic kasugamycin mimics mRNA nucleotides to destabilize tRNA binding and inhibit canonical translation initiation. Nat Struct Mol Biol 2006; 13:871-878. 19. Das G, hotala DK, Kapoor S et al. Role of 16S ribosomal RNA methylations in translation initiation in Escherichia coli. EMBO J 2008; 27:840-851. 20. van Buul CP, Visser W, van Knippenberg PH. Increased translational idelity caused by the antibiotic kasugamycin and ribosomal ambiguity in mutants harbouring the ksgA gene. FEBS Lett 1984; 177:119-124. 21. de Crecy-Lagard V, Marck C, Brochier-Armanet C et al. Comparative RNomics and modomics in mollicutes: prediction of gene function and evolutionary implications. IUBMB Life 2007; 59:634-658. 22. O’Farrell HC, Pulicherla N, Desai PM et al. Recognition of a complex substrate by the KsgA/Dim1 family of enzymes has been conserved throughout evolution. RNA 2006; 12:725-733. 22b. O’Farrell HC, Xu Z, Culver GM et al. Sequence and structural evolution of the KsgA/Dim1 methyltransferase family. BMC Res Notes. 2008; 1(1):108. 23. McCulloch V, Seidel-Rogol BL, Shadel GS. A human mitochondrial transcription factor is related to RNA adenine methyltransferases and binds S-adenosylmethionine. Mol Cell Biol 2002; 22:1116-1125. 24. Cotney J, Shadel GS. Evidence for an early gene duplication event in the evolution of the mitochondrial transcription factor B family and maintenance of rRNA methyltransferase activity in human mtTFB1 and mtTFB2. J Mol Evol 2006; 63:707-717. 25. Tokuhisa JG, Vijayan P, Feldmann KA et al. Chloroplast development at low temperatures requires a homolog of DIM1, a yeast gene encoding the 18S rRNA dimethylase. Plant Cell 1998; 10:699-711. 26. Inoue K, Basu S, Inouye M. Dissection of 16S rRNA methyltransferase (KsgA) function in Escherichia coli. J Bacteriol 2007; 189:8510-8518. 27. Leveque F, Blanchin-Roland S, Fayat G et al. Design and characterization of Escherichia coli mutants devoid of Ap4N-hydrolase activity. J Mol Biol 1990; 212:319-329. 28. Igarashi K, Kishida K, Kashiwagi K et al. Relationship between methylation of adenine near the 3ʹ end of 16-S ribosomal RNA and the activity of 30-S ribosomal subunits. Eur J Biochem 1981; 113:587-593. 29. Mecsas J, Bilis I, Falkow S. Identiication of attenuated yersinia pseudotuberculosis strains and characterization of an orogastric infection in BALB/c mice on day 5 postinfection by signature-tagged mutagenesis. Infect Immun 2001; 69:2779-2787. 30. Lu Q, Inouye M. he gene for 16S rRNA methyltransferase (ksgA) functions as a multicopy suppressor for a cold-sensitive mutant of era, an essential RAS-like GTP-binding protein in Escherichia coli. J Bacteriol 1998; 180:5243-5246. 31. Campbell TL, Brown ED. Genetic interaction screens with ordered overexpression and deletion clone sets implicate the Escherichia coli GTPase YjeQ in late ribosome biogenesis. J Bacteriol 2008; 190:2537-2545.

522

DNA and RNA Modii cation Enzymes

32. van Gemen B, Twisk J, van Knippenberg PH. Autogenous regulation of the Escherichia coli ksgA gene at the level of translation. J Bacteriol 1989; 171:4002-4008. 33. Roa BB, Connolly DM, Winkler ME. Overlap between pdxA and ksgA in the complex pdxA-ksgA-apaG-apaH operon of Escherichia coli K-12. J Bacteriol 1989; 171:4767-4777. 34. Pease AJ, Roa BR, Luo W et al. Positive growth rate-dependent regulation of the pdxA, ksgA and pdxB genes of Escherichia coli K-12. J Bacteriol 2002; 184:1359-1369. 35. O’Farrell HC, Scarsdale JN, Rife JP. Crystal structure of KsgA, a universally conserved rRNA adenine dimethyltransferase in Escherichia coli. J Mol Biol 2004; 339:337-353. 36. hammana P, Held WA. Methylation of 16S RNA during ribosome assembly in vitro. Nature 1974; 251:682-686. 37. Desai PM, Rife JP. he adenosine dimethyltransferase KsgA recognizes a speciic conformational state of the 30S ribosomal subunit. Arch Biochem Biophys 2006; 449:57-63. 38. Cunningham PR, Weitzmann CJ, Nurse K et al. Site-speciic mutation of the conserved m6(2)A m6(2) A residues of E. coli 16S ribosomal RNA. efects on ribosome function and activity of the ksgA methyltransferase. Biochim Biophys Acta 1990; 1050:18-26. 39. Van Buul CP, Hamersma M, Visser W et al. Partial methylation of two adjacent adenosines in ribosomes from euglena gracilis chloroplasts suggests evolutionary loss of an intermediate stage in the methyl-transfer reaction. Nucleic Acids Res 1984; 12:9205-9208. 40. Xu Z, O’Farrell HC, Rife JP et al. A conserved rRNA methyltransferase regulates ribosome biogenesis. Nat Struct Mol Biol 2008; 15:534-536. 41. Wimberly BT, Brodersen DE, Clemons WM Jr et al. Structure of the 30S ribosomal subunit. Nature 2000; 407:327-339. 42. Schluenzen F, Tocilj A, Zarivach R et al. Structure of functionally activated small ribosomal subunit at 3.3 angstroms resolution. Cell 2000; 102:615-623. 43. Schuwirth BS, Borovinskaya MA, Hau CW et al. Structures of the bacterial ribosome at 3.5 a resolution. Science 2005; 310:827-834. 44. Politz SM, Glitz DG. Magnesium-dependent interaction of 30S ribosomal subunits with antibodies to N6, N6-dimethyladenosine. Biochemistry 1980; 19:3786-3791. 45. Lowry CV, Dahlberg JE. Structural diferences between the 16S ribosomal RNA of E. coli and its precursor. Nat New Biol 1971; 232:52-54. 46. Hayes F, Hayes D, Fellner P et al. Additional nucleotide sequences in precursor 16S ribosomal RNA from Escherichia coli. Nat New Biol 1971; 232:54-55. 47. Ryden-Aulin M, Shaoping Z, Kylsten P et al. Ribosome activity and modiication of 16S RNA are inluenced by deletion of ribosomal protein S20. Mol Microbiol 1993; 7:983-992. 48. Bylund GO, Wipemo LC, Lundberg LA et al. RimM and RbfA are essential for eicient processing of 16S rRNA in Escherichia coli. J Bacteriol 1998; 180:73-82. 49. Himeno H, Hanawa-Suetsugu K, Kimura T et al. A novel GTPase activated by the small subunit of ribosome. Nucleic Acids Res 2004; 32:5303-5309. 50. Inoue K, Alsina J, Chen J et al. Suppression of defective ribosome assembly in a rbfA deletion mutant by overexpression of era, an essential GTPase in Escherichia coli. Mol Microbiol 2003; 48:1005-1016. 51. Andersen NM, Douthwaite S. YebU is a m5C methyltransferase speciic for 16S rRNA nucleotide 1407. J Mol Biol 2006; 359:777-786. 52. Gu XR, Gustafsson C, Ku J et al. Identiication of the 16S rRNA m5C967 methyltransferase from Escherichia coli. Biochemistry 1999; 38:4053-4057. 53. Basturea GN, Deutscher MP. Substrate speciicity and properties of the Escherichia coli 16S rRNA methyltransferase, RsmE. RNA 2007; 13:1969-1976. 54. White J, Li Z, Sardana R et al. Bud23 methylates G1575 of 18S rRNA and is required for eicient nuclear export of pre-40S subunits. Mol Cell Biol 2008; 28:3151-3161. 55. Lafontaine DL, Preiss T, Tollervey D. Yeast 18S rRNA dimethylase Dim1p: A quality control mechanism in ribosome synthesis? Mol Cell Biol 1998; 18:2360-2370. 56. Vanrobays E, Gelugne JP, Caizergues-Ferrer M et al. Dim2p, a KH-domain protein required for small ribosomal subunit synthesis. RNA 2004; 10:645-656. 57. Vedadi M, Lew J, Artz J et al. Genome-scale protein expression and structural biology of plasmodium falciparum and related apicomplexan organisms. Mol Biochem Parasitol 2007; 151:100-110. 58. Housen I, Demonte D, Lafontaine D et al. Cloning and characterization of the KlDIM1 gene from kluyveromyces lactis encoding the m2(6)A dimethylase of the 18S rRNA. Yeast 1997; 13:777-781. 59. Dennis PP, Omer A. Small noncoding RNAs in archaea. Curr Opin Microbiol 2005; 8:685-694. 60. Kowalak JA, Bruenger E, Crain PF et al. Identities and phylogenetic comparisons of post-transcriptional modiications in 16S ribosomal RNA from haloferax volcanii. J Biol Chem 2000; 275:24484-24489.

Roles of the Ultra-Conserved Ribosomal RNA Methyltransferase KsgA in Ribosome Biogenesis

523

61. McCulloch V, Shadel GS. Human mitochondrial transcription factor B1 interacts with the C-terminal activation region of h-mtTFA and stimulates transcription independently of its RNA methyltransferase activity. Mol Cell Biol 2003; 23:5816-5824. 62. Matsushima Y, Adan C, Garesse R et al. Drosophila mitochondrial transcription factor B1 modulates mitochondrial translation but not transcription or DNA copy number in schneider cells. J Biol Chem 2005; 280:16815-16820. 63. Shutt TE, Gray MW. Homologs of mitochondrial transcription factor B, sparsely distributed within the eukaryotic radiation, are likely derived from the dimethyladenosine methyltransferase of the mitochondrial endosymbiont. Mol Biol Evol 2006; 23:1169-1179. 64. Falkenberg M, Gaspari M, Rantanen A et al. Mitochondrial transcription factors B1 and B2 activate transcription of human mtDNA. Nat Genet 2002; 31:289-294. 65. Rantanen A, Gaspari M, Falkenberg M et al. Characterization of the mouse genes for mitochondrial transcription factors B1 and B2. Mamm Genome 2003; 14:1-6. 66. Adan C, Matsushima Y, Hernandez-Sierra R et al. Mitochondrial transcription factor B2 is essential for metabolic function in drosophila melanogaster development. J Biol Chem 2008; 283:12333-12342. 67. Schubot FD, Chen CJ, Rose JP et al. Crystal structure of the transcription factor sc-mtTFB ofers insights into mitochondrial transcription. Protein Sci 2001; 10:1980-1988. 68. Cliten PF, Park JY, Davis BP et al. Identiication of three regions essential for interaction between a sigma-like factor and core RNA polymerase. Genes Dev 1997; 11:2897-2909. 69. Bykhovskaya Y, Mengesha E, Wang D et al. Human mitochondrial transcription factor B1 as a modiier gene for hearing loss associated with the mitochondrial A1555G mutation. Mol Genet Metab 2004; 82:27-32. 70. Shadel GS. A dual-function mitochondrial transcription factor tunes out deafness. Mol Genet Metab 2004; 82:1-3. 71. Lai CJ, Weisblum B. Altered methylation of ribosomal RNA in an erythromycin-resistant strain of staphylococcus aureus. Proc Natl Acad Sci USA 1971; 68:856-860. 72. Suvorov AN, van Gemen B, van Knippenberg PH. Increased kasugamycin sensitivity in Escherichia coli caused by the presence of an inducible erythromycin resistance (erm) gene of streptococcus pyogenes. Mol Gen Genet 1988; 215:152-155. 73. Schluckebier G, Zhong P, Stewart KD et al. he 2.2 a structure of the rRNA methyltransferase ErmC’ and its complexes with cofactor and cofactor analogs: Implications for the reaction mechanism. J Mol Biol 1999; 289:277-291. 74. Zhong P, Pratt SD, Edalji RP et al. Substrate requirements for ErmC’ methyltransferase activity. J Bacteriol 1995; 177:4327-4332. 75. Pokkunuri I, Champney WS. Characteristics of a 50S ribosomal subunit precursor particle as a substrate for ermE methyltransferase activity and erythromycin binding in Staphylococcus aureus. RNA Biol 2007; 4:147-153. 76. Maravic G, Bujnicki JM, Feder M et al. Alanine-scanning mutagenesis of the predicted rRNA-binding domain of ErmC’ redeines the substrate-binding site and suggests a model for protein-RNA interactions. Nucleic Acids Res 2003; 31:4941-4949. 77. Constantinesco F, Motorin Y, Grosjean H. Characterization and enzymatic properties of tRNA(guanine 26, N2, N2)-dimethyltransferase Trm1p from P. furiosus. J Mol Biol 1999; 291:375-392. 78. Urbonavicius J, Armengaud J, Grosjean H. Identity elements required for enzymatic formation of N2, N2-dimethylguanosine from N2-monomethylated derivative and its possible role in avoiding alternative conformations in archaeal tRNA. J Mol Biol 2006; 357:387-399. 79. Ihsanawati H, Nishimoto M, Higashijima K et al. Crystal structure of tRNA N2, N2-guanosine dimethyltransferase Trm1 from Pyrococcus. J Mol Biol 2008; 383(4):871-884.

Chapter 36

Antibiotic Resistance in Bacteria through Modiication of Nucleosides in 16S Ribosomal RNA Graeme L. Conn,* Miloje Savic and Rachel Macmaster

Abstract

M

ethylation of the 30S ribosomal subunit RNA (16S rRNA) is a signiicant mechanism of resistance to ribosome-targeting antibiotics in both producer and pathogenic bacteria. Antibiotic resistance phenotypes may arise through both loss of intrinsic methylation or by site-speciic modiication by bona ide resistance methyltransferase enzymes. In the latter group, modiications at three 16S rRNA nucleotides on the small ribosomal subunit have so far been revealed as antibiotic resistance determinants: A964 (to pactamycin) and G1405/ A1408 (to diferent classes of aminoglycosides). hese 16S rRNA resistance methyltransferases act at nucleotides in close proximity to their respective antibiotic binding site and methyl group addition thus sterically blocks antibiotic binding. Mechanisms of action for resistance through loss of intrinsic methylations are less clear but these must also serve to signiicantly modify the antibiotic binding site in some way. Currently, there is no structure solved of a 16S rRNA resistance methyltransferase but recent studies have provided some initial insights using sequence conservation and homology modelling. 16S rRNA resistance methyltransferases modify only intact 30S subunits but very little is known about the molecular details of their target recognition mechanisms. Such studies are becoming all the more necessary with the increasing identiication of 16S rRNA resistance methyltransferases on mobile genetic elements from pathogens isolated in clinical environments. A key issue will be to determine whether speciic features of recognition can be exploited to combat the rise of resistance to clinically useful 16S rRNA-binding aminoglycoside antibiotics.

Introduction

Protein synthesis in all living cells is catalyzed by the ribosome, a massive macromolecular complex that is comprised of three ribosomal RNAs (rRNAs) and over 50 proteins in bacteria. Certain regions of rRNA, particularly those associated with critical functions such as mRNA decoding and peptidyl transfer, exhibit extreme sequence and structural conservation. h e ubiquity of these speciic sites across bacterial species make the ribosome and its rRNA an excellent target for antibiotics. Accordingly, ribosome function and thus cell viability is known to be impaired by a structurally diverse array of such compounds. Most interestingly, these are matched by a range of diferent strategies for resistance to their efects in both antibiotic producing strains and pathogens. One of these is modiication (methylation) of the rRNA *Corresponding Author: Graeme L. Conn—Department of Biochemistry, Emory University School of Medicine, 1510 Clifton Road, NE, Atlanta, Georgia 30322, USA. Email: [email protected].

DNA and RNA Modiication Enzymes: Structure, Mechanism, Function and Evolution edited by Henri Grosjean. ©2009 Landes Bioscience.

Antibiotic Resistance in Bacteria through Modiication of Nucleosides in 16S Ribosomal RNA

525

binding site itself. As part of a volume on nucleic acid modiication, the focus of this and the accompanying chapter by Vester and Long will necessarily be on mechanisms of bacterial resistance to antibiotics that arise through RNA modiication and will concern modiications to 16S and 23S rRNA respectively. In this chapter we will begin with a brief discussion of the antibiotics that target 16S rRNA, the nucleoside modiications that confer resistance and the enzymes that catalyze their incorporation. We will also describe the recent recent increase in identiication of such resistance methyltransferase enzymes on mobile genetic elements in pathogenic bacteria isolated from clinical environments. Next, we describe the current state of knowledge and what remains to be learned regarding the origin, structure and function of these resistance methyltransferase enzymes. Finally, we will consider the contribution that can be made by future studies on rRNA methyltransferases to strategies for controlling the increasing threat of antibiotic resistance to clinically important antibiotics.

Antibiotics hat Target the 30S Subunits and Resistance Mechanisms

Antibiotics that target the small ribosomal subunit (30S subunit) fall into three main groups: tetracyclines, cyclic peptides (e.g., viomycin and capreomycin) and aminoglycosides; along with a small number of other compounds such as edeine, pactamycin and kasugamycin. he binding sites for the majority of these are well characterized by structural studies of antibiotic complexes with ribosome subunits or smaller domains (reviewed in ref. 1) and these have revealed many mechanistic details of their action. For example, the aminoglycoside paromomycin binds to the ribosome A-site and induces conformational changes in critical residues monitoring the codon: anticodon interaction that mimic the binding of a correct transfer RNA.2 he antibiotic thus lowers the barrier for misincorporation leading to large scale aberrant protein synthesis and ultimately cell death. A number of excellent general reviews on these topics are available elsewhere (for example see refs. 1,3). Resistance to antibiotics that target the ribosome 30S subunit can arise by a number of mechanisms that are common to most known antibiotics:4,5 1. Decreased intracellular antibiotic concentration by alteration of cell wall permeability; 2. Decreased antibiotic transport across the inner cell membrane or active elux; 3. Chemical modiication of the antibiotic molecule; 4. Antibiotic target site alteration through protein or nucleotide mutation; 5. Antibiotic target site alteration through chemical modiication. Here, we will consider only the last mechanism of resistance relevant to antibiotics that target the ribosome small subunit, i.e., methylation of antibiotic target sites within 16S rRNA. To date, this mechanism has not been observed at all for the tetracyclins and certain aminoglycosides (e.g., spectinomycin, streptomycin and hygromycin B). However, 16S rRNA methylation can confer resistance to most clinically useful aminoglycosides making this process of increasing clinical relevance. he enzymes that catalyze the RNA modiication are also of signiicant interest in other ways, for example, in understanding how they are able to recognize their unique rRNA target site with exquisite selectivity. he antibiotic producer strains of the actinomycetes usually have a speciic genomically encoded methyltransferase enzyme that confers resistance to their own antimicrobial compound(s) in addition to any other mechanisms they may exploit to avoid self-intoxication.4 Production of the antibiotic and resistance methyltransferase are oten tightly coordinated and may involve complex feedback mechanisms to regulate gene expression that are only now beginning to be explored.6 Antibiotic target site modiication may also soon contribute signiicantly to the increasing problem of resistance to clinically important antibiotics. hese issues will be covered in detail in the following sections. First, however, we will briely consider examples where loss of 16S rRNA methylation(s) gives rise to bacterial antibiotic resistant phenotypes.

Resistance to Antibiotics via Loss of Methylation of 16S rRNA

Bacterial rRNAs contain a signiicant but variable number of base and 2ʹ-O-ribose methylations that are incorporated by genomically encoded ‘house-keeping’ methyltransferases. Such

526

DNA and RNA Modii cation Enzymes

modiications tend to cluster around functional regions of the ribosome.7 While these typically improve ribosome function under most growth conditions, when challenged with antibiotics the loss of certain speciic modiications can confer low to moderate levels of resistance (Table 1). he irst identiied and best characterized example is ksgA (also named rsmA), that encodes the house-keeping methyltransferase KsgA responsible for dimethylation on N6 of A1518 and A1519 at the 3ʹ-terminal hairpin loop of 16S rRNA (see also chapter by Jason Rife in this book).8,9 he kasugamycin binding pocket is located in the mRNA-binding clet of the 30S subunit between the head and the platform (i.e., in the channel between E and P sites) and the antibiotic binds through interaction with A794 and G926.10,11 Biochemical studies showed that the identity of the mRNA residues between –2 and +1 (where –1 is the last position of the E site codon and +1 is the irst position of the P site codon) has a large efect on the extent of inhibition by kasugamycin. he antibiotic therefore acts as a selective inhibitor of a subset of mRNAs by interfering with the path of mRNA immediately upstream of the start codon within ribosome.10,11 It appears that the enzymatic action of KsgA is tightly controlled during ribosome biogenesis and that the modiications it introduces are the hallmark of properly assembled and translationally competent Table 1. Summary of 16S rRNA methyltransferases associated with resistance to antibiotics in bacteria Methylationc Resistance Enzyme MTase Geneb Familyb

Site

rsmG

G527

N7

tlyA

C1409

2’-O-ribose −

Capreomycin, H viomycin

15

ksgA

A1519

N6,N6



Kasugamycin

8,9,13,55

e

Position −

Resistance Phenotype

Origind Refs

Streptomycin

H, P

H

14

A964

N1

+

Pactamycin

D

17

kgmB

Kgm

G1405

N7

+

Kanamycin, gentamicin

D

28,30

armA

Arm

G1405

N7

+

Kanamycin, gentamicin

P

38,39

kamA

Kam

A1408

N1

+

kanamycin, neomycin, apramycin

D

33

npmA

Pam

A1408

N1

+

Kanamycin, neomycin, apramycin

P

50

?

?

+

Capreomycin, D kanamycin (?)

16

pct

cmnU

b) Synonyms or former names: ksgA (rsmA); kamA (imrA, fmrT); Representative members of the G1405 and A1408 aminoglycoside resistance methyltransferase families (Fig. 2C) are shown for which the modification site has been experimentally verified. Other members include, Kgm—fmrO, grmA, grmB, grmO, kmr, nbrB, sgm and srm;22-26 Arm—rmtA, rmtB, rmtC and rmtD;40,46 and, Kam—kamB and kamC.22,25,28,30-32 The proposed Pam ‘group’ currently contains only npmA; c) Resistance conferred by the absence (–) or presence (+) of methylation; d) Abbreviations: H, the gene encodes a house-keeping methyltransferase; D, the gene is harbored by a drug producer organism; P, the gene has been found in pathogenic bacteria; e) TlyA also methylates nucleotide C1920 in Helix 69 of 23S rRNA.15

Antibiotic Resistance in Bacteria through Modiication of Nucleosides in 16S Ribosomal RNA

527

30S subunits12 (see also chapter by Rife). Mutations that give rise to kasugamycin resistance were mapped to nucleotides A794, G926 and A1519 in 16S rRNA and those that resulted in loss of KsgA methyltransferase function.13 In the latter case, the loss of methylation at A1519 directly results in the resistance to kasugamycin antibiotic. he precise cause of resistance is not known but loss of methylation at A1519 must suiciently perturb the kasugamycin binding site to lower its ainity for the 30S subunit. Two further examples where loss of house-keeping methylations confer a resistance phenotype are currently known. he rsmG gene encodes a 16S rRNA methyltransferase that methylates N7 of G527 within the ribosome 530 loop. Streptomycin interacts with the rRNA in this region and loss of methylation correlates with a low level of resistance to the antibiotic.14 he inal example involves the 2ʹ-O-ribose methyltransferase TlyA from mycobacteria that modiies C1409 in 16S rRNA and C1920 in 23S rRNA (see also chapter by Vester and Long). Loss of these methylations results in resistance to capreomycin and viomycin, two antibiotics that bind at the ribosome subunit interface and was used to help deine their binding site.15 TlyA is found in mycobacteria such as the tuberculosis (TB) agent Mycobacterium tuberculosis but not in many other bacteria, such as Escherichia coli (E. coli), that are thus innately less susceptible to the efects of these antibiotics. Despite overlap of the capreomycin/viomycin and aminoglycoside binding sites and adjacent modiication sites that confer resistance to them, there is no cross-resistance from loss of TylA methylation at C1409. In contrast, as described in the next section, a newly identiied capreomycin resistance methyltransferase, CmnU, found in the capreomycin biosynthesis gene cluster does appear to provide some resistance to kanamycin and apramycin.16

Resistance to Antibiotics hrough Methylation of 16S rRNA

Resistance to antibiotics that target the 30S ribosomal subunit can arise from modiications at several distinct sites in 16S rRNA (Fig. 1A). he only modiications currently known to be catalyzed by bona ide resistance methyltransferases are adenine N1 (m1A) and guanosine N7 (m7G) (Fig. 1B). As speciic resistance determinants, these are distinct from the examples discussed so far where loss of methylation confers a resistance phenotype. he gene, modiication site and type and known occurrence of each 16S rRNA methyltransferase incorporated resistance modiication identiied to date are summarized in Table 1.

Pactamycin

he 30S subunit binding antibiotic pactamycin has a broad spectrum of activity, suggestive of a highly conserved binding site within 16S rRNA. In the pactamycin producing organism,

Figure 1. Methylations in 16S rRNA associated with antibiotic resistance in bacteria. A) 30S subunit with indicated modification sites in 16S rRNA that confer antibiotic resistance through methyl group addition or methyl group loss (outline font). B) The two base modifications catalyzed by authentic 16S rRNA resistance methyltransferases.

528

DNA and RNA Modii cation Enzymes

Streptomyces pactum, resistance is conferred by the methyltransferase Pct that modiies the N1 position of A964 (m1A964).17 In initial studies of the ribosome-drug interaction, pactamycin was found to protect G693 and C795 in E. coli 16S rRNA from chemical modiication.18,19 Conirmation of the location of the unique pactamycin binding site came a decade later from the crystal structure of a 30S ribosomal subunit-pactamycin complex.20 In the structure, the two distal rings of pactamycin stack upon each other and G693 at the tip of helix 23b of 16S rRNA, while the central ring interacts with C795 and C796 in helix 24a. Despite the availability of this structure, the exact mechanism of resistance is not clear. Only limited interactions are made by the drug to A964,20 but addition of the methyl group must modify the binding site suiciently to block interaction. In footprinting experiments, A964 was inaccessible for modiication with dimethylsulfate both in E. coli and Halobacterium halobium ribosomes and pactamycin resistant mutations in hairpins 23 and 25 did not change the nucleotide modiication pattern in the vicinity of A964.21 herefore, it was not possible to conclude whether A964 interacts directly with pactamycin or its methylation in S. pactum renders ribosomes resistant via an allosteric efect.

Capreomycin

A capreomycin resistance methyltransferase gene, cmnU, was recently identiied in a study of the capreomycin biosynthesis gene cluster from the producer bacterium Saccharothrix mutabilis subsp. capreolus.16 CmnU is a homolog of the Kam 16S rRNA methyltransferases that confer resistance to the aminoglycosides kanamycin and apramycin (see below). Resistance to these aminoglycosides is accomplished by methylation of the N1 position of nucleotide A1408, adjacent to the site of TlyA methylation at C1409 where loss of modiication confers resistance to capreomycin and viomycin. Expression of cmnU in E. coli and Streptomyces lividans 1326 resulted in increased resistance to both capreomycin and kanamycin.16 It is possible that the CmnU target nucleotide is also A1408 but this remains to be conirmed particularly since the level of resistance observed in E. coli was considerably lower than might be expected for an authentic A1408 aminoglycoside-resistance methyltransferase. Since capreomycin is an important second line antibiotic used against multidrug resistance TB, further detailed investigations of this resistance mechanism is urgently required.

Aminoglycosides—Kgm and Kam Family Methyltransferases

Two distinct groups of 16S rRNA aminoglycoside resistance methyltransferases have been distinguished based upon their target nucleotides (G1405 or A1408) and these enzyme families can be further divided into those found in producer and pathogenic strains (Table 1). Here, we consider methyltransferases from the two sets of bacteria separately as the evolutionary links between these enzyme families are not entirely clear (see next section). A number of terms have been used in the literature to describe these groups and additional confusion may arise from some inconsistency in their use. For example, the names Agr (aminoglycoside resistance) and Arm (aminoglycoside resistance methyltransferase) have been used to name the G1405 methyltransferase from pathogenic bacteria, while the latter was also used as a collective term for these enzymes from both producers and pathogens. As shown in Table 1 and Figure 2A, we suggest and will use a set of unique names to describe these four families of 16S rRNA methyltransferases. Together, these families constitute a unique superfamily of 16S rRNA methyltransferases from aminoglycoside producers and pathogenic bacteria, for which we will use the name Rma (for resistance methyltransferases for aminoglycosides). In the actinomycetes aminoglycoside-producing organisms resistance is conferred by two major mechanisms: enzymatic antibiotic inactivation and methylation of residues G1405 and A1408 within 16S rRNA that alter the antibiotic binding site and eiciently protect the cell.4 Several Kgm family (G1405) methyltransferase genes have been cloned22-27 and a small number of the encoded enzymes, including Kgm and more recently Sgm, have been partially characterized. he KgmB methyltransferase from Streptoalloteichus tenebrarius (formerly Streptomyces tenebrarius) was shown to modify the N7 position of G1405 and confers resistance to kanamycin and gentamicin.28 he closely related Sgm, GrmA and other Kgm family methyltransferases are likely to function in the

Antibiotic Resistance in Bacteria through Modiication of Nucleosides in 16S Ribosomal RNA

529

Figure 2. Aminoglycoside antibiotic structure and resistance. A) Groups of aminoglycoside resistance methyltransferase genes from drug producer and pathogenic strains of bacteria. B) The aminoglycoside deoxystreptamine core. Ring I is attached at the 4 position and Rings II and III (if present) at either the 5 or 6 position. C) Gentamicin bound to helix 44 of 16S rRNA. The m7G modification at 1405 is shown (black sphere) causing a steric clash that blocks antibiotic binding.

same way since 30S subunits protected by KgmB cannot be further methylated by any of these enzymes.29 A second aminoglycoside resistance modiication, adenine-N1 methylation at A1408 (m1A1408), is catalysed by the Kam methyltransferase family.22,25,28,30-32 Modiication of the N1 position at this site by KamA from Streptomyces tenjimariensis results in resistance to kanamycin and apramycin.33 he aminoglycosides are products of actinomycetes secondary metabolism that bind to the decoding region of the 30S ribosomal subunit to induce codon misreading or inhibit translocation. hese highly potent wide-spectrum bactericidals are a structurally diverse group of compounds that are mostly based upon a 2-deoxystreptamine core (Fig. 2B and Table 2). However, numerous examples of atypical aminoglycosides are also known, including streptomycin which has a core streptidine ring instead of 2-deoxystreptamine and hygromycin B which contains two additional

530

DNA and RNA Modii cation Enzymes

Table 2. Aminoglycoside antibiotics Ring Substitution

Group

Example(s)a

4,6-disubstituted deoxystreptamines

Kanamycin

kanamycin, arbekacin, amikacin, dibekacin, tobramycin

Gentamicin

gentamicin, sisomicin, isepamicin, netilmicin

4,5-disubstituted deoxystreptamines

Paromomycin neomycin, paromomycin, lividomycin A, ribostamycin

4-substituted deoxystreptamines



apramycin, neamine

4-substituted streptidine –

streptomycin

Others

hygromycin B, kasugamycin, spectinomycin



a) The majority of aminoglycosides shown are approved for specific clinical applications; those shown in bold represent examples of drugs approved in UK and US for parenteral use as a second line of defence against various serious infections.

rings fused by ether linkages. When present, the 2-deoxystreptamine core can be mono- or disubstituted with amino sugars at the 4 position only, the 4 and 5 positions, or the 4 and 6 positions (Table 2). Such diferences may be directly correlated with the phenotypes that result from the two known methylation resistance determinants at G1405 and A1408. While the m7G1405 and m1A1408 modiications result in high-level resistance to speciic combinations of aminoglycoside antibiotics,33 the methyltransferase action spectra do not overlap entirely. he structures of several aminoglycosides bound to 30S subunit or A-site model RNAs have been determined that provide detailed molecular insights into both the mechanism of antibiotic action and how modiications within their binding pockets confer resistance. In the high-resolution structures of the 4,6-disubstituted aminoglycosides gentamicin C1a34 and tobramycin35 complexes with A-site model RNAs, both make direct contacts to G1405 via their Ring III substituents. Methylation of this nucleotide would thus interfere with antibiotic binding by inducing a steric clash between the modiied base and antibiotic Ring III substituent (Fig. 2C). In contrast 4,5-disubstituted aminoglycosides, such as paromomycin and neomycin, project their substituent at position 5 at a diferent angle, directing it away from G1405, so that methylation at this site does not interfere with their binding. he m7G1405 modiication is thus only efective against 4,6-disubstituted 2-deoxystreptamines but does confer high level resistance to both the kanamycin and gentamicin groups.4,23,32 Both groups of di-substituted deoxystreptamines bind so that their Ring I substituents are placed in close proximity to A1408. he methylated nucleotide (m1A1408) is positively charged at neutral pH and can therefore afect drug binding not only by steric hindrance but also charge repulsion. he m1A1408 modiication confers resistance to the kanamycin group and apramycin, but not gentamicin. Curiously, despite the observation of two direct hydrogen bonds made by paromomycin to A1408 in the crystal structure of the antibiotic-30S complex crystal structure,2 the m1A1408 modiication confers no signiicant resistance to neomycin or paromomycin.36 Presumably for these drugs additional contacts made by other parts of the molecule must suiciently compensate for those lost near A1408.

Aminoglycoside Resistance 16S rRNA Methyltransferases in Pathogenic Bacteria

he introduction of aminoglycosides into clinical practice and the emergence of antibiotic resistant Gram-negative and Gram-positive bacteria were undoubtedly tightly coupled events.37 he last decade has seen a surge in the identiication of 16S rRNA methyltransferases that confer

Antibiotic Resistance in Bacteria through Modiication of Nucleosides in 16S Ribosomal RNA

531

high-level resistance against a broad spectrum of aminoglycosides isolated from clinical environments. he gene sequence encoding the aminoglycoside resistance methyltransferase (ArmA; GenBank AF550415) was irst reported from Citrobacter reundii isolated in a hospital in Poland and subsequently fully characterised in France in 2003 using a second isolate from a diferent organism, Klebsiella pneumoniae.38 he encoded methyltransferase was later shown to modify N7 of G1405.39 Subsequently, a further four related methyltransferase genes (rmtA, rmtB, rmtC and rmtD) were identiied from clinical isolates around the globe. With armA, we group these enzymes together as the Arm family (Table 1 and Fig. 2A) of 16S rRNA methyltransferases.40,41

Activity and Origin of the Arm Family of Methyltransferases

he ArmA methyltransferase confers resistance to all 4,6-disubstituted deoxystreptamines and fortimicin, a resistance phenotype consistent with the identiied m7G1405 modiication in 16S rRNA. he rmtA and rmtB gene products, found in clinical isolates of Pseudomonas aeruginosa and Serratia marcescens,42,43 share 82% sequence identity and their products confer high-level resistance to almost all clinically useful aminoglycosides except streptomycin. he methylation site for RmtB was shown to be N7 of G140544 and is almost certainly the same for RmtA based on their sequence similarity and overlap of resistance phenotype. Similarly, Proteus mirabilis and E. coli expressing recombinant RmtC showed high-level resistance to all 4,6-disubstituted but not 4,5-disubstituted deoxystreptamine aminoglycosides nor to streptomycin,45 a resistance proile again consistent with G1405 methylation by RmtC. he methylation site for the inal member of the Arm family, RmtD, has also not been formally identiied but the aminoglycoside resistance pattern (resistance to gentamicin but not apramycin) is again suggestive of the 16S rRNA residue G1405 being the target site for methylation. Where precisely these pathogenic methyltransferases originated is an interesting and important question. It is possible that one or more enzymes were transferred from a producer to pathogen, though direct evidence for this is limited. For example, armA has a very low G+C content (30%) suggesting that its origin is not the high G+C actinomycetes (typically 64-72%). Further, the relatively low sequence identity between some resistance methyltransferases from pathogenic bacteria and those found in aminoglycoside producing actinomycetes supports the possibility that the two groups of enzymes evolved independently (Fig. 3). Since all methyltransferases identiied to date in pathogens are associated with transferable elements, it is tempting to speculate that these resistance methyltransferases originated from yet unidentiied bacteria through recombination events mediated by transposition. In some cases, however, a closer link to enzymes from producer strains is inferred from sequence similarities. Both rmtA and rmtB have a comparable G+C content (55%) to the actinomycetes and there is an amino acid identity

Figure 3. Phylogenetic analysis of producer and pathogen 16S rRNA resistance methyltransferases. Phylogenetic trees of (A) species harboring 16S rRNA methyltransferases based upon their 16S rDNA sequences and (B) the G1405 16S rRNA methyltransferases (Kgm and Arm families) based upon their amino acid sequence. Pathogenic bacteria (Arm family) are shown in the shaded regions. Maximum likelihood phylogenetic trees were calculated with the program fastDNAml.58 The bar indicates evolutionary change per position.

532

DNA and RNA Modii cation Enzymes

of around 30% between RmtA and the 16S rRNA methyltransferases GrmB and Sgm from Micromonospora rosea and Micromonospora zionensis respectively.43 his is somewhat greater than the 26% identity with ArmA, leading to speculation that both RmtA and RmtB were transferred to P. aeruginosa and S. marcescens independently from unidentiied aminoglycoside producing species.42 RmtD, isolated from two independent sources, appears to be derived from a more recent common ancestor with RmtA and RmtB with which it has 40% and 42% identity respectively.46 In contrast, RmtC has similarly low sequence identity with both the other Arm family methyltransferases (27, 29 and 27% identity with RmtA, RmtB and ArmA, respectively) and those from the actinomycetes (