276 60 10MB
English Pages 359 Year 2022
Methods in Molecular Biology 2397
Francesca Magnani Chiara Marabelli Francesca Paradisi Editors
Enzyme Engineering Methods and Protocols
METHODS
IN
MOLECULAR BIOLOGY
Series Editor John M. Walker School of Life and Medical Sciences University of Hertfordshire Hatfield, Hertfordshire, UK
For further volumes: http://www.springer.com/series/7651
For over 35 years, biological scientists have come to rely on the research protocols and methodologies in the critically acclaimed Methods in Molecular Biology series. The series was the first to introduce the step-by-step protocols approach that has become the standard in all biomedical protocol publishing. Each protocol is provided in readily-reproducible step-bystep fashion, opening with an introductory overview, a list of the materials and reagents needed to complete the experiment, and followed by a detailed procedure that is supported with a helpful notes section offering tips and tricks of the trade as well as troubleshooting advice. These hallmark features were introduced by series editor Dr. John Walker and constitute the key ingredient in each and every volume of the Methods in Molecular Biology series. Tested and trusted, comprehensive and reliable, all protocols from the series are indexed in PubMed.
Enzyme Engineering Methods and Protocols
Edited by
Francesca Magnani Department of Biology and Biotechnology, University of Pavia, Pavia, Italy
Chiara Marabelli Department of Biology and Biotechnology, University of Pavia, Pavia, Italy
Francesca Paradisi Department of Chemistry & Biochemistry, University of Bern, Bern, Switzerland
Editors Francesca Magnani Department of Biology and Biotechnology University of Pavia Pavia, Italy
Chiara Marabelli Department of Biology and Biotechnology University of Pavia Pavia, Italy
Francesca Paradisi Department of Chemistry & Biochemistry University of Bern Bern, Switzerland
ISSN 1064-3745 ISSN 1940-6029 (electronic) Methods in Molecular Biology ISBN 978-1-0716-1825-7 ISBN 978-1-0716-1826-4 (eBook) https://doi.org/10.1007/978-1-0716-1826-4 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Humana imprint is published by the registered company Springer Science+Business Media, LLC, part of Springer Nature. The registered company address is: 1 New York Plaza, New York, NY 10004, U.S.A.
Preface Our times call for the synthesis of new materials together with the development of sustainable industrial processes, and enzymes are central to this mission. Compared to conventional catalysts, enzymes have superior chemo- and stereo-selectivity and have an enormous potential to make new (bio)chemical reactions, reduce the number of processing steps, and allow new strategies for the valorization of otherwise chemically recalcitrant substrates, such as cellulose, lignin, and plastics. Moreover, their milder reaction conditions and the absence of side products enable the development of cost-effective and environmentally friendly industrial processes. Enzymes are such versatile catalysts that they can also be designed and optimized to overcome the limits of their natural isoform (e.g., loss of activity in harsh conditions, expensive materials, and low-production yields) and/or to address the enzyme activity toward any desired unconventional reaction. In the last decade, the power of enzyme engineering has come to the fore, and technology development in the field has boomed. These achievements were celebrated in 2018 with the award of the Nobel Prize in Chemistry to Prof. Frances Arnold for “the directed evolution of enzymes,” and therefore this book is very timely. Enzyme engineering strategies are used to design de novo biocatalysts or to change an enzyme amino acid sequence in order to chisel it into its novel function. This book provides comprehensive methods and protocols about enzyme design for both the novice and the veteran researcher interested in biocatalysis. The chapters are grouped by main topic, starting from methodologies describing library preparation and screening, state-of-the-art techniques in directed evolution and rational design, to examples of immobilization of enzymes on sustainable polymers, and biocatalytic conversions mediated by homogenous enzymatic preparations or whole cells. Most chapters follow the classic format of Methods in Molecular Biology; a general introduction is followed by a materials section and then a very clear methodology which will enable any reader to follow it step-by-step through a set of sequential instructions. As is also typical of this book series, the authors also always provide helpful tips which will help troubleshooting possible hitches. The editors would like to acknowledge all the authors who have worked to this book. It is well known that in the year of the SARS-CoV2 pandemic, academics have been overwhelmed with requests to write reviews and book chapters. Their contribution to this book has been greatly appreciated. Pavia, Italy Pavia, Italy Bern, Switzerland
Francesca Magnani Chiara Marabelli Francesca Paradisi
v
Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Contributors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PART I
ENZYME LIBRARIES PREPARATION AND SCREENING
1 Preparation of Soil Metagenome Libraries and Screening for Gene-Specific Amplicons. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Luke J. Stevenson, David F. Ackerley, and Jeremy G. Owen 2 Ultrahigh-Throughput Screening of Metagenomic Libraries Using Droplet Microfluidics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Davide Agostino Cecchini, Mercedes Sa´nchez-Costa, Alejandro H. Orrego, Jesu´s Ferna´ndez-Lucas, and Aurelio Hidalgo 3 Synthetic DNA Libraries for Protein Engineering Toward Process Improvement in Drug Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michele Tavanti
PART II
3
19
33
DIRECTED EVOLUTION
4 In Silico Prediction Methods for Site-Saturation Mutagenesis . . . . . . . . . . . . . . . . Ge Qu and Zhoutong Sun 5 Recombination of Compatible Substitutions by 2GenReP and InSiReP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Haiyang Cui, Mehdi D. Davari, and Ulrich Schwaneberg
PART III
v ix
49
71
SEMI-RATIONAL AND DE NOVO DESIGN
6 Using the Evolutionary History of Proteins to Engineer Insertion-Deletion Mutants from Robust, Ancestral Templates Using Graphical Representation of Ancestral Sequence Predictions (GRASP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 Connie M. Ross, Gabriel Foley, Mikael Boden, and Elizabeth M. J. Gillam 7 Resurrecting Enzymes by Ancestral Sequence Reconstruction . . . . . . . . . . . . . . . . 111 Maria Laura Mascotti 8 Expression and In Vivo Loading of De Novo Proteins with Tetrapyrrole Cofactors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Paul Curnow and J. L. Ross Anderson
PART IV
RATIONAL DESIGN
9 Rational-Design Engineering to Improve Enzyme Thermostability. . . . . . . . . . . . 159 Vinutsada Pongsupasa, Piyanuch Anuwan, Somchart Maenpuen, and Thanyaporn Wongnate
vii
viii
10
11
12
13
Contents
Using Molecular Simulation to Guide Protein Engineering for Biocatalysis in Organic Solvents. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Haiyang Cui, Markus Vedder, Ulrich Schwaneberg, and MehdiD. Davari In Silico Engineering of Enzyme Access Tunnels . . . . . . . . . . . . . . . . . . . . . . . . . . . Alfonso Gautieri, Federica Rigoldi, Archimede Torretta, Alberto Redaelli, and Emilio Parisini Quantum-Mechanical/Molecular-Mechanical (QM/MM) Simulations for Understanding Enzyme Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . Rimsha Mehmood and Heather J. Kulik Computational Enzyme Design at Zymvol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Emanuele Monza, Victor Gil, and Maria Fatima Lucas
PART V 14
15
16
17
179
203
227 249
BIOCATALYTIC PROCESS DEVELOPMENT
Advanced Enzyme Immobilization Technologies: An Eco-friendly Support, a Polymer-Stabilizing Immobilization Strategy, and an Improved Cofactor Co-immobilization Technique . . . . . . . . . . . . . . . . . . . . . . . Ana I. Benı´tez-Mateos and Francesca Paradisi Chemical Reaction Engineering to Understand Applied Kinetics in Free Enzyme Homogeneous Reactors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alvaro Lorente-Arevalo, Alberto Garcia-Martin, Miguel Ladero, and Juan M. Bolivar Enzymatic Oxidative Cascade for Oxofunctionalization of Fatty Acids in One-Pot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Somayyeh Gandomkar and Me´lanie Hall CRISPR-Cas9 Editing of the Synthesis of Biodegradable Polyesters Polyhydroxyalkanaotes (PHA) in Pseudomonas putida KT2440. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Si Liu, Tanja Narancic, Chris Davis, and Kevin E. O’Connor
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
263
277
321
341 359
Contributors DAVID F. ACKERLEY • School of Biological Sciences, Victoria University of Wellington, Wellington, New Zealand; Centre for Biodiscovery and Maurice Wilkins Centre for Biodiscovery, Victoria University of Wellington, Wellington, New Zealand J. L. ROSS ANDERSON • School of Biochemistry, University of Bristol, University Walk, Bristol, UK; BrisSynBio Synthetic Biology Research Centre, Life Sciences Building, University of Bristol, Bristol, UK PIYANUCH ANUWAN • School of Biomolecular Science and Engineering, Vidyasirimedhi Institute of Science and Technology (VISTEC), Wangchan Valley, Rayong, Thailand ANA I. BENI´TEZ-MATEOS • Department of Chemistry, Biochemistry and Pharmaceutical Sciences, University of Bern, Bern, Switzerland MIKAEL BODEN • School of Chemistry and Molecular Biosciences, The University of Queensland, St. Lucia, QLD, Australia JUAN M. BOLIVAR • Chemical and Materials Engineering Department, Faculty of Chemical Sciences, Complutense University of Madrid, Madrid, Spain DAVIDE AGOSTINO CECCHINI • Department of Molecular Biology, Center for Molecular Biology “Severo Ochoa” (UAM-CSIC), Universidad Autonoma de Madrid, Madrid, Spain HAIYANG CUI • Institute of Biotechnology, RWTH Aachen University, Aachen, Germany; DWI-Leibniz Institute for Interactive Materials, Aachen, Germany PAUL CURNOW • School of Biochemistry, University of Bristol, University Walk, Bristol, UK; BrisSynBio Synthetic Biology Research Centre, Life Sciences Building, University of Bristol, Bristol, UK MEHDI D. DAVARI • Institute of Biotechnology, RWTH Aachen University, Aachen, Germany CHRIS DAVIS • UCD Earth Institute and School of Biomolecular and Biomedical Science, University College Dublin, Belfield, Dublin, Ireland; BiOrbic Bioeconomy Research Centre, University College Dublin, Belfield, Dublin, Ireland JESU´S FERNA´NDEZ-LUCAS • Applied Biotechnology Group, Universidad Europea de Madrid, Madrid, Spain GABRIEL FOLEY • School of Chemistry and Molecular Biosciences, The University of Queensland, St. Lucia, QLD, Australia SOMAYYEH GANDOMKAR • Institute of Chemistry, University of Graz, Graz, Austria ALBERTO GARCIA-MARTIN • Chemical and Materials Engineering Department, Faculty of Chemical Sciences, Complutense University of Madrid, Madrid, Spain ALFONSO GAUTIERI • Biomolecular Engineering Lab, Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy VICTOR GIL • Zymvol Biomodeling SL, Carrer Roc Boronat 117, Barcelona, Spain ELIZABETH M. J. GILLAM • School of Chemistry and Molecular Biosciences, The University of Queensland, St. Lucia, QLD, Australia ME´LANIE HALL • Institute of Chemistry, University of Graz, Graz, Austria; Field of Excellence BioHealth, University of Graz, Graz, Austria AURELIO HIDALGO • Department of Molecular Biology, Center for Molecular Biology “Severo Ochoa” (UAM-CSIC), Universidad Autonoma de Madrid, Madrid, Spain HEATHER J. KULIK • Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
ix
x
Contributors
MIGUEL LADERO • Chemical and Materials Engineering Department, Faculty of Chemical Sciences, Complutense University of Madrid, Madrid, Spain SI LIU • UCD Earth Institute and School of Biomolecular and Biomedical Science, University College Dublin, Belfield, Dublin, Ireland; BiOrbic Bioeconomy Research Centre, University College Dublin, Belfield, Dublin, Ireland ALVARO LORENTE-AREVALO • Chemical and Materials Engineering Department, Faculty of Chemical Sciences, Complutense University of Madrid, Madrid, Spain MARIA FATIMA LUCAS • Zymvol Biomodeling SL, Carrer Roc Boronat 117, Barcelona, Spain SOMCHART MAENPUEN • Department of Biochemistry, Faculty of Science, Burapha University, Chonburi, Thailand MARIA LAURA MASCOTTI • Molecular Enzymology Group, University of Groningen, Groningen, The Netherlands; IMIBIO-SL CONICET, Facultad de Quı´mica Bioquı´mica y Farmacia, Universidad Nacional de San Luis, San Luis, Argentina RIMSHA MEHMOOD • Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA; Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, USA EMANUELE MONZA • Zymvol Biomodeling SL, Carrer Roc Boronat 117, Barcelona, Spain TANJA NARANCIC • UCD Earth Institute and School of Biomolecular and Biomedical Science, University College Dublin, Belfield, Dublin, Ireland; BiOrbic Bioeconomy Research Centre, University College Dublin, Belfield, Dublin, Ireland KEVIN E. O’CONNOR • UCD Earth Institute and School of Biomolecular and Biomedical Science, University College Dublin, Belfield, Dublin, Ireland; BiOrbic Bioeconomy Research Centre, University College Dublin, Belfield, Dublin, Ireland ALEJANDRO H. ORREGO • Department of Molecular Biology, Center for Molecular Biology “Severo Ochoa” (UAM-CSIC), Universidad Autonoma de Madrid, Madrid, Spain JEREMY G. OWEN • School of Biological Sciences, Victoria University of Wellington, Wellington, New Zealand; Centre for Biodiscovery and Maurice Wilkins Centre for Biodiscovery, Victoria University of Wellington, Wellington, New Zealand FRANCESCA PARADISI • Department of Chemistry, Biochemistry and Pharmaceutical Sciences, University of Bern, Bern, Switzerland EMILIO PARISINI • Center for Nano Science and Technology @Polimi, Istituto Italiano di Tecnologia, Milan, Italy; Latvian Institute of Organic Synthesis, Riga, Latvia VINUTSADA PONGSUPASA • School of Biomolecular Science and Engineering, Vidyasirimedhi Institute of Science and Technology (VISTEC), Wangchan Valley, Rayong, Thailand GE QU • Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, China; National Technology Innovation Center of Synthetic Biology, Tianjin, China ALBERTO REDAELLI • Biomolecular Engineering Lab, Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy FEDERICA RIGOLDI • Biomolecular Engineering Lab, Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy CONNIE M. ROSS • School of Chemistry and Molecular Biosciences, The University of Queensland, St. Lucia, QLD, Australia MERCEDES SA´NCHEZ-COSTA • Department of Molecular Biology, Center for Molecular Biology “Severo Ochoa” (UAM-CSIC), Universidad Autonoma de Madrid, Madrid, Spain ULRICH SCHWANEBERG • Institute of Biotechnology, RWTH Aachen University, Aachen, Germany; DWI Leibniz-Institute for Interactive Materials, Aachen, Germany
Contributors
xi
LUKE J. STEVENSON • School of Biological Sciences, Victoria University of Wellington, Wellington, New Zealand; Centre for Biodiscovery and Maurice Wilkins Centre for Biodiscovery, Victoria University of Wellington, Wellington, New Zealand ZHOUTONG SUN • Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, China; National Technology Innovation Center of Synthetic Biology, Tianjin, China MICHELE TAVANTI • Early Chemical development Pharmaceutical Sciences, R&D AstraZeneca, Cambridge Biomedical Campus, Cambridge, UK; Synthetic Biochemistry, Medicinal Science and Technology, Pharma R&D GlaxoSmithKline Medicines Research Centre, Stevenage, UK ARCHIMEDE TORRETTA • Center for Nano Science and Technology @Polimi, Istituto Italiano di Tecnologia, Milan, Italy MARKUS VEDDER • Lehrstuhl fu¨r Biotechnologie, RWTH Aachen University, Aachen, Germany THANYAPORN WONGNATE • School of Biomolecular Science and Engineering, Vidyasirimedhi Institute of Science and Technology (VISTEC), Wangchan Valley, Rayong, Thailand
Part I Enzyme Libraries Preparation and Screening
Chapter 1 Preparation of Soil Metagenome Libraries and Screening for Gene-Specific Amplicons Luke J. Stevenson, David F. Ackerley, and Jeremy G. Owen Abstract Cosmid libraries constructed from environmental metagenome samples are powerful tools for capturing the genomic diversity of complex microbial communities. The large insert size (35 kb) of such libraries means they are compatible with downstream expression of large biosynthetic gene clusters (BGCs). This allows the discovery of previously undescribed natural products that would be inaccessible using traditional culturebased discovery pipelines. Here we describe methods for the construction of a cosmid metagenome library from a soil sample, and the process of screening that library for individual cosmid clones containing aromatic polyketide BGCs using degenerate primers that target the ketosynthase alpha (KSα) gene. Key words eDNA, Metagenome library, Natural product gene clusters, Polyketide synthase, Sequence-based screening, Synthetic biology
1
Introduction The microbial diversity of a soil sample is estimated to be two orders of magnitude greater than can presently be accessed using standard laboratory cultivation techniques [1, 2]. Increasingly, this “uncultivated majority” of microbes is being interrogated by genomic and metagenomic analysis for the discovery of novel enzymes and natural products [3–5]. Cosmid metagenome libraries are an efficient means of preparing environmental DNA (eDNA) for storage and interrogation, as the relatively large insert sizes and high cloning efficiency of cosmid vectors give potential for effective coverage of the soil metagenome. Libraries can be arrayed and pooled into multiple levels of complexity, which can assist in the recovery of sequence types that may be rare within the library [6, 7]. Additionally, the storage of DNA in a replicating library creates a resource for continued inspection and screening by different techniques and for a variety of purposes.
Francesca Magnani, Chiara Marabelli and Francesca Paradisi (eds.), Enzyme Engineering: Methods and Protocols, Methods in Molecular Biology, vol. 2397, https://doi.org/10.1007/978-1-0716-1826-4_1, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2022
3
4
Luke J. Stevenson et al.
Type II polyketides originate from iterative biosynthetic systems, wherein acyl/malonyl-CoA monomers are added to a growing polyketide chain, increasing the carbon chain length by 2 carbon units with each cycle [8, 9]. Three core proteins are required for the minimal polyketide biosynthesis: ketosynthase alpha (KSα), ketosynthase beta (KSβ), and an acyl carrier protein (ACP), with additional enzymes acting to decorate the polyketide core [8, 9]. Due to the small number of core genes required for compound production, Type II polyketide biosynthetic gene clusters are attractive targets for screening from metagenome libraries, which are limited in the size of genetic sequence that can be captured on a single cosmid [3, 10, 11]. In this chapter, we describe methods for extracting and preparing clean high molecular weight DNA from a soil sample, and for cloning this purified eDNA into an arrayed cosmid metagenome library. We then describe a PCR amplicon-based screening protocol for the recovery of individual cosmids containing Type II polyketide biosynthetic gene clusters, using degenerate primers that target KSα genes. This same process is, in theory, applicable to any conserved marker gene for which degenerate primers can be designed. Thus, the protocol we describe here can easily be applied to other BGC types, as well as individual enzymes of interest.
2
Materials Prepare all buffers and culture media in double distilled water.
2.1 Soil DNA Extraction and Preparation
1. Lysis buffer: 100 mM Tris–HCl pH 8.0, 100 mM Na2EDTA, 1.5 M NaCl, 1% (w/v) cetyl trimethyl ammonium bromide. 2. Sodium dodecyl sulfate: 20% (w/v) solution in water. 3. Water bath preheated to 70 C. 4. Centrifuge, ideally with rotor and bottles capable of holding 1 L samples and achieving 4500 rcf. Smaller tubes can be used in replicate; however, this will increase the handling required to process each sample. Precool to 4 C. 5. Isopropanol: 100%. 6. Ethanol: 70% (v/v) in water. 7. TE buffer: 10 mM Tris–HCl, 1 mM Na2EDTA, pH 8.0. 8. TAE buffer: 40 mM Tris, 20 mM acetic acid, 1 mM Na2EDTA, pH 8.3. 9. Agarose. 10. SYBR safe DNA gel stain (Invitrogen). 11. Electro elution unit: e.g., CBS Scientific Electro-eluter/ concentrator ECU-040-20.
Preparation of Soil Metagenome Libraries and Screening for Gene-Specific. . .
5
12. 30 kDa molecular weight cut-off column centrifugal concentrator: e.g., Amicon® Ultra-15 Centrifugal Filter Unit. 13. Method for accurate DNA quantification: e.g., Qubit (Invitrogen). 14. DNA end repair enzyme mix: e.g., NEBNext End Repair Module (NEB), End-It DNA End-Repair Kit (Lucigen). 15. Sodium acetate: 3 M in water, pH 5.2. 2.2 Cosmid Library Construction
1. Cosmid vector: e.g., pWEB, pWEB::tnc. 2. SmaI or other blunt-cutting restriction enzyme that linearizes vector at an appropriate cut site. 3. Shrimp alkaline phosphatase enzyme, e.g., rSAP (NEB). 4. T4 DNA ligase. 5. Phage packaging extracts: either produced in-house using protocol (see Note 1) or commercially prepared, e.g., Lucigen MaxPlax, Agilent SuperCos. 6. Phage dilution buffer: 10 mM Tris–HCl pH 8.3, 100 mM NaCl, 10 mM MgCl2, filter sterilized. 7. Chloroform. 8. Lysogeny broth (LB), autoclaved. 9. LB agar, autoclaved. 10. 1 M magnesium sulfate: dissolved in water, autoclaved. 11. Antibiotics appropriate for vector selection: e.g., ampicillin/ kanamycin for pWEB, ampicillin/chloramphenicol for pWEB:: tnc, filter sterilized as appropriate.
2.3 Cosmid Library Amplicon Screening
1. DNA polymerase mix suitable for colony PCR. 2. Degenerate DNA primers targeting conserved motifs within genes of interest: For KSα screening, KSα_fwd 50 -TSGCSTG CTTGGAYGCSATC -30 , KSα_rev 50 - TGGAANCCGCC GAABCCGCT-30 [12]. 3. 384-well culture plates, sterile. 4. LB, autoclaved. 5. LB agar, autoclaved. 6. Antibiotics appropriate for vector selection: e.g., ampicillin/ kanamycin for pWEB, ampicillin/chloramphenicol for pWEB:: tnc, filter sterilized as appropriate. 7. DNA agarose gel extraction kit: e.g., QIAquick (Qiagen) or Monarch (NEB).
6
3
Luke J. Stevenson et al.
Methods
3.1 Extraction of Soil Environmental DNA
1. Collect soil samples of around 1 kg. DNA yields from very sandy samples are generally lower than those from samples that are rich in organic matter. Remove contaminating material (e.g., roots, sticks, stones) prior to further processing. 2. Preheat lysis buffer and SDS solution to 70 C in a water bath. 3. Add 250 g of soil to a 1 L centrifuge bottle, followed by 270 mL of preheated lysis buffer. Mix briefly to ensure all soil is thoroughly wetted. Return to 70 C water bath for a few minutes (see Note 2). 4. Add 30 mL of 20% SDS solution to the bottle, mix briefly by inversion and return to the water bath (see Note 3). 5. Incubate for 2 h with gentle inversion every 30 min (see Note 4). 6. Rapidly cool the sample by placing the bottle into an ice/water bucket for 30 min, inverting the bottle to mix once during this incubation. A pale precipitate may appear. 7. Centrifuge the sample at 4500 rcf for 35 min at 4 C. 8. Recover the clarified supernatant from the soil/precipitate pellet. Measure the supernatant volume in a measuring cylinder and transfer to a clean 1 L centrifuge tube. Return the sample to room temperature by brief incubation in the warm water bath (see Note 5). 9. Add 0.7 vol. 100% isopropanol to the sample and mix by gentle inversion. Incubate at room temperature for 30 min. 10. Centrifuge the sample at 4500 rcf for 35 min at 4 C. 11. Check for the presence of a DNA pellet and discard the supernatant (see Note 6). 12. Add 100 mL ice cold 70% ethanol to the bottle, and wash over the bottle walls and DNA pellet. 13. Centrifuge the sample at 4500 rcf for 10 min at 4 C. 14. Discard the supernatant, briefly centrifuge to collect remaining ethanol, and remove this by pipette. 15. Briefly air-dry pellet for no more than 15 min (see Note 7). 16. Add a minimal volume of TE buffer to cover the DNA pellet (e.g., 5 mL is often sufficient). Incubate at room temperature overnight to resuspend. 17. The following morning ensure the DNA is fully resuspended by gently swirling the sample around the centrifuge bottle. Some samples may require additional TE, or brief incubation at 50 C to fully resuspend the eDNA pellet. Transfer the fully resuspended sample to a 15 mL centrifuge tube for storage at 4 C.
Preparation of Soil Metagenome Libraries and Screening for Gene-Specific. . .
7
Fig. 1 Illustrative agarose gel electrophoresis of thirteen soil eDNA extracts. Lane 1 contains the lambda-HindIII digest marker, with the top marker band at 23 kb. Lanes 2–14 contain eDNA extracts of soils 1–13, respectively. Soil 4 (lane 5) has a significant amount of high molecular weight eDNA above the 23 kb band of the lambda-HindIII marker, and is the optimal choice for metagenome library preparation. Of the remaining twelve samples, only soils 7 (lane 8) and 12 (lane 13) would be likely to yield a library of sufficient quality to be useful
18. To assess the concentration and quality of the eDNA sample, dilute 1μL into 5μL water run alongside a lambda-HindIII marker on a 0.8% agarose gel at 100 V for 2 h. Postelectrophoresis, stain the gel overnight in TAE solution amended with 1 SYBR safe prior to visualization in a UV trans-illuminator. Good quality eDNA extracts will yield a bright band above the 23 kb band of the lambda-HindIII marker, with minimal shearing below (Fig. 1). 3.2 Preparation of Environmental DNA
1. Prepare a 0.8% agarose TAE gel in a cast at least 10 cm long, and with a comb sufficiently large to accommodate 1.5 mL volume of sample (see Note 8). 2. Gently mix 1.2 mL of crude eDNA with an appropriate volume of loading dye, and load into the large well of the agarose gel. Load lambda-HindIII marker alongside, and run the gel for 2 h at 80 V. 3. Rinse any dirt out of the wells of the gel, and change the tank run buffer to fresh TAE. Run the gel overnight at 18 V. 4. Discard the run buffer, and post-electrophoresis stain the agarose gel using 1 SYBR safe DNA gel stain in TAE buffer for 30 min.
8
Luke J. Stevenson et al.
5. Visualize the agarose gel under blue light (UV will damage the DNA) and excise the high molecular weight band above the 23 kb band of the lambda-HindIII marker (see Note 9). 6. Collect the agarose gel slice containing the high molecular weight eDNA, and load into an electroelution unit in TAE buffer. Run the electroelution overnight at 100 V (see Note 10). 7. Collect the buffer from the elution chamber, and concentrate using a 30 kDa molecular weight cut-off column centrifugal concentrator to a volume of 500μL. 8. Add 15 mL TE buffer to concentrated sample to wash the DNA and replace the buffer. 9. Concentrate the sample and repeat the wash step once more, before finally concentrating to a volume of 150μL. 10. Determine the DNA concentration of the sample using a fluorometric method (e.g., Qubit) and run a 2μL sample on a 0.8% agarose gel alongside the lambda-HindIII marker to check size distribution and quality (see Note 11). 11. End repair 5μg of eDNA to achieve blunt and phosphorylated DNA using commercial kit such as NEBNext End Repair Module (NEB) or End-It DNA End-Repair Kit (Lucigen) according to the manufacturer’s instructions. 12. Recover DNA from the end-repair reaction by isopropanol precipitation, adding sodium acetate to 0.3 M and 0.7 volume isopropanol. Incubate for 30 min at room temperature, then centrifuge at 17,000 rcf for 30 min at 4 C. 13. Discard supernatant, and wash DNA pellet with 1 mL ice cold 70% ethanol, then centrifuge at 17,000 rcf for 10 min at 4 C. 14. Discard supernatant, and briefly centrifuge to collect the remaining ethanol then remove this using a pipette. Briefly air-dry the pellet at room temperature (20 min), then add 25μL TE. Resuspend the DNA at room temperature overnight. 15. Determine the DNA concentration of the sample as in step 10. 3.3 Cosmid Library Construction
1. To prepare the cosmid vector (e.g., pWEB or pWEB::tnc), digest with a single-site blunt-cutting restriction enzyme such as SmaI and dephosphorylate using shrimp alkaline phosphatase. Heat inactivate all enzymes, then purify the DNA vector by spin column or isopropanol precipitation (see Note 12). 2. Ligate 125 ng of the end repaired eDNA with 250 ng of prepared cosmid vector using Blunt T/A Ligase Master Mix (NEB) or Quick Ligation Kit (NEB) in a final volume of 10μL. Ligation reactions can be run overnight and should not be heat
Preparation of Soil Metagenome Libraries and Screening for Gene-Specific. . .
9
inactivated. Ligation reactions and all downstream stages of library construction can be scaled up in linear fashion as required. 3. Inoculate 10 mL LB amended with 10 mM MgSO4 with the desired E. coli host strain and culture overnight at 37 C, 200 rpm. 4. The next day, inoculate 10 mL fresh LB amended with 10 mM MgSO4 with 0.01 vol of the overnight culture and incubate at 37 C, 200 rpm until the OD600 reaches 0.6. Place the culture on ice. 5. Quickly thaw an aliquot of cosmid phage packaging extract. Add a minimum of 12.5μL of packaging extract to each ligation reaction and incubate at 30 C in a water bath for 90 min (see Note 13). 6. Repeat step 5, adding the same volume of packaging extract to reach reaction again, and incubate at 30 C in a water bath for an additional 90 min. 7. Add 250μL of phage dilution buffer to each packaging reaction, mixing gently, then add 7μL of chloroform to each tube. Invert the tube several times, then centrifuge briefly at 200μL, 96 sterile 15 mL centrifuge tubes can be used. Incubate at 37 C, 200 rpm for 75 min. 10. Add LB supplemented with cosmid selection antibiotics (e.g., ampicillin/kanamycin for pWEB, ampicillin/chloramphenicol for pWEB::tnc) to each library well, at a volume appropriate for the culture vessel. For each of three randomly selected library wells, plate 100μL of undiluted, and 100μL of a 1:100 dilution, onto distinct plates of LB agar containing appropriate selection antibiotics. Incubate the liquid cultures at 37 C, 200 rpm, and the agar plates at 37 C overnight. 11. In the morning count the colonies on each of the agar plates and use these values to determine the number of unique clones per library well (see Note 14). 12. Prepare glycerol stocks for each library well, as well as row and plate pools for each library plate constructed (Fig. 2). The remaining cultures should be used to isolate cosmid library pools (well, row, and plate pools) via miniprep as an additional archive (see Note 15).
10
Luke J. Stevenson et al.
Fig. 2 Metagenome library arrayed across 96 wells. After overnight culture, samples of library wells are pooled by row. Samples of row pools are then pooled to give an overall plate pool. These three levels of library complexity greatly assist in cataloging and downstream screening efforts
13. Repeat steps 1–11 to generate additional metagenome library “plates” until sufficient diversity is achieved to represent the desired coverage of the soil metagenome. Cosmid cloning efficiency should be constant with the same eDNA, vector, and phage extract preparations, therefore steps in this section can be scaled as appropriate to achieve the desired number of unique clones per library well/plate (see Note 16). 3.4 Amplicon Screening of Metagenome Library
The full workflow is summarized in Fig. 3. 1. Using the appropriate screening PCR primers, set up one reaction for each metagenome library row pool, using the isolated miniprep cosmid samples as template. For positive control reactions, use either isolated genomic DNA from a species with known Type II PKS biosynthetic gene clusters (e.g., Streptomyces coelicolor), or a plasmid/cosmid containing the target gene type (see Note 17). 2. Assess the products of each PCR screen via agarose gel electrophoresis, recording each row pool that returns a positive band at the appropriate size (around 600 bp for the KSα primer pair recommended here). 3. For each row pool sample that returned a positive result, repeat the PCR screen on each individual well miniprep that contributed to that row pool. 4. Gel extract the target bands from each positive PCR screen result. Sanger sequencing of these amplicons can confirm the presence of target KSα gene(s) and permit preliminary bioinformatic analyses to infer the novelty of each sequence as well as potential phylogenetic relationships to known biosynthetic gene clusters [13, 14].
Preparation of Soil Metagenome Libraries and Screening for Gene-Specific. . .
11
Fig. 3 Methodology for PCR amplicon screening of array metagenome library. First, row pools of each metagenome library plate are screened with preferred degenerate PCR primer pairs. Positive result pools are then rescreened at the individual well level. Library well glycerol stocks are then cultured overnight, and diluted according to the level of library complexity achieved for that library plate, and titered across a 384-well culture plate. Post culture, row and column pools are generated for the 384-well dilution plate, and PCR screened for the well(s) containing the hit cosmid. This process is repeated until a culture containing 10 unique cosmid clones is recovered, which can then be streaked on selective agar. Individual colonies are then PCR screened, to find the library isolate containing the target cosmid
5. Set up an overnight LB culture for each library well that returned a positive result. 6. The next morning, check each overnight culture contains a target clone by PCR using 1μL overnight culture as template, and following the same PCR setup as previously described. 7. From each overnight culture, subculture 1 mL into 10 mL fresh LB and incubate for 2 h at 37 C, 200 rpm to generate a fresh day culture. Measure OD600 of this day culture and from this determine the concentration of cells/mL (see Note 18).
12
Luke J. Stevenson et al.
8. Dilute the day culture and aliquot 50μL into each well of a sterile 384-well culture plate, such that each well will contain 1/100 of the diversity of the starting culture—e.g., if the original library well diversity was 10,000 unique cosmid clones, then each well of the new 384 well plate should contain 100 unique cosmid clones. In this example, the culture should be diluted to 2000 cells/mL to achieve 100 cells in each 50μL aliquot. 9. Cover plate (or tubes) and incubate at 37 C overnight, 200 rpm. 10. The next morning, construct both column pools and row pools for each 384-well plate by taking 5μL samples from each well in a column or row and pooling together. Ensure the pooled samples are well mixed. 11. Use 1μL of the row/column pools as template for PCR screening. After your PCR reactions are complete, assess the products of each PCR screen by agarose gel electrophoresis, recording each row/column pool that returns a positive band of the appropriate size. 12. Where positive results for a row and column pool intersect on the 384-well plate, use the corresponding well to seed a fresh overnight culture. In cases where multiple positive hits are found in the row/column pools, repeat the PCR screen on the intersecting wells to determine which wells are the true positive hits. 13. Repeat steps 6–11 as many times as required until a culture with a diversity of 10 unique cosmid clones is found to return a positive result. Once a positive result is found in a well of sufficiently low diversity, transfer 5μL to an agar plate and streak to enable isolation of single colonies. 14. The following morning, screen individual colonies via colony PCR to identify individual cosmid clones containing the gene of interest. 15. Culture positive result colonies overnight, archive as glycerol stocks, and isolate cosmids via miniprep. 16. Isolated cosmids can then be fully sequenced using the preferred NGS platform for further bioinformatic analysis. (see Note 19). 17. Sequencing may reveal that the isolated cosmid contains only a partial biosynthetic gene cluster. By designing new screening primers based on the recovered cosmid insert edge sequences, steps 1–15 can be repeated to recover additional cosmids from the metagenome library that contain the remainder of the target gene cluster.
Preparation of Soil Metagenome Libraries and Screening for Gene-Specific. . .
4
13
Notes 1. Phage packaging extracts can be purchased from a commercial supplier or produced in-house for significant cost saving. We use the method described by Winn and Norris [15] to produce phage packaging extracts that have similar or greater cosmid cloning efficiency than those commercially supplied—note that the two phage packaging extracts generated in the Winn and Norris protocol are stored at 80 C in separate aliquots, to be thawed and combined immediately prior to use. 2. Very dry soil can sometimes absorb a large amount of buffer—if this happens, add additional lysis buffer to ensure the soil sample is submerged. Make sure to note this additional volume when balancing tubes in later centrifuge steps. 3. If the soil sample/lysis buffer mixture and SDS solution are not heated to the 70 C water bath temperature prior to addition, SDS will precipitate out upon combination. This can sometimes be resolved by heating and gentle mixing; however, vigorous mixing should be avoided to minimize the risk of shearing DNA. 4. High molecular weight DNA is required for later steps in this protocol, therefore care must be taken throughout to avoid shearing DNA. Gentle inversion/rocking of the tubes is sufficient for mixing and avoids the shearing forces that can occur with vigorous shaking. 5. The white precipitate can be difficult to avoid in some samples if it has not have fully pelleted, or is particularly soft. In instances where a small amount of this carries over into the fresh tube, it will likely dissolve upon addition of isopropanol and should not have a major effect on downstream processing. If the difference in temperature between the clarified soil extract and isopropanol is too great, the two solvents will not mix easily, and the DNA in the extract may not precipitate. 6. The DNA pellet often appears dark brown, depending on the clarified soil extract color, and may have collected other material from the extract as it precipitated. If a DNA pellet does not appear, incubate on ice for 30 min and repeat centrifuge step. If there is still no DNA pellet after this, it is unlikely that the sample contains sufficient DNA to construct a metagenome library and should be discarded. 7. Overdried DNA can be difficult to resuspend and may not go into solution at all. The DNA pellet should still be slightly wet, but no longer smell of ethanol. 8. Larger wells can be prepared by taping two or more teeth of a comb together, or by affixing a band of plastic to the back of
14
Luke J. Stevenson et al.
the comb to create a wider well. Care should be taken when removing the comb, as low percentage agarose gels are soft/ fragile. The amount of high molecular weight eDNA that can be purified is limited by the volume of crude eDNA run into each gel, so maximizing this volume is important. 9. The cosmid cloning process is size limited to inserts >30 kb, so extraction of eDNA of lower molecular weight will inhibit the success of downstream cloning and packaging reactions. 10. We use a specialized electroelution device for recovery of high molecular weight from agarose gel slices; however, similar results can be obtained by electroelution in dialysis tubing, or other DNA gel purification techniques. With any electroelution method, it is important to check the dialysis membrane setup for leaks at any connection/opening points prior to adding the agarose gel slice or running the electrical current. Any leaks will result in loss of DNA, so care should be taken with equipment setup. Likewise, care should be taken if using other gel extraction protocols, as some techniques (including spin columns) can cause shearing of high molecular weight DNA, rendering it unsuitable for downstream cosmid cloning. 11. We recommend the use of fluorometric DNA quantification such as a Qubit (Invitrogen). An absorbance-based method such as Nanodrop can also be used; however in our experience, these produce less accurate results. 12. We routinely prepare large batches of cosmid vector by digesting 40μg pWEB::tnc with 100 U SmaI in 500μL volume at 25 C overnight, followed by a further 60 U of SmaI in the morning for 2 h. To this batch we then add 8 U of rSAP, and incubate at 37 C for 2 h, followed by heat inactivation at 65 C for 20 min. Over-digestion is recommended to avoid empty vector background in libraries—the success of this can be determined prior to library construction by transforming competent E. coli with a sample of the prepared vector and plating on LB agar with appropriate antibiotic selection. Colonies that emerge are the result of uncut vector persisting in the preparation, and may indicate that further digestion is required. 13. Packaging extracts are best used fresh after thawing, so thaw only as much as required. The amount of packaging extract added per ligation reaction can be increased to improve the packaging efficiency giving greater numbers of resulting cosmid clones per reaction; however, this will significantly increase the cost of the library construction procedure (when using commercial packaging extracts). Note if using packaging extracts made in-house, the two phage packaging extracts generated using the Winn and Norris protocol are stored at 80 C in separate aliquots, to be thawed and combined immediately prior to use.
Preparation of Soil Metagenome Libraries and Screening for Gene-Specific. . .
15
14. Use the number of colonies on each agar plate to calculate the number of unique cosmid clones per volume of culture added to each agar plate, and therefore the number of unique clones per well. From these values, a representative measure of cosmid diversity across all library wells/library plate can be calculated. These values are important for any subsequent analysis of the metagenome libraries to ensure sufficient screening and/or sequencing depth is achieved. These values also determine the number of dilution levels/screening rounds for amplicon screening in this chapter (Subheading 3.4). 15. Isolating the cosmid pools from each individual library well is a labor-intensive exercise, but is very useful both as a backup form of metagenome library storage and as a resource for screening and/or sequencing of library pools. Effective PCR amplicon-based screening (Subheading 3.4) can be difficult in very diverse cultures, hence diluted miniprep templates of individual library wells are a preferable PCR template for initial screening. 16. Generally, we have found that a reasonable coverage of a soil metagenome can be achieved with cosmid metagenome libraries >10 million unique cosmid clones. However, different environmental samples vary greatly in microbial diversity. We recommend storing additional eDNA extracts for any important samples, to permit further cosmid library generation at a later date, should the metagenome library already generated be found to have insufficient coverage. This is particularly important if overlapping cosmids are required to complete a large biosynthetic gene cluster, as metagenome libraries with insufficient coverage may not contain redundant representation of each genome sequence. 17. To achieve consistent results, we recommend setting up all PCRs in a 96-well ice block, from a master stock containing all reaction components except template. Always include both positive controls and no template negative control reactions. PCRs should be placed directly into a preheated thermocycler (95 C) to initiate the PCR protocol. 18. Correlating OD600 absorbance values to viable cell concentration is only accurate with fresh cultures at similar growth points—it is important to be consistent in the subculture volume, and culture length of the day culture. An OD600 of 1.0 in a fresh culture is roughly 1.0 108 E. coli cells/mL; however, this should be determined empirically for each laboratory environment by making serial dilutions of fresh cultures in triplicate, and plating 100μL volumes onto agar plates to count colonies that emerge, and correlate these counts to the viable cell numbers in culture and their absorbance value. For
16
Luke J. Stevenson et al.
example, if an average of 200 colonies are found after plating 100μL of a 1/100,000 culture dilution for OD600 ¼ 0.800, then for subsequent cultures you could assume an OD600 of 1.0 is 2.50 108 E. coli cells/mL. 19. Pools of at least 50 cosmid clones can be sequenced in a single library; however, assembly artifacts may occur if highly similar gene clusters are sequenced in the same pool. We have found that PE150 bp Illumina sequencing to a depth of 50-fold coverage using TruSeq PCR libraries works well for this process. Assembly of reads can be achieved using a de-bruijn graph-based assembler (e.g., SPAdes) [16]. Prior to assembly, sequencing reads should be quality filtered and adapter trimmed. Reads mapping to the cosmid vector sequence (e.g., pWEB, pWEB::tnc) should also be removed.
Acknowledgments This work was supported by the Health Research Council of New Zealand (contract 16/172 to D.F.A. and J.G.O.) and the New Zealand Ministry of Business, Innovation and Employment (Smart Ideas contract RTVU1908 to J.G.O.). J.G.O. was additionally supported by a Rutherford Discovery Fellowship awarded by the Royal Society of New Zealand, and L.J.S. by a Maurice Wilkins Centre Doctoral Scholarship. We thank Drs Magnani, Marabelli, and Paradisi for all their hard work editing this volume and their kind invitation to contribute a chapter. References 1. Torsvik V, Daae FL, Sandaa RA, Øvrea˚s L (1998) Novel techniques for analysing microbial diversity in natural and perturbed environments. J Biotechnol 64:53–62 2. Rappe´ MS, Giovannoni SJ (2003) The uncultured microbial majority. Annu Rev Microbiol 57:369–394 3. Katz M, Hover BM, Brady SF (2016) Cultureindependent discovery of natural products from soil metagenomes. J Ind Microbiol Biotechnol 43:129–141 4. Hover BM, Kim SH, Katz M, Charlop-PowersZ, Owen JG, Ternei MA, Maniko J, Estrela AB, Molina H, Park S, Perlin DS, Brady SF (2018) Culture-independent discovery of the malacidins as calcium-dependent antibiotics with activity against multidrug-resistant gram-positive pathogens. Nat Microbiol 3:415–422 5. Stevenson LJ, Owen JG, Ackerley DF (2019) Metagenome driven discovery of nonribosomal peptides. ACS Chem Biol 14:2115–2126
6. Owen JG, Reddy BVB, Ternei MA, CharlopPowers Z, Calle PY, Kim JH, Brady SF (2013) Mapping gene clusters within arrayed metagenomic libraries to expand the structural diversity of biomedically relevant natural products. Proc Natl Acad Sci U S A 110:11797–11802 7. Owen JG, Charlop-Powers Z, Smith AG, Ternei MA, Calle PY, Reddy BVB, Montiel D, Brady SF (2015) Multiplexed metagenome mining using short DNA sequence tags facilitates targeted discovery of epoxyketone proteasome inhibitors. Proc Natl Acad Sci 112:4221–4226 8. Hertweck C, Luzhetskyy A, Rebets Y, Bechthold A (2007) Type II polyketide synthases: gaining a deeper insight into enzymatic teamwork. Nat Prod Rep 24:162–190 9. Hertweck C (2009) The biosynthetic logic of polyketide diversity. Angew Chem Int Ed 48:4688–4716
Preparation of Soil Metagenome Libraries and Screening for Gene-Specific. . . 10. King RW, Bauer JD, Brady SF (2009) An environmental DNA-derived type II polyketide biosynthetic pathway encodes the biosynthesis of the pentacyclic polyketide erdacin. Angew Chem Int Ed 48:6257–6261 11. Feng Z, Kallifidas D, Brady SF (2011) Functional analysis of environmental DNA-derived type II polyketide synthases reveals structurally diverse secondary metabolites. Proc Natl Acad Sci U S A 108:12629–12634 12. Mets€a-Ketel€a M, Salo V, Halo L, Hautala A, Hakala J, M€ants€al€a P, Ylihonko K (1999) An efficient approach for screening minimal PKS genes from Streptomyces. FEMS Microbiol Lett 180:1–6 13. Kang HS, Brady SF (2014) Mining soil metagenomes to better understand the evolution of natural product structural diversity:
17
Pentangular polyphenols as a case study. J Am Chem Soc 136:18111–18119 14. Kang HS (2017) Phylogeny-guided (meta)genome mining approach for the targeted discovery of new microbial natural products. J Ind Microbiol Biotechnol 44:285–293 15. Winn R, Norris M (2010) Analysis of mutations in λ transgenic medaka using the cII mutation assay. In: Techniques in aquatic toxicology, vol 2. CRC Press, Boca Raton, Florida 16. Nurk S, Bankevich A, Antipov D, Gurevich AA, Korobeynikov A, Lapidus A, Prjibelski AD, Pyshkin A, Sirotkin A, Sirotkin Y, Stepanauskas R, Clingenpeel SR, Woyke T, Mclean JS, Lasken R, Tesler G, Alekseyev MA, Pevzner PA (2013) Assembling single-cell genomes and mini-metagenomes from chimeric MDA products. J Comput Biol 20:714–737
Chapter 2 Ultrahigh-Throughput Screening of Metagenomic Libraries Using Droplet Microfluidics Davide Agostino Cecchini, Mercedes Sa´nchez-Costa, Alejandro H. Orrego, Jesu´s Ferna´ndez-Lucas, and Aurelio Hidalgo Abstract Droplet microfluidics enables the ultrahigh-throughput screening of the natural or man-made genetic diversity for industrial enzymes, with reduced reagent consumption and lower costs than conventional robotic alternatives. Here we describe an example of metagenomic screening for nucleoside 20 -deoxyribosyl transferases using FACS as a more widespread and accessible alternative than microfluidic on-chip sorters. This protocol can be easily adapted to directed evolution libraries by replacing the library construction steps and to other enzyme activities, e.g., oxidases, by replacing the proposed coupled assay. Key words Functional metagenomics, Microfluidics, Nucleoside 20 -deoxyribosyl transferases, Enzyme screening
1
Introduction Enzymes are efficient and exquisite catalysts that perform the manufacture and modification of molecules in nature, perfected throughout billions of years of natural evolution. In contrast to the relatively “harsher” conditions for “classic” chemistry and catalysis, enzymes operate with exquisite selectivity, unparalleled rate acceleration and under mild reaction conditions. Therefore, the application of enzymes to catalyze industrial reactions, known as “biocatalysis,” has the potential to make chemical processes less contaminant and more sustainable, in compliance with the concept and principles of Green Chemistry [1] and has been exploited commercially in consumer products and manufacturing processes for over a century [2]. Their application to develop “greener” industrial processes is increasingly important to achieve the UN
Davide Agostino Cecchini, Mercedes Sa´nchez-Costa and Alejandro H. Orrego contributed equally to this work. Francesca Magnani, Chiara Marabelli and Francesca Paradisi (eds.), Enzyme Engineering: Methods and Protocols, Methods in Molecular Biology, vol. 2397, https://doi.org/10.1007/978-1-0716-1826-4_2, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2022
19
20
Davide Agostino Cecchini et al.
Sustainable Development Goals, strengthening and circularizing the developing bioeconomy. However, despite their advantages, enzymes find two major hurdles towards their industrial application, namely the low success rates of discovery and engineering campaigns together with tedious and expensive methods to explore both the natural and man-made genetic diversity. The natural diversity represents an unfathomable source of enzymes for industrial application, most of which are of microbial origin. However, depending on the environment sampled, up to 99.9% of the microbial diversity present therein cannot be cultured in the laboratory and remained unaccessible through classic cultivation and extraction techniques. In the past decades, advances in cultivation-independent methodologies have enabled the study of the complete microbial diversity and mining microbes for their coveted enzymes. Metagenomics allows studying the genetic material (metagenome) recovered directly from environmental samples, including DNA from unculturable organisms. Although shotgun metagenomics consisting in NGS of the recovered DNA can provide insight into function through elaborate bioinformatic searches, functional metagenomics involves the activity-based screening of massive libraries of environmental DNA (eDNA), thus allowing access to truly novel catalysts, including promiscuous enzymes, selected only through their function and independently of sequence [3]. Given the large sequence space available in the natural diversity and the rather variable odds that a gene will be successfully transcribed, translated and folded in a recombinant host in sufficient amount to be detected, increasing the throughput of the screening methods is paramount for a successful discovery campaign. Functional metagenomics is only the first step in enzyme discovery and development. With the notable exceptions of enzymes from extremophiles, in their natural context enzymes generally operate in aqueous media, under low substrate concentrations and mild temperature, whereas in industry high substrate loads, cosolvents and higher temperatures are preferred. Hence, protein engineering, e.g., by directed evolution is often required to improve the discovered enzymes into an industrially relevant biocatalyst. Similarly to functional metagenomics, the directed evolution of enzymes involves creating large libraries of genetic diversity that need to be assayed for the desired function, although in this instance the genetic diversity is created artificially. Once again, increasing the throughput of the screening methods used is a key factor towards a successful enzyme engineering campaign. Furthermore, a major bottleneck of industrial biocatalysis is to discover and optimize a given enzyme’s properties suited to a particular industrial bio-process or product formulation in a timeframe compatible with industrial needs [4]. These short development times, together with the diverse activities and properties of
Ultrahigh-Throughput Screening of Metagenomic Libraries Using Droplet. . .
21
potential industrial enzymes necessarily require screening methods that are fast, versatile, and affordable to find candidate enzymes in metagenomic libraries of natural genetic diversity or directed evolution libraries of man-made genetic diversity. The manipulation of fluids at the micrometric scale, i.e., microfluidics, allows the creation and use of picoliter water-in-oil (w/o) droplets as miniature compartments for enzyme assays. Dropletbased microfluidics has represented a disruptive advancement that enables the encapsulation of library individuals inside surfactantstabilized, monodisperse, picoliter w/o droplets or water-in-oil-inwater (w/o/w) at single occupancy for their assay with, e.g., chromogenic or fluorogenic substrates and subsequent recovery of the encoding DNA of selected hits. The increase in throughput and miniaturization results in a concomitant reduction of screening time (107/h), reagent consumption (50 μL per library) and costs (10€/library) [5]. Besides complying with the speed and cost requirements mentioned above, droplet microfluidics also provides versatility to accommodate screening for diverse activities and properties. Most operations in the “macroscopic lab,” e.g., heating, cooling, mixing contents, diluting, adding reagents have a “microfluidic equivalent” achieved through bespoke chip design and instrumentation, e.g., picoinjection, gradient preparation, mixing, and splitting, thus enabling modular and versatile microfluidic workflows composed of successive operations [6]. The current toolbox for functional droplet-based assays is dominated by fluorescence readouts followed by absorbance and occasional alternative readouts, such as imaging, among others [7]. This limits enzyme screening to those activities with available fluorogenic or chromogenic substrates compatible with w/o droplets, typically hydrolytic activities with increasing examples of other activities, such as aldolases [8], diverse oxidoreductases [9, 10], and polymerases [11]. In this chapter, we present an example of an ultrahighthroughput screening of 2-deoxynucleotidyl transferases (NDTs) in metagenomes as a coupled transferase/oxidase/peroxidase assay. This workflow can be customized by replacing the metagenomic library construction steps with a gene diversification step, e.g., error-prone PCR and the proposed reaction mix with other standard fluorogenic assay compatible with droplets and FACS. Although fluorescence, absorbance, and image-based on-chip sorting equipment is readily available in some laboratories, we have opted to describe a FACS-based sorting of water-in-oil-in-water emulsions, as FACS is more widespread in research institutions and thus, a more accessible option.
22
2
Davide Agostino Cecchini et al.
Materials
2.1 Biological Agents, Chemical Materials, and Labware
1. Luria Bertani lysogeny broth (LB): 10 g/L tryptone, 5 g/L yeast extract, 10 g/L NaCl. Adjust the pH to 7.0 using 2 M NaOH. 2. SOC broth: 0.5% Yeast Extract, 2% Tryptone, 10 mM NaCl, 2.5 mM KCl, 10 mM MgCl2, 10 mM MgSO4, 20 mM Glucose. 3. LB agar plates: LB media, 15 g/L Agar, 100 μg/mL ampicillin, 1 mM Isopropyl-β-D-1-thiogalactopyranoside (IPTG), 40 μg/ mL 5-bromo-4-chloro-3-indolyl-β-D-galactopyranoside (X-Gal). 4. UltraPure LMP Agarose (Life Technologies) gels, stained with SYBR Safe (Thermo Scientific). 5. XL10-Gold® Ultracompetent Cells (Stratagene). 6. pBluescript II SK(+) vector (Stratagene). 7. Enzymes: (a) Sau3A. (b) FastDigest SmaI. (c) FastAP Alkaline Phosphatase. (d) T4 DNA ligase. (e) Xanthine Oxidase from bovine milk. (f) Peroxidase from Horseradish. (g) Lysozyme from hen egg white. 8. miniTUBE Blue 3.0 kb (Covaris). 9. Gel and PCR purification kit: Wizard SV Gel and PCR CleanUp System (Promega). 10. Blunting and phosphorylation DNA ends kit: Fast DNA End Repair Kit (Thermo Scientific). 11. DNA Clean and Concentrator 5 Kit (Zymo Research). 12. Polydimethylsiloxane (PDMS) elastomer kit: SYLGARD™ 184 Silicone Elastomer Kit (Dow Corning). 13. SU-8 master mold (Tekniker Foundation). 14. Master mold silanizing agent: Trichloro(1H,1H,2H,2H-perfluorooctyl)silane. 15. Fluorophilic treatment solution: 1% (v/v) of Trichloro (1H,1H,2H,2H-perfluorooctyl)silane in HFE-7500 3 M fluorinated oil. 16. Hydrophilic treatment solutions:
Ultrahigh-Throughput Screening of Metagenomic Libraries Using Droplet. . .
23
(a) Poly(diallyldimethylammonium chloride) (PDADMAC) solution: dilute 20% PDADMAC solution (average Mw 200,000–300,000) to 0.2% in 0.5 M NaCl. (b) Poly(sodium 4-styrenesulfonate) (PSS) solution: 0.2% (w/v) PSS (average Mw ~70,000, powder) in water. (c) MilliQ water. 17. Emulsion breaking compound: 1H,1H,2H,2H-Perfluoro-1octanol. 18. Fluorinated surfactant: neat FluoSurf (Emulseo). 19. Fluorinated oils: HFE-7500 3 M (Novec 7500; Fluorochem), Fluorinert FC-40. 20. PBS-Tween solution: 1% (v/v) Tween® 80 in PBS buffer. 21. Percoll (Sigma-Aldrich). 22. Substrate solution (2): 2 mM 20 -deoxyinosine, 2 mM thymine, 0.4 U/mL xanthine oxidase, 0.4 U/mL Horseradish peroxidase, 0.4% (w/v) Pluronic 127 (Sigma-Aldrich), 200 μM Amplex UltraRed (Invitrogen), and 4 mg/mL of lysozyme dissolved in 100 mM Bis-Tris buffer pH 6. 23. 25 mm Syringe Filter 0.2 μm polyethersulfone membrane. 24. Fine Bore Polyethylene Tubing (REF 800/100/120; Smiths Medicals International Ltd.). 25. Millex®-GS 0.22 μm (Sterile filter unit with MF-Millipore MCE membrane; Merck Millipore Ltd.). 26. Biopsy punch with an external diameter of 1.0 mm. 27. 1 mL plastic syringe. 28. 1 mL gastight syringe. 29. 100 μL gastight syringe. 30. 1.5 mL Eppendorf tubes. 2.2
Equipment
1. E220 Focused-ultrasonicator (Covaris). 2. FACSVantage (BD Bioscience). 3. Microfluidic platform components: (a) XDS-1R microscope (Optika Srl). (b) UI-3360CP Rev.2 (iDS Imaging). (c) Low Pressure Syringe (Cetoni GmbH).
Pump
neMESYS
4. LabView 2015 Software (National Instruments). 5. FlowJo v10 Software (BD Biosciences). 6. 600 Plasma Etcher (Tepla).
290N
24
3
Davide Agostino Cecchini et al.
Methods
3.1 Microfluidic Device Fabrication
For the preparation of PDMS microfluidic chips, a SU-8 master mold with the desired micro-channel pattern design is needed (Fig. 1). The fabrication of SU-8 master requires photolithography and can be ordered from companies that provide photolithography services (see Note 1). Before use, the SU-8 master mold must be treated with trichloro(1H,1H,2H,2H-perfluorooctyl)silane (see Note 2). 1. Weigh 30 g of PDMS elastomer base and curing agent in a mass ratio of 10:1 (w:w) in a plastic cup and mix them. This quantity is indicative for 3-in. masks. 2. Place the cup containing the mixture in a vacuum desiccator for about 30 min or until all air bubbles are removed. 3. Clean the surface of the SU-8 master wafer with compressed air and transfer it into a Petri dish. Pour the PDMS mixture onto the SU-8 master mold to a thickness of ca. 5 mm and place the
Fig. 1 Design of the microfluidic device for single emulsion production. (a) The device consists of one inlet for the continuous phase of 2% (w/w) surfactant in FC40 fluorinated oil, two inlets for the cells and reaction mixture respectively and one collection outlet. (b) Picture showing the formation of w/o droplets at the flowfocusing junction
Ultrahigh-Throughput Screening of Metagenomic Libraries Using Droplet. . .
25
Petri dish in a vacuum desiccator for about 1 h or until all air bubbles are removed (see Note 3). 4. Cure the PDMS in an oven at 65 C for at least 3 h (see Note 4). 5. Cut the PDMS slab with a scalpel (see Note 5), peel the PDMS from the SU-8 master mold and punch holes at each inlets and outlets using a biopsy punch with a width of 1 mm. Clean the inlets and outlets with compressed air and both sides of the PDMS slab with frosted Scotch tape. 6. Plasma treat the PDMS slab and the glass slide, with a thickness of 1 mm, together with the channel side facing upward using the oxygen plasma cleaner and the following conditions: O2 flow at 150 mL/min, power at 200 W for 30 s with the frequency of the magnetic field at 13.56 MHz. 7. Bond immediately the plasma treated face of the PDMS chip to the glass slide and place the device onto a heating plate at 95 C for 5 min. 3.2 Functionalization of the PDMS Chips 3.2.1 Fluorophilic Treatment
1. Fill the syringe with a freshly prepared solution of 1% (v/v) trichloro(1H,1H,2H,2H-perfluorooctyl)silane in fluorocarbon oil HFE-7500 (see Note 6), connect the syringe to a polyethylene microtubing through a needle, and treat the channels by inserting the microtube into the continuous phase inlet and by flushing the treatment solution. 2. Place the treated device in an oven at 65 C overnight.
3.2.2 Hydrophilic Treatment
1. Fill four different syringes with 2 mg/mL PDADMAC in 0.5 M NaCl, 150 mM NaCl, 2 mg/mL PSS, and MQ water, respectively. Connect them to a polyethylene microtubing through a needle and treat the channels by inserting the microtube into the continuous phase inlets and flush them in the following order (see Note 6): (a) Flush the 2 mg/mL PDADMAC through the chip and incubate it for 10 min at room temperature. (b) Wash the channels with the 150 mM NaCl solution. (c) Flow the 2 mg/mL PSS solution and incubate it for 10 min at room temperature. (d) Wash the channels with MQ water. 2. Store the microfluidic chips in a container with a damp atmosphere to prevent the dehydration of the chip surface.
26
Davide Agostino Cecchini et al.
3.3 Metagenomic Library Construction
1. Shear at least 2 μg of environmental DNA with an E220 Focused-ultrasonicator using a miniTUBE Blue 3.0 kb to obtain fragments of 3 kb of average size (see Notes 7 and 8). Process the sample according to the settings provided by the supplier. 2. Digest the pBluescript II vector (see Note 9) with FastDigest SmaI restriction enzyme (see Note 10) and dephosphorylate the digested vector with FastAP Alkaline Phosphatase. 3. Run the fragmented DNA and the digested vector on an UltraPure LMP agarose (0.8% w/v). Excise the eDNA band of desired size (e.g., between 3 and 5 kb) and the digested plasmid from the gel and purify both using Wizard SV Gel and PCR Clean-Up System or other similar kit. 4. Blunt and phosphorylate the eDNA ends with Fast DNA End Repair Kit. 5. Purify the repaired blunt-end eDNA sample using Wizard SV Gel and PCR Clean-Up System or other similar kit. 6. Ligate the purified eDNA and the vector with T4 DNA ligase. A typical reaction setup contains 50–100 ng of vector, insert DNA with a molar ratio (vector to insert) of 1:3 and up to 5 U of ligase in 1 T4 DNA ligase buffer. Adjust the final volume to 10 μL with nuclease-free water and incubate at 16 C overnight. 7. Use up to 5 μL of the mixture of ligation for transformation of 50 μL of chemically competent E. coli XL 10-Gold Ultracompetent Cells. Heat-pulse the tubes at 42 C for 30 s and incubate them on ice for 2 min. Add 450 μL of SOC and incubate the tubes at 37 C for 1 h with shaking at 220 rpm. Plate the transformation on LB agar plates containing 100 μg/ mL of ampicillin as selective marker, 1 mM of IPTG and 40 μg/mL of X-gal to estimate the percentage of clones bearing an insert with the Blue-White color screening. Incubate overnight at 37 C. 8. Recover with a scraper the colonies from the agar plates in LB medium supplemented with 100 μg/mL of ampicillin and continue as described below.
3.4 Generation of Water-in-Oil Picoliter Droplets (W/O; Single Emulsion)
Single emulsions (water-in-oil droplets, ~4 pL, ϕ: 20 μm) were generated using a fluorophilic flow focusing chip (Fig. 1) (channel dimensions 20 width 20 μm height) bearing three inlets, two aqueous and one oil phases. 1. Filter the substrate solution containing 20 -deoxyinosine, thymine, xanthine oxidase, Horseradish peroxidase, Pluronic 127, Amplex UltraRed, and lysozyme through a 0.22 μm syringe filter (see Notes 6 and 11).
Ultrahigh-Throughput Screening of Metagenomic Libraries Using Droplet. . .
27
2. Prepare a cell solution by centrifuging an appropriate amount of the resuspended colonies (1 or 2 mL) at 4000 rpm and 4 C. Wash the cells three times with 1 mL of an already filtered solution of 100 mM Bis-Tris pH 6. Adjust the cell density to a final OD600 of 0.1 in 100 mM Bis-Tris pH 6 containing Percoll (25% v/v) for a deterministic encapsulation (see Notes 6 and 12). 3. Prepare a surfactant solution of 2% w/v Fluosurf in FC-40 fluorinated oil. Filter the oil mixture through a 0.22 μm syringe filter (see Note 6). 4. Using a 1 mL gastight syringe take 1 mL of oil phase and connect it to the continuous phase inlet of the PDMS chip through a piece of polyethylene tubing. Start flowing this phase through the chip at a low flow rate (~200 μL/h). 5. Fill two 100 μL gastight syringes with both aqueous solutions and connect them to the discontinuous phase inlets of the chip via polyethylene tubing. 6. To generate 4 pL droplets at ~6.9 kHz set the flow at 900 μL/h for the oil phase and 50 μL/h for both aqueous phases. Once the chip is primed and the flows are stable, connect the outlet tube and collect the required amount of w/o emulsion in a 1.5 mL Eppendorf tube (see Notes 13 and 14). 7. Once the single emulsion is prepared, disconnect the outlet tube from the chip and stop flowing the aqueous and oil phases. 8. Take the Eppendorf containing the single emulsion and incubate it for 3 h at room temperature. 3.5 Generation of Water-in-Oil-in-Water Picoliter Droplets (W/O/W; Double Emulsion)
Double emulsions (water-in-oil-in-water, ~4 pL, ϕ: 20 μm) are generated using a hydrophilic flow focusing chip (Fig. 2) (dimensions width height of 20 20 μm) bearing two inlets. 1. Remove the excess oil from the w/o emulsion using a syringe or pipette. A quick centrifugation step at maximum speed allows a better decantation of emulsion and oil bed. 2. Connect the polyethylene tubing to a 1 mL gastight syringe, both preloaded with PBS. Leaving a 3–5 cm air cushion, aspirate very carefully the single emulsion and let it pack inside the tubing by holding it vertically for at least 10 min (see Note 15) (oil phase). 3. Using a 1 mL gastight syringe, take 1 mL of filtered PBS-Tween solution and connect it to the continuous phase inlet of the hydrophilic chip via polyethylene tubing and start flowing it at 200 μL/h. 4. Connect the tube loaded with emulsion to the discontinuous phase inlet and start flowing it at 20 μL/h.
28
Davide Agostino Cecchini et al.
Fig. 2 Design of the microfluidic device for double emulsion production. (a) The device consists of one inlet for the continuous phase of 1% (v/v) Tween 80 in 100 mM Bis-Tris pH 6, one inlet for the discontinuous phase of re-injected emulsion, and one collection outlet. (b) Picture showing the formation of w/o/w droplets at the flow-focusing junction
5. Once the droplets start to flow inside the chip, adjust the flows to obtain the w/o/w droplets. We suggest between 20 and 30 μL/h for the oil phase (w/o emulsion) and 400 and 500 μL/h for the aqueous phase (see Note 16). 6. Once the flows are stable, start collecting the double emulsion in a 1.5 mL Eppendorf (see Notes 13 and 14). 7. After the double emulsion is prepared, disconnect the outlet tube from the chip and stop flowing both phases. 3.6 Droplet Sorting and DNA Recovery
1. For an appropriate frequency, dilute the w/o/w emulsion by 20-fold (in the same buffer), approximately, before the sorting experiment. 2. Sort the w/o/w emulsion in a BD FACSVantage cell sorter, with a nozzle of 100 μm of diameter, using 488 nm as excitation wavelength and 582 nm as emission wavelength. Collect
Ultrahigh-Throughput Screening of Metagenomic Libraries Using Droplet. . .
29
Fig. 3 2D density plot of the sorted droplets. (a) Density plot representing the size (FSC-H) vs. the complexity (SSC-H) of the population of the w/o/w droplets analyzed. (b) Density plot representing the fluorescence (FL2-H) vs. the size (FSC-H) of the droplets analyzed. Three population are indicated: empty droplets (red rectangle), droplets with cells bearing a background activity (green rectangle), and the sorted droplets (orange rectangle)
the positive droplets (Fig. 3) in 100 μL of 10 mM Tris–HCl, 1 mM EDTA pH 8 in a low binding Eppendorf (see Note 17). 3. Break the sorted emulsion by adding 100 μL 1H,1H,2H,2Hperfluorooctanol. 4. Vortex the mix for 5–10 s and centrifuge at 1000 g for 1 min (see Note 18). 5. Recover the aqueous phase (upper phase) with a pipette using low binding tips and collect it in a new low binding Eppendorf. 6. Recover the plasmid using the DNA Clean and Concentrator 5 Kit. 7. Direct transformation using E. coli XL 10-Gold Ultracompetent Cells that was previously described (see Subheading 3.3, step 7).
4
Notes 1. The height of the pattern in the SU-8 mold will determine the depth of the channels of the microfluidic device. In this case, the molds of both devices were fabricated with a height of 20 μm in order to have an aspect ratio (width height) at the flow-focusing junction of 1. 2. As the silicon surface of the SU-8 master mold will result in the PDMS adhering strongly to the master, making the peeling of
30
Davide Agostino Cecchini et al.
the PDMS more difficult, it is recommended to silanize the SU-8 master mold before use in order to make it more hydrophobic. This can be achieved by placing the SU-8 master mold in a desiccator, together with a vial filled with 50 μL of trichloro (1H,1H,2H,2H-perfluorooctyl)silane, under vacuum, to allow the silanizing agent to evaporate and form a monolayer on the surface of the master mold. 3. It is important to remove all the air bubbles from the PDMS to avoid their presence in the chip. 4. The curing step can be done overnight. 5. Cut the PDMS very gently without exerting too much pressure onto the SU-8 master. The SU-8 wafers are fragile and can be easily scratched or cracked. 6. The solutions must be filtered (0.22 μm filters) to avoid clogging the microfluidic channels. 7. With a Focused-ultrasonicator it is possible to shear low volumes of genomic DNA into reproducible fragments ranging from 2 to 5 kb, by simply using the appropriate miniTUBE. Three different miniTUBE are available: miniTUBE Clear 1.5–2.5 kb, miniTUBE Blue 3.0 kb, and miniTUBE Red 5.0 kb for 2 kb, 3 kb, and 5 kb fragments, respectively. 8. Alternatively, it is possible to obtain fragmented DNA of desired size by partial digestion with a restriction enzyme. In this case, preliminary experiments are necessary to optimize the protocol of partial digestion according to the restriction enzyme used and the final desired size of the fragments. A frequently used enzyme is Sau3A that with its 4-base pair restriction site would theoretically occur once every 44 or 256 bp. To optimize the protocol, start with 100 ng of sample and partially digest it at 37 C by using different dilutions of the enzyme and different incubations time (from 15 to 30 min). Analyze the digestions on an agarose gel. When a condition is established, where 70–90% of the fragmented DNA has the desired size, scale up the amount of restriction enzyme to digest at least 1 μg of eDNA to prepare the library. 9. Other high-copy number vectors (500 to 700 copies per cell) could be used like pZero-2. It is important to use vectors with a high copy number especially in assays performed with single cells encapsulated in droplets. High-copy number plasmids will make the plasmid recovery from droplets easier. 10. If eDNA fragments are prepared by partial digestion with Sau3A, digest the vector with BamHI to create “sticky” ends for the ligation step. 11. Aqueous solutions are prepared by doubling the concentrations desired inside the droplets. When encapsulating, both
Ultrahigh-Throughput Screening of Metagenomic Libraries Using Droplet. . .
31
solutions are diluted by two-fold if both solutions are injected at the same flow rates. 12. The number of cells inside each droplet can be estimated using the Poisson distribution [12, 13]. Here, we used a final OD600 of 0.05 for E. coli and droplets of 4 pL to obtain approximately 10% of droplets with a single cell inside and 90% of empty droplets. 13. The amount of required emulsion depends on the size of the library. We suggest generating 30 to 50 times more droplets than library individuals to ensure a 3–5 coverage, taking into account that the emulsion is composed by 90% of empty droplets. 14. Collect the single emulsion in an Eppendorf protected from light to avoid photobleaching of the fluorophore. 15. This is a key step. It is important to avoid the presence of oil and/or air inside the packed emulsion. 16. It is possible to reduce the flows by 50%, always trying to maintain a ratio of continuous:discontinuous phase flows between 1:10 and 1:20, to avoid splitting the w/o droplets during their re-encapsulation as w/o/w droplets. However, optimal flow rates may vary from lab to lab depending on many factors, including chip design and should be determined empirically. 17. Before sorting experiments, it is highly recommended to measure a positive control emulsion, a negative control emulsion, and a known mixture of both to optimize the acquisition parameters, such as detector voltage, flow rates, and position of the sorting gates. 18. Repeat until two clear phases appear in the tube.
Acknowledgments This work has been funded through the European Union’s Research and Innovation program Horizon 2020 through grant agreement no. 685474 (MetaFluidics) and through a grant from the Spanish Ministry of Science and Innovation, ref. PID2020117025RB-I00 (UltraNDTs). Alejandro H. Orrego is grateful to the Department of Education and Research of the Region of Madrid and the European Social Fund for the Postdoctoral contract (PEJD-2018-POST/BIO-8798).
32
Davide Agostino Cecchini et al.
References 1. Sheldon RA, Woodley JM (2018) Role of biocatalysis in sustainable chemistry. Chem Rev 118:801–838 2. May O (2019) Industrial enzyme applications—overview and historic perspective. In: Industrial enzyme applications. Wiley-VCH, Weinheim, pp 1–24. https://doi.org/10. 1002/9783527813780.ch1_1 3. Ufarte´ L, Potocki-Veronese G, Laville E´ (2015) Discovery of new protein families and functions: new challenges in functional metagenomics for biotechnologies and microbial ecology. Front Microbiol. https://doi.org/ 10.3389/fmicb.2015.00563 4. Truppo MD (2017) Biocatalysis in the pharmaceutical industry: the need for speed. ACS Med Chem Lett 8:476–480 5. Agresti JJ, Antipov E, Abate AR et al (2010) Ultrahigh-throughput screening in drop-based microfluidics for directed evolution. Proc Natl Acad Sci U S A 107:4004–4009 6. Kintses B, Hein C, Mohamed MF et al (2012) Picoliter cell lysate assays in microfluidic droplet compartments for directed enzyme evolution. Chem Biol 19:1001–1009
7. Mair P, Gielen F, Hollfelder F (2017) Exploring sequence space in search of functional enzymes using microfluidic droplets. Curr Opin Chem Biol 37:137–144 8. Obexer R, Godina A, Garrabou X et al (2017) Emergence of a catalytic tetrad during evolution of a highly active artificial aldolase. Nat Chem 9:50–56 9. Gielen F, Hours R, Emond S et al (2016) Ultrahigh-throughput–directed enzyme evolution by absorbance-activated droplet sorting (AADS). Proc Natl Acad Sci 113: E7383–E7389 10. Debon A, Pott M, Obexer R et al (2019) Ultrahigh-throughput screening enables efficient single-round oxidase remodelling. Nat Catal 2:740–747 11. Nikoomanzar A, Vallejo D, Chaput JC (2019) Elucidating the determinants of polymerase specificity by microfluidic-based deep mutational scanning. ACS Synth Biol 8:1421–1429 12. Mazutis L, Gilbert J, Ung WL et al (2013) Single-cell analysis and sorting using dropletbased microfluidics. Nat Protoc 8:870–891 13. Shapiro HM (2005) Practical flow cytometry. John Wiley & Sons, Hoboken, New Jersey
Chapter 3 Synthetic DNA Libraries for Protein Engineering Toward Process Improvement in Drug Synthesis Michele Tavanti Abstract Speeding-up enzyme engineering by directed evolution is a primary target to be achieved for a wider uptake of biocatalysis in pharmaceutical process development. The capability to rapidly generate the designed sequence diversity has profound implications in the overall optimization of protein function. Drawbacks associated with traditional PCR methods for sequence diversification interfere with the generation of all the variants that have been designed. On the contrary, the enhanced quality of synthetic DNA libraries makes the exploration of sequence space more efficient. Here, methods for the effective utilization of synthetic DNA libraries are described. The overall procedure allows the generation of ready-to-screen libraries within two weeks from synthetic DNA acquisition. Key words Synthetic DNA, Site-saturation mutagenesis, Combinatorial libraries, Seamless cloning, Directed evolution
1
Introduction The adoption of biocatalysis in the pharmaceutical industry depends on our capacity to find enzymes that can meet process specifications. Naturally occurring enzymes are not always fit for purpose, as they can exhibit low activity against non-cognate substrates, poor operational stability and substrate/product inhibition. As such, protein engineering through directed evolution (DE) has become an established procedure to enable the application of biocatalysts in chemical manufacturing. The quest for the ideal enzyme entails a number of steps including the selection of a protein backbone with desirable traits, iterative rounds of sequence diversification and screening under conditions converging toward the targeted process [1, 2]. The application of this workflow has enabled the implementation of enzymes in different processes for API manufacturing [3–8] and, more in general, allowed the generation of enzymes that could outperform synthetic catalysts [9–11].
Francesca Magnani, Chiara Marabelli and Francesca Paradisi (eds.), Enzyme Engineering: Methods and Protocols, Methods in Molecular Biology, vol. 2397, https://doi.org/10.1007/978-1-0716-1826-4_3, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2022
33
34
Michele Tavanti
The major limitation of the DE algorithm is the speed at which we can deliver improved variants [12, 13]. Efforts to optimize the exploration of the vast sequence diversity available in databases [14, 15], improve library design [16, 17], and increase screening throughput [18] are being made. A significant amount of work has also been carried out to not only accelerate but also to improve the quality of the diversification step output. Typically, PCR methods have been used to build libraries of variants to be screened. While random mutagenesis methods were employed in the early days of DE [19–21], subsequent research focused on generating smarter libraries by mutating predetermined positions in the enzyme to all of the other amino acids or to a limited set of amino acids (sitesaturation mutagenesis, SSM) [22, 23]. However, PCR-based methods rarely deliver all the variants that have been designed by the protein engineer, they can carryover the backbone template and they can also show a certain degree of redundancy [24]. This means that additional resources need to be allocated for thorough quality control of variant libraries and increased screening effort. As the cost of DNA synthesis has decreased significantly, the use of synthetic DNA libraries has been explored. The Reetz group reported several studies that highlighted the benefits of using chemical gene synthesis for the production of site-saturation libraries [25–27]. When compared to libraries produced with traditional PCR methods, synthetic DNA libraries displayed higher genetic diversity and less backbone template carryover, which can ultimately lead to a reduction in the screening effort needed to identify improved variants [28]. As the generation of diversity using synthetic DNA libraries can transform DE workflows, this chapter describes methods for the seamless cloning (Subheadings 3.1 and 3.2), evaluation of library diversity and purification of pooled plasmids encoding the desired variants (Subheadings 3.3 and 3.4, respectively), as well as production of variant libraries obtained as synthetic genes (Subheading 3.5). Nowadays, it is also possible to order cloned DNA libraries (thus reducing the risk of introducing the empty backbone plasmid in the variant library), but we considered the additional cost of commercially cloned variant libraries to be prohibitive and as such in-house cloned libraries were produced.
2
Materials Stock solutions should be prepared using nuclease-free molecular biology-grade water or ultrapure water (18.2 MΩ-cm at 25 C and a total organic carbon value below 5 ppb). Store enzymes and stock solutions as suggested by the manufacturer.
Synthetic DNA Libraries for Protein Engineering Toward Process Improvement. . .
2.1 Backbone Plasmid Preparation
35
1. PCR plates and/or thin-walled PCR tubes. 2. 2 Phusion Hot Start PCR Master Mix or alternative thermostable DNA polymerase (see Note 1). 3. pET28a (see Note 2). 4. 10μM forward (GA primer_fw: 50 - AGATCCGGCTGCTAA CAAAGCC-30 ) and 10μM reverse (GA primer_rv: 50 -GCTAG CCATATGGCTGCCG-30 ) primers (see Note 3). 5. DpnI (20,000 units/mL). 6. Precast 1% agarose gels, 1 kb DNA Ladder, and DNA gel loading dye (see Note 4). 7. Kit for DNA extraction from agarose gels. 8. PCR purification kit.
2.2 In-Fusion Cloning
1. Synthetic DNA libraries comprising a 50 -prefix (50 - AGCCA TATGGCTAGC -30 ) right before the first ATG codon and a 30 -suffix (50 -AGATCCGGCTGCTAA-30 ) right after the stop codon (see Note 3). 2. In-Fusion® HD cloning kit including Stellar competent cells (see Note 5). 3. LB agar plates supplemented with 50μg/mL kanamycin (see Note 6).
2.3
Quality Control
1. 5μM forward (Seq_fw: 50 -CTCGATCCCGCGAAATTAATACG-30 ) and 5μM reverse (Seq_rv: 50 -CGCCAATCCGGATATAGTTCCTCC-30 ) primers (see Note 7). 2. 2 Taq polymerase Hot Start Master Mix. 3. ExoSAP-IT™ PCR Product Cleanup Reagent.
2.4 Plasmid Library Purification
1. Plasmid Miniprep Kit.
2.5 Variant Library Production
1. Chemically competent BL21(DE3).
2. Liquid Luria-Bertani medium (LB): Dissolve 10 g of Tryptone, 5 g of Yeast Extract, and 10 g of NaCl in 1 L H2O. Adjust the pH to 7.0 with 5 M NaOH and sterilize by autoclaving.
2. LB agar plates supplemented with 50μg/mL kanamycin: Prepare LB agar adding the reagents indicated for liquid LB to 20 g agar. Sterilize by autoclaving and cool down in a 50 C water bath for 30 min before adding antibiotics. Rapidly pour into media trays (see Note 8). 3. Liquid LB supplemented with 50μg/mL kanamycin. 4. Sterile 96-well plates (shallow- and deep-well, see Note 9). 5. Breathable sealing tape for microplates.
36
Michele Tavanti
6. Terrific broth (TB) supplemented with 50μg/mL kanamycin: Prepare TB by dissolving 24 g of Yeast Extract, 20 g of Tryptone, and 4 mL of glycerol in 900 mL H2O. Sterilize by autoclaving. Allow the solution to cool down. Filter sterilize a solution containing 0.17 M KH2PO4 and 0.72 M K2HPO4. Add 100 mL of the latter solution to the autoclaved medium. 7. 50% sterile glycerol in water. 8. 6 mM filter sterilized thiogalactopyranoside).
3
IPTG
(Isopropyl
β-D-1-
Methods Unless stated otherwise, all procedures can be carried out at room temperature. Automated liquid handling/colony picking should be used where possible. While discouraged, steps typically carried out under a laminar flow hood can simply be performed in environments designed to minimize contamination. Reaction conditions are given for the specific set of enzymes employed in this work: if alternative enzymes are employed, reaction conditions should be adjusted accordingly. The preferred media for this protocol are LB and TB but different growth media can be used.
3.1 Backbone Plasmid Preparation
1. Prepare a master mix for 40 50μL PCR reactions containing 25μL of 2 Phusion Hot Start PCR Master Mix, 0.5μM (2.5μL from a 10μM stock) of each forward (GA primer_fw) and reverse primer (GA primer_rv), and 1 ng of backbone DNA (see Note 10). 2. Aliquot the PCR mixture in a PCR plate (or thin-walled PCR tubes) and run reactions in a thermal cycler as follows: (a) Initial denaturation, at 98 C for 30 s. (b) 30 cycles at 98 C for 10 s, 60 C for 30 s, and 72 C for 4 min. (c) Final extension at 72 C for 10 min. (d) Storage at 4 C until next step (see Note 11). 3. Visualize PCR products (5μL) by agarose gel electrophoresis (Fig. 1). 4. Add 2μL DpnI (40 U) to each well of the PCR plate and run parental DNA digestion in a thermal cycler as follows: (a) Digestion at 37 C for 16 h. (b) Heat denaturation at 80 C for 20 min. (c) Storage at 4 C (see Note 12). 5. Load a 48-well 1% agarose gel with 20μL of DpnI-treated DNA into each well of the agarose gels so to load the entire content
Synthetic DNA Libraries for Protein Engineering Toward Process Improvement. . .
37
Fig. 1 Typical agarose gel electrophoresis analysis results of backbone plasmid amplification. DNA ladder was loaded in lane 1 and the size of relevant bands is indicated
of the PCR plate employed for vector backbone preparation. Leftover DpnI-treated DNA can optionally be loaded into an additional 48-well agarose gel depending on the desired yield. 6. Purify DNA fragments using a gel extraction kit following manufacturer instructions (see Note 13). Quantify the obtained DNA by measuring OD260. This procedure typically yields 10–20μg DNA. 7. Clean up the obtained DNA using a PCR purification kit (see Note 14). Quantify the obtained DNA by measuring OD260 and check relative purity by measuring OD260/280 and OD260/ 230 (expect ~2 for both metrics). This procedure typically yields 10–15μg DNA. 3.2 In-Fusion Cloning
1. Set up In-Fusion cloning reactions in thin-walled PCR tubes by adding 2μL of 5 In-Fusion HD Enzyme Premix, backbone plasmid prepared as described above and synthetic DNA libraries in a molar ratio 1:2, and water to 10μL (see Note 15). 2. Run the cloning reaction in a thermal cycler for 1 h at 50 C (see Note 16). 3. Transform Stellar Competent Cells as follows: (a) Place SOC medium aliquots (provided with the In-Fusion kit) at 37 C.
38
Michele Tavanti
(b) Thaw one aliquot of competent cells per cloning reaction. (c) Add 2.5μL of the In-Fusion cloning reaction to the competent cells. (d) Incubate for 30 min in ice. (e) Heat-shock for 45 s at 42 C using a water bath. (f) Place tubes back in ice for 1–2 min. (g) Add 450μL pre-warmed SOC medium. (h) Incubate for 1 h at 37 C either at 800 rpm shaking using a ThermoMixer or with 200 rpm orbital shaking. (i) Spin at 4 C, 6000 rpm, 5 min using a microcentrifuge. (j) Remove 300μL supernatant and resuspend the cell pellet. (k) Plate the entire transformation volume in LB agar plates supplemented with 50μg/mL kanamycin and incubate at 37 C for at least 16–18 h. Store agar plates at 4 C until Subheading 3.4 (see Notes 17 and 18). 3.3
Quality Control
1. For quality control purposes by colony PCR (cPCR), prepare a master mix containing 7.5μL of 2 Taq polymerase Hot Start Master Mix, 1μM (3μL from a 5μM stock) of each forward (Seq_fw) and reverse primers (Seq_rv) and water to 15μL (see Note 19). 2. Aliquot the master mix into a PCR plate. Pick single colonies with a sterile pipette tip and swirl inside the destination well. 3. Run the cPCR setting the thermal cycler as follows: (a) Initial denaturation, at 95 C for 2 min. (b) 30 cycles at 95 C for 30 s, 55 C for 30 s, and 72 C for 2.5 min. (c) Final extension at 72 C for 7 min. (d) Storage at 4–12 C until next step. 4. Visualize cPCR products by agarose gel electrophoresis as indicated above. Depending on the number of reactions performed, it might be acceptable to just analyze a selection of samples. In order to save material for later sequencing, 3μL of PCR mixture can be analyzed by agarose gel electrophoresis (Fig. 2). 5. Transfer 5μL of each cPCR reaction into a new PCR plate. 6. Add 2μL ExoSAP-IT™ PCR Product Cleanup Reagent to each well. 7. Incubate using a thermal cycler as follows: (a) Cleanup reaction at 37 C for 15 min. (b) Heat denaturation at 80 C for 15 min. (c) Storage at 4 C as needed.
Synthetic DNA Libraries for Protein Engineering Toward Process Improvement. . .
39
Fig. 2 Typical agarose gel electrophoresis analysis results of a successful cPCR (~1 kb insert). DNA ladder was loaded on lane 1 and the size of the 1500 bp band is indicated
8. Dilute by adding 63μL water. 9. For Sanger sequencing, add 4μL of ExoSAP-treated DNA to a 12μL solution containing the desired sequencing primer (Seq_fw and Seq_rv for this work, see Note 20). 10. Analyze sequencing results. 3.4 Plasmid Library Purification
1. Add 5–8 mL of sterile LB on the agar plate(s) containing the desired library and gently scrape bacteria using a hockey stick spreader (see Note 21). 2. Purify the pooled library using a plasmid miniprep kit following the manufacturer instructions. 3. Quantify the obtained DNA by measuring OD260.
3.5 Variant Library Production
1. Transform chemically competent BL21 (DE3) with the plasmid library obtained in Subheading 3.4 following the manufacturer instructions and incubate agar plates supplemented with 50μg/mL kanamycin for 16–20 h (see Note 22). 2. For overnight cultures preparation, aliquot 200μL of LB supplemented with 50μg/mL kanamycin into sterile 96-well plates (shallow-well). 3. Pick single colonies into 96-well plates pre-aliquoted with growth medium using sterile pipette tips or using automated colony picking. 4. Seal plates with breathable sealing tape and incubate at 30 C, 200 rpm orbital shaking, 85% humidity for 16–20 h (see Note 23). 5. For expression cultures preparation, aliquot 380μL of TB supplemented with 50μg/mL kanamycin into 96-well plates (deep-well). 6. Inoculate expression cultures with 20μL overnight culture. 7. Seal plates with breathable sealing tape and incubate at 30 C, 200 rpm orbital shaking, 85% humidity (see Note 23).
40
Michele Tavanti
8. For glycerol stocks preparation, add 50–100μL overnight culture into 50–100μL of 50% sterile glycerol (1:1 ratio overnight culture:50% glycerol) and mix thoroughly either by pipette aspiration or by vigorous shaking at room temperature (10 min, 800 rpm, 3 mm throw diameter). 9. Store glycerol stocks at 80 C. 10. Prepare a 6 mM IPTG solution to induce protein expression. 11. After 1 h incubation, use a plate reader to measure OD600 by transferring 100μL expression culture into a transparent 96-well plate (shallow-well, see Note 24). 12. Transfer back the content of the plate employed for the OD600 measurement and continue incubation if needed (see Note 25). 13. Induce protein expression at OD600 0.6–0.8 by adding 40μL of 6 mM IPTG (final concentration ~0.5 mM, see Note 26). 14. Continue incubation for 20–24 h (see Note 27). 15. After overnight incubation, measure OD600 as indicated above diluting the expression culture 10–20 in 200μL TB (see Note 28). 16. Harvest cells by centrifugation (4000 g, 10 min, 4 C). 17. Discard supernatant and gently tap the 96-well plate on tissue to remove residual liquid. 18. Enzyme variants can either be tested directly as whole-cell biocatalysts or stored at 80 C before lysis and screening (see Note 29).
4
Notes 1. Hot Start master mixes are designed to minimize the number of pipetting steps and to reduce the time spent preparing stocks of deoxynucleotide triphosphates (dNTPs) and buffer components. Only template and primers are needed for reaction setup. Hot Start technology allows to set up reactions at room temperature. 2. The vector backbone can be either acquired or purified from bacterial cultures using plasmid Miniprep kits. DNA quantification should be carried out by measuring OD260. 3. Primers were designed for cloning the gene of interest (GOI) in a pET28a expression vector and allow for protein production of N-terminal 6xHis-Tag proteins. Modification of the primers employed for plasmid preparation and adjustment of the 50 -, 30 -overhangs on the GOI allow for different cloning strategies. Primer design for vector and target fragment preparation is well explained for In-Fusion cloning (adopted in this protocol) in
Synthetic DNA Libraries for Protein Engineering Toward Process Improvement. . .
41
the corresponding user manual (http://www.takara.co.kr/ file/manual/pdf/In-Fusion_HD_Cloning_Kit_121416.pdf). Successful cloning design can be verified in silico using SnapGene or online tools such as the NEBuilder Assembly Tool (after adjusting the minimum overlap to 15 nucleotides, http://nebuilder.neb.com/#!/). 4. The use of precast agarose gels is recommended as they allow efficient and safe DNA analysis. 1% agarose gels can also be prepared and run using standard molecular biology techniques. 5. Alternative seamless cloning methods such as NEBuilder or Gibson Assembly [29] can be used without major modifications to the protocols indicated in this chapter. 6. Media recipes are freely available at http://cshprotocols.cshlp. org/. 7. Primers should be designed according to the selected plasmid backbone. 8. Plate formats larger than standard Petri dishes (such as Qtrays) should be considered for this step depending on library size. 9. While discouraged, non-sterile plates can also be employed. 10. Minimizing backbone template concentration helps in reducing backbone carryover. 11. The specific PCR parameters should be adjusted if a different polymerase, primers, and/or template backbone are employed. 12. Suppliers typically suggest performing DpnI treatment for 15–60 min and with 5–20 U of enzyme. While this procedure can be followed for routine cloning, it is suggested to follow the protocol indicated here to minimize backbone template background. 13. Employ multiple spin columns to not exceed the maximum amount of agarose gel per column. A maximum amount of 350 mg agarose per column was loaded for this work. Elute DNA from each column using 20μL pre-warmed water (65 C). 14. For routine cloning, either gel extraction kits or PCR purification kits are used for vector DNA purification. It is suggested to use both kits to obtain the highest quality DNA for subsequent cloning steps. The number of spin columns employed for this cleanup depends on the yield of the first DNA extraction step and spin column binding capacity. Elute using 30μL pre-warmed water (65 C). 15. Reactions can also be prepared using different tube formats, but in that case the subsequent 50 C incubation might require a water bath. Considering a ~5 kb backbone vector and a ~1 kb insert, DNA concentrations of about 50 and 25 ng/μL,
42
Michele Tavanti
respectively, are acceptable for this reaction. As such, it is advised to acquire at least 250 ng synthetic DNA to not complicate subsequent cloning steps. The vector:insert molar ratio can be adjusted as needed for this reaction to succeed, but generally a 1:2 ratio is a good starting point. Additional information is included in the In-Fusion® HD Cloning Kit User Manual and an online tool is available for molar ratio calculations (https://www.takarabio.com/learning-centers/clon ing/primer-design-and-other-tools/in-fusion-molar-ratiocalculator). 16. Although the user manual indicates to run reactions for 15 min, it is advised to run reactions for 1 h. Reactions can be stored at 4 C overnight or at 20 C until needed. 17. This procedure typically yields hundreds of colonies. If less colonies are needed, then several dilutions of the transformation mixture before plating should be tested. If more colonies are needed, replicate reactions can be performed until the target number of colonies is reached. Optimizing the cloning strategy/reaction conditions should also be considered. 18. Considering a library size of N variants, an oversampling factor of about 3 is required for 95% library coverage, meaning that N 3 colonies need to be screened to cover that fraction of variant space. The CASTER tool developed by the Reetz group can be used to calculate the number of transformants to be screened given a certain library size (associated to the selected degenerate codons) and library coverage [30]. Alternatively, screening effort can be significantly reduced if the probability of finding an outstanding variant which is not necessarily the best one is chosen as a condition to determine library size [27, 31]. 19. Typically, ten colonies per library are screened by colony PCR to estimate library diversity. However, hundreds of samples can potentially be screened using this protocol. Depending on library design, the “Quick Quality Control” developed by the Reetz group can also be used [22]. Alternatively, next generation sequencing can significantly increase sequencing throughput. 20. Different sequencing companies have slightly different requirements in terms of sample volume and primer concentration. 21. Avoid splitting the agar. The obtained solution should have a maximum OD600 of ~6. Additional LB can be added for dilution purposes. Maxiprep kits can also be considered. Excessive cell density can lead to inefficient lysis/spin column overloading. 22. Plating tests should be performed beforehand using the backbone plasmid containing the wildtype sequence to determine
Synthetic DNA Libraries for Protein Engineering Toward Process Improvement. . .
43
the optimal amount of DNA to add to competent cells and the dilution of the transformation mixture needed to obtain discrete colonies. Optionally, agar plates can be incubated over weekend at 23 C. For this step, plate formats larger than standard Petri dishes should be considered depending on library size. 23. High relative humidity is needed to reduce evaporation of the medium which ultimately can translate in reduced screening variability. Incubation temperature can also be set to 37 C. A throw diameter of 50 mm was employed for protocol development. 24. Measure OD600 for a plate containing 100μL sterile medium (blank) beforehand. If a plate reader is not available, OD600 of a representative sample can be measured using a standard cuvette. 25. Induction of protein expression is typically carried out at OD 0.6–0.8, even if IPTG-induction can also be started at OD 2 when using rich media as TB. For this protocol, duplication time is ~30 min and induction is typically performed 1 h 50 min after inoculation. 26. Final IPTG concentration typically ranges from 0.1 to 1 mM. 27. Protein expression can be performed at lower temperature to aid soluble protein expression. 28. Fold dilution largely depends on the medium employed and protein produced. 29. Freeze plates to aid the subsequent cell lysis step.
Acknowledgments This project has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No [722361]. The author is grateful to Dr. Murray J. B. Brown and Dr. Gheorghe-Doru Roiban for critically reviewing this work. References 1. Romero PA, Arnold FH (2009) Exploring protein fitness landscapes by directed evolution. Nat Rev Mol Cell Biol 10(12):866–876. https://doi.org/10.1038/nrm2805 2. Fox RJ, Davis SC, Mundorff EC, Newman LM, Gavrilovic V, Ma SK, Chung LM, Ching C, Tam S, Muley S, Grate J, Gruber J, Whitman JC, Sheldon RA, Huisman GW (2007) Improving catalytic function by
ProSAR-driven enzyme evolution. Nat Biotechnol 25(3):338–344. https://doi.org/10. 1038/nbt1286 3. Savile CK, Janey JM, Mundorff EC, Moore JC, Tam S, Jarvis WR, Colbeck JC, Krebber A, Fleitz FJ, Brands J, Devine PN, Huisman GW, Hughes GJ (2010) Biocatalytic asymmetric synthesis of chiral amines from ketones applied to sitagliptin manufacture. Science 329
44
Michele Tavanti
(5989):305–309. https://doi.org/10.1126/ science.1188934 4. Wu S, Snajdrova R, Moore JC, Baldenius K, Bornscheuer UT (2020) Biocatalysis: enzymatic synthesis for industrial applications. Angew Chem Int Ed Engl 60:88–119. https://doi.org/10.1002/anie.202006648 5. Fryszkowska A, Devine PN (2020) Biocatalysis in drug discovery and development. Curr Opin Chem Biol 55:151–160. https://doi.org/10. 1016/j.cbpa.2020.01.012 6. Huffman MA, Fryszkowska A, Alvizo O, Borra-Garske M, Campos KR, Canada KA, Devine PN, Duan D, Forstater JH, Grosser ST, Halsey HM, Hughes GJ, Jo J, Joyce LA, Kolev JN, Liang J, Maloney KM, Mann BF, Marshall NM, McLaughlin M, Moore JC, Murphy GS, Nawrat CC, Nazor J, Novick S, Patel NR, Rodriguez-Granillo A, Robaire SA, Sherer EC, Truppo MD, Whittaker AM, Verma D, Xiao L, Xu Y, Yang H (2019) Design of an in vitro biocatalytic cascade for the manufacture of islatravir. Science 366 (6470):1255–1259. https://doi.org/10. 1126/science.aay8484 7. Schober M, MacDermaid C, Ollis AA, Chang S, Khan D, Hosford J, Latham J, Ihnken LAF, Brown MJB, Fuerst D, Sanganee MJ, Roiban G-D (2019) Chiral synthesis of LSD1 inhibitor GSK2879552 enabled by directed evolution of an imine reductase. Nat Catal 2 (10):909–915. https://doi.org/10.1038/ s41929-019-0341-4 8. Latham J, Ollis AA, MacDermaid C, Honicker K, Fuerst D, Roiban G-D (2020) Directed evolution of enzymes driving innovation in API manufacturing at GSK. In: Whittall J, Sutton PW (eds) Applied biocatalysis: the chemist’s enzyme toolbox. Wiley, Hoboken, New Jersey 9. Kan SBJ, Huang X, Gumulya Y, Chen K, Arnold FH (2017) Genetically programmed chiral organoborane synthesis. Nature 552 (7683):132–136. https://doi.org/10.1038/ nature24996 10. Hammer SC, Kubik G, Watkins E, Huang S, Minges H, Arnold FH (2017) AntiMarkovnikov alkene oxidation by metal-oxomediated enzyme catalysis. Science 358 (6360):215–218. https://doi.org/10.1126/ science.aao1482 11. Kan SB, Lewis RD, Chen K, Arnold FH (2016) Directed evolution of cytochrome c for carbonsilicon bond formation: bringing silicon to life. Science 354(6315):1048–1051. https://doi. org/10.1126/science.aah6219 12. Truppo MD (2017) Biocatalysis in the pharmaceutical industry: the need for speed. ACS Med
Chem Lett 8(5):476–480. https://doi.org/ 10.1021/acsmedchemlett.7b00114 13. Goodwin NC, Morrison JP, Fuerst DE, Hadi T (2019) Biocatalysis in medicinal chemistry: challenges to access and drivers for adoption. ACS Med Chem Lett 10(10):1363–1366. https://doi.org/10.1021/acsmedchemlett. 9b00410 14. Hon J, Borko S, Stourac J, Prokop Z, Zendulka J, Bednar D, Martinek T, Damborsky J (2020) EnzymeMiner: automated mining of soluble enzymes with diverse structures, catalytic properties and stabilities. Nucleic Acids Res 48(W1):W104–W109. https://doi.org/ 10.1093/nar/gkaa372 15. Trudeau DL, Tawfik DS (2019) Protein engineers turned evolutionists-the quest for the optimal starting point. Curr Opin Biotechnol 60:46–52. https://doi.org/10.1016/j. copbio.2018.12.002 16. Romero-Rivera A, Garcia-Borras M, Osuna S (2016) Computational tools for the evaluation of laboratory-engineered biocatalysts. Chem Commun (Camb) 53(2):284–297. https:// doi.org/10.1039/c6cc06055b 17. Yang KK, Wu Z, Arnold FH (2019) Machinelearning-guided directed evolution for protein engineering. Nat Methods 16(8):687–694. https://doi.org/10.1038/s41592-019-04966 18. Holland-Moritz DA, Wismer MK, Mann BF, Farasat I, Devine P, Guetschow ED, Mangion I, Welch CJ, Moore JC, Sun S, Kennedy RT (2020) Mass activated droplet sorting (MADS) enables high-throughput screening of enzymatic reactions at Nanoliter scale. Angew Chem Int Ed Engl 59(11):4470–4477. https://doi.org/10.1002/anie.201913203 19. Turner NJ (2009) Directed evolution drives the next generation of biocatalysts. Nat Chem Biol 5(8):567–573. https://doi.org/10. 1038/nchembio.203 20. Stemmer WP (1994) Rapid evolution of a protein in vitro by DNA shuffling. Nature 370 (6488):389–391. https://doi.org/10.1038/ 370389a0 21. Chen K, Arnold FH (1993) Tuning the activity of an enzyme for unusual environments: sequential random mutagenesis of subtilisin E for catalysis in dimethylformamide. Proc Natl Acad Sci U S A 90(12):5618–5622. https:// doi.org/10.1073/pnas.90.12.5618 22. Kille S, Acevedo-Rocha CG, Parra LP, Zhang ZG, Opperman DJ, Reetz MT, Acevedo JP (2013) Reducing codon redundancy and screening effort of combinatorial protein libraries created by saturation mutagenesis.
Synthetic DNA Libraries for Protein Engineering Toward Process Improvement. . . ACS Synth Biol 2(2):83–92. https://doi.org/ 10.1021/sb300037w 23. Reetz MT (2011) Laboratory evolution of stereoselective enzymes: a prolific source of catalysts for asymmetric reactions. Angew Chem Int Ed Engl 50(1):138–174. https:// doi.org/10.1002/anie.201000826 24. Sayous V, Lubrano P, Li Y, Acevedo-Rocha CG (1868) Unbiased libraries in protein directed evolution. Biochim Biophys Acta Proteins Proteom 2020(2):140321. https://doi.org/10. 1016/j.bbapap.2019.140321 25. Qu G, Li A, Acevedo-Rocha CG, Sun Z, Reetz MT (2020) The crucial role of methodology development in directed evolution of selective enzymes. Angew Chem Int Ed Engl 59 (32):13204–13231. https://doi.org/10. 1002/anie.201901491 26. Li A, Acevedo-Rocha CG, Sun Z, Cox T, Xu JL, Reetz MT (2018) Beating bias in the directed evolution of proteins: combining high-Fidelity on-Chip solid-phase gene synthesis with efficient gene assembly for combinatorial library construction. Chembiochem 19 (3):221–228. https://doi.org/10.1002/cbic. 201700540 27. Hoebenreich S, Zilly FE, Acevedo-Rocha CG, Zilly M, Reetz MT (2015) Speeding up
45
directed evolution: combining the advantages of solid-phase combinatorial gene synthesis with statistically guided reduction of screening effort. ACS Synth Biol 4(3):317–331. https:// doi.org/10.1021/sb5002399 28. Li A, Sun Z, Reetz MT (2018) Solid-phase gene synthesis for mutant library construction: the future of directed evolution? Chembiochem 19(19):2023–2032. https://doi.org/ 10.1002/cbic.201800339 29. Gibson DG, Young L, Chuang RY, Venter JC, Hutchison CA 3rd, Smith HO (2009) Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat Methods 6 (5):343–345. https://doi.org/10.1038/ nmeth.1318 30. Reetz MT, Carballeira JD (2007) Iterative saturation mutagenesis (ISM) for rapid directed evolution of functional enzymes. Nat Protoc 2 (4):891–903. https://doi.org/10.1038/ nprot.2007.72 31. Nov Y (2012) When second best is good enough: another probabilistic look at saturation mutagenesis. Appl Environ Microbiol 78 (1):258–262. https://doi.org/10.1128/ AEM.06265-11
Part II Directed Evolution
Chapter 4 In Silico Prediction Methods for Site-Saturation Mutagenesis Ge Qu and Zhoutong Sun Abstract Directed enzyme evolution has proven to be a powerful means to endow biocatalysts with novel catalytic repertoires. Apart from completely random gene mutagenesis, site-directed or site-saturation mutagenesis requires a semi-rational selection of the amino acid positions or the substituted residues, which can dramatically reduce the screening efforts in protein engineering. To this end, in silico prediction methods play a pivotal role in targeting site-saturation mutagenesis. In this chapter, we provide two distinct computational methods, (a) conformational dynamics-guided design and (b) protein-ligand interaction fingerprinting analysis, to identify specific positions for site-saturation mutagenesis toward manipulating substrate specificity/stereoselectivity of an alcohol dehydrogenase, and improving activity of a carboxylic acid reductase, respectively. Key words Enzyme engineering, Rational design, In silico, Site-specific saturation mutagenesis, Conformational dynamics, Protein–ligand interaction, Alcohol dehydrogenase, Carboxylic acid reductase
1
Introduction Enzyme engineering has become a foundational technique for providing a plethora of practical biocatalysts, which benefit the fields of biocatalysis and industrial biomanufacturing [1–3]. As a powerful and widely used method, directed evolution enables the improvement of enzyme function through random mutation of the protein’s amino acid sequence [4]. However, the screening step constitutes the bottleneck of directed evolution due to the vast size of sequence space [5, 6]. As such, methods and strategies for evolving “small but smart” mutant libraries requiring a minimum of screening are desired [7, 8]. In this crucial endeavor, more sophisticated methodologies that reduce the randomness have been developed to unravel a subset of well-chosen amino acid residues (often called “hotspots”), at which mutagenesis is likely to improve activity, stability, or selectivity [8–11]. Thus, focused
Francesca Magnani, Chiara Marabelli and Francesca Paradisi (eds.), Enzyme Engineering: Methods and Protocols, Methods in Molecular Biology, vol. 2397, https://doi.org/10.1007/978-1-0716-1826-4_4, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2022
49
50
Ge Qu and Zhoutong Sun
mutagenesis methods can reduce library size and screening burden, thereby accelerating the rate of finding improved mutants. Such focused mutagenesis methods have generally relied on site-saturation mutagenesis approach to incorporate random diversity at the hotspot positions, followed by screening of the respective mutant libraries to determine the superior substitution. On the one hand, a number of degenerate codon sets corresponding to reduced amino acid alphabets have been developed to limit bias and to control redundancy in the genetic code for keeping the screening effort at a minimum, which has been extensively elucidated [12, 13]. On the other hand, in such efforts structure-guided methods and in silico tools can help significantly by raising the probability of success [14–16]. An increasing number of computational methodologies and tools have been developed to assist in the faster identification of suitable hotspots and the design of “small but smart” mutant libraries [17–21]. In this chapter, we provide the protocols of two in silico strategies for identifying hot spot positions for site-saturation mutagenesis to manipulate enzymatic repertoires (Fig. 1): (a) conformational dynamics-guided design of substrate specificity and stereoselectivity of an alcohol dehydrogenase from Thermoanaerobacter brockii (TbSADH) [22] and (b) engineering of a carboxylic acid reductase from Segniliparus rugosus (SrCAR) with the aim of increasing activity by protein-ligand fingerprint interaction analysis [23, 24]. The first case study uses the conformational dynamics changes information in enzyme structures to identify the critical residues. In recent years, there has been growing interest in the analysis of conformational dynamics of proteins [25–28], as a guide in enzyme engineering for expanding substrate scope [29], increasing enantioselectivity [30], relieving product inhibition [31], improving thermostability [32], and more. The motion of amino acid residues in loops involved in the shape of the binding pocket has been recognized to influence the catalytic properties of many enzymes in their native states of the wild-type (WT) [25–28]. It is therefore of great interest to identify beneficial positions that manipulate enzymatic reactions via conformational dynamics analysis. An example was given by conformational dynamics-guided design of substrate specificity and stereoselectivity of TbSADH [22, 33]. As a result of such an analysis, two key residues were found to increase the fluctuation of a loop region, thereby yielding a larger volume of the binding pocket to accommodate non-natural bulky substrates, enabling acceptable activity and high enantioselectivity [22, 33]. The second approach uses protein-ligand interaction fingerprinting analysis for mining information within the enzyme active site, which is likewise beneficial for the identification of hotspots. It is a well-established approach in protein engineering, structural bioinformatics, and drug discovery [23, 34, 35]. Moreover, since
In Silico Prediction Methods for Site-Saturation Mutagenesis
51
Fig. 1 Schematic representation of two in silico prediction methods described in this chapter
molecular dynamics (MD) simulation is a dominant method for analyzing conformational changes of protein–ligand complexes, it can be exploited advantageously to quantify the frequency of protein–ligand interactions. On this basis, the second case study is illustrated here using the carboxylic acid reductase SrCAR, in which hotspot residues at the substrate binding pockets were identified via protein–ligand interaction fingerprinting analysis based on the generated MD simulation trajectories. The resultant mutants showed highly enhanced activities toward the model substrate benzoic acid [23, 24].
2
Materials 1. A Unix-based (e.g., Ubuntu 16.04 system) workstation that can run MD simulations and other computational programs was used in this protocol.
52
Ge Qu and Zhoutong Sun
2. A GPU version of the AMBER (Assisted Model Building and Energy Refinement) molecular dynamics package 2016 [36] comprised of AmberTools 16 was adopted for MD computations. The GPU driver is version 375.20 and the CUDA toolkit is version 8.0. AMBER is distributed under a license from the University of California. After purchasing, AMBER can be downloaded and installed using Fortran 95, C or C++ compilers under the instructions in the Amber Reference Manual at https://ambermd.org/Manuals.php. 3. Schro¨dinger Maestro (release 2015-2) was adopted to prepare ligand structures [37]. As a powerful, molecular visualization program, Maestro is free for academic users; it can be downloaded from https://www.schrodinger.com/academiclicensing. 4. VMD (Visual Molecular Dynamics) software (http://www.ks. uiuc.edu/Research/vmd) [38] was used for the visualizations of MD trajectories and structures. In addition, PyMOL [39] and UCSF Chimera [40] were also alternatively used for structure visualization and preparation. 5. CPPTRAJ [41] was carried out to analyze MD trajectories. 6. PLIP (Protein–Ligand Interaction Profiler) program was employed for detection and visualization of relevant noncovalent protein-ligand contacts. This is available for download from https://github.com/pharmai/plip/releases. Unpack the compressed file and install it. 7. H++ webserver (http://biophysics.cs.vt.edu/) was used to compute pK values of ionizable groups in apoprotein and to add missing hydrogen atoms according to the specified pH of the environment [42]. 8. X-ray structures used for the initial system setup were fetched from the Protein Data Bank (http://www.rcsb.org). PDB code 1YKF was used for TbSADH model creation, while 5MSS and 5MSV were used for thiolation and reduction modeling in regard to SrCAR model creation. If no PDB structure available, homology modeling is recommended (see Note 1).
3
Methods
3.1 Structure Preparation
In the case study of conformational dynamics-guided engineering of TbSADH. 1. From the PDB website, download the X-ray structure 1YKF of wild-type TbSADH (1ykf.pdb), which needs to be used as the basis for model creation.
In Silico Prediction Methods for Site-Saturation Mutagenesis
53
2. Open the 1ykf.pdb file with any kind of text editor tools (e.g., sublime, gedit, etc.), delete the lines of cofactor NADP+ and zinc ion, and save it as apo.pdb. 3. Do protonation of the apoprotein by using H++ webserver [42]. Its website has links to the query interface (by clicking “PROCESS A STRUTURE”) in which detailed explanations of the server usage with examples (by clicking “EXAMPLES”) are also provided. Upon uploading the apoprotein structure file (apo.pdb), we can set the protonation parameters, here we set pH value as 7.4 to mimic the experimental conditions. After processing a few minutes (1–2 min in this case), we can get the resultant PDB structure in the predicted protonation state; then download the file and save it as wt_dry.pdb. Alternatively, we can also fetch the .top and .crd formatted files 0.15_80_10_pH7.4_apo.top and 0.15_80_10_pH7.4_apo.crd, and perform the following command to generate the structure file wt_dry.pdb. The latter way is recommended. ambpdb -p 0.15_80_10_pH7.4_apo.top -c 0.15_80_10_pH7.4_apo. crd > wt_dry.pdb
3.2
MD Simulations
1. Building topology and coordinates files of TbSADH solvated system. Create a file containing the following lines, and save it as a new file tleap.in. source leaprc.ff14SB source leaprc.gaff #load pdb file mol = loadpdb wt_dry.pdb #put 10 Å-buffer of TIP3P water around the system solvatebox mol TIP3PBOX 10.0 #Neutralize the system addions mol Na+ 0 addions mol Cl- 0 #save topology and coordinate files savepdb mol wt_solv.pdb saveamberparm mol wt_solv.prmtop wt_solv.inpcrd quit
Once it is prepared, run the following command: tleap -s -f tleap.in
As a result, the topology (wt_solv.prmtop) and coordinate (wt_solv.inpcrd) files are saved in the prmtop and inpcrd AMBER format, respectively.
54
Ge Qu and Zhoutong Sun
2. Energy minimization. Before molecular dynamics simulation, the system needs to be relaxed to remove bad contacts created by solvation. It includes four sequential steps to relax proton, solvent, side chains, and the entire system, respectively. Firstly, prepare below files min1.in, min2.in, min3.in, and min4.in. min1.in #Minimization 1 - protons &cntrl imin=1, ntx=1, maxcyc=2000, ncyc=1000, cut=10.0, ntb=1, ntpr=100, ntr=1, restraintmask=’:!@H=’, restraint_wt=10, / min2.in #Minimization 2 - solvent &cntrl imin=1, ntx=1, maxcyc=2000, ncyc=1000, cut=10.0, ntb=1, ntpr=100, ntr=1, restraintmask=’:1-352’, restraint_wt=10, / min3.in #Minimization 3 - side chains &cntrl imin=1, ntx=1, maxcyc=2000, ncyc=1000, cut=10.0, ntb=1, ntpr=100, ntr=1, restraintmask=’:1-352@CA,N,C,O’, restraint_wt=10, / min4.in #Minimization 4 - all atoms &cntrl imin=1, maxcyc=10000, ncyc=5000, cut=10.0, ntb=1, /
In Silico Prediction Methods for Site-Saturation Mutagenesis
55
Next, run the following command: pmemd.cuda -O -i min1.in -p wt_solv.prmtop -c wt_solv. inpcrd -o min1.out -r min1.rst -ref wt_solv.inpcrd && pmemd. cuda -O -i min2.in -p wt_solv.prmtop -c min1.rst -o min2.out -r min2.rst -ref min1.rst && pmemd.cuda -O -i min3.in -p wt_solv. prmtop -c min2.rst -o min3.out -r min3.rst -ref min2.rst && pmemd.cuda -O -i min4.in -p wt_solv.prmtop -c min3.rst -o min4.out -r min4.rst -ref min3.rst
As a result, a series of output files will be generated, including the file min4.rst. 3. Heating the system up under NVT conditions. The minimized system needs to be incrementally heated from 0 to 303 K (30 C, the optimal reaction temperature of TbSADH) for 50 ps. A weak restraint of 10 kcal mol1 Å2 on the protein residues has to be carried out. Create a new file named as heat. in. heat.in &cntrl imin=0,irest=0,ntx=1,irest=0, nstlim=50000,dt=0.001, ntc=2,ntf=2, cut=10.0,iwrap=1, ntb=1,ntp=0, ntpr=500,ntwx=500, ntt=3,gamma_ln=2.0, tempi=0,ig=-1,
temp0¼303, # here we can set reference temperature, e.g., 303 for 30 C, or 333 for 60 C ntr=1, restraintmask=’:1-352’, restraint_wt=10, nmropt=1, / &wt TYPE=’TEMP0’, istep1=0, istep2=50000, value1=0.0, value2=300.0 / &wt TYPE=‘end’ /
Once the heat.in file is prepared, run the following command, and the generated file heat.rst will be used in the next step. pmemd.cuda -O -i heat.in -p wt_solv.prmtop -c min4.rst -o heat.out -r heat.rst -x heat.mdcrd -ref min4.rst
56
Ge Qu and Zhoutong Sun
4. Density equilibration (NPT ensemble). Next, maintain the heated system for 50 ps of density equilibration under NPT conditions at constant temperature of 300 K and pressure of 1.0 atm using Langevin-thermostat with a collision frequency of 2 ps1 and pressure relaxation time of 1 ps. A weak restraint of 10 kcal mol1 Å2 on the protein residues were carried out. Create a file containing following lines, and save it as density.in. density.in &cntrl imin=0,irest=0,ntx=5, nstlim=25000,dt=0.001, ntc=2,ntf=2, cut=10.0,iwrap=1, ntb=2,ntp=1,taup=2.0, ntpr=500,ntwx=500,ntwr=5000, ntt=3,gamma_ln=2.0, temp0=303,ig=-1, ntr=1, restraintmask=’:1-352’, restraint_wt=10, /
Run the following command, and use the generated file density.rst in the next step. pmemd.cuda -O -i density.in -p wt_solv.prmtop -c heat.rst -o density.out -r density.rst -x density.mdcrd -ref heat.rst
5. Unrestrained equilibration. After removal of all restraints, equilibrate the system for further 10 ns to get well-settled pressure and temperature. Create a file containing following lines, and save it as equil.in. equil.in &cntrl imin=0,irest=1,ntx=5, nstlim=1000000,dt=0.001, ntc=2,ntf=2, cut=10.0,iwrap=1, ntb=2,ntp=1,taup=5.0, ntpr=5000,ntwx=5000,ntwr=500000, ntt=3,gamma_ln=2.0, temp0=300,ig=-1, /
In Silico Prediction Methods for Site-Saturation Mutagenesis
57
Once the equil.in file is prepared, run the following command, and use the output file equil.rst in the next step. pmemd.cuda -O -i equil.in -p wt_solv.prmtop -c density.rst -o equil.out -r equil.rst -x equil.mdcrd -ref density.rst
6. MD production. Run a productive MD simulation of 100 ns with random initial velocities (see Note 2). Create a file containing following lines, and save it as prod.in. prod.in &cntrl imin=0,irest=1,ntx=1, nstlim=100000000,dt=0.001, ntc=2,ntf=2, cut=10.0,iwrap=1, ntb=2,ntp=1, ntpr=1000,ntwx=1000,ntwr=500000, ntt=3,gamma_ln=2.0, temp0=303,ig=-1, /
Afterwards, run the following command, and a 100 ns trajectory (prod.mdcrd) will be generated. pmemd.cuda -O -i prod.in -p wt_solv.prmtop -c equil.rst -o prod.out -r prod.rst -x prod.mdcrd -ref equil.rst
3.3 Trajectory Analysis
CPPTRAJ program from AmberTools 16 is adopted to analyze the generated trajectory from the last step. Firstly strip all solvent and neutralize ions to keep the trajectory file slim by running the following command in the CPPTRAJ program. >parm wt_solv.prmtop >trajin prod.mdcrd >strip :WAT >strip @Na+ >trajout prod_dry.mdcrd
The file prod_dry.mdcrd will be generated, and it is much smaller than prod.mdcrd in size, because it only contains the coordinate information of protein atoms. Next, make the coordinate RMSF (root mean square fluctuation) analysis, which measures the atom’s or residue’s variation over the whole trajectory. The atomic fluctuation calculation averaged by residue allows the identification of the flexible regions, and it can be executed as follows:
58
Ge Qu and Zhoutong Sun >parm wt_dry.prmtop >trajin prod_dry.mdcrd >rms first >average crdset MyAvg >rms ref MyAvg >atomicfluct out rmsf.agr @C,CA,N byres >atomicfluct out rmsf.dat @C,CA,N byres
This will provide a graphic file rmsf.agr and a text-type file rmsf. dat. If you are not pleased with the default graphic representation, then you can customize and visualize it in your own way by using the latter text-type file. 3.4
Replica Analysis
3.5 Structure Preparation
In order to avoid false positive conclusions caused by a single MD run [43], it is necessary to perform multiple simulations derived from identical atomic coordinates. As such, six independent ensembles are carried out by repeatedly running step 6 in this case study. For each replica, do the same trajectory analysis in step 7 (Fig. 2a). In addition, do the same work under temperature 60 C, it can be done by setting the parameter temp0 ¼ 333 as described in step 3. Also run 6 100 ns trajectories in 60 C, and make the same RMSF analysis (Fig. 2b). After averaging the six replicas of MD runs in 30 C and 60 C, respectively, the result shown in Fig. 3 is obtained. Among all the eight residues (S39, A85, I86, W110, Y267, M285, L294, and C295) lining the respective binding pockets, the RMSF values of sites A85, I86, L294, and C295 are less than 1 Å, indicating that they are relatively rigid and may impose restrictions on substrate recognition [22]. Therefore, in this way, the hotspots residues A85, I86, L294, and C295 are identified as hotspots. In the case study of protein-ligand interaction analysis-guided engineering of SrCAR: 1. Download the SrCAR X-ray structure files 5MSS.pdb and 5MSV.pdb from PDB database, which correspond to thiolation and reduction modeling, respectively. Download other X-ray structures for model construction, including 5MSD that is co-crystalized with the model substrate benzoic acid, 3NYQ containing methylmalonyl-CoA in the N-terminal adenylation domain, as well as 1W6U soaked with hexanoyl-CoA and NADPH in the C-terminal reductase domain. 2. Completing missing regions. As shown in Fig. 4, the original downloaded structures 5MSS and 5MSV missed some residues. Therefore, the missing residues need to be completed in the next step. Taking 5MSS.pdb as an example, open it by UCSF Chimera, click on Tools ! Structure Editing !Model/
In Silico Prediction Methods for Site-Saturation Mutagenesis
59
Fig. 2 Six replicas of 100 ns MD simulations (with random initial velocities) for the apo WT enzyme TbSADH at (a) 30 C and (b) 60 C. (Adapted from Ref. 22) loop B
6
30 °C 60 °C
RMSF (Å)
5 4
M285
loop A 3
Y267 loop C
W110
S39
2
L294, C295
A85, I86 1 0 0
50
100
150 200 Residue number
250
300
350
Fig. 3 Average RMSF values of all residues calculated from MD simulations for the apo WT enzyme TbSADH at 30 C and 60 C. Key residues lining the substrate binding pocket are indicated by black arrows. (Adapted from Ref. 22)
60
Ge Qu and Zhoutong Sun
Fig. 4 Missing SrCAR residues in the PDB IDs 5MSS and 5MSV. (a) 5MSS missing residues 296–300, 664–665, and 685–688; (b) 5MSV missing residues 685–688 and 746–747, indicated by red arrows. (Adapted from Ref. 24)
Refine Loops. In the Model Loops/Refine Structure window, click all missing structure and one, then input the Modeller license key (see Note 3) and click OK to start completing. Save the generated structure as 5mss_complete.pdb. Do the same procedure for 5MSV structure, and obtain 5msv_complete.pdb. 3. Open the 5mss_complete.pdb file with any kind of text editor tools (e.g., sublime, gedit, etc.), remove the cofactor lines, and save it as 5mss-apo.pdb. Do the same and get the file 5msv-apo. pdb. 4. Protonation of the apoprotein. Do the same procedure as described in Subheading 3.1, step 3, and get the predicted protonation state results of 5MSS and 5MSV, respectively. Save them as 5mss_dry.pdb and 5msv_dry.pdb. 5. Ligand preparation. The thiolation model consists of three components: 5MSS apoenzyme, benzoic-50 -AMP, and phosphopantetheinyl group. As the apoenzyme has been prepared in the above step, the benzoic-50 -AMP and phosphopantetheinyl group need to be built. Run PyMOL program, click Tools ! File ! Open 5MSS.pdb and 5MSD.pdb, align 5MSD to 5MSS by typing “align 5MSD, 5MSS,” select the benzoic acid moiety in 5MSD and AMP in 5MSS, click File ! Export Molecule ! Selection Sele, and save it as ASU.pdb. Close PyMOL and run Schro¨dinger Maestro program, click Import, choose ASU.pdb file and open it (Fig. 5a), add the missing H atoms manually on the benzoic-50 -AMP complex by clicking Edit ! Add H. Click Tasks ! Minimization ! Force-Field, and click Run with the default parameters. After 2 min, the energy minimized benzoic-50 -AMP complex will be obtained (Fig. 5b); save it as
In Silico Prediction Methods for Site-Saturation Mutagenesis
61
Fig. 5 Preparation of ligands. (a) AMP and benzoic acid fetched from PDB code 5MSS and 5MSD; (b) Optimized benzoic-50 -AMP complex; (c) methylmalonyl-CoA moiety in 3NYQ; (d) the generated PPT moiety
ASU-mae.mol2. In the same way, generate the phosphopantetheinyl group as below: Run PyMOL program, click Tools ! File ! Open 5MSS.pdb and 3NYQ.pdb, align 3NYQ to 5MSS by typing “align 3NYQ, 5MSS,” select methylmalonyl-CoA moiety in 3NYQ (Fig. 5c), and save it as PPT.pdb. Close PyMOL and run Schro¨dinger Maestro program, click Import, open PPT.pdb, delete the oxygen atom of phosphate linked to Ser residue, and also delete the methylmalonyl group by clicking Build ! Delete. After Add H operation, minimize the structure and save it as PPT-mae.mol2 (Fig. 5d). The reduction system also includes three components: 5MSV apoenzyme, NADPH, and benzoic-CoA group. The latter is prepared in Subheading 3.5, step 4. For NADPH, open a terminal and type grep NAP 5MSV.pdb > NAP.pdb, then download the NADPH molecule parameters from the AMBER parameter database (http://www.pharmacy. manchester.ac.uk/bryce/amber/). Search the row NADPH in the cofactors table, and download the PREP and FRCMOD files as NAP.prep and NAP.frcmod, respectively. 6. Construction of protein-ligand complex. In order to generate the structure of 5MSS apoenzyme complexed with benzoic50 -AMP, and phosphopantetheinyl group, open a terminal and type cat 5mss_apo.pdb ASU-mae.pdb PPT-mae.pdb > 5mss_complex.pdb, then get the pdb file 5mss_complex.pdb.
62
Ge Qu and Zhoutong Sun
In the same way, get the pdb file 5msv_complex.pdb related to 5msv complex. 7. Preparing other files for ligands and non-standard residue. Firstly, AM1-BCC charge method is used to generate .mol2 files for ligands that can be recognized by AMBER. In the case of benzoic-50 -AMP, run following command: antechamber -fi mol2 -fo mol2 -i ASU-mae.pdb -o ASU.mol2 -c bcc -pf y -nc -4
Thereafter, perform the following command to obtain the frcmod file for this ligand: parmchk2 -i ASU.mol2 -o ASU.frcmod -f mol2
In the same way, obtain .mol2 and .frcmod files for other ligands. Considering the phosphopantetheinyl group is attached to the side chain of a conserved nucleophilic serine in CAR (e.g., Ser702 in SrCAR), the hydrogen atom in the side chain of Ser702 should be deleted. In order to make AMBER recognize the modified serine residue, a new lib file with regard to the serine residue lacking hydrogen atom in its side chain needs to be prepared. A simple way to do so is modifying the ordinary serine amino acid library information (stored in the folder /../ amber16/dat/reslib/leap/) by removing the lines related to this hydrogen atom (named as “HG”), and then save the new file as SOR.lib. 3.6
MD Simulations
1. Building topology and coordinates files of SrCAR thiolation solvated systems. Create a file containing following lines, and save it as thio_tleap.in. thio_tleap.in source leaprc.ff14SB source leaprc.gaff ASU = loadmol2 ASU.mol2 PPT = loadmol2 PPT.mol2 Loadoff SOR.lib loadamberparams ASU.frcmod loadamberparams PPT.frcmod loadamberparams S-P.frcmod mol = loadpdb 5mss_complex.pdb
#create a bond between the O atom of Ser and the P atom of cofactor PPT. It is of note that Ser702 in 5MSS system is renumbered as 686.
In Silico Prediction Methods for Site-Saturation Mutagenesis
63
bond mol.686.OG mol.730.P1 savepdb mol thio_dry.pdb saveamberparm mol thio_dry.prmtop thio_dry.inpcrd solvatebox mol TIP3PBOX 10.0 addions mol Na+ 0 addions mol Cl- 0 savepdb mol thio_solv.pdb saveamberparm mol thio_solv.prmtop thio_solv.inpcrd quit
Once the tleap.in file is prepared, run the following command: tleap -s -f thio_tleap.in
Building topology and coordinates files of SrCAR reduction solvated systems. Create a file containing following lines, and save it as “red_tleap.in.” red_tleap.in source leaprc.ff14SB source leaprc.gaff PSU = loadmol2 PSU.mol2 loadoff SOR.lib loadamberprep NAP.prep loadamberparams NAP.frcmod loadamberparams PSU.frcmod loadamberparams S-P.frcmod mol = loadpdb 5msv_complex.pdb
#create a bond between the O atom of Ser702 and the P atom of cofactor PPT. It is of note that Ser702 in 5MSV system is renumbered as 35. bond mol.35.OG mol.522.P1 savepdb mol red_dry.pdb saveamberparm mol red_dry.prmtop red_dry.inpcrd solvatebox mol TIP3PBOX 10.0 addions mol Na+ 0 addions mol Cl- 0 savepdb mol red_solv.pdb saveamberparm mol red_solv.prmtop red_solv.inpcrd quit
Once the tleap.in file is prepared, run the following command: tleap -s -f red_tleap.in
64
Ge Qu and Zhoutong Sun
2. Energy minimization. It includes five sequential steps to relax proton, solvent, side chains, cofactors, and the entire system, respectively, by using the below files min1.in, min2.in, min3. in, min4.in, and min5.in. min1.in #Minimization 1 - protons &cntrl imin¼1, ntx¼1, maxcyc¼2000, ncyc¼1000, cut¼10.0, ntb¼1, ntpr¼100, ntr¼1, restraintmask¼:!@H¼, restraint_wt¼10, / min2.in #Minimization 2 - solvent &cntrl imin¼1, ntx¼1, maxcyc¼2000, ncyc¼1000, cut¼10.0, ntb¼1, ntpr¼100, ntr¼1, restraintmask¼:1-730, restraint_wt¼10, / min3.in #Minimization 3 - cofactor &cntrl imin¼1, ntx¼1, maxcyc¼2000, ncyc¼1000, cut¼10.0, ntb¼1, ntpr¼100, ntr¼1, restraintmask¼:1-728, restraint_wt¼10, / min4.in #Minimization 4 - sidechains &cntrl imin¼1, ntx¼1, maxcyc¼2000, ncyc¼1000, cut¼10.0, ntb¼1, ntpr¼100, ntr¼1, restraintmask¼:1-728@CA,N,C,O, restraint_wt¼10, /
In Silico Prediction Methods for Site-Saturation Mutagenesis
65
min5.in #Minimization 5 - all atoms &cntrl imin¼1, maxcyc¼10000, ncyc¼5000, cut¼10.0, ntb¼1, /
Once the min1.in, min2.in, min3.in, and min4.in files are prepared, run the following command: pmemd.cuda -O -i min1.in -p thio_solv.prmtop -c thio_solv. inpcrd -o min1.out -r min1.rst -ref thio_solv.inpcrd && pmemd. cuda -O -i min2.in -p thio_solv.prmtop -c min1.rst -o min2.out -r min2.rst -ref min1.rst && pmemd.cuda -O -i min3.in -p thio_solv.prmtop -c min2.rst -o min3.out -r min3.rst -ref min2.rst && pmemd.cuda -O -i min4.in -p thio_solv.prmtop -c min3.rst -o min4.out -r min4.rst -ref min3.rst && pmemd.cuda -O -i min5.in -p thio_solv.prmtop -c min4.rst -o min5.out -r min5.rst -ref min4.rst
Similarly, by changing restraintmask setting (the amino acid sequence length of 5msv is 522), obtain the min1. in~min5.in files for reduction system, and run the same command line by replacing thio_solv.prmtop to red_solv.prmtop. 3. NVT ensemble, NPT ensemble, unrestrained equilibration and MD production. The heat.in, density.in, equil.in, and prod.in files are the same as shown in TbSADH case study, except for changing restraintmask value to the length of thiolation and reduction systems, and changing nstlim value to 300,000,000 in order to generate 300 ns of productive MD runs. 3.7 Trajectory Analysis
Following procedures in Subheading 3.3, get the dried trajectory files of thiolation and reduction systems, name them as thio_prod_dry.mdcrd and red_prod_dry.mdcrd, respectively. Next, select 300 frames at 1 ns intervals from each of the generated .mdcrd file by running the following command lines in CPPTRAJ program. It will generate 300 PDB files (named from thio-stru.pdb.1 to thio-stru.pdb.300). >parm thio_dry.prmtop >trajin thio_prod_dry.mdcrd 1 300000 1000 >trajout thio-stru.pdb pdb multi >run
Thus, the generated PDB files can be used to analyze noncovalent protein-substrate interactions by PLIP program. To do so, either upload each PDB file to PLIP online web page (https://plip. biotec.tu-dresden.de/plip-web/plip/index), or using the local
66
Ge Qu and Zhoutong Sun
S702
(a)
(b)
S702
PPT PPT-Sub NADPH K528
S408
A937
G430
Y519
Q1015
AMP-Sub T935
G432 R522
Y431
D507
V936
T505 T434
Y431
M999
T434
V936 R522
Y519
20
40
60
80
M999
100 20
40
K528
D507
T505
S408 G430
G432
60
80
100
A937
Q1015 T935
Fig. 6 Hotspots identified in thiolation stage (a) and in reduction system (b) using SrCAR. (Adapted from [23, 24])
version. After analyzing the PDB files, the residues that are mostly in contact with the substrate can be statistically obtained, as shown in Fig. 6. The predicted hotspots can then be designed for sitesaturation mutagenesis with the aim to improve catalytic activity [23]. In summary, hotspots positions can be efficiently predicted both by conformational dynamics in the TbSADH case study (Fig. 3) and protein-ligand interaction fingerprinting analysis in the SrCAR case study (Fig. 6). There are also many other in silico methods to predict hotspots for site-directed mutagenesis [17], which are not discussed here.
In Silico Prediction Methods for Site-Saturation Mutagenesis
4
67
Notes 1. If the crystallized protein structure is not available, homology modeling can also give a reliable starting structure, which can be achieved by using MODELLER, I-TASSER [44] or other tools. 2. Random initial velocities are recommended when preparing multiple independent replicas. 3. The MODELLER license key can be requested at the web page https://salilab.org/modeller/registration.html after filling up the license agreement, and it is free for academic users.
Acknowledgments This work was supported by the National Key Research and Development Program of China (2019YFA0905100, 2018YFA0901900), Tianjin Synthetic Biotechnology Innovation Capacity Improvement Project (TSBICIP-CXRC-009, TSBICIPKJGG-003), and Youth Innovation Promotion Association of CAS (2021175). References 1. Clomburg JM, Crumbley AM, Gonzalez R (2017) Industrial biomanufacturing: the future of chemical production. Science 355:aag0804 2. Qu G, Li A, Acevedo-Rocha CG, Sun Z, Reetz MT (2020) The crucial role of methodology development in directed evolution of selective enzymes. Angew Chem Int Ed 59 (32):13204–13231 3. Sheldon RA, Pereira PC (2017) Biocatalysis engineering: the big picture. Chem Soc Rev 46:2678–2691 4. Arnold FH (2018) Directed evolution: bringing new chemistry to life. Angew Chem Int Ed 57(16):4143–4148 5. Romero PA, Arnold FH (2009) Exploring protein fitness landscapes by directed evolution. Nat Rev Mol Cell Biol 10(12):866–876 6. Xiao H, Bao Z, Zhao H (2015) High throughput screening and selection methods for directed enzyme evolution. Ind Eng Chem Res 54(16):4011–4020 7. Hauer B (2020) Embracing Nature’s catalysts: a viewpoint on the future of biocatalysis. ACS Catal 10(15):8418–8427 8. Qu G, Lonsdale R, Yao P, Li G, Liu B, Reetz MT, Sun Z (2018) Methodology development
in directed evolution: exploring options when applying triple-code saturation mutagenesis. Chembiochem 19(3):239–246 9. Ebert MC, Pelletier JN (2017) Computational tools for enzyme improvement: why everyone can—and should—use them. Curr Opin Chem Biol 37:89–96 10. Zeymer C, Hilvert D (2018) Directed evolution of protein catalysts. Annu Rev Biochem 87 (1):131–157 11. Sun Z, Liu Q, Qu G, Feng Y, Reetz MT (2019) Utility of B-factors in protein science: interpreting rigidity, flexibility, and internal motion and engineering thermostability. Chem Rev 119(3):1626–1665 12. Li A, Qu G, Sun Z, Reetz MT (2019) Statistical analysis of the benefits of focused saturation mutagenesis in directed evolution based on reduced amino acid alphabets. ACS Catal 9 (9):7769–7778 13. Acevedo-Rocha CG, Reetz MT (2016) Handling the numbers problem in directed evolution. In: Understanding Enzymes: Function, Design, Engineering and Analysis. Pan Stanford Publishing Pte. Ltd., Singapore 14. Zaugg J, Gumulya Y, Gillam EM, Bode´n M (2014) Computational tools for directed
68
Ge Qu and Zhoutong Sun
evolution: a comparison of prospective and retrospective strategies. Methods Mol Biol 1179:315–333 15. Damborsky J, Brezovsky J (2014) Computational tools for designing and engineering enzymes. Curr Opin Chem Biol 19:8–16 16. Mazurenko S, Prokop Z, Damborsky J (2020) Machine learning in enzyme engineering. ACS Catal 10(2):1210–1223 17. Sebestova E, Bendl J, Brezovsky J, Damborsky J (2014) Computational tools for designing smart libraries. In: Gillam EMJ, Copp JN, Ackerley D (eds) Directed evolution library creation: methods and protocols. Springer New York, New York, NY, pp 291–314 18. Sun Z, Lonsdale R, Wu L, Li G, Li A, Wang J, Zhou J, Reetz MT (2016) Structure-guided triple-code saturation mutagenesis: efficient tuning of the Stereoselectivity of an epoxide hydrolase. ACS Catal 6(3):1590–1597 19. Sun Z, Lonsdale R, Ilie A, Li G, Zhou J, Reetz MT (2016) Catalytic asymmetric reduction of difficult-to-reduce ketones: triple-code saturation mutagenesis of an alcohol dehydrogenase. ACS Catal 6(3):1598–1605 20. Xu J, Cen Y, Singh W, Fan J, Wu L, Lin X, Zhou J, Huang M, Reetz MT, Wu Q (2019) Stereodivergent protein engineering of a lipase to access all possible stereoisomers of chiral esters with two Stereocenters. J Am Chem Soc 141(19):7934–7945 21. Moore JC, Rodriguez-Granillo A, Crespo A, Govindarajan S, Welch M, Hiraga K, Lexa K, Marshall N, Truppo MD (2018) “Site and mutation”-specific predictions enable minimal directed evolution libraries. ACS Synth Biol 7 (7):1730–1741 22. Liu B, Qu G, Li J-K, Fan W, Ma J-A, Xu Y, Nie Y, Sun Z (2019) Conformational dynamics-guided loop engineering of an alcohol dehydrogenase: capture, turnover and enantioselective transformation of difficult-toreduce ketones. Adv Synth Catal 361 (13):3182–3190 23. Qu G, Liu B, Zhang K, Jiang Y, Guo J, Wang R, Miao Y, Zhai C, Sun Z (2019) Computer-assisted engineering of the catalytic activity of a carboxylic acid reductase. J Biotechnol 306:97–104 24. Qu G, Fu M, Zhao L, Liu B, Liu P, Fan W, Ma JA, Sun Z (2019) Computational insights into the catalytic mechanism of bacterial carboxylic acid reductase. J Chem Inf Model 59 (2):832–841 25. Hanoian P, Liu CT, Hammes-Schiffer S, Benkovic S (2015) Perspectives on electrostatics
and conformational motions in enzyme catalysis. Acc Chem Res 48(2):482–489 26. Kreß N, Halder JM, Rapp LR, Hauer B (2018) Unlocked potential of dynamic elements in protein structures: channels and loops. Curr Opin Chem Biol 47:109–116 27. Singh P, Francis K, Kohen A (2015) Network of remote and local protein dynamics in dihydrofolate reductase catalysis. ACS Catal 5 (5):3067–3073 28. Wang Z, Abeysinghe T, Finer-Moore JS, Stroud RM, Kohen A (2012) A remote mutation affects the hydride transfer by disrupting concerted protein motions in thymidylate synthase. J Am Chem Soc 134 (42):17722–17730 29. Ouedraogo D, Souffrant M, Vasquez S, Hamelberg D, Gadda G (2017) Importance of loop L1 dynamics for substrate capture and catalysis in Pseudomonas aeruginosa d-arginine dehydrogenase. Biochemistry 56 (19):2477–2487 30. Yang B, Wang H, Song W, Chen X, Liu J, Luo Q, Liu L (2017) Engineering of the conformational dynamics of lipase to increase enantioselectivity. ACS Catal 7:7593–7599 31. Han S-S, Kyeong H-H, Choi JM, Sohn Y-K, Lee J-H, Kim H-S (2016) Engineering of the conformational dynamics of an enzyme for relieving the product inhibition. ACS Catal 6:8440–8445 32. Parra-Cruz R, Jager CM, Lau PL, Gomes RL, Pordea A (2018) Rational design of thermostable carbonic anhydrase mutants using molecular dynamics simulations. J Phys Chem B 122 (36):8526–8536 33. Qu G, Liu B, Jiang Y, Nie Y, Yu H, Sun Z (2019) Laboratory evolution of an alcohol dehydrogenase towards enantioselective reduction of difficult-to-reduce ketones. Bioresour Bioprocess 6(1):18 34. Chan HCS, Li Y, Dahoun T, Vogel H, Yuan S (2019) New binding sites, new opportunities for GPCR drug discovery. Trends Biochem Sci 44(4):312–330 35. Salentin S, Schreiber S, Haupt VJ, Adasme MF, Schroeder M (2015) PLIP: fully automated protein-ligand interaction profiler. Nucleic Acids Res 43(W1):W443–W447 36. Case DA, Betz RM, Cerutti DS, Cheatham TE III, Darden TA, Duke RE, Giese TJ, Gohlke H, Goetz AW, Homeyer N, Izadi S, Janowski P, Kaus J, Kovalenko A, Lee TS, LeGrand S, Li P, Lin C, Luchko T, Luo R, Madej B, Mermelstein D, Merz KM, Monard G, Nguyen HT, Nguyen HT, Omelyan I, Onufriev A, Roe DR, Roitberg A, Sagui C, Simmerling CL,
In Silico Prediction Methods for Site-Saturation Mutagenesis Botello-Smith WM, Swails J, Walker RC, Wang J, Wolf RM, Wu X, Xiao L, Kollman PA (2016) AMBER 2016. University of California, San Francisco 37. Schro¨dinger Release 2015-2: Maestro, Schro¨dinger, LLC, New York, NY, 2015 38. Humphrey W, Dalke A, Schulten K (1996) VMD: visual molecular dynamics. J Mol Graph 14(1):33–38; 27–38 39. The PyMOL Molecular Graphics System, Version 174 Schro¨dinger, LLC 40. Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, Ferrin TE (2004) UCSF chimera--a visualization system for exploratory research and analysis. J Comput Chem 25(13):1605–1612
69
41. Roe DR, Cheatham Iii TE (2013) PTRAJ and CPPTRAJ: software for processing and analysis of molecular synamics trajectory data. J Chem Theory Comput 9(7):3084–3095 42. Ramu A, Boris A, Onufriev AV (2012) H++ 3.0: automating pK prediction and the preparation of biomolecular structures for atomistic molecular modeling and simulations. Nucleic Acids Res 40(Web Server issue):537–541 43. Knapp B, Ospina L, Deane CM (2018) Avoiding false positive conclusions in molecular simulation: the importance of replicas. J Chem Theory Comput 14(12):6127–6138 44. Zhang Y (2008) I-TASSER server for protein 3D structure prediction. BMC Bioinformatics 9(1):40
Chapter 5 Recombination of Compatible Substitutions by 2GenReP and InSiReP Haiyang Cui, Mehdi D. Davari, and Ulrich Schwaneberg Abstract The CompassR rule enables to identify the beneficial substitutions, which can be recombined in directed evolution with gradually improving the enzymatic properties. However, the question of how to efficiently explore the protein sequence space when ten or more beneficial substitutions are identified has not yet been addressed. Two recombination strategies 2GenReP and InSiReP employing CompassR are systematically investigated to minimize experimental efforts and maximize possible improvements. Here we describe the details of the 2GenReP and InSiReP procedure with an example of recombining 15 substitutions and discuss some important practical issues that should be considered for the application of 2GenReP and InSiReP, such as placing the substitutions into subsets. The core part of the protocol (Step1 to Step5) is transferable to other enzymes and any recombination of potential substitutions. Key words Recombination strategy, Protein engineering, Directed evolution, substitution, CompassR
1
Introduction Directed evolution is a powerful tool to tailor enzymes for a wide range of industrial applications, which is documented by the Nobel Prize in chemistry in 2018 [1–4]. Besides the high-throughput screening challenge [3], the main limitation that researchers face in directed enzymes evolution is how to recombine the many beneficial positions/substitutions obtained in directed evolution and (semi-)rational campaigns. CompassR (Computer-assisted Recombination) was developed as a filter tool to identify compatible beneficial substitutions that can likely be combined in a beneficial manner [5–7]. CompassR was postulated by analysis of the stability-function tradeoff between the relative free energy of folding (ΔΔGfold) and enzymatic activity [5]. However, the question of how to efficiently explore the protein sequence space when
Francesca Magnani, Chiara Marabelli and Francesca Paradisi (eds.), Enzyme Engineering: Methods and Protocols, Methods in Molecular Biology, vol. 2397, https://doi.org/10.1007/978-1-0716-1826-4_5, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2022
71
72
Haiyang Cui et al.
numerous beneficial substitutions (generally >10) are identified has not yet been addressed. Considering the vastness of the potential protein sequence space, recombining all the theoretical possibilities is a numbers game beyond current screening possibilities. And the knowledge about “compatible” substitutions that can be recombined is still demanded [6, 8] . Many directed enzyme evolution campaigns (e.g., lipase [9], sortase [10], monooxygenases [11]) show that only a fraction of potentially beneficial substitutions (0.5–17% [6]) is recombined. To achieve the goal of minimizing experimental efforts and maximizing property improvements, two recombination strategies, 2GenReP (two-gene recombination process) and InSeReP (in silico guided recombination process), were introduced to recombine beneficial substitutions selected by CompassR analysis [6]. 2GenReP is a purely experimental strategy, while InSeReP employs after initial experimental gene recombinations a computationally guided strategy to recombine the beneficial recombinations based on the obtained experimental results. The workflow of 2GenReP and InSiReP recombination strategies can be divided into five steps from identifying compatible and beneficial substitutions to significantly improved variants (see Fig. 1). Step 1 describes the identification of beneficial positions from directed evolution and/or (semi-)rational design studies. Step 2 applies the computational filter based on the CompassR rule, followed by grouping/clustering the beneficial positions on genes considering limitations of PCR-based recombination methods (Step 3). Step 4A & 4B describe the two recombination strategies: (4A) the two-gene recombination process (2GenReP) and (4B) the in silico guided recombination process (InSeReP). In this chapter, fifteen CompassR-guided beneficial substitutions, which were identified from the “BSLA-SSM” library (Bacillus subtilis Lipase A denoted as BSLA) previously and had improved 1,4-dioxane cosolvent (DOX, 22% (v/v)) resistance [12, 13], were chosen for the demonstration of 2GenReP and InSiReP recombination practices. After screening of less than 300 clones, a remarkable multiple organic solvent resistant BSLA variant I12R/Y49R/E65H/ N98R/K122E/L124K with six substitutions was achieved throughput the recombination rounds in both strategies. The BSLA variant has a 14.6-fold improvement in 50% (v/v) DOX, a 6.0-fold in 60% (v/v) acetone, a 2.1-fold in 30% (v/v) ethanol, and a 2.4-fold in 60% (v/v) methanol resistance. In summary, 2GenReP (purely experimental) and InSiReP (in silico guided) are highly suitable strategies to explore the recombination potential of amino acid positions/substitution that were identified in directed evolution campaigns and/or rational design.
Recombination of Compatible Substitutions by 2GenReP and InSiReP
73
Fig. 1 Overview of 2GenReP and InSiReP recombination strategies. (Step 1) Identification of beneficial substitutions by directed evolution and/or (semi-)rational design; (Step 2) Pool of beneficial substitutions filtered by the CompassR rule which ensures compatibility in recombination; (Step 3) Grouping of genes with beneficial substitutions depending on position feature of substitutions on the amino acid sequence. (Step 4A) In 2GenReP, the “best” recombinants, identified from the two-gene recombination experiments, were selected as the parent for the next two-gene recombination until highly improved recombinants were obtained. (Step 4B) In the in silico guided recombination process (InSiReP), three two-gene recombinations experiments were performed and screened. Identified improved recombinants with on average < 4 substitutions were subsequently recombined in silico and ranked according to their thermodynamic stability (ΔΔGfold). The top 10% ranked recombinations were genetically constructed, expressed, and experimentally validated. In (Step 5), highly improved recombinants of both approaches were produced, purified, and compared with respect to organic solvent resistance and activity improvements. (Reprinted from [6] with permission)
74
2 2.1
Haiyang Cui et al.
Materials Substitution List
A list of beneficial substitutions, located in CompassR category A (ΔΔGfold < +0.36 kcal/mol) or B (+0.36 kcal/ mol ΔΔGfold +7.52 kcal/mol), should be prepared for recombination experiments (e.g., I12R, Y49R, E65H, N98R, T180R, K122D, K122E, L124K, L124R, Y129R, M137H, M137K, N138R, N138H, N140H) (see Note 1).
2.2 Recombination Library Generation
The primers or genes harboring multiple substitutions applied for mutagenesis were designed, according to the substitutions in the classified subsets (see Note 2). The gene harboring certain substitutions (e.g., bsla gene with five mutations) was chemically synthesized. The PLICing method was used to clone the recombination variant DNA of BSLA into the pET22b(+) vector as our previous work [5, 14, 15] (see Note 3).
2.3 The Initial Enzyme Structure
To calculate the value of ΔΔGfold required for the application of CompassR and stability analysis, one wild-type protein structure (e.g., PDB ID: 1i6w, Chain A, resolution 1.5 Å) and all software (i.e., FoldX and YASARA Structure) under operating systems (e.g., Microsoft Windows 10) should be obtained (see Note 4).
3
Methods Step 1. Identification of Beneficial substitutions by Directed evolution and/or (Semi-)Rational Design In our previous study [16], in total, 159 single beneficial BSLA variants (at 75 amino acid positions) with improved DOX resistance were identified in the “BSLA-SSM” library (see Note 5). Step 2. The Pool of Beneficial substitutions Filtered by the CompassR Rule which Ensures Compatibility in Recombination Out of 159 variants, 15 compatible beneficial substitutions at eleven positions were identified by CompassR analysis and employed based on the following criteria: (1) substitutions should be located in category A (ΔΔGfold < +0.36 kcal/mol) or B (+0.36 < ΔΔGfold < +7.52 kcal/mol) based on the CompassR rule (see Note 6); (2) substitutions are distributed across in the gene sequence and multiple substitutions per position are possible. The ΔΔGfold values of the 15 substitutions ranged from 4.43 to +2.50 kcal/mol. According to the CompassR rule, eleven substitutions were in category A, which are surely stabilizing and yield very likely improved recombinants [6]. Substitutions at four positions were located in category B, with destabilizing values that are likely tolerated by BSLA fold [6]. Step 3. Grouping/Clustering substitutions on Different Gene
Recombination of Compatible Substitutions by 2GenReP and InSiReP
75
Fifteen compatible beneficial substitutions are placed in two types of subsets as follows: (1) Clustered substitutions: substitutions in distance 9 amino acids on sequence were grouped into the same subset; and (2) Isolated substitutions: substitutions in the distance >9 amino acids were grouped. Following the abovedescribed classification, 15 substitutions were placed into three subsets as follows: Subset 1 comprised five isolated substitutions (I12R, Y49R, E65H, N98R; T180R); Subset 2 and subset 3 comprised of five clustered substitutions (K122D, K122E, L124K, L124R, Y129R; subset 2) and (M137H, M137K, N138R, N138H, N140H; subset 3), respectively. Multi-site-directed mutagenesis method [17] and StEP method [18] were applied to recombine isolated and clustered substitutions, respectively. The two-step StEP PCR protocol is shown in Table 1. Multi-site-directed mutagenesis (MSDM) libraries were stepwise constructed by PCR according to the QuikChange site-directed mutagenesis (SDM) method [17]. Step 4A. Two-Gene Recombination Process (2GenReP). In the 2GenReP, the “best” recombinant from a two-gene recombination experiment is purely experimentally recombined with the next subset iteratively (see Fig. 1; 4A). Three recombination libraries including one StEP library (WT + subset 1) and two MSDM libraries (“best” + subset 2; “best” + subset 3) were transformed and expressed in Escherichia coli BL21-Gold (DE3) using standard methods. Table 2 and Fig. 2 provide an overview of the performance of the 2GeneRep strategy. DOX resistance of recombinants increased gradually by applying the 2GenReP strategy, e.g., I12R/Y49R/E65H/N98R (2.7-fold improvement compared to BSLA WT; Fig. 2), I12R/Y49R/E65H/N98R/K122E/L124K (4.3-fold, Fig. 2). Step 4B. In Silico Guided Recombination Process (InSiReP). Table 1 Two-step StEP PCR program Program
Step
Temperature C
Time
Cycles
First
1 2
96 C 94 C 55 C 4–10 C
2 min 30 s 5s Storage
1 99 2
98 C 94 C 55 C 94 C 55 C 72 C 4–10 C
2 min 30 s 10 s 30 s 5s 5 min Storage
1 30
3 Second
1 2 3 4 5
99 2 1
WT
E
F
WT
D
5–8 (5–8)
5 (3)
5 (3) Subset 3
Subset 2
Subset 1
Subset 3
5 (3)
5 (5)
Subset 2
Subset 1
5 (3)
5 (5)
9
18
18
32
18
18
32
a
16.7% (3) 22.2% (2)
9
22.2% (4)
12.5% (4)
5.5% (1)
11.1% (2)
12.5% (4)
90
90
90
90
90
90
Improvement Subset used Theoretical Number of likelihood (Number for recombination screening of improved recombination diversity clones variants)
The ratio of active clones was determined by counting the active clones (specific activity > wt + 3σ) on 96-well MTPs
2
WT
I12R/Y49R/ E65H/ N98R/ K122E/ L124K
I12R/Y49R/ E65H/N98R
WT
A
C
3
1
B
2
InSiReP
A
2GenReP 1
Number of substitutions Recombination (Number of round Library Parent template positions)
Table 2 Performances of 2GenReP and InSiReP strategy
100%
89.8%
85.1%
85.7%
63.0%
73.3%
85.7%
Active ratioa
76 Haiyang Cui et al.
Recombination of Compatible Substitutions by 2GenReP and InSiReP
77
Fig. 2 Recombination path and DOX resistance of improved BLSA variants relative to WT in 2GenReP (left) and InSiReP (right) strategies. 2GenReP: the variants identified from library A (round I, WT recombines with subset 1), B (round II, I12R/Y49R/E65H/N98R recombines with subset 2), and C (round III, I12R/Y49R/E65H/N98R/ K122E/L124K (M4) recombines with subset 3) are shown with green, red, and purple arrow lines, respectively. InSiReP: the improved variants identified from library A (round I, WT recombines with subset 1), D (round I, WT recombines with subset 2), E (round I, WT recombines with subset 3), and F (round II, recombination of improved variants from library A, D, and E) are shown with green, red, purple, and orange arrow lines, respectively. All data shown were average values from measurements in triplicates or more. The concentration of DOX was 22% (v/v). (Reprinted from [6] with permission)
In the InSiReP strategy (Fig. 1; 4B), the recombination process can be divided into an experimental step and a computationally guided in silico recombination with experimental validation: 1. The two-gene recombination experiments with the subset substitutions and the WT gene were performed and screened. Similar required screening efforts were performed with the InSiReP strategy (~279 clones vs 270 clones 2GenReP). In total, eleven identified beneficial recombinants were in silico recombined (Fig. 2). 2. The identified improved variants from the first experimental step (on averaged < 4 recombinants in each subset) were in silico recombined and yielded numerous recombinants (e.g., eleven BSLA improved recombinants yielded in total 88 recombinants). Meanwhile, the thermodynamic stability (accessed by ΔΔGfold) of all theoretical recombinants [19] was calculated by FoldX (see Note 7). 3. The most stabilized top 10% recombinants were experimentally generated and validated to identify highly improved recombinant(s). Out of 88 recombinants, nine variants (top 10) were generated by the SDM method [17] in a stepwise manner. Step 5. The Highly Improved Recombinants Generation (Step 5). In 2GenReP, the “best” obtained variant was I12R/Y49R/ E65H/N98R/K122E/L124K/M137K, showing a 6.4-fold
78
Haiyang Cui et al.
improved DOX resistance when compared to BSLA WT (Fig. 2), after the screening of only ~270 clones. In InSiReP, the “best” obtained variant was I12R/Y49R/E65H/N98R/K122E/L124K (4.3-fold, Fig. 2). In terms of the fraction of the improved variants, InSiReP strategy was slightly better (12.5–22.2% vs. 5.5–12.5% in 2GenReP). InSiReP is on average also expected to be slightly less time-consumption (1 week) while it requires twice a computational analysis for stability (FoldX calculations). Summarily, the combination of CompassR with two recombination strategies (purely experimental, 2GenReP; in silico guided, InSiReP) proved to be a highly efficient approach to identify improved recombinants with limited experimental efforts (see Note 8).
4
Notes 1. Beneficial substitutions can be obtained from directed evolution and/or (semi)-rational design campaign. 2. AA-Calculator program (http://guinevere.otago.ac.nz/cgibin/aef/AA-Calculator.pl) can assist the researcher to find ABC codons that encode all of the selected substitutions (where A, B, C are standard nucleotide codes). AA-Calculator program can sort suitable ABC codons by the fraction of ABC-specified codons that code for the selected substitutions and then return the top 50 ABC codons if there are more than 50 suitable ones. If the calculated ABC codon can code unexpected amino acids, reducing the selection of substitutions may be a solution. However, the latter usually means you need to order more primers. 3. PLICing (Phosphorothioate-based ligase-independent gene cloning) is an enzyme-free and sequence-independent cloning method. PLICing starts with amplification of the target gene and the vector by PCR using primers with complementary phosphorothioated (PTO) nucleotides at the 50 -end. The PCR products are cleaved in an alkaline iodine/ethanol solution, producing single-stranded overhangs. Subsequently, these overhangs hybridize at room temperature, and the resulting DNA constructs can be transformed directly into competent host cells (e.g., chemically competent E. coli DH5α cells). PLICing is mostly background free and sequence independent and has also been optimized to minimize time consumption and preparative effort because often no purification of cleaved fragments is required. 4. The initial structure of wild-type enzyme can be taken from Protein Data Bank (www.rcsb.org) and the water molecules and other ligands should be removed via text editors (e.g., Notepad++, WordPad, BBEdit) or PDB visualization software
Recombination of Compatible Substitutions by 2GenReP and InSiReP
79
(e.g., YASARA, Pymol, Discovery Studio Visualizer). If there are several available X-ray or NMR structure, or the protein of interest is a dimer, instead of a monomer, one of the highest near-atomic resolutions would be preferred. As FoldX is sensitive to protein structure, higher resolution crystal structures of proteins (better than 3.3 Å) will reinforce the better performance of FoldX in predicting stability trends, and quantitative accuracy. If no X-ray or NMR structure is available, a homology model of the structure has to be built using homology modeling tools such as YASARA [20], I-TASSER [21], Phyre2 [22], or Rosetta [23], but, the accuracy of the homology model structure will negatively affect the reliability of ΔΔGfold calculation and even CompassR prediction. 5. The “BSLA-SSM” library covers all the natural diversity with a single amino acid exchange at each position of BSLA (in total 181 positions; 3440 variants; “site-saturation mutagenesis” denoted as “SSM”). 2GenReP and InSiReP strategies can be also applied for other target properties of an enzyme (e.g., activity, ionic liquid resistance, thermostability, and so on). 6. A detailed protocol about the recombination of single beneficial substitutions by CompassR was reported previously [24]. In addition, some important practical issues that should be considered (such as the selection of protein structures, number of FoldX runs, evaluation of calculations) were discussed for application of the CompassR rule. 7. The ΔΔGfold (ΔΔGfold ¼ ΔGfold,sub ΔGfold,wt) of recombinants were computed using FoldX version 4 [19] employing YASARA Plugin [25] in YASARA Structure version 17.4.17 [26] as previously reported [5]. The BSLA crystal structure (PDB ID: 1i6w [27] Chain A, resolution 1.5 Å) was used for computational analysis. Default FoldX parameters (temperature 298 K; ionic strength 0.05 M; pH 7) were used to generate the substitutions. The structure of the BSLA wild type was rotamerized, and energy minimized using the “RepairObject” command (optimizes residues by removing Vander Waals clashes and bad contacts) [25]. “Mutate multiple residues” commands were applied to calculate the ΔΔGfold of recombinants, respectively. Five FoldX runs were performed for each recombinant to ensure that the minimum energy conformation of even large residues, that possess many rotamers, were identified. 8. Interestingly, the highly improved organic solvent resistance of BSLA variant I12R/Y49R/E65H/N98R/K122E/L124K (14.6-fold in 50% (v/v) DOX; 6.0-fold in 60% (v/v) acetone, 2.1-fold in 30% (v/v) ethanol, and 2.4-fold in 60% (v/v) methanol) was interestingly obtained by both strategies. The
80
Haiyang Cui et al.
latter indicated that both strategies are well suited to identify enhanced BSLA variants and deliver with minimal screening efforts ( 0.5 are assigned as gaps. (continued)
Using the Evolutionary History of Proteins to Engineer Insertion-Deletion. . .
93
Table 1 (continued) Software
How are gaps encoded
How are gaps inferred
Type of ASR
The gap probability for each ancestral node is then recalculated by incorporating the root’s ancestor’s gap probability into the mean calculation, again assigning any node with probability > 0.5 as a gap. FireProtASR [32]
Marginal (FireProtASR A binary vector is created FireProtASR for each sequence of independently is an automated gap presence/ considers each column workflow tool which amino acid. in the alignment and uses PAML via the works upwards from Lazarus interface). each leaf (i.e., extant sequence) to the root (i.e., ancestor), assigning a gap probability based on the mean score of gap presence of the immediate descendants and the evolutionary distance between nodes FireProtASR takes evolutionary distance into account in order to limit the effect of rogue clades and single sequences. Finally, positions that are deemed within an inconclusive score range are determined with regard to gap frequency in the original alignment.
PhyML [33]
Marginal/minimum Gap characters are treated Every position in an posterior expected as missing data. ancestor will be error assigned an amino acid, even if, at a particular branch of the tree, all of the descendants are missing content at that position. (continued)
94
Connie M. Ross et al.
Table 1 (continued) Software
How are gaps encoded
How are gaps inferred
IQ-TREE 2 [34]
Marginal Gap characters are treated Every position in an as missing data. ancestor will be assigned an amino acid, even if, at a particular branch of the tree, all of the descendants are missing content at that position
RAxML [35] RAxML-ng [36]
Marginal Gap characters are treated RAxML-ng: Every as missing data. position will have a character state. RAxML: Every position will have a character state unless, at a particular branch of the tree all of the descendants are missing content at that position.
A binary vector for each Uses Fitch parsimony to Lazarus [37] infer gaps [38]. sequence of gap (Lazarus is a tool to run presence/amino acid is batched PAML jobs, created. and can be used to infer marginal ancestors at different tree nodes across multiple PAML runs.)
Type of ASR
Marginal
Alignments are represented on POGs with the vertices of the graphs representing a particular column in an alignment and the edges of the graph representing the order between columns. A gap is represented by an edge that skips over one or more vertices, which means that many alternative gap patterns can be represented on a single graph (Fig. 2). Sequences can be read in the forward direction (progressing from the N-terminal to C-terminal) by choosing edges to follow. The thickness or color intensity of each edge reflects the number of sequences showing that particular arrangement of residues (“path”). On a POG, two gaps with the same start but different end positions are visualized as two separate edges starting at the same vertex, but ending at two different vertices. Because gaps can be staggered in this way, between two vertices, a and b, the set of edges leaving a is not necessarily the same set of edges entering
Using the Evolutionary History of Proteins to Engineer Insertion-Deletion. . .
95
Fig. 2 Partial order graphs in GRASP. (a) Overview of a partial order graph. (b) An ancestral inference showing the potential ancestors that could be derived from GRASP. Extant sequences (orange) are at the tips of the tree, and for each ancestor there is an identified preferred path (pink), as well as either a bidirectionally parsimonious but non-preferred path (blue) or unidirectionally parsimonious and non-preferred path (green). From position zero (V) there are three possible transitions, as seen in node N3 (to position one (VA), to position two (V-S), and to position three (V--N)). At N3, all three options are parsimonious in at least one direction, but only one is bidirectionally parsimonious (V--N). At N2, only two options are parsimonious in at least one direction, but only one is bidirectionally parsimonious (V-S). At positions N1 and N0, two options are bidirectionally parsimonious (VA, V-S). All options are able to be explored by GRASP with the preferred path automatically choosing bidirectional edges. In the case of N1 and N0, when multiple paths contain bidirectional edges, the preferred path is chosen by the number of sequences containing that path in the original sequences (V-S)
96
Connie M. Ross et al.
b (Fig. 2a). To account for this variability, GRASP performs a parsimony analysis on each edge in both the forward (N–C) and backward (C–N) directions (Fig. 2b). The parsimony analysis determines for each edge the parsimony score based on all of its descendants. There is no cost for the edge being present and a cost of 1 to add the edge. Parsimony can result in equally optimal parsimonious scenarios, both between an ancestral parent and child, and between positions within a particular ancestor. In GRASP this is used as an advantage—the graph structure allows for the visualization of every edge that is optimally parsimonious in at least one of the forward or backward directions, as well as identifying when an edge is optimally parsimonious in both directions (Fig. 2b). This allows alternative indel histories to be visualized for both an individual ancestor and across ancestral nodes. GRASP also identifies a single, preferred ancestral sequence (i.e., path through the POG), defined by a preference for bidirectional edges over unidirectional edges. The need for bidirectionality is ignored at the first and last nodes of the graph, which often contain regions of less accurate alignment (i.e., greater sequence divergence or incomplete sequences). In instances when multiple bidirectionally parsimonious edges exist, the proportional weighting of sequences underneath the node that contains each edge is considered. This preferred sequence is automatically generated and available to download, but the value of the POG is that ambiguity can be represented and the full set of edges that are parsimonious in at least one direction can be displayed, which enables the user to select variants for libraries with relative ease. Indel positions which are more likely to be able to accommodate an insertion or deletion can be used to vary a sequence, either by addition of a selection of extant-specific indels, a selection of inferred ancestral variant indels (generated from all the ancestors with an indel at that position) or a selection of randomly generated indel sequences. To further increase diversity in a library, the collection of inferred ancestral indels can also be varied by exploring alternative combinations of indels in a given inferred ancestor by varying which pathways are taken. As well as identifying regions more tolerant of indel insertion, GRASP also identifies the amino acid content of ancestors and insertions, so that ancestral insertions from one lineage can be placed into ancestral sequences that have similar amino acid composition, even if the insertion was not predicted to have occurred in that particular ancestor. 1.9 Using GRASP to Engineer Indel Variants
Indels are useful, functional segments of the variable sequence space that have been underutilized for enzyme engineering. A major barrier for the use of indels in enzyme engineering is the generation of large numbers of non-functional variants, typically due to the poor tolerance of randomly positioned indels. ASR and investigation of naturally occurring indels provides an alternative
Using the Evolutionary History of Proteins to Engineer Insertion-Deletion. . .
97
approach, where the choice of where to position the indels in variants is informed by evolution. Furthermore, ASR provides a robust template(s) on which to engineer indels. Many ASR programs perform only incomplete treatment of indels and even where full treatment of indels is done (e.g., FastML), only the most likely indels are inferred and possible variants are limited. GRASP uses a parsimony-based treatment of indels, combined with a graphical interface to enable the user to explore alternative less-parsimonious indel pathways that are nonetheless still represented in the extant sequence pool. This enables a user to vary indels in the same ancestor and to create a library of “indel-alternative” ancestors.
2
Materials 1. Computer hardware requirements—standard workstation (up to 10,000 sequences if the online server is to be used but GRASP should be downloaded for analyses involving more than 10,000 sequences). 2. Sequence databases: UniProt, NCBI, and other protein- or organism-specific collections. 3. Software for sequence collection and curation: (a) BLAST (available via https://blast.ncbi.nlm.nih.gov/ Blast.cgi or though UniProt). (b) SeqScrub (http://bioinf.scmb.uq.edu.au/seqscrub). (c) CD-HIT (http://weizhongli-lab.org/cd-hit/). 4. Software for phylogenetic analysis, ASR, visualization, assessment, etc.: (a) Alignments—MAAFT, MUSCLE, Clustal Omega. (b) Tree—MEGA, IQ-TREE 2, FastTree 2, RAxML. (c) Online portals to automate workflows—PhyloBot, FireProtASR, NGPhylogeny.fr. (d) GRASP (online or downloaded). (e) Visualization—Jalview, AliView, FigTree, Archaeopteryx. (f) Post-analysis—PyMOL, Chimera.
3
Methods
3.1 Selecting an Appropriate Protein or Enzyme Family
The choice of enzyme or protein to reconstruct will depend on the desired application. Published ancestral reconstructions have produced proteins with improvements in solvent tolerance, thermostability,and catalytic turnover and have yielded novel or promiscuous activities (reviewed in [24]). Good candidates for ASR are families of extant proteins with well-characterized activities, for which
98
Connie M. Ross et al.
representative sequences are available from a variety of taxonomic lineages, and which display significant conservation at both the sequence and structural levels. Good quality sequence annotations and structural information are also beneficial. Reconstructions can be performed with a single set of orthologues or with subfamilies of paralogues. Since indels occur with lower frequency than substitutions, a more evolutionary diverse group may provide greater opportunities for identifying indel variants. The next step is to select a set of appropriate outgroup sequences. This is typically the next most closely related family or subfamily to the protein of interest. Where possible, consult published alignments or phylogenies of the protein family of interest. 3.2 Fetching Sequences and Sequence Curation
Ideally, include as many sequences as possible and ensure representative sequences are included for specific phylogenetic lineages. However, if limits on sequence numbers apply, avoid oversampling any particular lineage (e.g., mammals, for which sequence data is often more plentiful than for other animal classes) at the cost of other lineages. When aligning and curating sequences, take note of the full sequence lengths, and the number and prevalence of gaps in particular lineages. When working with multidomain proteins, assess the level of domain conservation within a specific family before performing the alignment. If domain composition or order is poorly conserved, the alignment strategy should be altered so that large non-aligning regions do not impact the quality of the alignment (e.g., consider separate domains independently). In addition to the methods described here, there may be protein family-specific tools developed for assessing sequence alignments and trees (see Note 1). Optional tools for assessing the gaps and sequence quality are as follows: 1. Alignment consistency and confidence checking—M-COFFEE [25], GUIDANCE2 [26]. 2. Outlier detection—OD-seq [27], EvalMSA [28].
3.3 Searching for Homologous Sequences
1. Select multiple (see Note 2) query amino acid sequences (see Note 3) of each orthologue from well-curated model genomes covering the breadth of phylogeny to be mined. Download sequences from UniProt or another curated sequence database and save as FASTA files with appropriate annotation. 2. Perform pBLAST searches against each “query” protein sequence to build a collection of potentially orthologous (or paralogous, if you are including multiple protein families) protein sequences. If necessary, exclude sequences from species that are not relevant to the project at hand but may have been collected in the BLAST search because they show some
Using the Evolutionary History of Proteins to Engineer Insertion-Deletion. . .
99
sequence similarity (e.g., exclude prokaryotes if reconstructing a plant protein). 3. Select sequences with appropriate E values, sequence identity or similarity specific to the protein family of interest. This will vary according to the degree of sequence conservation and classification system for each protein family, e.g., by convention, all sequences in the same cytochrome P450 family share at least ~40% amino acid identity, and at least ~55% if in the same subfamily so these are useful thresholds for use in ASR of P450 enzymes. 4. Examine the sequences and exclude anomalous sequences based on exceptional length (whether too short or too long), the absence of essential, conserved protein family-specific motifs, or the presence of unidentified residues (denoted by X). 5. Save sequences to a FASTA file with informative and consistent labels (e.g., >ProteinName_SpeciesName_AccessionNumber). Remove any characters that are incompatible with the alignment and tree-building programs. This can be done using SeqScrub {http://bioinf.scmb.uq.edu.au/seqscrub}. Keep a separate index file to record sequences collected, sequences removed, why they were removed, and any other useful metadata. 6. Process sequence collection through CD-HIT (http:// weizhong-lab.ucsd.edu/cdhit_suite/cgi-bin/index.cgi) with a cut-off percentage identity of (n 1)/n 100 to remove duplicates (where n is the theoretical length of the protein to be resurrected) to remove any duplicated sequences but retain variants that differ in at least one residue. Other cut-off values can also be used if only representative sequences are required. 3.4 Generating Alignments
1. Jalview (https://www.jalview.org/) is a free software package for the visualization of FASTA files and alignments [29]. Alignments can be made in a separate program or web server and loaded into Jalview, but many common alignment programs are available through Jalview itself. To perform an initial alignment in Jalview, select the Web Service and then Alignment tab. This will run an alignment with the default parameters. To alter the alignment parameters, access the alignment programs using the web server or download them locally onto your workstation. We routinely use MAFFT (http://www.ebi.ac.uk/Tools/ msa/mafft/), but a number of other alignment programs, including MUSCLE and Clustal Omega, are available which may be better for different types of proteins, e.g., more conserved sequences. 2. Most alignments use a set of user-defined parameters. Particular alignment programs may have additional parameters or
100
Connie M. Ross et al.
Table 2 Parameters for sequence alignments Parameter name
Description and default (MAFFT)
Scoring matrix
The default scoring matrix is BLOSUM62 for amino acid sequences, but it may be necessary to adjust this parameter if sequences are closely related, highly divergent or have unusual family-specific amino acid frequencies (e.g., transmembrane proteins), as reviewed in [39].
Gap opening penalty
Penalties of 1–3 are typically used in MAFFT; the default is 1.53. Higher values will result in fewer predicted gaps. This may need to be empirically determined, depending on the sequences used.
Gap extension penalty
Values of 0–2 are typically used with a default of 0.123. Higher values will result in a preference for shorter gaps.
Tree rebuilding number
The default is 2. Options include 0, 1, 2, 5, 10, 20, 50, 80, and 100. This indicates the number of tree building cycles the algorithm proceeds through.
Refinement methods
Refinement methods trade-off “accuracy” and computation time. For example, LINS-i in MAFFT uses an iterative refinement method to improve accuracy. LINS-i is usually more accurate, but may not always be appropriate for a particular alignment [40].
alignment refinement methods. Details of the most common programs and parameters are listed on the EMBL-EBI website (https://www.ebi.ac.uk/seqdb/confluence/display/JDSAT/ Multiple+Sequence+Alignment) and summarized in Table 2, with a focus on the commonly used program, MAFFT. 3. In the absence of published information about existing alignments, start with the default parameters. If the sequences align poorly, identify the issue (e.g., too many gaps) and alter the related parameters (e.g., increase gap opening penalty). 3.5 Alignment Output and Refinement
1. Alignment programs produce an alignment file that consists of the input sequences with gap characters added. In most tools, it is also possible to designate the sequence order in the alignment. Typically, the “aligned” option is most informative, as this will group similar sequences. 2. After the first alignment, examine the conserved sequence motifs to ensure they align correctly. Examine the alignment of orthologous sequences within taxonomic groups to ensure they are similar. The alignment should be consistent with the known properties of the protein family. If the alignment is poor quality, realign using a different approach, program, or parameters. If there are no major problems with the alignment, check individual sequences. 3. Identify and remove any sequences that align particularly poorly or appear to contain significant anomalies (see Note 4). For example, sequences with apparent frameshifts or
Using the Evolutionary History of Proteins to Engineer Insertion-Deletion. . .
101
miscalled intron-exon boundaries will be obvious after alignment, and may include sequences with large internal insertions or gaps (>20 residues) that do not appear in other sequences. Apparent frameshifts are also commonly seen at the N- and C-termini of protein sequences. However, it is important to be aware of the species from which an apparently anomalous sequence is derived: if it is the only sequence from that lineage, then the differences may reflect genuine sequence divergence. 4. Also remove sequences that lack key structural motifs. For examples of the above, see Fig. 3. If available, compare the alignment with published reports of previous attempts to align members of the same protein family. 5. Record all removed sequences and the reasons for their removal in the index file. 6. Repeat the alignment. When all poorly aligning sequences have been removed, proceed with inferring the phylogenetic tree. 3.6 Generating Trees, File Type Requirements, and Benchmarking of Trees
1. Phylogenetic trees should be inferred using a probabilistic method: either a maximum-likelihood (ML) algorithm or a Bayesian approach is usually used. ML-based programs are usually faster and are capable of handling larger sequence numbers. Common programs include MEGA, IQ-TREE 2, FastTree 2, and RAxML. Other tree programs utilize existing phylogenetic information or explicit species-aware evolutionary models that may improve accuracy [15]. These are most effective for proteins from well-supported phylogenetic lineages, such as vertebrates. Importantly, simple neighborjoining algorithms are not satisfactory here: the inference of the tree needs to be statistically robust. 2. Phylogenetic tree programs require the sequence alignment to be provided and the substitution model and any phylogenetic testing parameters to be specified (Table 3). 3. Newick format trees (.nwk) can be visualized easily using the program FigTree (http://tree.bio.ed.ac.uk/software/figtree/ ). Examine the tree branches to ensure the grouping of sequences is consistent with published reports. In general, orthologues will group based on taxonomic identity and paralogues will separate. If the protein is prokaryotic, consider the potential for horizontal gene transfer. 4. Bootstrapping should be performed on final trees. Bootstrapping is a process of creating new alignments by randomly sampling columns from the original alignment to create an alignment the same length of the original, but with shuffled columns (and allowing for the possibility of the same column to appear more than once). These new alignments are used to generate new trees and the percentage of times a particular clade from the original tree appears in the set of trees is
102
Connie M. Ross et al.
Fig. 3 Examples of alignments where a sequence is anomalous and would be investigated and potentially removed from the alignment. (a) Unknown sequence content. (b) Frameshifts possible due to miscalled intron/ exon boundaries. (c) Large indels in conserved regions of an alignment
calculated as a confidence measure. A bootstrap value of 100% would mean that a particular clade was always present. Ultrafast bootstrap values are interpreted differently, with 95% support approximately corresponding to a 95% probability that the particular clade is accurate. It is common to overlay the bootstrap values for each node onto the final tree in order to allow others to interpret the reproducibility of the tree and most tree programs automatically provide an option to do this.
Using the Evolutionary History of Proteins to Engineer Insertion-Deletion. . .
103
Table 3 Parameters for phylogenetic tree inference Parameter name
Description and default (IQ-TREE 2)
Substitution model
By default, IQ-TREE 2 uses Model Finder [41] to identify the best-fit substitution model. It is also possible to use ProtTest [42], or to specify a particular substitution model. For other tree programs, ProtTest can be used separately to determine the best substitution model, or a particular model can be specified.
Bootstrap analysis
The ultrafast bootstrap is the default in IQ-TREE 2 [43]. It is also possible to perform the standard bootstrap [44]. The number of bootstrapping rounds will depend on the number of sequences, but 100 rounds is routine for the standard bootstrap, and 1000 rounds is standard if using the ultrafast bootstrap from IQ-TREE 2.
3.7 Inferring Ancestors Using GRASP
1. GRASP is available through the server link (http://grasp.scmb. uq.edu.au) and is free to use. Create login details and follow the instructions to verify the associated email in order to save reconstructions. Upload the finalized alignment in FASTA or Clustal format and the tree in Newick format. Select an appropriate evolutionary model (JTT, Dayhoff, WAG, LG) for the protein family of interest and perform the reconstruction. 2. GRASP accepts up to 10,000 sequences through the web server. However, a longer time will be needed to compute alignments, trees, and ancestors from a greater number of sequences. 3. GRASP automatically produces the joint reconstruction with the option of inferring marginal ancestors at each selected node.
3.8 Interpreting the GRASP Output
GRASP provides a tree viewer and ancestor viewer that allows the interactive selection of ancestors via the nodes on the tree (Fig. 4). Multiple ancestors can be stacked on top of each other, so that differences in their character states and indel composition can be assessed. For example, stacking a parent and two children allows identification of indel events which may have occurred in one subtree but not the other. GRASP identifies unidirectional parsimonious edges as dotted lines, bidirectional edges as solid lines and the darkness of any edge is further used to indicate how many extant sequences the inference is based upon. This allows for the simultaneous viewing of multiple ambiguous paths, with a button to optionally toggle the single preferred path through the graph. Selections of ancestors or the full set of ancestors can be downloaded as a FASTA file, as well as phylogenetic trees labeled with the internal node labels consistent with the GRASP tree viewer
104
Connie M. Ross et al.
Fig. 4 Visualization of results in GRASP. (a) The tree viewer panel displays the ancestral nodes of the userprovided tree. (b) Right clicking on a node on the tree opens a menu which allows for collapsing the tree, calculating and displaying a marginal reconstruction centered at that node, or displaying or stacking a joint reconstruction. (c) The ancestor viewer panel shows POGs of the original MSA, as well as any selected ancestors. Here, the joint reconstructions of the N3 and N4 ancestors are stacked on top of each other to allow comparison
visualization, for use in external tree viewing programs. The output of the ancestor viewer can also be saved as a PNG, centered at the chosen location and including the MSA graph and the top-most ancestral graph in the stacked ancestral graphs. GRASP automatically generates the “preferred path” ancestor when downloading files, but by referring to the GRASP visualizations that display ambiguity, it is possible to define alternative paths through the alignment that represent alternative evolutionary scenarios (e.g., insertion or deletion events in an immediate ancestor or descendent; Fig. 5). To address these alternative scenarios, indels can be manually inserted by manipulating the downloaded FASTA files, either into the same ancestor that may have contained them, or into alternative, usually adjacent ancestors. This can create ancestors that address the ambiguity in deciding when indels occurred during evolution. However, this approach can also be used to design totally new permutations to explore the sequence space around specific ancestors, e.g., by making insertions into ancestors
Using the Evolutionary History of Proteins to Engineer Insertion-Deletion. . .
105
Fig. 5 GRASP can help to address ambiguity in deciding when indels occurred during evolution. (a and b) Two examples of related ancestors from different branches of a phylogenetic tree are depicted showing preferred and alternative paths that represent ambiguous evolutionary trajectories
of different lineages than those from which the insertions are derived, or by introducing artificial inserts into positions in the sequence identified as tolerating indels from an analysis of the natural evolutionary history.
106
Connie M. Ross et al.
3.9 Designing an Indel Library
It is useful to evaluate the potential for indel variant success in silico, prior to synthesizing and resurrecting the proteins. If possible, visualize the structure of ancestors via homology modeling or by mapping indels on structures of extant sequences using a structure visualization program such as PyMOL or Chimera. When viewing the GRASP output, consider: 1. The frequency of indel events and how many variants are possible. 2. If the predicted indels appear in any extant proteins, compare the similarity in sequences flanking the indel between the extant proteins and ancestors. 3. Examine the length of the indels and whether they are at the termini or in the middle of the sequence. 4. If possible, predict the secondary structural elements that flank the predicted indel by comparison with the structure of a wellcharacterized extant form, and determine if the indel falls within a disordered region. 5. Assess how conserved the indel is within the lineage. Is the sequence or length of the indel conserved in a number of extant sequences, or does it vary in size? 6. Since soluble surface-localized indels are better tolerated than buried indels, it can be useful to determine whether the amino acids of the insertion or the flanking regions of the deletion are mostly small and hydrophilic or large and hydrophobic, indicating whether these regions of the protein are likely to be buried (abundant hydrophobic residues) or exposed (hydrophilic residues). When deciding what ancestral variants to create, it is useful to examine the tree for ancestors close to where an indel arises. Variants that can be characterized include (Fig. 6): 1. The GRASP-predicted ancestor. 2. The alternatives with or without individual indels. 3. Ancestral variants with single-edge parsimony support. 4. Ancestors with indels from nearby nodes. 5. Combinations of the above if more than one indel is predicted in an ancestor. Other options for increasing sequence space are the use of the alternative joint or marginal ancestor or (in the case of marginal reconstruction) substitution of amino acid residues with lower posterior probabilities in combination with indel variants. To probe more distant regions of sequence space, novel indels can be designed and used to replace indels in the inferred ancestor.
Using the Evolutionary History of Proteins to Engineer Insertion-Deletion. . .
107
Fig. 6 Examples of indel variants that can be generated from the GRASP output 3.10 Resurrection of Ancestral Forms and Creation of Libraries of Variants
4
Once the variants to be created have been designed, a number of methods can be used for construction of the library, most of which depend on gene synthesis of at least one ancestor as a template into which mutations can then be introduced by standard mutagenesis methods (see Note 5). The amino acid sequence of the chosen ancestor should be reverse translated into a DNA sequence that is codon-optimized for the host to be used for expression of the recombinant proteins.
Notes 1. It may be necessary to review what is known about the evolutionary aspects of the protein family under investigation as poor sequence or alignment quality can affect the structural functional integrity of the resultant inferred ancestors. 2. The number of sequences chosen as probe sequences will depend on the diversity seen in the protein family under consideration. Probe sequences should be chosen from across the full phylogenetic tree to avoid bias in the reconstruction. 3. Protein sequences are preferred because they are more conserved than DNA sequences, due to codon redundancy. 4. The criteria used for sequence curation should depend on the characteristics of the protein but also take into consideration the likely extent of species-specific differences that may arise. For example, if a particular sequence comes from a species
108
Connie M. Ross et al.
where there are no other closely related sequences some large differences (such as insertions in loops) may be genuine. 5. The number of ancestors resurrected will depend on the purpose of the reconstruction. However, for biotechnological applications it is useful to select sequences from different lineages so as to facilitate the exploration of sequence space and reveal more novel properties than possible by assessing closely related variants, e.g., near a specific extant protein in the phylogenetic tree. References 1. Saab-Rincon G, Li Y, Meyer M, Carbone M, Landwehr M, Arnold FH (2011) Protein engineering by structure-guided SCHEMA recombination. In: Lutz S, Bornscheuer U (eds) Protein engineering handbook, 1st edn: 481-492. Wiley-VCH, Darmstadt 2. Zhang Z, Wang J, Gong Y, Li Y (2018) Contributions of substitutions and indels to the structural variations in ancient protein superfamilies. BMC Genomics 19(1):771. https:// doi.org/10.1186/s12864-018-5178-8 3. Emond S, Petek M, Kay EJ, Heames B, Devenish SRA, Tokuriki N, Hollfelder F (2020) Accessing unexplored regions of sequence space in directed enzyme evolution via insertion/deletion mutagenesis. Nat Commun 11 (1):3469. https://doi.org/10.1038/s41467020-17061-3 4. Arpino JA, Reddington SC, Halliwell LM, Rizkallah PJ, Jones DD (2014) Random single amino acid deletion sampling unveils structural tolerance and the benefits of helical registry shift on GFP folding and structure. Structure 22(6):889–898. https://doi.org/10.1016/j. str.2014.03.014 5. Li D, Jackson EL, Spielman SJ, Wilke CO (2017) Computational prediction of the tolerance to amino-acid deletion in greenfluorescent protein. PLoS One 12(4): e0164905. https://doi.org/10.1371/journal. pone.0164905 6. Kim R, Guo J-T (2010) Systematic analysis of short internal indels and their impact on protein folding. BMC Struct Biol 10(1):24. https://doi.org/10.1186/1472-6807-10-24 7. Chang MSS, Benner SA (2004) Empirical analysis of protein insertions and deletions determining parameters for the correct placement of gaps in protein sequence alignments. J Mol Biol 341(2):617–631. https://doi.org/10. 1016/j.jmb.2004.05.045 8. Light S, Sagit R, Sachenkova O, Ekman D, Elofsson A (2013) Protein expansion is
primarily due to Indels in intrinsically disordered regions. Mol Biol Evol 30 (12):2645–2653. https://doi.org/10.1093/ molbev/mst157 9. Fraternali F, Joseph AP, Valadie´ H, Srinivasan N, de Brevern AG (2012) Local structural differences in homologous proteins: specificities in different SCOP classes. PLoS One 7(6):e38805. https://doi.org/10.1371/ journal.pone.0038805 10. de la Chaux N, Messer PW, Arndt PF (2007) DNA indels in coding regions reveal selective constraints on protein evolution in the human lineage. BMC Evol Biol 7(1):191. https://doi. org/10.1186/1471-2148-7-191 11. Leushkin EV, Bazykin GA, Kondrashov AS (2012) Insertions and deletions trigger adaptive walks in drosophila proteins. Proc R Soc B Biol Sci 279(1740):3075–3082. https://doi. org/10.1098/rspb.2011.2571 12. Zhang Z, Huang J, Wang Z, Wang L, Gao P (2011) Impact of Indels on the flanking regions in structural domains. Mol Biol Evol 28(1):291–301. https://doi.org/10.1093/ molbev/msq196 13. Bloom JD, Labthavikul ST, Otey CR, Arnold FH (2006) Protein stability promotes evolvability. Proc Natl Acad Sci U S A 103 (15):5869–5874. https://doi.org/10.1073/ pnas.0510098103 14. Ayuso-Fernandez I, Ruiz-Duenas FJ, Martinez AT (2018) Evolutionary convergence in lignindegrading enzymes. Proc Natl Acad Sci U S A 115(25):6428–6433. https://doi.org/10. 1073/pnas.1802555115 15. Groussin M, Hobbs JK, Szollosi GJ, Gribaldo S, Arcus VL, Gouy M (2015) Toward more accurate ancestral protein genotypephenotype reconstructions with the use of species tree-aware gene trees. Mol Biol Evol 32 (1):13–22. https://doi.org/10.1093/ molbev/msu305
Using the Evolutionary History of Proteins to Engineer Insertion-Deletion. . . 16. Thomas A, Cutlan R, Finnigan W, van der Giezen M, Harmer N (2019) Highly thermostable carboxylic acid reductases generated by ancestral sequence reconstruction. Commun Biol 2:429. https://doi.org/10.1038/ s42003-019-0677-y 17. Schenkmayerova A, Pinto G, Toul M, Marek M, Hernychova L, Planas-Iglesias J, Liskova V, Pluskal D, Vasina M, Emond S, Doerr M, Chaloupkova´ R, Bednar D, Prokop Z, Hollfelder F, Bornscheuer U, Damborsky J (2020) Engineering protein dynamics of ancestral luciferase. ChemRxiv. https://doi. org/10.26434/chemrxiv.12808295.v1 18. Thornton JW (2004) Resurrecting ancient genes: experimental analysis of extinct molecules. Nat Rev Genet 5(5):366–375 19. Felsenstein J (2003) Inferring Phylogenies. Sinauer Associates, Inc., Sunderland, MA 20. Pupko T, Pe I, Shamir R, Graur D (2000) A fast algorithm for joint reconstruction of ancestral amino acid sequences. Mol Biol Evol 17 (6):890–896. https://doi.org/10.1093/ oxfordjournals.molbev.a026369 21. Yang Z (2007) PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol 24 (8):1586–1591. https://doi.org/10.1093/ molbev/msm088 22. Koshi JM, Goldstein RA (1996) Probabilistic reconstruction of ancestral protein sequences. J Mol Evol 42(2):313–320. https://doi.org/ 10.1007/bf02198858 23. Vialle RA, Tamuri AU, Goldman N (2018) Alignment modulates ancestral sequence reconstruction accuracy. Mol Biol Evol 35 (7):1783–1797. https://doi.org/10.1093/ molbev/msy055 24. Merkl R, Sterner R (2016) Ancestral protein reconstruction: techniques and applications. Biol Chem 397(1):1–21. https://doi.org/10. 1515/hsz-2015-0158 25. Moretti S, Armougom F, Wallace IM, Higgins DG, Jongeneel CV, Notredame C (2007) The M-Coffee web server: a meta-method for computing multiple sequence alignments by combining alternative alignment methods. Nucleic Acids Res 35(Web Server Issue):W645–W648. https://doi.org/10.1093/nar/gkm333 26. Sela I, Ashkenazy H, Katoh K, Pupko T (2015) GUIDANCE2: accurate detection of unreliable alignment regions accounting for the uncertainty of multiple parameters. Nucleic Acids Res 43(W1):W7–W14. https://doi. org/10.1093/nar/gkv318 27. Jehl P, Sievers F, Higgins DG (2015) OD-seq: outlier detection in multiple sequence alignments. BMC Bioinformatics 16(1):269.
109
https://doi.org/10.1186/s12859-0150702-1 28. Chiner-Oms A, Gonza´lez-Candelas F (2016) EvalMSA: a program to evaluate multiple sequence alignments and detect outliers. Evol Bioinform Online 12:277–284. https://doi. org/10.4137/ebo.S40583 29. Waterhouse AM, Procter JB, Martin DM, Clamp M, Barton GJ (2009) Jalview version 2--a multiple sequence alignment editor and analysis workbench. Bioinformatics 25 (9):1189–1191. https://doi.org/10.1093/ bioinformatics/btp033 30. Cohen O, Ashkenazy H, Belinky F, Huchon D, Pupko T (2010) GLOOME: gain loss mapping engine. Bioinformatics 26(22):2914–2915. https://doi.org/10.1093/bioinformatics/ btq549 31. Edwards RJ, Shields DC (2004) GASP: gapped ancestral sequence prediction for proteins. BMC Bioinformatics 5(1):123. https://doi. org/10.1186/1471-2105-5-123 32. Musil M, Khan RT, Beier A, Stourac J, Konegger H, Damborsky J, Bednar D (2020) FireProtASR: a web server for fully automated ancestral sequence reconstruction. Brief Bioinform. 22(4): bbaa337. https://doi.org/10. 1093/bib/bbaa337 33. Oliva A, Pulicani S, Lefort V, Bre´he´lin L, Gascuel O, Guindon S (2019) Accounting for ambiguity in ancestral sequence reconstruction. Bioinformatics 35(21):4290–4297. https://doi.org/10.1093/bioinformatics/ btz249 34. Lanfear R, von Haeseler A, Woodhams MD, Schrempf D, Chernomor O, Schmidt HA, Minh BQ, Teeling E (2020) IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol Biol Evol 37 (5):1530–1534. https://doi.org/10.1093/ molbev/msaa015 35. Stamatakis A (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30 (9):1312–1313. https://doi.org/10.1093/ bioinformatics/btu033 36. Kozlov AM, Darriba D, Flouri T, Morel B, Stamatakis A (2019) RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference. Bioinformatics 35(21):4453–4455. https://doi.org/10. 1093/bioinformatics/btz305 37. Hanson-Smith V, Kolaczkowski B, Thornton JW (2010) Robustness of ancestral sequence reconstruction to phylogenetic uncertainty. Mol Biol Evol 27(9):1988–1999. https://doi. org/10.1093/molbev/msq081
110
Connie M. Ross et al.
38. Fitch WM (1971) Toward defining the course of evolution: minimum change for a specific tree topology. Syst Zool 20(4):406–416. https://doi.org/10.2307/2412116 39. Wheeler D (2003) Selecting the right proteinscoring matrix. Curr Protoc Bioinformatics. Chapter 3:Unit 3.5. https://doi.org/10. 1002/0471250953.bi0305s00 40. Katoh K, Standley DM (2013) MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30(4):772–780. https://doi. org/10.1093/molbev/mst010 41. Kalyaanamoorthy S, Minh BQ, Wong TKF, von Haeseler A, Jermiin LS (2017) ModelFinder: fast model selection for accurate
phylogenetic estimates. Nat Methods 14 (6):587–589. https://doi.org/10.1038/ nmeth.4285 42. Darriba D, Taboada GL, Doallo R, Posada D (2011) ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics 27(8):1164–1165. https://doi.org/10.1093/ bioinformatics/btr088 43. Minh BQ, Nguyen MAT, von Haeseler A (2013) Ultrafast approximation for phylogenetic bootstrap. Mol Biol Evol 30 (5):1188–1195. https://doi.org/10.1093/ molbev/mst024 44. Felsenstein J (1985) Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39(4):783–791. https://doi.org/ 10.2307/2408678
Chapter 7 Resurrecting Enzymes by Ancestral Sequence Reconstruction Maria Laura Mascotti Abstract Ancestral Sequence Reconstruction (ASR) allows one to infer the sequences of extinct proteins using the phylogeny of extant proteins. It consists of disclosing the evolutionary history—i.e., the phylogeny—of a protein family of interest and then inferring the sequences of its ancestors—i.e., the nodes in the phylogeny. Assisted by gene synthesis, the selected ancestors can be resurrected in the lab and experimentally characterized. The crucial step to succeed with ASR is starting from a reliable phylogeny. At the same time, it is of the utmost importance to have a clear idea on the evolutionary history of the family under study and the events that influenced it. This allows us to implement ASR with well-defined hypotheses and to apply the appropriate experimental methods. In the last years, ASR has become popular to test hypotheses about the origin of functionalities, changes in activities, understanding physicochemical properties of proteins, among others. In this context, the aim of this chapter is to present the ASR approach applied to the reconstruction of enzymes—i.e., proteins with catalytic roles. The spirit of this contribution is to provide a basic, hands-to-work guide for biochemists and biologists who are unfamiliar with molecular phylogenetics. Key words Ancestral sequence reconstruction, Enzyme function, Molecular evolution, Evolutionary biochemistry, Phylogeny
1
Introduction Enzymes perform catalysis by the coordinated action of specific residues in an active site. Revealing exactly how they perform their functions allows us to understand in detail the cellular mechanisms that they are involved in [1]. The determinants of enzyme function are defined as the necessary and sufficient set of residues that when accommodated in a three-dimensional fold lead to a particular activity. To define this set of residues, most analyses compare the sequences of modern-day enzymes with distinct activities to find the causes of their different catalytic properties [2]. This approach is horizontal, because it compares sequences on the leaves of a phylogenetic tree to each other. This approach treats equally all
Francesca Magnani, Chiara Marabelli and Francesca Paradisi (eds.), Enzyme Engineering: Methods and Protocols, Methods in Molecular Biology, vol. 2397, https://doi.org/10.1007/978-1-0716-1826-4_7, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2022
111
112
Maria Laura Mascotti
substitutions that occurred since the last common ancestor of the two proteins in question. Many of these substitutions are entirely unrelated to the functional differences between the enzymes, and identifying a causal set can be very difficult. In reality, enzyme functions usually change over a single branch on the phylogeny, between an ancestral protein and its immediate descendant on the tree. Evolutionary biochemistry makes explicit use of this fact and employs phylogenetics to identify the historical and physical causes of protein function [3]. This vertical approach implements ancestral sequence reconstruction (ASR) to infer the sequences at internal nodes of the tree, which can then be experimentally characterized. Using these proteins to identify the exact branch along which a new function evolved allows biochemists then to focus only on the substitutions occurred in that specific branch, massively reducing the number of differences to consider. Therefore, modern-day enzymes can be understood with respect to their own history. Thus, crucial sites for a protein function can be identified while those related to the variable features may be dissected as well [4]. Also, the context and the evolutionary pressures that modeled the family history can be identified by this approximation [5, 6]. In 1963, Pauling and Zuckerkandl proposed the concept of “molecular restoration” for the first time [7]. Relying on the site homology concept, they built up the idea of determining from the sequence of extant proteins their ancestors’ sequences. They anticipated that in a future it would be possible to synthesize these ancestral components to study their physicochemical properties and to make inferences about their functions. However, limitations in the availability of protein sequences and computational power at the time meant that their vision only became reality more than 30 years later. The growth of sequenced genomes and the improvements in gene synthesis made evolutionary biochemistry a flourishing field since the 2000s. The essence of the approach is to reconstruct the ancestors of any protein family to recover its phenotype, rather than its precise sequence [8]. Hence, it is important to highlight the functional character of the approach. This is discussed by Hochberg and Thornton [9]. They dig into the conceptualization of the approach and then present some of the most interesting cases where the evolutionary analysis overcomes the limitations of the traditional horizontal analyses. From a different perspective, Garcia and Kac¸ar describe the reconstruction of ancestral enzymes as proxies for ancient biochemistry [10]. They tackle in great detail the sources of uncertainty and how these should be accounted. This chapter will be devoted to present the ASR approach in a comprehensive and practical manner for biochemists and biologists who are unfamiliar with molecular evolution. The how-to-start, proposition of evolutionary hypotheses and initial setup of the
Resurrecting Enzymes by Ancestral Sequence Reconstruction
113
analysis will be presented and discussed with special detail. All the steps involved in the approach will be also described in a manner that allows the reader to understand the key principles behind them.
2
ASR to Study Enzymes To showcase how ASR can be implemented in an enzymology investigation, a few selected examples will be shortly presented in this section. These illustrate different aspects of enzyme biochemistry as: how a catalytic activity is encoded in the sequence of enzymes and changes over time [11], how catalysis emerges de novo from non-catalytic precursors [12], and what determines the physicochemical features of a family of enzymes [13]. Gene duplication plays a fundamental role in evolutionary innovation. The yeast α-glucosidases (MALS) form an enzyme family that metabolizes complex carbohydrates and have undergone several gene duplications along its history. Voordeckers et al. [11] tackled the analysis of this enzyme family by ASR to delineate the residues and structural features that permit substrate preference. They found that present-day enzymes derive from a promiscuous ancestor (ancMALS) capable of transforming maltose-like as well as isomaltose-like sugars. From this promiscuous ancestor, a series of duplication events lead to the substrate optimization for either maltose or isomaltose-like carbohydrates. The structural analysis revealed that these activities cannot coexist in any extant enzyme. Interestingly, the substitutions responsible for these changes were found to be dissimilar along the different postduplication paths. Hence, modern-day maltoses and isomaltoses employ slightly different active sites to cope with their activities. This case portrays that a particular biochemical activity can emerge at different times by different causal substitutions. The ASR approach provided access to these paths and unveiled which substitutions were crucial towards switching between functions. Another interesting case is the emergence of a bona fide enzyme from a non-catalytic ancestor. This is the case reported by Clifton et al. [12] in which the evolution of the cyclohexadienyl dehydratases (CDT) was investigated by ASR. The evolutionary trajectory from the solute-binding proteins (SBP) to the CDT revealed the existence of a non-catalytic ancestral protein from which catalytic proteins descended. The oldest non-catalytic ancestor (AncCDT-1) was identified as an amino acid binding protein with affinity for positively charged amino acids, while its descendant (AncCDT-2) was a carboxylic acid binding protein. The biophysical analysis of these ancestors demonstrated that some substitutions had been accommodated already in the binding cavity that allowed a descendant of this protein, and the last common ancestor of catalytic
114
Maria Laura Mascotti
enzymes (AncCDT-3), to gain an active site relatively easy. However, not only the emergence of the catalytic activity was unveiled, but also the subsequent steps that delivered a proficient enzyme. Remarkably, this study portrays how by natural evolution the transition from a non-catalytic to a catalytic protein occurred. This process provides valuable insights for computational design and directed evolution to improve enzyme activities. The last example involves the reconstruction of the mammalian flavin monooxygenases (FMOs). These enzymes along with the cytochrome P450s form the defense system of humans towards toxic compounds. Despite their physiological significance, until recently their structure remained unknown due to their problematic expression and purification as recombinant enzymes. In a recent study by Nicoll et al. [13], the evolutionary history of tetrapod FMOs was investigated. It was shown that the last common ancestor of all mammals already encoded the five known FMO paralogs (FMO1-5). By ASR, last common ancestors of three of these paralogs clades were resurrected: AncFMO2, AncFMO3-6, and AncFMO5. The structure of all the resurrected ancestors was solved providing central evidence on the catalytic mechanism and substrate acceptance of these enzymes. This work elegantly demonstrates the power of ASR to characterize elusive enzymes and to use ancestors as models of modern-day enzymes. The “ancestral superiority” concept deserves a final mention in this section. Many articles document a trend in reconstructed ancestors whereby they display higher thermal stabilities and are catalytically more promiscuous [14, 15]. Despite that the main implication of such trend directly affects how protein evolution is conceived; it also impacts the field of protein engineering [16]. However, the lack of systematic analyses, the variability in the experimental methodologies, and the possibility that inference methods are biased towards stabilizing mutations do not guarantee consistently achieving these features in all ASR applications [17] (see Note 1). The implementation of ASR with the sole aim of obtaining more robust and stable enzymes is not a solid ground. Instead, getting insight into the historical and physical determinants of enzyme functionality would be a solid basis for research.
3
The Steps of ASR As it was stated by Joseph Thornton, “an enzyme resurrection study, as with all good science, begins with a question” [18]. A question that can be answered by investigating the catalytic or physicochemical features of ancestral forms of modern-day enzymes. To present all the steps of an ASR-based investigation, a toy scenario will be discussed as an example. Two different monooxygenases named Enz2 and Enz6 have been biochemically
Resurrecting Enzymes by Ancestral Sequence Reconstruction
115
characterized (Fig. 1). Enz2 catalyzes the formation of esters from ketones and Enz6 transforms sulfides into sulfoxides. While these two enzymes belong to the same protein family, they display different substrate specificity. Our aim is to understand which residues define the substrate preference in order to design mutants for further applications. Previously, single-point mutations inspired by the differences among both sequences were tested. However, no clear hints were obtained about the identity of the residues defining the observed specificities. Therefore, the research is now going to be approached from an evolutionary biochemistry perspective. First step (Step I) consists in gathering a group of homologous sequences to our enzymes. To do this, homology searches will be conducted using Enz2 and Enz6 as queries in protein databases. The taxonomy of the mined organisms will be carefully considered to achieve good representation. Later, the collected sequences will be compared among them in a multiple sequence alignment, the best-fit evolutionary model will be determined and the phylogeny inferred (Step II). Once the tree is available, the distribution of the known enzymes will be analyzed. In this case we observe that the group containing Enz2 is formed by enzymes displaying preference for ketones, while the clade including Enz6 shows preference for sulfide as substrates. This scenario could be interpreted as two groups with well-defined substrate preferences emerging from a common ancestor of unknown preference. The Step III consists in the inference of the sequences of the ancestors (i.e., the nodes in the phylogeny) at all branching points in the tree. The ancestral sequences will be the string of the most probable amino acids (i.e., states) for each site. Three specific sequences will be investigated next. These are AncA, which is the ancestor of the clade with preference for ketones, and AncB which is the ancestor of the clade with preference for sulfides. We expect the phenotype (i.e., the substrate preference) of these to be the same as their derived enzymes. The third sequence will be AncC, which is actually the common ancestor between AncA and AncB, and its phenotype is unknown. Later (Step IV), the targeted ancestors will be acquired as synthetic genes and the proteins expressed and assessed for the activity of interest. As expected, AncA and AncB display the same substrate specificity as their extant counterparts. In our example, AncC shows the same preference for ketones as AncA. Mapping these traits into the tree (Step V) reveals that along the branch connecting AncC to AncB a series of substitutions caused the change in the substrate preference. To unveil which of these substitutions has a crucial role in defining the trait, both sequences will be compared and single-point mutations designed. These mutations will be introduced into the AncC sequence (Step VI). The aim of this step will be to characterize these mutants aiming to find the switch of the original substrate preference of AncC towards the one of AncB. In this example, with a single amino acid swap the switch is
116
Maria Laura Mascotti
Fig. 1 Steps in the reconstruction of ancestral enzymes. The example described in the text is depicted. Step I: initially, the diversity in the substrate specificity (trait) is identified. Then, sequences of selected enzymes are used in homology searches to build the dataset. Step II: after the construction of an alignment a tree is inferred. This has to include a root (Enz8) to polarize the phylogeny. The trait investigated is mapped on the tree: in red, branches describing ketone specificity are shown, whereas in blue, the preference for sulfides. Step III: the ancestors in the phylogeny are inferred. Particularly, the sequences of those to be experimentally characterized (AncC, AncA, and AncB) are analyzed. Step IV: the ancestral proteins are resurrected in the lab and assessed for their substrate specificity. Step V: the changes along the branch connecting AncC to AncB (highlighted in orange) accounting for the switch in the substrate specificity are identified. Step VI: mutations are introduced into the ancestral scaffold of AncC to recover the phenotype of the derived specificity for sulfides as substrates. Finally, the set of residues accounting for the switching in the substrate specificity is identified
accomplished. Clearly this is an over-simplistic example but it highlights the power of the approach: the universe of possible substitutions accounting for a change in an enzyme phenotype will be reduced to those occurring on a specific branch of a phylogeny. By simply comparing extant sequences, it would be much harder to find these substitutions, because the number of differences is larger. This is the virtue of the vertical approach compared to the classic horizontal strategy. Additionally, framing an enzyme functionality
Resurrecting Enzymes by Ancestral Sequence Reconstruction
117
investigation into molecular evolution can lead to better interpretations of experimental data. A few automated algorithms to perform ASR have been published [19]. Though it is not the intention of this chapter to criticize these, before trying to understand the evolutionary history of an enzyme family as a black box, a step-by-step approach would be more suitable. In this way, the researcher will get familiarized with the enzymes of interest and learn all the details and peculiarities about their evolutionary history. Then, hypotheses on the origin and extent of enzyme function might be formulated. In the next sections, the complete workflow to perform ASR will be dissected into three consecutive stages. For each of these stages, the tasks, computational tools, and analyses to carry out will be described in detail.
4
Commence with a Reliable Evolutionary History
4.1 Building the Dataset
The starting point of this workflow is an interesting biochemical feature to investigate in an enzyme family [20]. This could be substrate specificity [11, 21], catalytic mechanism [22], oligomerization state [23], a particular physicochemical property [24], and many others [25]. It is important to begin from sequences for which experimental evidence confirms the selected trait. Although this might seem obvious, automated functional annotations are often not accurate [26] (see Note 2). Once the query is defined, a homology search is conducted [27] (see Note 3). Therein the simplest case, there is a single copy of the sequence for each organism so orthologs are collected with good confidence. However, enzymes often have multiple copies (paralogs) among the different taxa, and that number may also change from one species to another. In that case, the search has to be rigorously conducted to detect all possible paralogs (for definitions of these terms, see [28, 29]). A good strategy to do that is to restrict the search to taxa with full genome representation to avoid false negatives. The taxonomy has to be considered when constructing a dataset. This will ensure a good representation of all taxa and will avoid further problems in the phylogeny as the presence of long branches [30, 31]. TimeTree is a remarkable knowledge database that provides the taxonomy for any lineage in the timescale of life [32]. This can be used in combination with the NCBI taxonomy database, which provides the taxonomy as well as information on the genome availability [33]. However, an even better approach would be to first conduct a thorough literature search for recent phylogenetic analyses of the taxonomic group in which the enzymes are found. This process will lead to a dataset fulfilling two central principles: representativity and robustness. The first denotes that the sampled enzymes should capture all the taxonomic and sequence
118
Maria Laura Mascotti
diversity among the protein family investigated. Hence, the dataset should not be biased to any specific taxa or to a particular group of highly similar sequences. The robustness refers to an independence from the programs used for analysis. The evolutionary relationships among the sequences should always be consistent. These two concepts are graphically represented in the next section. 4.2 Alignment of the Sequences and Selection of the Best-Fit Model
Once protein sequences are collected, they have to be aligned in a multiple sequence alignment (MSA). A MSA is a hypothesis of the positional homology of sites among sequences [34]. Different software are available and the choice merely depends on the users’ preferences. MAFFT has been extensively implemented [13, 23] as large data can be processed obtaining reliable alignments [35, 36] . Based on the first MSA, a fast tree can be constructed to analyze how the dataset looks and based on this, make decisions such as whether to reinforce some groups, remove redundant sequences, etc. This “guide tree” will be a Neighbor Joining (NJ) tree that represents fairly well how sequences are related among them in terms of similarity [37]. This step might take the user back to the sequence searching stage and to iterate the previous steps until reaching a dataset that is reliable, i.e., has good taxonomic and sequence representation, and it is not redundant (Fig. 2). When a robust dataset and a MSA are available, two steps follow to obtain a phylogeny. First, trimming the single sequence extensions and removing poorly aligned regions. This will significantly decrease the computational power required for the phylogeny construction (for a detailed discussion about this topic, see [38, 39]). Secondly, as amino acids are not chemically equivalent, some substitutions will be more frequent than others at a given dataset. Then, a substitution model has to be chosen [40]. Likewise, sequences do not evolve at the same rate across its entire length [41]. This is captured by the gamma distribution parameter α, which indicates rate variation [42]. These parameters will be included in the model of evolution. ProtTest [43] can be used to select which of a number of available models best fits data, and whether a gamma distribution should be included. Otherwise, some phylogeny inference programs (as PhyML and RaxML, see below) provide an option for automatic model choice [44].
4.3 Inferring the Phylogeny
Once the MSA and model are known, a phylogeny can be constructed. It is possible that a good phylogeny including the same taxa as the protein dataset is already available. Hence, this can be used for the ASR. Nevertheless, the most common approach is to infer the phylogeny by using a probabilistic method, which can be either Maximum Likelihood (ML) and/or Bayesian inference. ML is our method of choice, and most often used for ancestral reconstructions (see Note 4). Among the software available to implement
Resurrecting Enzymes by Ancestral Sequence Reconstruction
119
Fig. 2 Building a representative and robust dataset. The construction of a dataset is schematically represented in three stages. Dataset (1) lacks of good representation and this impacts in the tree topology. To break the long branch of Enz2, more homologous sequences are added to Dataset (2). The obtained tree topology improves as long branches are not observed but there is still poor representation in each group. After n iterations among homology search/multiple sequence alignment/guide tree contruction, Dataset (n) is obtained. This shows good representation and robustness
ML method, RAxML [45] and PhyML [46] stand out (see Note 5). In order to assess the robustness of the obtained tree, support values for individual branches can be calculated. In the ML inference, the preferred method is the bootstrapping [47, 48]. Recently, a renovation of this concept was developed in light of the massive amount of genetic data available. This is called the transfer bootstrap expectation, which addresses some of the problems with classic bootstrap as a systematic tendency for low support at deep nodes [49]. The other inference method applies Bayesian statistics to calculate phylogenies (see Note 6). One of the standard programs for this inference is Mr. Bayes [50]. A friendly biologist’s guide to perform Bayesian phylogenetic analysis has been outlined by Nascimento et al. [51]. 4.3.1 Rooting the Phylogeny
A node in a phylogeny represents the common ancestor of two extant taxa or groups of taxa. The order of the nodes, and hence
120
Maria Laura Mascotti
Fig. 3 The steps at the first stage of an ASR project are graphically summarized. In the violet boxes, the characteristics of the dataset and the phylogeny are highlighted
ancestor-descendant relationships between them, depends on where the root is placed. So before the tree can be interpreted, it has to be rooted [52]. The root will polarize the order of evolutionary events. In general, rooting is usually done based on the outgroup criterion: a series of sequences is included in the alignment which is known to cluster outside the sequences of interest because of some external information. The root is then placed between the outgroup and the ingroup. A simple way to find an outgroup is to use the species phylogeny. If the evolutionary history of our protein family follows the species evolution, the protein tree has to show the same topology than the species tree. Therefore, sequences from related taxa to the ingroup—but not as closely related to be part of it—will be included [53]. TimeTree is a great resource to help with this task, but again, a thorough literature search is preferable. Likewise, if the reconstruction of a whole protein family or superfamily is pursued, an outgroup formed by another structurally related family can be used (see [54, 55] for examples). Another option is paralog rooting in case the investigated family has underwent duplication events. Then, it is possible
Resurrecting Enzymes by Ancestral Sequence Reconstruction
121
to use as an outgroup a clade of paralogs and thus the root will be placed at the duplication event [56, 57]. In Fig. 3, all the steps described in this section are summarized. Recently, a valuable review on building phylogenies by Kapli et al. has been published [58]. They present the major steps of phylogenetic inferences and also explain the sources of errors and some strategies to mitigate them. 4.4 Propose Evolutionary Hypotheses: Defining the Target Ancestors
Once the construction of a robust phylogeny has been achieved, it is the turn to decide which will be the target ancestors. This means that the tree (i.e., the evolutionary history) has to be analyzed in a comprehensive way in order to propose the research hypotheses. Initially, the biochemical features of the investigated enzymes should be mapped into the phylogeny (Fig. 4a). At this stage, all the available knowledge in structural protein databases about the investigated enzymes should be included (see Note 7). Different distribution patterns could be observed in the tree and thus two opposite situations will be described following the line of the example presented in Fig. 1. By analyzing the activity of all the enzymes in the phylogeny, it is possible to discover that the trait (here the substrate specificity) is actually not a trend but a single case, i.e., an autapomorphy (Fig. 4b). In this case, performing a horizontal analysis would be more appropriate. By comparing extant sequences among them, the substitutions causing the change in the substrate scope of Enz2 will become evident. A different scenario is depicted in Fig. 4c. In this case, a gene duplication event generating two derived clades (shown in red and blue branches) is observed. The substrate scope is a trait shared on each group by all the derived sequences and its common ancestor, i.e., a synapomorphic trait. This is a good case to be investigated by ASR. Different examples in the literature dealing with scenarios like this have been reported [11, 59, 60]. Back into the example, three ancestors—one pre-duplication (AncC) and two post-duplication (AncA and AncB)—would be selected for their experimental characterization. Different hypotheses can be formulated envisioning possible outcomes. For example, it is conceivable that a neofunctionalization event occurred so that the pre-duplication ancestor shows the same substrate scope as one of the derived ancestors. Therefore, the set of substitutions causing the switch in the substrate scope would be contained in the branch connecting AncC and AncB (Fig. 4c, right upper part). Alternatively, it could be hypothesized that the pre-duplication ancestor showed a mixed phenotype, thus two subfunctionalization or functional optimization paths took place yielding the red and blue substrate scopes. These are just two possible hypotheses for an enzyme distribution like the one observed in Fig. 4c and others could be enounced. The formulation of one or another will be contingent on the background of the investigated protein family.
122
Maria Laura Mascotti
Fig. 4 Defining the target ancestors. (a) The Enzs experimentally characterized are mapped in the tree. The branch coloring is consistent with the example depicted in Fig. 1: in red, the substrate preference for ketones and in blue, for sulfides. The outgroup is colored in green to emphasize its different substrate specificity. (b) A trait distribution is shown in a way that the preference for ketones as substrates is an autapomorphy. To continue with an experimental horizontal analysis is suggested. (c) A duplication scenario is depicted with two derived groups showing different substrate specificity. The different substrate preferences behave as synapomorphic traits. A further vertical analysis is suggested. On the right two hypothetical scenarios explaining the observed distribution are proposed. These represent the possible outcomes of the experimental characterization. The thick orange branches show the switching functionality paths. The boxes over the branches represent the substitutions colored as switching functions (red or blue according to the specific substrate specificity) or neutral (white)
Spending time analyzing in detail the possible outcomes is a fundamental exercise as it will ease the interpretation of the ancestors’ experimental characterization.
5
Ancestral Sequence Reconstruction by Maximum Likelihood
5.1 Brief Theoretical Background
To infer the ancestors from a protein family, a MSA containing the sequence data along with, the model and the tree describing its evolutionary history, are needed. The method proceeds on a site-
Resurrecting Enzymes by Ancestral Sequence Reconstruction
123
by-site basis and the likelihood (L) of each of the 20 possible amino acids (x), at each site is calculated on the assumed parameters. Consequently, the posterior probability (P) of the ancestral states at each site (n) can be determined. In simplified terms: L n ðxjdata, tree, modelÞ ¼ P n ðdatajtree, model, x Þ The best assignment at a site will be the amino acid that displays the highest posterior probability, which will be called the maximum a posteriori (MAP) state. The ancestor sequence will be the string of the MAP states at the reconstructed sites [61]. This strategy is called the empirical Bayes (EB) approach, because the estimated parameters by ML are used in the calculation of the posterior probabilities of the ancestors [62]. A distinction can be made between the marginal and joint reconstructions. The marginal reconstruction assigns a state at a particular node, while the joint reconstruction assigns a set of states to all the nodes. Although both approaches use slightly different criteria, they normally produce consistent results. The marginal reconstruction is more suitable for ASR studies [62]. The sources of uncertainty in a reconstruction might be different [10, 63]. The accuracy will be largely dependent on the ML estimates as are the tree and all the parameters describing the evolutionary model. The reliability of the reconstruction diminishes as the extant sequences become more divergent and sites more highly variable. Hence, building the phylogeny is the crucial foundation (see Subheading 4). The accuracy of the reconstruction can be estimated by the mean posterior probability over all sites at a given node. As a reference, a mean P > 0.8 is a good estimation. However, there will be sites with better reconstruction probabilities than this threshold as well as others will lower values. The problem then falls in accepting only the MAP states while ignoring suboptimal but possible reconstructions. As all the P values are calculated for all possible states at a particular site, these uncertainties can be easily spotted. The ambiguously reconstructed sites will be those reconstructed with a P < 0.8 and with the second-highest state showing a P > 0.2. In that case, that second best reconstruction is called the alternative state [64]. 5.2 ASR in PAML Programs
The described method is implemented in the PAML (Phylogenetic Analyses by Maximum Likelihood) package under the programs baseml (for nucleotide sequences) and codeml (for protein-coding sequences as codons or amino acids) [65]. This package is by far the most preferred by the community of evolutionary biochemists (see [13, 23, 66] for examples). Furthermore, its accuracy has been experimentally validated by Randall et al. [67]. A revision of the different ASR methods and a comprehensive list of all available software has been presented by Joy et al. [68].
124
Maria Laura Mascotti
The input will practically consist of the MSA, the tree, and the substitution model. Once the program is executed, the output files will be generated (see Note 8 for details). The sequences of all reconstructed ancestors will be obtained. Besides, the posterior probability distribution for all states at a particular site for any ancestor will be available to analyze the accuracy of the reconstruction. 5.3 Dealing with the Reconstructed Sequences
The ancestral sequences directly extracted from the codeml output are crude sequences. They need to be examined before their experimental analysis in order to define its length, ancestor/descendant polarization and accuracy.
5.3.1 The Length of the Reconstructed Ancestors
The program will export ancestors which are the same length as the MSA length, i.e., if we submitted a MSA with 600 sites, all the reconstructed ancestors will be 600 amino acids in length. Clearly, this is an unrealistic result because there are insertions and deletions generating gaps in the MSA. Hence, how do we confidently define the length of the ancestors—i.e., which insertions/deletions were present in the different ancestors? Various approaches have been proposed to deal with gaps [69, 70]. A straightforward and reliable way consists in coding the gaps by parsimony. This can be done by analyzing the length of the modern-day sequences derived from a particular ancestor along with the sequences in the sister clade or outgroup. By parsimony, we can basically examine if an insertion causing a gap in the MSA was present in the sequence of the targeted ancestor or not. Parsimony sometimes cannot resolve whether a gap was present or absent in particular ancestors, because either would imply an equal number of changes along the tree. In these cases, the experimenter will have to use their biochemical intuition, or synthesize two versions of the gene, one with and one without the insertion in question. Once the length of the targeted ancestors is defined, all the incorrectly placed sites can be removed and the mean overall P of the reconstruction recalculated.
5.3.2 The Problem of the “Root Reconstruction”
In PAML, a rooted tree has a bifurcation at the root, while an unrooted tree has a trifurcation or multifurcation at the root. As most of the evolutionary models are time reversible, probabilities will be independent of the position of the root [69, 71]. However, for calculation purposes the algorithm places the “root” in an arbitrary place at the root branch and treats it as a real node. Therefore, a reconstructed sequence will be obtained for that position. However, this should never be considered for experimental characterization as its position along the branch is underdetermined and arbitrarily assigned to the middle [66]. Characterizing unreal root ancestors is a frequent error in ASR studies [72]. The way to overcome this is simply by building a phylogeny with an outgroup to root the group of enzymes of interest (as described in
Resurrecting Enzymes by Ancestral Sequence Reconstruction
125
Subheading 4.3.1), and then perform the reconstruction. There are certain cases where an outgroup cannot be obtained. This is for example the case for proteins that existed in the last universal common ancestor of all cellular life (LUCA). ASR cannot be used to infer the sequences of ancestral proteins that existed in LUCA (certain exceptions exist when multiple paralogs were present in LUCA, but for most questions about LUCA’s biochemistry, ASR is not a suitable method). 5.3.3 The Accuracy of the Reconstructed Sequences
6
Once the targeted ancestors have been selected, they should be analyzed in terms of their mean overall probability as well as the posteriors at each site. As explained before, for those sites ambiguously reconstructed, there will be suboptimal alternative states. One should recall that the aim of ASR is to recover the physicoor biochemical features of an ancestor and not its precise sequence. In this sense, the phenotype has been demonstrated to be robust to the uncertainties in the reconstruction [67]. Therefore, to prove the reconstruction is reliable, these suboptimal states should be experimentally tested as well. The strategy is to characterize the ML ancestor as well as the AltAll ancestor, consisting of all the suboptimal states for the ambiguously reconstructed sites in the same sequence [59]. This ancestor would be the “worst” possible combination of states. Thus if the phenotype is recovered, the reconstruction will be robust to uncertainties [64]. It is necessary to characterize these two versions of the targeted ancestor as in ASR there are no positive controls since it is a historical approach. In Fig. 5, all the steps described in this whole section are summarized.
Experimental Characterization of the Ancestors: The Evolution of Enzyme Function As depicted in Figs. 1 and 4, when following a biochemical trait in a phylogeny, its emergence can be circumscribed to a branch connecting two nodes. That is, the trait can be encapsulated into a defined set of substitutions and in consequence dissected. Initially, at least three ancestors will be screened for the activity [73]. Once the switch in the function has been detected, the two ancestors (and their alternative sequences) at the nodes connected by the identified branch will be thoroughly characterized [59, 74]. Gene sequences of the ML ancestors can be ordered codon optimized to express the recombinant proteins in any desired host. Regarding the Alt ancestors, if these are just a couple mutations away from the ML version they can be produced by site-directed mutagenesis [75]. Otherwise, these should be ordered as well and expressed along with the ML ancestors. Once the sequences have been cloned and expressed in a convenient system according to the biochemistry of the enzyme studied, they have to be characterized
126
Maria Laura Mascotti
Fig. 5 The steps to perform the reconstruction of the targeted ancestors and to analyze them are graphically summarized
[76]. The methodological strategy will also depend on the features of the enzyme investigated, in some cases it is necessary to purify the proteins [13, 60], while in others in vivo activity assays are more suitable [11, 25, 75]. When the ancestors have been characterized biochemically, we can deduce the order of evolutionary events and determine which function was ancestral. Next, we can examine the substitutions that occurred along the branch that contains the change in function. As stated before the specific group of substitutions responsible for the change in function must be among the sequence differences between the two ancestral proteins flanking that branch. Therefore, a general strategy would be to introduce these substitutions by single-point mutations into the enzyme displaying the ancestral function to recover the derived one. If we go back to the example in Fig. 1, the AncB shows the new, i.e., derived, phenotype. Therefore, we would introduce the mutations in AncC to capture the emergence of the substrate preference exhibited by AncB. This
Resurrecting Enzymes by Ancestral Sequence Reconstruction
127
means that a number of AncC* mutants will be expressed and screened for the substrate preference. A rational way to approach this mutational analysis would be by creating shells—from the proximity of the substrate binding pocket towards the surface [11, 12]. If the number of substitutions is too high, these can be divided into structural categories (e.g., forming the active site, defining the entrance for substrates) and thus several changes may be tested at the same time. This approach would lead to find a subgroup of necessary substitutions for the activity. Afterwards, specific mutations among this subgroup will allow the changes that are sufficient to define the trait to be discovered. Regardless of the mutational scheme details adopted, after a few rounds of experimental analyses the sites defining the trait will be revealed [21, 25, 54]. The occurrence of mutational epistasis is a phenomenon that deserves special attention. In the context of protein evolution, it is evidenced when a set of mutations define a biochemical trait working together in a non-additively manner, deviating from the prediction based on their individual effects [77]. The occurrence of epistasis will directly have an impact on the experimental scheme designed, as extra rounds of different substitution combinations will be required. Finally, it should be remarked that the trajectories leading to the observed biochemical activity might be vast and thus ASR provides a glance into some of those [11, 12, 78]. The rationalization of them relies on conducting a thorough investigation.
7
Final Considerations ASR is a very powerful approach for enzyme characterization. It can unveil the causal mutations of almost any feature of a protein. Also, the impact of the molecular architecture of these traits in the close and distal operating environment of the enzymes can be disclosed. Especially relevant for protein engineering, understanding protein functions in such detail directly impacts how they may be finetuned and applied for specific purposes. Constructing a reliable phylogeny is the core of a robust and sound ASR-based investigation and the approach still requires a large degree of manual intervention and knowledge of phylogenetics. This knowledge is not often taught to enzymologists, but the power of the approach should make learning it worthwhile.
8
Notes 1. There is a widely acknowledged notion that ancestral enzymes are more tolerant to high temperatures as well as being more promiscuous in their activities compared to modern enzymes.
128
Maria Laura Mascotti
As a consequence of this, ASR is often applied as means to get more robust proteins, regardless the evolutionary context. These two notions can be dissected as follows. The higher thermal stability observed in many reconstructed enzymes might be a consequence of the accumulation of different stabilizing mutations along diverse lineages. Then, ML inference will be biased by the dataset towards the incorporation of these mutations in the reconstructed ancestor. Other possible methodological sources accounting for inflated thermal stability had been analyzed in detail as well [17]. Therefore, there is no enough evidence to claim that ancestrally reconstructed enzymes will always be more thermostable than their extant counterparts. Regarding specificity, it has to be analyzed in the context of gene duplication as the major source of enzyme diversity. Once a gene is duplicated, the two daughter copies will suffer different fates [79]. De novo functions can emerge as well as moonlighting activities can evolve to become main functions. Then, there are no reasons to state that ancestral enzymes were more promiscuous than modern ones. Usually, extant enzymes are not experimentally assayed for their promiscuity like extinct enzymes are. This topic is pleasantly presented by Siddiq et al. [80]. As corollary, further systematic research is needed to clearly unveil the occurrence of these traits in reconstructed enzymes. 2. Less than the 0.3% of available sequences in UniprotKB has been manually annotated and reviewed [81]. This means that this 0.3% of sequences has gone through a process of revision by an expert curator aided by computational methods and data from the literature [82]. That leaves a vast amount of available protein sequences to be automatically annotated on their functionality. Though useful, this kind of annotation is far from being accurate, especially if there is little experimental evidence of the protein family [83]. Besides, contamination in massive sequencing projects obscures the record [84]. Therefore, when building the dataset the function of unknown sequences (those obtained by homology searches labeled as “predicted” or “automatic annotation”) should be carefully considered until a robust phylogeny is available. The topology of the tree will help to envision the probable biochemical activity of uncharacterized sequences by their clustering with known enzymes. 3. Performing homology searches is the first task when building a dataset. When the user is not familiarized with this process, the best way to approach it is by reading the help or user guide of the widely acknowledged tools for this purpose. These are basically BLAST (https://blast.ncbi.nlm.nih.gov/Blast.cgi) and HMMER (http://hmmer.org/). A detailed workflow to perform homology searches in BLAST specifically has been
Resurrecting Enzymes by Ancestral Sequence Reconstruction
129
described by Pearson [27]. Also Uniprot (https://www. uniprot.org/) provides the option of running BLAST algorithm as well. 4. Briefly, the likelihood is defined as the probability of observing the data, i.e., the aligned sequences, when the parameters describing the tree (branch lengths, topology, and model of evolution) are given [85]. In a simplistic way: L ðdatajtreeÞ The ML criterion consists of choosing the tree that maximizes this likelihood. Then, if all sites (n) are assumed to evolve independently, the likelihood of the whole dataset (Ltot) is the product of the likelihoods at the individual sites. In the same way, the log of the likelihood is the sum over the logs of the site likelihoods [86]. X ln L n ln L tot ¼ n
5. Often users learn to implement programs to build phylogenies relying on the experience of colleagues. This is probably the best starting point. However, one might reach the stage when this knowledge is not enough and more details on running the analysis are required. The user manuals of PhyML (http:// www.atgc-montpellier.fr/phyml/usersguide.php) and RaxML (https://cme.h-its.org/exelixis/resource/download/ NewManual.pdf) are sources of extremely valuable practical knowledge. These are written in a great detail providing explanations, fundamentals, and advise on how to run the programs in a suitable way according to the analysis and expertise. This Note can be extended to any of the other phylogeny programs mentioned in this chapter as Mr. Bayes (http://nbisweden. github.io/MrBayes/manual.html) and PAML (http://aba cus.gene.ucl.ac.uk/software/pamlDOC.pdf). Additionally, other resources as blogs and specific discussion forums/groups are a source of tricks and help for naive users. 6. In Bayesian inference, instead of searching the single best tree as ML does, a larger number of high likelihood trees are sampled to account for uncertainty. Therefore, it calculates among the universe of all possible trees for a particular dataset and a set of priors, the (posterior) probability of a given tree [62]. This can be expressed as: P ðtreejdataÞ As sequence datasets are highly informative, the choice of other priors (e.g., topologies, branch lengths) are not crucial and it is possible to use uniform or non-informative priors. The support among the branches is given by the posterior
130
Maria Laura Mascotti
probability (P). This concept can be enounced as the conditional probability of observing the inferred tree (or a clade) after the evidence, which is actually the dataset. 7. Three-dimensional structures are the last sign of common evolutionary origin of proteins [87]. In consequence, to include the structural information in phylogenetic inferences is a powerful resource [88, 89]. This can help to find remote homologs and outgroups, understand the mechanisms of evolution, define constraints, among others [90, 91]. For example, structure-based phylogenies can be constructed when sequence identity is low as in the case of remote homologs. Although this kind of tree will not be directly employed in ASR, it will help to delve into the biochemistry and mechanisms of evolution of the protein family investigated. CATH/Gene3D [92, 93] is a consistently developed database that provides evolutionary information of protein domains. Similarly, Pfam [94] and Interpro [95] are databases including functional information into an evolutionary-based structural classification of protein families. There are a lot of resources providing valuable information; hence it is worthy to explore them at the beginning of research. 8. As stated in Notes 3 and 4, user guides provide the fundamentals on how to run the programs and this must be consulted before running them. Codeml can be executed in any environment without problems. Here, just some minimal/ naive suggestions for the first-time user: (a) In Windows-based systems, the most efficient way to run it is at the command line, even that a running interface is provided. (b) All input files (tree, msa, dat & ctl), the program itself and the specified outfile have to be in the same folder. (c) Sequence names always have to be less than 20 characters. These should be exactly the same among the tree and the msa files. (d) The tree has to be saved without branch supports. Branch lengths can be included and this will be used as starting values in the calculation. (e) If the gaps distribution in the MSA is complex, the input msa file will likely have a different length than the one used to build the phylogeny. It is convenient to use the full-length MSA with only N/C-terminal unique extensions trimmed for the reconstruction. The length of the reconstructed ancestors can be later determined according to the phylogeny as described in Subheading 5.3.1. (f) In the dat file, the substitution model is described. In the substitution model, the amino acid frequencies can be
Resurrecting Enzymes by Ancestral Sequence Reconstruction
131
replaced by the empirical frequencies of the investigated dataset. (g) The ctl file is self-explicative. The gamma distribution value can be indicated or re-inferred in the reconstruction. The number of rate categories can be played according the specific dataset to improve the reconstruction. Also, it is important to indicate in this file that the full posterior probability distribution from the marginal reconstruction should be printed in the output file. Otherwise, only the ML sequences of the reconstructed ancestors will be printed. (h) The output (rst) is a plain format file that summarizes all the information of the reconstruction. It provides a tree (called tree with node labels for Rod Page’s) with the reconstructed ancestors numbered at the nodes so the targeted ones can be easily spotted. The sequences of the ML ancestors are provided as well as the list of the posterior probabilities for all states at any site in the reconstruction. This file can be processed or parsed by the user as desired.
Acknowledgments I thank Georg K. A. Hochberg for the enlighten discussion and for the suggestions about the chapter structure and contents. I also thank Callum R. Nicoll and Martı´n A. Palazzolo for their careful critical reading of the manuscript and to Marco W. Fraaije and Maximiliano Juri Ayub for their comments. This work was supported by the European Union’s Horizon 2020 research and innovation program under grant agreement No 847675 and the ANPCyT (Argentina) PICT 2016-2839 to MLM. MLM is a member of the Researcher Career of CONICET, Argentina. References 1. Fersht A (1999) Structure and mechanism in protein science: a guide to enzyme catalysis and protein folding. Macmillan, Basingstoke 2. Gerlt JA, Babbitt PC (2009) Enzyme (re)design: lessons from natural evolution and computation. Curr Opin Chem Biol 13 (1):10–18. https://doi.org/10.1016/j.cbpa. 2009.01.014 3. Harms MJ, Thornton JW (2013) Evolutionary biochemistry: revealing the historical and physical causes of protein properties. Nat Rev Genet 14(8):559–571. https://doi.org/10.1038/ nrg3540
4. Yang G, Miton CM, Tokuriki N (2020) A mechanistic view of enzyme evolution. Protein Sci 29(8):1724–1747. https://doi.org/10. 1002/pro.3901 5. Kaltenbach M, Tokuriki N (2014) Dynamics and constraints of enzyme evolution. J Exp Zool B Mol Dev Evol 322(7):468–487. https://doi.org/10.1002/jez.b.22562 6. Floudas D, Binder M, Riley R, Barry K, Blanchette RA, Henrissat B, Martı´nez AT, Otillar R, Spatafora JW, Yadav JS, Aerts A, Benoit I, Boyd A, Carlson A, Copeland A, Coutinho PM, de Vries RP, Ferreira P, Findley K, Foster B, Gaskell J, Glotzer D,
132
Maria Laura Mascotti
Go´recki P, Heitman J, Hesse C, Hori C, Igarashi K, Jurgens JA, Kallen N, Kersten P, Kohler A, Ku¨es U, Kumar TKA, Kuo A, LaButti K, Larrondo LF, Lindquist E, Ling A, Lombard V, Lucas S, Lundell T, Martin R, McLaughlin DJ, Morgenstern I, Morin E, Murat C, Nagy LG, Nolan M, Ohm RA, ˜ as FJ, Patyshakuliyeva A, Rokas A, Ruiz-Duen Sabat G, Salamov A, Samejima M, Schmutz J, Slot JC, St. John F, Stenlid J, Sun H, Sun S, Syed K, Tsang A, Wiebenga A, Young D, Pisabarro A, Eastwood DC, Martin F, Cullen D, Grigoriev IV, Hibbett DS (2012) The Paleozoic origin of enzymatic lignin decomposition reconstructed from 31 fungal genomes. Science 336(6089):1715. https:// doi.org/10.1126/science.1221748 7. Pauling L Chemical paleogenetics. Acta chem scand 17:S9–S16 8. Gaucher EA (2007) Ancestral sequence reconstruction as a tool to understand natural history and guide synthetic biology: realizing and extending the vision of Zuckerkandl and Pauling. Liberles 83:20–33 9. Hochberg GKA, Thornton JW (2017) Reconstructing ancient proteins to understand the causes of structure and function. Annu Rev Biophys 46(1):247–269. https://doi.org/10. 1146/annurev-biophys-070816-033631 10. Garcia AK, Kac¸ar B (2019) How to resurrect ancestral proteins as proxies for ancient biogeochemistry. Free Radic Biol Med 140:260–269. https://doi.org/10.1016/j.freeradbiomed. 2019.03.033 11. Voordeckers K, Brown CA, Vanneste K, van der Zande E, Voet A, Maere S, Verstrepen KJ (2012) Reconstruction of ancestral metabolic enzymes reveals molecular mechanisms underlying evolutionary innovation through gene duplication. PLoS Biol 10(12):e1001446. https://doi.org/10.1371/journal.pbio. 1001446 12. Clifton BE, Kaczmarski JA, Carr PD, Gerth ML, Tokuriki N, Jackson CJ (2018) Evolution of cyclohexadienyl dehydratase from an ancestral solute-binding protein. Nat Chem Biol 14 (6):542–547. https://doi.org/10.1038/ s41589-018-0043-2 13. Nicoll CR, Bailleul G, Fiorentini F, Mascotti ML, Fraaije MW, Mattevi A (2020) Ancestralsequence reconstruction unveils the structural basis of function in mammalian FMOs. Nat Struct Mol Biol 27(1):14–24. https://doi. org/10.1038/s41594-019-0347-2 ´ , Sa´nchez-Murcia PA, Gago F 14. Corte´s Cabrera A (2017) Making sense of the past: hyperstability of ancestral thioredoxins explained by free energy simulations. Phys Chem Chem Phys
19(34):23239–23246. https://doi.org/10. 1039/C7CP03659K 15. Risso VA, Gavira JA, Mejia-Carmona DF, Gaucher EA, Sanchez-Ruiz JM (2013) Hyperstability and substrate promiscuity in laboratory resurrections of Precambrian β-lactamases. J Am Chem Soc 135 (8):2899–2902. https://doi.org/10.1021/ ja311630a 16. Semba Y, Ishida M, S-i Y, Yamagishi A (2015) Ancestral amino acid substitution improves the thermal stability of recombinant ligninperoxidase from white-rot fungi, Phanerochaete chrysosporium strain UAMH 3641. Protein Eng Des Sel 28(7):221–230. https:// doi.org/10.1093/protein/gzv023 17. Wheeler LC, Lim SA, Marqusee S, Harms MJ (2016) The thermostability and specificity of ancient proteins. Curr Opin Struct Biol 38:37–43. https://doi.org/10.1016/j.sbi. 2016.05.015 18. Thornton JW (2004) Resurrecting ancient genes: experimental analysis of extinct molecules. Nat Rev Genet 5(5):366–375. https:// doi.org/10.1038/nrg1324 19. Hanson-Smith V, Johnson A (2016) PhyloBot: a web portal for automated Phylogenetics, ancestral sequence reconstruction, and exploration of mutational trajectories. PLoS Comput Biol 12(7):e1004976. https://doi.org/10. 1371/journal.pcbi.1004976 20. Harms MJ, Thornton JW (2010) Analyzing protein structure and function using ancestral gene reconstruction. Curr Opin Struct Biol 20 (3):360–366. https://doi.org/10.1016/j.sbi. 2010.03.005 21. Kaltenbach M, Burke JR, Dindo M, Pabis A, Munsberg FS, Rabin A, Kamerlin SCL, Noel JP, Tawfik DS (2018) Evolution of chalcone isomerase from a noncatalytic ancestor. Nat Chem Biol 14(6):548–555. https://doi.org/ 10.1038/s41589-018-0042-3 22. Busch F, Rajendran C, Heyn K, Schlee S, Merkl R, Sterner R (2016) Ancestral tryptophan synthase reveals functional sophistication of primordial enzyme complexes. Cell Chem Biol 23(6):709–715. https://doi.org/10. 1016/j.chembiol.2016.05.009 23. Pillai AS, Chandler SA, Liu Y, Signore AV, Cortez-Romero CR, Benesch JLP, Laganowsky A, Storz JF, Hochberg GKA, Thornton JW (2020) Origin of complexity in haemoglobin evolution. Nature 581 (7809):480–485. https://doi.org/10.1038/ s41586-020-2292-y 24. Lim SA, Bolin ER, Marqusee S (2018) Tracing a protein’s folding pathway over evolutionary
Resurrecting Enzymes by Ancestral Sequence Reconstruction time using ancestral sequence reconstruction and hydrogen exchange. eLife 7:e38369. https://doi.org/10.7554/eLife.38369 25. Finnigan GC, Hanson-Smith V, Stevens TH, Thornton JW (2012) Evolution of increased complexity in a molecular machine. Nature 481(7381):360–364. https://doi.org/10. 1038/nature10724 26. Mitchell JBO (2017) Enzyme function and its evolution. Curr Opin Struct Biol 47:151–156. https://doi.org/10.1016/j.sbi.2017.10.004 27. Pearson WR (2014) BLAST and FASTA similarity searching for multiple sequence alignment. Methods Mol Biol 1079:75–101. https://doi.org/10.1007/978-1-62703-6467_5 28. Fitch WM (1970) Distinguishing homologous from analogous proteins. Syst Zool 19 (2):99–113 29. Koonin EV (2005) Orthologs, paralogs, and evolutionary genomics. Annu Rev Genet 39 (1):309–338. https://doi.org/10.1146/ annurev.genet.39.073003.114725 30. Heath TA, Hedtke SM, Hillis DM (2008) Taxon sampling and the accuracy of phylogenetic analyses. J Syst Evol 46(3):239–257 31. Hillis DM (1996) Inferring complex phytogenies. Nature 383(6596):130–131. https://doi. org/10.1038/383130a0 32. Kumar S, Stecher G, Suleski M, Hedges SB (2017) TimeTree: a resource for timelines, Timetrees, and divergence times. Mol Biol Evol 34(7):1812–1819. https://doi.org/10. 1093/molbev/msx116 33. Schoch C (2011) NCBI Taxonomy. National Center for Biotechnology Information (US). https://www.ncbi.nlm.nih.gov/books/ NBK53758/ 34. Carrillo H, Lipman D (1988) The multiple sequence alignment problem in biology. SIAM J Appl Math 48(5):1073–1082. https://doi.org/10.1137/0148063 35. Vialle RA, Tamuri AU, Goldman N (2018) Alignment modulates ancestral sequence reconstruction accuracy. Mol Biol Evol 35 (7):1783–1797. https://doi.org/10.1093/ molbev/msy055 36. Katoh K, Rozewicki J, Yamada KD (2017) MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization. Brief Bioinform 20 (4):1160–1166. https://doi.org/10.1093/ bib/bbx108 37. Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4(4):406–425.
133
https://doi.org/10.1093/oxfordjournals. molbev.a040454 38. Talavera G, Castresana J (2007) Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst Biol 56 (4):564–577. https://doi.org/10.1080/ 10635150701472164 39. Tan G, Muffato M, Ledergerber C, Herrero J, Goldman N, Gil M, Dessimoz C (2015) Current methods for automated filtering of multiple sequence alignments frequently worsen single-gene phylogenetic inference. Syst Biol 64(5):778–791. https://doi.org/10.1093/sys bio/syv033 40. Thorne JL, Goldman N (2004) Probabilistic models for the study of protein evolution. In: Handbook of statistical genetics. Wiley, Hoboken, New Jersey. https://doi.org/10.1002/ 0470022620.bbc05 41. Echave J, Wilke CO (2017) Biophysical models of protein evolution: understanding the patterns of evolutionary sequence divergence. Annu Rev Biophys 46(1):85–103. https:// doi.org/10.1146/annurev-biophys-070816033819 42. Yang Z (1996) Among-site rate variation and its impact on phylogenetic analyses. Trends Ecol Evol 11(9):367–372. https://doi.org/ 10.1016/0169-5347(96)10041-0 43. Darriba D, Taboada GL, Doallo R, Posada D (2011) ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics 27(8):1164–1165. https://doi.org/10.1093/ bioinformatics/btr088 44. Lefort V, Longueville J-E, Gascuel O (2017) SMS: Smart model selection in PhyML. Mol Biol Evol 34(9):2422–2424. https://doi.org/ 10.1093/molbev/msx149 45. Stamatakis A (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30 (9):1312–1313. https://doi.org/10.1093/ bioinformatics/btu033 46. Guindon S, Dufayard J-F, Lefort V, Anisimova M, Hordijk W, Gascuel O (2010) New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol 59 (3):307–321. https://doi.org/10.1093/sys bio/syq010 47. Felsenstein J (1985) Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39(4):783–791 48. Efron B, Halloran E, Holmes S (1996) Bootstrap confidence levels for phylogenetic trees. Proc Natl Acad Sci U S A 93
134
Maria Laura Mascotti
(23):13429–13429. https://doi.org/10. 1073/pnas.93.23.13429 49. Lemoine F, Domelevo Entfellner JB, Wilkinson E, Correia D, Da´vila Felipe M, De Oliveira T, Gascuel O (2018) Renewing Felsenstein’s phylogenetic bootstrap in the era of big data. Nature 556(7702):452–456. https:// doi.org/10.1038/s41586-018-0043-0 50. Ronquist F, Teslenko M, van der Mark P, Ayres DL, Darling A, Ho¨hna S, Larget B, Liu L, Suchard MA, Huelsenbeck JP (2012) MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst Biol 61(3):539–542. https://doi.org/10. 1093/sysbio/sys029 51. Nascimento FF, Md R, Yang Z (2017) A biologist’s guide to Bayesian phylogenetic analysis. Nat Ecol Evolution 1(10):1446–1454. https://doi.org/10.1038/s41559-017-0280x 52. Felsenstein J, Felenstein J (2004) Inferring phylogenies, vol 2. Sinauer Associates, Sunderland, MA 53. Nixon KC, Carpenter JM (1993) ON OUTGROUPS. Cladistics 9(4):413–426. https:// doi.org/10.1111/j.1096-0031.1993. tb00234.x 54. Mobbs JI, Di Paolo A, Metcalfe RD, Selig E, Stapleton DI, Griffin MDW, Gooley PR (2018) Unravelling the carbohydrate-binding preferences of the carbohydrate-binding modules of AMP-activated protein kinase. Chembiochem 19(3):229–238. https://doi.org/10. 1002/cbic.201700589 55. Jones BJ, Bata Z, Kazlauskas RJ (2017) Identical active sites in Hydroxynitrile Lyases show opposite enantioselectivity and reveal possible ancestral mechanism. ACS Catal 7 (6):4221–4229. https://doi.org/10.1021/ acscatal.7b01108 56. Hashimoto T, Hasegawa M (1996) Origin and early evolution of eukaryotes inferred from the amino acid sequences of translation elongation factors 1α/Tu and 2/G. Adv Biophys 32:73–120. https://doi.org/10.1016/0065227X(96)84742-3 57. Mathews S, Clements MD, Beilstein MA (2010) A duplicate gene rooting of seed plants and the phylogenetic position of flowering plants. Philos Trans R Soc Lond Ser B Biol Sci 365(1539):383–395. https://doi.org/10. 1098/rstb.2009.0233 58. Kapli P, Yang Z, Telford MJ (2020) Phylogenetic tree building in the genomic age. Nat Rev Genet 21(7):428–444. https://doi.org/10. 1038/s41576-020-0233-0
59. Bridgham JT, Keay J, Ortlund EA, Thornton JW (2014) Vestigialization of an allosteric switch: genetic and structural mechanisms for the evolution of constitutive activity in a steroid hormone receptor. PLoS Genet 10(1): e1004058. https://doi.org/10.1371/journal. pgen.1004058 60. Wheeler LC, Anderson JA, Morrison AJ, Wong CE, Harms MJ (2018) Conservation of specificity in two low-specificity proteins. Biochemistry 57(5):684–695. https://doi.org/10. 1021/acs.biochem.7b01086 61. Yang Z, Kumar S, Nei M (1995) A new method of inference of ancestral nucleotide and amino acid sequences. Genetics 141 (4):1641–1650 62. Yang Z (2014) Molecular evolution: a statistical approach. Oxford University Press, Oxford 63. Hanson-Smith V, Kolaczkowski B, Thornton JW (2010) Robustness of ancestral sequence reconstruction to phylogenetic uncertainty. Mol Biol Evol 27(9):1988–1999. https://doi. org/10.1093/molbev/msq081 64. Eick GN, Bridgham JT, Anderson DP, Harms MJ, Thornton JW (2016) Robustness of reconstructed ancestral protein functions to statistical uncertainty. Mol Biol Evol 34 (2):247–261. https://doi.org/10.1093/ molbev/msw223 65. Yang Z (2007) PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol 24 (8):1586–1591. https://doi.org/10.1093/ molbev/msm088 66. Perez-Jimenez R, Ingle´s-Prieto A, Zhao Z-M, Sanchez-Romero I, Alegre-Cebollada J, Kosuri P, Garcia-Manyes S, Kappock TJ, Tanokura M, Holmgren A, Sanchez-Ruiz JM, Gaucher EA, Fernandez JM (2011) Singlemolecule paleoenzymology probes the chemistry of resurrected enzymes. Nat Struct Mol Biol 18(5):592–596. https://doi.org/10. 1038/nsmb.2020 67. Randall RN, Radford CE, Roof KA, Natarajan DK, Gaucher EA (2016) An experimental phylogeny to benchmark ancestral sequence reconstruction. Nat Commun 7(1):12847. https:// doi.org/10.1038/ncomms12847 68. Joy JB, Liang RH, McCloskey RM, Nguyen T, Poon AFY (2016) Ancestral reconstruction. PLOS Comput Biol 12(7):e1004763. https://doi.org/10.1371/journal.pcbi. 1004763 69. Pupko T, Doron-Faigenboim A, Liberles DA, Cannarozzi GM (2007) Probabilistic models and their impact on the accuracy of reconstructed ancestral protein sequences. In: Ancestral Sequence Reconstruction. Oxford
Resurrecting Enzymes by Ancestral Sequence Reconstruction Scolarship Online. https://doi.org/10.1093/ acprof:oso/9780199299188.003.0004 70. Aadland K, Kolaczkowski B (2020) Alignmentintegrated reconstruction of ancestral sequences improves accuracy. Genome Biol Evol 12:1549–1565. https://doi.org/10. 1093/gbe/evaa164 71. Cannarozzi GM, Schneider A, Gonnet GH (2007) Probabilistic ancestral sequences based on the Markovian model of evolution–algorithms and applications. Ancestral Sequence Reconstruction 1(1):58 72. Gumulya Y, Baek J-M, Wun S-J, Thomson RES, Harris KL, Hunter DJB, Behrendorff JBYH, Kulig J, Zheng S, Wu X, Wu B, Stok JE, De Voss JJ, Schenk G, Jurva U, Andersson S, Isin EM, Bode´n M, Guddat L, Gillam EMJ (2018) Engineering highly functional thermostable proteins using ancestral sequence reconstruction. Nat Catal 1 (11):878–888. https://doi.org/10.1038/ s41929-018-0159-5 73. Savory FR, Milner DS, Miles DC, Richards TA (2018) Ancestral function and diversification of a horizontally acquired oomycete carboxylic acid transporter. Mol Biol Evol 35 (8):1887–1900. https://doi.org/10.1093/ molbev/msy082 74. Ugalde JA, Chang BSW, Matz MV (2004) Evolution of coral pigments recreated. Science 305(5689):1433. https://doi.org/10.1126/ science.1099597 75. Siddiq MA, Loehlin DW, Montooth KL, Thornton JW (2017) Experimental test and refutation of a classic case of molecular adaptation in Drosophila melanogaster. Nat Ecol Evolution 1(2):0025. https://doi.org/10. 1038/s41559-016-0025 76. Gaucher EA (2007) Experimental resurrection of ancient biomolecules: gene synthesis, heterologous protein expression, and functional assays. In: Ancestral Sequence Reconstruction. Oxford Scolarship Online. https://doi.org/ 10.1093/acprof:oso/9780199299188.003. 0014 77. Starr TN, Thornton JW (2016) Epistasis in protein evolution. Protein Sci 25 (7):1204–1218. https://doi.org/10.1002/ pro.2897 78. Starr TN, Picton LK, Thornton JW (2017) Alternative evolutionary histories in the sequence space of an ancient protein. Nature 549(7672):409–413. https://doi.org/10. 1038/nature23902 79. Conant GC, Wolfe KH (2008) Turning a hobby into a job: how duplicated genes find
135
new functions. Nat Rev Genet 9(12):938–950. https://doi.org/10.1038/nrg2482 80. Siddiq MA, Hochberg GK, Thornton JW (2017) Evolution of protein specificity: insights from ancestral protein reconstruction. Curr Opin Struct Biol 47:113–122. https:// doi.org/10.1016/j.sbi.2017.07.003 81. Consortium TU (2020) UniProt. ELIXIR. https://ebi12.uniprot.org/. Accessed 01 09 2020 82. Consortium TU (2018) UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res 47(D1):D506–D515. https://doi.org/10. 1093/nar/gky1049 83. Loewenstein Y, Raimondo D, Redfern OC, Watson J, Frishman D, Linial M, Orengo C, Thornton J, Tramontano A (2009) Protein function annotation by homology-based inference. Genome Biol 10(2):207. https://doi. org/10.1186/gb-2009-10-2-207 84. Pible O, Hartmann EM, Imbert G, Armengaud J (2014) The importance of recognizing and reporting sequence database contamination for proteomics. EuPA Open Proteom 3:246–249. https://doi.org/10.1016/j. euprot.2014.04.001 85. Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17(6):368–376. https://doi.org/10.1007/BF01734359 86. Kishino H, Hasegawa M (1989) Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in hominoidea. J Mol Evol 29(2):170–179. https://doi.org/ 10.1007/BF02100115 87. Orengo CA, Thornton JM (2005) Protein families and their evolution—a structural perspective. Annu Rev Biochem 74(1):867–900. https://doi.org/10.1146/annurev.biochem. 74.082803.133029 ´ , Tawfik DS (2014) The 88. To´th-Petro´czy A robustness and innovability of protein folds. Curr Opin Struct Biol 26:131–138. https:// doi.org/10.1016/j.sbi.2014.06.007 89. Das S, Dawson NL, Orengo CA (2015) Diversity in protein domain superfamilies. Curr Opin Genet Dev 35:40–49. https://doi.org/10. 1016/j.gde.2015.09.005 90. Mascotti ML, Juri Ayub M, Furnham N, Thornton JM, Laskowski RA (2016) Chopping and changing: the evolution of the Flavin-dependent monooxygenases. J Mol Biol 428(15):3131–3146. https://doi.org/ 10.1016/j.jmb.2016.07.003 91. Prakash A, Bateman A (2015) Domain atrophy creates rare cases of functional partial protein
136
Maria Laura Mascotti
domains. Genome Biol 16(1):88–88. https:// doi.org/10.1186/s13059-015-0655-8 92. Sillitoe I, Dawson N, Lewis TE, Das S, Lees JG, Ashford P, Tolulope A, Scholes HM, Senatorov I, Bujan A, Ceballos RodriguezConde F, Dowling B, Thornton J, Orengo CA (2018) CATH: expanding the horizons of structure-based functional annotations for genome sequences. Nucleic Acids Res 47 (D1):D280–D284. https://doi.org/10. 1093/nar/gky1097 93. Lewis TE, Sillitoe I, Dawson N, Lam SD, Clarke T, Lee D, Orengo C, Lees J (2017) Gene3D: extensive prediction of globular domains in proteins. Nucleic Acids Res 46 (D1):D435–D439. https://doi.org/10. 1093/nar/gkx1069 94. El-Gebali S, Mistry J, Bateman A, Eddy SR, Luciani A, Potter SC, Qureshi M, Richardson LJ, Salazar GA, Smart A, Sonnhammer EL, Hirsh L, Paladin L, Piovesan D, Tosatto SCE,
Finn RD (2018) The Pfam protein families database in 2019. Nucleic Acids Res 47(D1): D427–D432. https://doi.org/10.1093/nar/ gky995 95. Mitchell AL, Attwood TK, Babbitt PC, Blum M, Bork P, Bridge A, Brown SD, Chang H-Y, El-Gebali S, Fraser MI, Gough J, Haft DR, Huang H, Letunic I, Lopez R, Luciani A, Madeira F, Marchler-Bauer A, Mi H, Natale DA, Necci M, Nuka G, Orengo C, Pandurangan AP, Paysan-Lafosse T, Pesseat S, Potter SC, Qureshi MA, Rawlings ND, Redaschi N, Richardson LJ, Rivoire C, Salazar GA, Sangrador-Vegas A, Sigrist CJA, Sillitoe I, Sutton GG, Thanki N, Thomas PD, Tosatto SCE, Yong S-Y, Finn RD (2018) InterPro in 2019: improving coverage, classification and access to protein sequence annotations. Nucleic Acids Res 47(D1):D351–D360. https://doi.org/ 10.1093/nar/gky1100
Chapter 8 Expression and In Vivo Loading of De Novo Proteins with Tetrapyrrole Cofactors Paul Curnow and J. L. Ross Anderson Abstract Tetrapyrrole cofactors such as heme and chlorophyll imprint their intrinsic reactivity and properties on a multitude of natural proteins and enzymes, and there is much interest in exploiting their functional and catalytic capabilities within minimal, de novo designed protein scaffolds. Here we describe how, using only natural biosynthetic and post-translational modification pathways, de novo designed soluble and hydrophobic proteins can be equipped with tetrapyrrole cofactors within living Escherichia coli cells. We provide strategies to achieve covalent and non-covalent heme incorporation within the de novo proteins and describe how the heme biosynthetic pathway can be co-opted to produce the light sensitive zinc protoporphyrin IX for loading into proteins in vivo. In addition, we describe the imaging of hydrophobic proteins and cofactor-rich protein droplets by electron and fluorescence microscopy, and how cofactors can be stripped from the de novo proteins to aid in vitro identification. Key words De novo proteins, Tetrapyrroles, In vivo cofactor loading, Fluorescence imaging, Electron microscopy, Heme, Protein expression and purification
1
Introduction Bottom up, or de novo, protein design aims to provide minimal scaffolds that enable a deeper understanding of structure:function relationships while providing useful tools for bio- and nanotechnological applications [1]. It can also provide biocompatible yet artificial components with the potential to alter cellular function by imprinting their functional characteristics on the structure or metabolism of the host cell. To achieve this with cofactordependent de novo proteins and enzymes, it is necessary to express and assemble the holoproteins in vivo [2]. This remains a significant challenge in de novo design, and there are few examples of functional de novo proteins reliant on exogenous cofactors that are fully assembled and active in vivo.
Francesca Magnani, Chiara Marabelli and Francesca Paradisi (eds.), Enzyme Engineering: Methods and Protocols, Methods in Molecular Biology, vol. 2397, https://doi.org/10.1007/978-1-0716-1826-4_8, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2022
137
138
Paul Curnow and J. L. Ross Anderson
Most successful examples of cofactor-dependent de novo designed proteins assembled in vivo bind the heme cofactor, a versatile biomolecule that can impart functions such as oxygen binding and sensing, as well as a diverse array of chemistries on both natural and de novo designed protein scaffolds [3–5]. In vivo heme loading of de novo proteins can be achieved using two strategies, resulting in either covalent or non-covalent incorporation of the cofactor into the protein. We have demonstrated that the post-translational machinery of Escherichia (E.) coli can be hijacked to efficiently covalently incorporate c-type heme into de novo tetrahelical bundles, requiring only a consensus sequence for heme incorporation in the peptide sequence (CXXCH), periplasmic expression and overexpression of the E. coli cytochrome c maturation apparatus (Fig. 1) [6–8]. By virtue of the covalent thioether bonds between heme and protein, this confers effectively infinite affinity on the protein for heme. Alternatively, non-covalent b-type heme can be incorporated in vivo, so long as the de novo proteins have appropriately high affinity binding sites for heme, preferably on the nanomolar scale. For the latter approach, it is often necessary to upregulate heme biosynthesis through addition of a key heme precursor, δ-aminolevulinic acid [9]. Another strategy that supports cofactor loading in the cell is the production of intracellular protein droplets. An emerging interest lies in the specific design of recombinant protein sequences that can phase-separate within the cell and recruit cargo molecules to form proto-organelles [10–13]. In our own case, we have observed [14] that low-complexity de novo proteins can form small inclusions— or “droplets”—when expressed in E. coli (Fig. 2a). One particular feature of these protein condensates is that they are rather hydrophobic, and so have the potential to recruit hydrophobic small molecules. We found that cells producing such droplets accumulated the cofactor zinc protoporphyrin IX (ZnPPIX), which is rare in biology (Fig. 2b–e). Although this system remains to be fully characterized, our work suggests the potential for engineering tunable intracellular condensates with the capacity to recruit functional hydrophobic cofactors.
2
Materials All solutions should be prepared in ultrapure, deionized water (resistivity ¼ 18.2 MΩ cm at 25 C). All reagents should be analytical grade. Unless specified, all reagents and solutions should be prepared and stored at room temperature. It is important to follow the appropriate procedures when disposing of waste solutions, reagents, and biological materials. Do not add sodium azide to the buffers or reagents as the concentrations employed for inhibiting microbial growth can strip non-covalently bound heme from
a
c
GMTPE QIWKQ.HEDALQK.FEEALNQ.FEDLKQL GGSGSGSGG EIWKQ.HEDALQK.FEEALNQ.FEDLKQL GGSGGSGGSGG EIWKQ.HEDALQK.FEEALNQ.FEDLKQL GGSGSGSGG C4 ECIAC.HEDALQK.FEEALNQ.FEDLKQL
100 kDa 70 kDa 55 kDa
Remove non-covalent tetrapyrrole binding site
2 x His to Phe
d
35 kDa 25 kDa
GMTPE QIWKQ.FEDALQK.FEEALNQ.FEDLKQL GGSGSGSGG EIWKQ.HEDALQK.FEEALNQ.FEDLKQL GGSGGSGGSGG C46 EIWKQ.FEDALQK.FEEALNQ.FEDLKQL GGSGSGSGG ECIAC.HEDALQK.FEEALNQ.FEDLKQL
15 kDa 10 kDa
MRE x 10-3 (deg cm2 dmol-1 res-1)
Expression and In Vivo Loading of De Novo Proteins with Tetrapyrrole Cofactors
60 5 oC 25 oC 45 oC 65 oC 85 oC
40 20 0 20 200
Remove distal heme-iron coordinating ligand
1 x His to Phe
e
f
Soret
C
1
2
3
N
4
C
1
2
3
4
C46
418
Ferric heme iron
420 430
Q
N
C
1
2
3
4
C45
Ferrous heme iron Q C46 C45+im C45
C46 C45 300
C4
418
405
b N
220 240 260 Wavelength (nm)
Soret
409
GMTPE QIWKQ.FEDALQK.FEEALNQ.FEDLKQL GGSGSGSGG EIWKQ.FEDALQK.FEEALNQ.FEDLKQL GGSGGSGGSGG GGSGSGSGG C45 EIWKQ.FEDALQK.FEEALNQ.FEDLKQL ECIAC.HEDALQK.FEEALNQ.FEDLKQL
139
400
500
600
Wavelength (nm)
700
400
500
600
700
Wavelength (nm)
Fig. 1 Design, purification, and characterization of a de novo c-type cytochrome. (a) Representative design process demonstrating the creation of a 5-coordinate monohistidine ligated c-type heme within a de novo protein, C45. C4, C46, and C45 are all expressed and fully assembled with heme in the E. coli periplasm. (b) Schematic of C4, C46, and C45 demonstrating covalent attachment of hemes to helix 4 and removal of a second tetrapyrrole-binding site. (c) SDS-PAGE gel showing the purity of C45 following affinity chromatography, purification tag cleavage and size exclusion chromatography. Lane 1 ¼ crude lysate; lanes 2–14 ¼ fractions from size exclusion chromatography. (d) Circular dichroism spectra of C45 at increasing temperature. These data indicate the thermostability of C45, with a transition melting temperature (Tm) in excess of 80 C. (e) UV/visible spectra of the 5-coordinate monohistidine C45 and the 6-coordinate bis-histidine C46 both in the ferric forms. (f) UV/visible spectra of the 5-coordinate monohistidine C45 and the 6-coordinate bis-histidine C46 both in the ferrous forms. The ferrous spectrum of C45 bound to imidazole is also provided, highlighting the vacant heme coordination site. (All panels are adapted from the author’s own work originally published under CC-BY license [8])
the de novo proteins or bind tightly to a vacant coordination site on c-type heme. 2.1 Design and Construction of Synthetic Genes
1. Synthetic gene constructs. 2. Vectors suitable for periplasmic expression in E. coli, e.g., pMalp4x. 3. pEC86 vector encoding the entire cytochrome c maturation apparatus. 4. Expression vectors for cytosolic expression, e.g., pET28, pET45, pET151-TOPO. 5. Miniprep kit for plasmid purification, e.g., QIAgen miniprep kit (27104).
REAMP2.0-GFP
b Absorbance
a
Paul Curnow and J. L. Ross Anderson
c 1.0
422
0.8 0.6 0.4
422
REAMP2.0H
REAMP Empty vector -IPTG -ALA 550 587
0.2 0.0 300
Absorbance
140
REAMP2.0H cell extract Zinc PPIX
0.5
550 587 0.0
400
500
600
400
Wavelength (nm) 1.0
Excitation
e
Emission 590 REAMP2.0H cell extract Zinc PPIX
0.5
642
0.0
Fluor. Emission
Fluorescence
d
n=3
80
1.0 60 40 0.5 20
ZnPPIX fluoresence REAMP2.0H
0
400
500
600
700
Wavelength (nm)
600
800
0
50
100
150
0.0
Normaliized expression
REAMP2.0
500
Wavelength (nm)
200
IPTG (µM)
Fig. 2 E. coli cells can form intracellular droplets from hydrophobic de novo proteins and accumulate ZnPPIX. (a) Cell imaging by confocal fluorescence microscopy (top) and electron microscopy (bottom) show the formation of multiple small intracellular inclusions, or “droplets,” by a hydrophobic de novo protein called REAMP2.0 [14]. (b) Organic extracts from cells expressing REAMP2.0 exhibit an unusual absorption profile consistent with the accumulation of a novel pigment. This is absent from control strains. (c) Absorption spectroscopy and (d) fluorescence spectroscopy against commercial standards reveal that the pigment is ZnPPIX. This was confirmed by LC-MS (not shown). (e) The accumulation of ZnPPIX, determined by fluorescence, correlates with the expression levels of REAMP2.0, determined by Western blot. Panels (b–e) make use of a single histidine mutant of REAMP2.0, designated with superscript “H.” However, similar results were obtained with a non-histidine variant, suggesting a general mechanism of association with the condensed protein droplet rather than any specific coordination by His. (All panels are adapted from the author’s own work originally published under CC-BY license [14])
6. T7 Express (High Efficiency) E. coli competent cells (NEB, C2566H), or similar BL21(DE3) competent cells. 7. Calcium-competent E. coli BL21-AI (Invitrogen, C607003). 2.2 Soluble De Novo Protein Expression and Purification
1. Carbenicillin (50 mg/mL, 1000): Dissolve 2.5 g of carbenicillin in 50 mL water and filter sterilize through 0.2 μm syringe filters. Divide into 1 mL aliquots in sterile microcentrifuge tubes. Store at 20 C. 2. Chloramphenicol (34 mg/mL, 1000): Dissolve 1.7 g of chloramphenicol in 50 mL ethanol, divide into 1 mL aliquots in sterile microcentrifuge tubes. Store at 20 C. 3. LB broth: 25 g LB powder dissolved in 1 L water.
Expression and In Vivo Loading of De Novo Proteins with Tetrapyrrole Cofactors
141
4. Isopropyl-β-D-thiogalactoside (1 M IPTG): Add 2.38 g of IPTG to 10 mL of water and filter sterilize through 0.2 μm syringe filters. Divide into 1 mL aliquots in sterile microcentrifuge tubes. Store at 20 C. 5. δ-aminolevulinic acid (0.3 M ALA, 1000): Add 0.39 g ALA to 10 mL of water and filter sterilize through 0.2 μm syringe filters. Divide into 1 mL aliquots in sterile microcentrifuge tubes. Store at 20 C. 6. Phenylmethanesulfonyl fluoride (500 mM PMSF, 500): Add 0.87 g PMSF to 10 mL of ethanol. Divide into 1 mL aliquots in sterile microcentrifuge tubes. Store at 20 C. 7. Lysis buffer: 50 mM NaH2PO4, 300 mM NaCl, 10 mM Imidazole, pH 8.0. To make 1 L, add 5.99 g of NaH2PO4, 17.53 g of NaCl, and 0.68 g of imidazole to 900 mL of water, and adjust the pH to 8 using 10 M NaOH. Add water to 1 L. 8. Elution buffer: 50 mM NaH2PO4, 300 mM NaCl, 250 mM Imidazole, pH 8.0. To make 1 L, add 5.99 g of NaH2PO4, 17.53 g of NaCl, and 17.02 g of imidazole to 900 mL of water, and adjust the pH to 8 using NaOH. Add water to 1 L. 9. TEV cleavage buffer: 20 mM Tris–HCl, 0.5 mM EDTA, pH 8.0. To make 1 L, add 3.15 g of Tris–HCl and 0.15 g of EDTA to 900 mL of water, and adjust the pH to 8 using NaOH. Add water to 1 L. 10. Redox buffer: 20 mM CHES, 100 mM KCl, pH 8.6. To make 1 L, add 4.15 g of CHES and 7.46 g of KCl to 900 mL of water, and adjust the pH to 8.6 using NaOH. Add water to 1 L. 11. 7000 MWCO Snakeskin dialysis tubing (ThermoFisher, 68700). 12. 5 mL HisTrap HP nickel affinity column (Cytiva, 17524801). 13. 10k MWCO centrifugal concentrators (Cytiva, 28-9323-60). 14. HiLoad Superdex 16/600 75 pg size exclusion column (Cytiva, 28-9893-33). 2.3 Pyridine Hemochrome Assay
1. 40% Pyridine in 0.1 M NaOH: Add 4 mL of pyridine to 6 mL of 0.1 M NaOH. 2. Solid sodium dithionite.
2.4 Extraction of Non-covalent Bound Tetrapyrroles with 2Butanone
1. 2-Butanone. 2. Hydrochloric acid: 0.1 M solution of HCl in water. 3. Sodium bicarbonate: To make 4 L of a 10 mM solution of NaHCO3, add 3.36 g of NaHCO3 to 4 L water.
142
Paul Curnow and J. L. Ross Anderson
2.5 Hydrophobic De Novo Protein Cell Culture and Expression
1. LB broth: 25 g LB powder dissolved in 1 L water.
2.6 Cell Fractionation
1. Sodium phosphate, pH 7.4: To give 0.5 M stocks of the component stocks, separately dissolve 34.5 g NaH2PO4·H2O in 500 mL water and 35.5 g Na2HPO4 in 500 mL water. Mix together 22.6 mL 0.5 M NaH2PO4 with 77.4 mL 0.5 M Na2HPO4 to give a 0.5 M sodium phosphate stock with pH 7.4.
2. 10% arabinose: 5 g arabinose in 50 mL water (see Note 1).
2. 50% Glycerol: Mix 50 mL glycerol with 50 mL water. 3. 5 M NaCl: Dissolve 146.1 g NaCl in 400 mL water. Adjust to final volume 500 mL once dissolved. 4. For a solution of 50 mM sodium phosphate, 5% glycerol, 150 mM NaCl: dilute 5 mL 0.5 M sodium phosphate, 5 mL 50% glycerol, and 1.5 mL 5 M NaCl into 38.5 mL water. 2.7
Western Blotting
1. Transfer buffer: 3 g Trizma base, 14.4 g glycine, 0.37 g SDS, 100 mL methanol. Adjust volume to 1 L with water. 2. 1 PBS, 0.05% v/v Tween-20: Dilute 50 mL 10 PBS into 450 mL water. Add 0.25 mL Tween-20. (This is highly viscous, so careful pipetting is required.) 3. 25 mM Tris, 200 mM Glycine, 1.3 mM (0.037% w/v) SDS, 10% v/v methanol in distilled water. 4. Blocking buffer. Weigh 1 g of low-fat powdered milk into a sterile 50 mL centrifuge tube. Add 20 mL PBS-Tween and vortex to a smooth suspension. 5. V5-HRP antibody (Invitrogen R961-25). 6. SuperSignal West Pico Plus (Thermo Fisher Scientific 34577).
2.8 Other Solutions and Materials
1. Phosphate-buffered saline (PBS) at 10 stock: 80 g NaCl, 2 g KCl, 14.4 g Na2HPO4, 2.4 g KH2PO4. Dissolve in 800 mL water and adjust pH to 7.4 with HCl. Add water to 1 L. Can be sterilized by autoclaving if desired. Dilute tenfold to 1 before use. 2. 2% w/v paraformaldehyde: Dissolve 2 g paraformaldehyde powder in 100 mL 1 PBS. Heat to 70 C until dissolved. Allow to cool and adjust pH to 7.4 with 0.1 M NaOH or 0.1 M HCl if required. 3. 80:20:1 (v/v) DMSO:ethanol:acetic acid: Combine 8 mL DMSO, 2 mL ethanol, and 0.1 mL acetic acid. 4. ProLong Gold antifade mountant (ThermoFisher P36930). 5. BugBuster cell lysis reagent (Novagen 70584).
Expression and In Vivo Loading of De Novo Proteins with Tetrapyrrole Cofactors
3
143
Methods
3.1 Design and Production of Synthetic Genes
Synthetic genes corresponding to the designed de novo protein sequences can be obtained from a number of vendors. It is generally useful to request that the vendor optimize the gene sequence for expression in the intended recombinant host. For the work described here, the genes are optimized for expression in standard E. coli strains. There are several considerations for additional sequences included to facilitate cofactor loading, protein detection and purification, while directing spatial localization in the cell. For the incorporation of c-type heme into the protein sequence, the recognition sequence for the E. coli cytochrome c maturation apparatus (CX1X2CH) must be included at an appropriate location in the protein (Fig. 1a, b), typically on a helix with the heme oriented into the core [6, 7]. This location dictates the selection of amino acid identities at X1 and X2, and residues should be selected with high helical propensity (e.g., Ala, Lys, Glu, Leu) and appropriate properties to match the solvent exposure of the site. Though the genomically encoded cytochrome c maturation apparatus can be upregulated during anaerobic expression, the levels of covalent heme incorporation into the CX1X2CH consensus sequence are generally low. To overcome this issue and achieve high levels of heme incorporation, there is a requirement for co-transfection with the pEC86 plasmid encoding the entire E. coli maturation apparatus. In addition, it is necessary to include a periplasmic signal sequence in the de novo c-type cytochrome expression vector, usually from the E. coli maltose binding protein (MKIKTGARILAL SALTTMMFSASALAK), and this is typically followed by a tobacco etch virus n1a (TEV) protease-cleavable hexahistidine tag (HHH HHHGSSGENLYFQG ) to the protein N-terminus. This directs the protein to the periplasm through the SEC translocon, enabling the periplasmic cytochrome c maturation apparatus to catalyze covalent heme incorporation. These additional sequences can be synthesized as part of the synthetic gene construct or added by cloning the synthetic gene into a suitable plasmid. In contrast, the expression of b-type heme-containing de novo proteins does not require an additional plasmid, and cytoplasmic expression is typically favored, eliminating the requirement for additional sequence elements. For the hydrophobic protein designs discussed here, the linear V5 epitope ( GKPIPNPLLGLDST ) is incorporated at the C-terminus to enable specific detection, and routinely use cleavable N-terminal purification tags such as hexa-, octa-, or decahistidine peptides.
144
Paul Curnow and J. L. Ross Anderson
3.2 Expression and Purification of Soluble Proteins Incorporating b-Type or c-Type Heme
1. Clone the synthetic gene into a suitable E. coli vector. For these soluble c-type heme-containing proteins, a modified version of pMal-p4x in which the maltose binding protein sequence has been deleted with the exception of the periplasmic signal sequence [6–8] is recommended. For b-type heme-containing proteins, standard pET vectors are suitable, including pET45b or pET151-TOPO, incorporating a TEV-cleavable hexahistidine tag at the N-terminus of the construct [9]. For these examples, a strategy involving either a ligase-independent cloning method or restriction/ligation using suitable restriction enzymes to delete the majority of the multiple cloning site on the vector was used. TEV protease is required for this procedure, and can be expressed following well-established protocols or purchased from a variety of sources (see Note 2). 2. Purify the recombinant plasmid to high concentrations (~200 ng/μL) using a QIAgen miniprep kit or similar. 3. For c-type protein constructs, co-transform the selected plasmid with pEC86, encoding the entire E. coli cytochrome c maturation apparatus in a vector for constitutive expression [15], into competent T7 Express (High Efficiency) E. coli cells. Plate out the transformed cells on 2% LB Agar containing carbenicillin and chloramphenicol at 50 and 34 μg/mL, respectively. For b-type protein constructs, transform the selected plasmid into competent T7 Express (High Efficiency) E. coli cells. Plate out the transformed cells on 2% LB Agar containing carbenicillin at 50 μg/mL. Incubate plates overnight at 37 C (see Note 3). 4. Add 100 mL LB broth to a 250 mL baffled glass flask. Plug the neck of the flask with a foam bung. Cover the neck with aluminum foil, secure the foil with autoclave tape, and sterilize the culture vessel by autoclaving. Prepare as many of these starter culture flasks as required. To prepare for expression, add 1 L of LB to appropriately sized baffled glass or plastic flasks (typically 2.5 L), cover the necks with aluminum foil, secure the foil with autoclave tape, and sterilize the culture vessels by autoclaving. A total of 2 L of expression media is sufficient to produce reasonable quantities of protein, and the purification below corresponds to this scale (see Note 4). 5. Working with sterile good practice, remove the aluminum foil and foam bung from the sterile culture vessel prepared in step 4. Add carbenicillin and chloramphenicol at 50 and 34 μg/mL respectively for c-type protein expression and only carbenicillin at 50 μg/mL for b-types. Pick a single colony from the agar plate using a sterile toothpick or pipette tip and inoculate the culture flask. Replace the bung and discard the foil. Grow at 37 C with shaking at 250 rpm overnight (16 h). The
Expression and In Vivo Loading of De Novo Proteins with Tetrapyrrole Cofactors
145
absorbance at 600 nm (OD600nm) of these overnight cultures should be ffi4. 6. As described in step 5, remove the aluminum foil and foam bungs from the large sterile culture vessels prepared in step 4. Add carbenicillin and chloramphenicol at 50 and 34 μg/mL respectively for c-type protein expression and only carbenicillin at 50 μg/mL for b-types. Inoculate the flasks containing 1 L LB with 50 mL of the overnight starter culture and grow at 37 C, 200 rpm until OD600nm is 0.6–0.8. This typically takes approximately 3 h, depending on the starting temperature of the LB media in the large flask. 7. Induce recombinant protein expression by adding 0.1–1 mM IPTG. Culture cells for 3–5 h post-induction. For b-type hemecontaining constructs, it is beneficial to add the heme precursor δ-aminolevulinic acid (ALA) to stimulate heme biosynthesis and ensure good heme incorporation in vivo. This compound is a precursor in the heme synthesis pathway, and exogenous ALA overrides one of the key regulatory steps in heme synthesis, enabling cells to produce high levels of heme and tetrapyrroles coordinated by a protein “sink.” If using ALA, add to 0.3 mM at the point of induction with 0.1–1 mM IPTG (see Notes 5 and 6). 8. Harvest induced cells at 4000 g for 30 min. With successful protein expression and heme incorporation, the cell pellet should be red or brown. 9. Resuspend cells in lysis buffer (approximately 50 mL per 10 g wet cell paste) and add PMSF, a serine protease inhibitor, to 1 mM. Store resuspended cells on ice, or in a cold room at 4 C. 10. Lyse the resuspended cells using a probe sonicator (Soniprep 150, MSE UK). Split the resuspended cells into 30–50 mL aliquots in small glass beakers and sonicate on ice at 100% amplitude with 30 s pulses separated by 30 s of mixing the lysed cells. Repeat sonication pulses until the viscosity of the lysed cell solution is reduced and resembles that of water. The color of the supernatant should darken on sonication (see Note 7). 11. Centrifuge the lysate at 40,000 g for 30 min. Decant the supernatant into 50 mL centrifuge tubes and filter through 0.2 μm pore syringe filters. 12. Using an AKTA purification system, an AKTA Start or a peristaltic pump, equilibrate a 5 mL HisTrap HP nickel affinity column with 5 column volumes (CV) of lysis buffer at 5 mL/ min.
146
Paul Curnow and J. L. Ross Anderson
13. Apply the filtered lysate to the column at 5 mL/min. A red band should appear indicating successful binding of the heme protein to the column matrix. 14. Wash the column with lysis buffer until OD280nm returns to the baseline level. Apply a gradient of elution buffer of 0–100% over 5 CV and collect 2 mL fractions. 15. Pool fractions containing the majority of the eluted heme protein, using either SDS-PAGE or a UV/visible spectrometer to assess protein purity and/or heme concentration. 16. Dialyze the pooled protein overnight into 5 L TEV cleavage buffer using 7000 MWCO Snakeskin dialysis tubing. 17. Ideally, take the sample into an anaerobic glove box (500) are calculated and compared with the experimental data, where significant correlations are found. The correlation coefficients vary from 0.5 to 0.8. l Eris models backbone flexibility, which turns out to be crucial for ΔΔG estimation of small-to-large mutations. l Available at https:/ /dokhlab.med.psu.edu/eris
[12]
l
[13] An efficient tool for rational computer-aided design of single-site mutations in proteins and peptides. l Based on statistical potentials for which the energies are derived from frequencies of residues or atom contacts reported in the datasets of experimentally characterized protein mutants. l Allows the estimation of the changes in folding free energy for specific point mutations given by the user. l All possible point mutations in a given protein or protein region are performed and the most stabilizing or destabilizing mutations, or the neutral mutations with respect to thermodynamic stability, are selected. l For each sequence position or secondary structure, (continued)
164
Vinutsada Pongsupasa et al.
Table 1 (continued) Name
Type
Description
References
deviation from the most stable sequence is moreover evaluated, which helps to identify the most suitable sites for the introduction of mutations. l Available at http:/ /babylone.ulb.ac.be/popmusic FRESCO
Program
[14] Framework for Rapid Enzyme Stabilization by Computational libraries (FRESCO). l Rosetta with FoldX energy calculations and combines single-point mutations with disulfide predictions for drastic energy improvements of enzymes. l FRESCO strategy consists of the computational design of potentially stabilizing point mutations and disulfide bonds point mutations are selected by computational tools that predict the resulting change in ΔG of folding (ΔΔGFold). Fold l The ΔΔG values are calculated with both Rosetta-ddG and FoldX since the underlying algorithms gave significantly different predictions, resulting in different selected mutations. l All residues could mutate, except inside or near the active site. l Stabilizing mutations are generated with multiple algorithms. l Variants are eliminated, having properties that are known to typically decrease thermostability, such as increased hydrophobic surface exposure to the water phase or an increased number of unsatisfied H-bond donors and acceptors. l Eliminates variants in increased flexibility. An experimental screening is used before combining the most stabilizing mutations. l
constructed by two distinct protein engineering strategies based on the energy and evolution approaches. The multiple-point mutants are checked for the potentially antagonistic effects in the designed protein structure. In addition, time demands of the FireProt method are decreased by the utilization of knowledge-based filters, protocol optimization, and effective parallelization. The server is complemented with an interactive as well as easy-to-use interface that allows users to directly analyze the thermostable proteins. The user is requested to specify the protein structure, either by providing its PDB ID or by uploading a user PDB file. Sequence homologs are obtained by performing a BLAST search [15] against the UniRef90 database [16], using the target protein sequence as an input query. Identified homologs are then aligned with the query protein using USEARCH [17]. The sequences with the query below 30% or above 90% are excluded from the list of homologs. The r sequences are clustered using UCLUST
Rational-Design Engineering to Improve Enzyme Thermostability
165
[17]. The cluster representatives are sorted based on the BLAST query coverage. The first 200 queries are compiled to create a multiple sequence alignment using Clustal Omega tool [18]. The multiple sequence alignment is used to: (1) estimate the conservation coefficient of each residue position in the protein [19]; (2) identify correlated positions employing a consensual decision; and (3) analyze amino acid frequencies at individual positions within the protein. Moreover, the energy-based approach is also employed by FoldX and Rosetta tools. FoldX protocol is utilized to fill in the missing atoms of the residues and patched structure is minimized with Rosetta module. Conserved and correlated positions are excluded for further analysis. The remaining positions are subjected to saturation mutagenesis by using the FoldX tool. Mutations with predicted ΔΔG over 1 kcal/mol are steered away and the rest is forwarded to Rosetta calculations. Finally, the mutations predicted by Rosetta as strongly stabilizing are tagged as potential candidates for the design of the multiple-point mutants. The second approach is based on the information obtained from multiple sequence alignment. The most common amino acid in each position of protein sequence often provides a non-negligible effect on protein stability [20–23]. Therefore, FireProt implements a majority and frequency ratio approach to identify mutations at positions where the wild-type amino acid differs from the most prevalent one. Selected mutations are evaluated by FoldX. The stabilizing variants are listed as candidate mutations for the engineering of multiple-point mutants. In order to avoid the clashes of antagonistic effects between individual mutations, FireProt is minimizing these effects by utilizing Rosetta. All pairs of single-point mutations within the range of 10 Å are evaluated separately for energy- and evolution-based approach. Once change in free energy is obtained for all residue pairs, FireProt starts to introduce them into the multiple-point mutant in the order based on their predicted stability. Here we demonstrate a rational-design strategy by site-directed mutagenesis that mutates specific residues using a pair of complementary primers to generate mutants, in which primer design is a crucial parameter for a successful mutation. Furthermore, sophisticated high-throughput screening assays which identify good candidates of variants based on thermostability are described below.
2
Materials Prepare all solutions used for molecular biology experiments and protein expression with sterile distilled H2O. Clearly label all tubes to be used in all experiments. Plasticware and glassware as well as the working area must be clean. Media for bacterial cultures must be autoclaved (typically 121 C for 20 min) and checked for
166
Vinutsada Pongsupasa et al.
non-contamination before use. Materials for thermal screening as well as cultivation and assay conditions are prepared depending on type of the enzymes. 2.1 Protein Structure and Design Tools
1. Crystal structure of protein of interest. Search PDB file on the website (http://www.rcsb.org/pdb/). 2. Install crystal structure visualization software. For example, Avogadro, Jmol, PyMOL, or UCSF Chimera program. 3. To design the appropriate primers, various programs including Gene Designer, OLIGO, OligoCalc, Primer3Plus, PrimerBLAST by NCBI, Primo Pro, and SnapGene Viewer can be used to check the self-dimerization score, hairpin formation, and self-annealing of 30 and 50 ends, also to calculate length, % GC, and melting temperature (Tm) of primer.
2.2 Site-Directed Mutagenesis
1. 0.8–1.0% (w/v) Agarose gel: Dissolve 0.8–1 g of Agarose in 100 mL of Tris-acetate-EDTA (TAE) buffer. 2. 10 Buffer for PCR reaction (usually supplied with the DNA polymerase of choice). 3. 10 TAE buffer: Dissolve 48.5 g Tris base in 800 mL of deionized water. Add 11.4 mL of glacial acetic acid, 20 mL of 0.5 M EDTA, and then adjust to 1 L by deionized water. Dilute a stock solution into 10:1 to make a final 1 working buffer (40 mM Tris base, 20 mM acetic acid, and 1 mM EDTA). 4. Agar plate. 5. Agarose gel electrophoresis device. 6. Deoxynucleotide dGTP, dTTP.
triphosphate
(dNTPs):
dATP,
dCTP,
7. 6 DNA gel loading dye. 8. DNA markers with appropriate ladder sizes. 9. DNA plasmid harboring the target gene of interest as a template. 10. DNA polymerase (i.e., Taq DNA polymerase, Pfu DNA polymerase, Q5 high-fidelity DNA polymerase, Phusion highfidelity DNA polymerase). 11. DpnI and CutSmart buffer (New England Biolabs). 12. Heat block or water bath. 13. Incubator shaker. 14. Microwave. 15. PCR purification kit and plasmid extraction kit. 16. PCR Thermocycler. 17. Primers. 18. RedSafe nucleic acid staining solution.
Rational-Design Engineering to Improve Enzyme Thermostability
167
19. Sterile distilled H2O or Molecular grade H2O. 20. Sterile media (both agar and broth, e.g., Luria Broth (LB): Add 10 g of bacto-tryptone, 5 g of yeast extract, 10 g of NaCl to 1 L dH2O, sterilized by autoclave). 2.3 Screening Conditions
1. 96-well clear microtiter plates, laboratory flasks. 2. Additional nutritional supplements for protein expression. (e.g., 1 M Isopropyl β-D-1-thiogalactopyranoside (IPTG): Weigh 2.383 g of IPTG and dissolve in 10 mL sterile H2O. Sterilize by 0.22-μm sterile filter, then store in 1 mL aliquots at 20 C.) 3. Agar plates containing colonies from a transformed mutant library. 4. Antibiotics. (e.g., 100 mg/mL Ampicillin: Dissolve 1 g of sodium ampicillin in sufficient H2O to adjust final volume to 10 mL. Sterilize by 0.22-μm sterile filter.) 5. Cell lysis reagents 1 mL/g of cell. (e.g., 10 mg/mL lysozyme: 10 mg lysozyme in 10 mM Tris–HCl, pH 8.0, 1 M Dithiothreitol (DTT): Add 1.54 g DTT to 10 mL of dH2O, filtrate by 0.22 μm syringe filter and aliquot into 2 mL tubes and store at 20 C, 100 mM Phenylmethanesulfonyl fluoride (PMSF); prepare 17.4 mg of PMSF per milliliter of isopropanol and store at 20 C, 0.5 M Ethylene diamine tetraacetic acid (EDTA): Add 186.1 g of disodium EDTA∙2H2O to 800 mL of H2O. Adjust the pH to 8.0 with NaOH. 6. Materials for purification. (e.g., ammonium sulfate, protein column chromatography, dialysis bag: typical pore sizes around 10–100 Å for 1–50 K MWCO membranes). 7. Media. (e.g., LB, ZY-5052: ZY contains 10 g of peptone, 5 g of yeast extract in 1 L dH2O and add 5 mM Na2SO4, 2 mM MgSO4, 40% glucose, and 1 5052 for auto induction. 50 5052 prepares by 250 g of glycerol, 25 g of glucose, 100 g of α-lactose in 730 mL dH2O. All media sterilized by autoclave.) 8. Reaction assay reagent depends on reaction of enzyme. (e.g., 2 mM Flavin mononucleotide (FMN): Add 9.13 mg of FMN to H2O 10 mL, 20 mM Nicotinamide adenine dinucleotide (NADH): Add 0.13 g of NADH to Tris–HCl pH 8.5, 50 mM Sodium Phosphate buffer (NaH2PO4) pH 7.0: Prepare 8.20 g of NaH2PO4 in H2O 1 L.)
2.4 Time-Dependent Thermal Inactivation Assays
1. 96-well clear microtiter plates. 2. Bradford reagent. 3. Reagents for reaction (e.g., buffer, substrate).
168
Vinutsada Pongsupasa et al.
2.5 Differential Scanning Fluorimetry by Thermal Shift Assay (TSA)
1. Fluorescent dye.
2.6 Site Saturation Mutagenesis
1. 96-well microtiter sterile plates.
2. Reaction buffer. 3. Real-time PCR machine. 4. Real-time PCR tubes with optically clear flat caps or 96-well PCR plates with optically sealing film.
2. Aluminum sealing foil film for 96-well plate. 3. Antibiotics. 4. Cell lysis reagents. 5. Inducer reagent for protein expression. 6. Reagent for specific enzyme activity assay. 7. Sterile media (both of agar and broth, e.g., Luria-Bertani (LB) broth, Terrific Broth (TB), and ZY medium).
3
Methods Site-directed mutagenesis is a technique that specifically creates and changes nucleotide base in double stranded plasmid DNA to change the targeted amino acid adopted in the protein, in order to improve the activity or thermostability by PCR. A protocol for thermal screening of the thermostable enzyme is shown in general. Thermal shift assay (TSA) measures the change in thermal denaturation temperature and hence stability of a protein under varying conditions or mutation. The most common method to measure protein thermal shifts is differential scanning fluorimetry (DSF) or thermofluor, which utilizes specialized fluorogenic dyes [24] to track the particular unfolding of protein. Site saturation mutagenesis is a technique for protein engineering that substitutes a single codon or a set of codons with all possible amino acids by PCR. The primer is designed for randomized codon at the position. The success of site saturation mutagenesis should be a variety and adequate amino acid at the targeted positions.
3.1 In Silico Prediction
1. Search for high-resolution crystal structure of the protein of interest in the Protein Data Bank (PDB; http://www.rcsb.org/ pdb/) [25], a repository which provides structural data of biological macromolecules (see Note 1). Search for nucleotide sequence from the National Center for Biotechnology Information (NCBI; https://www.ncbi.nlm.nih.gov). The Nucleotide database is a collection of sequences from several sources, including GenBank, RefSeq, TPA, and PDB [26]. 2. Input an existing PDB file or PDB code of protein structure and nucleotide sequence into either the program or web server depending on in silico tool used (Table 1). PDB files can be
Rational-Design Engineering to Improve Enzyme Thermostability
169
downloaded from a local machine or network drive. Choose the biological unit or specify the chains for which the program calculation will be performed. 3. Adjust some computational parameters to predict stable enzymes depending on the algorithm used. Click on the run button to submit the job. Web servers mostly send results to an e-mail after the analysis process finishes (see Note 2). 4. Evaluate and save the results. A list of candidate mutants will be selected from specific criteria for further experiment and characterization (see Note 3). 3.2
Primer Design
1. Select the desired mutant residue from the prediction program. Then, design a forward and reverse primer to overlay that position (Fig. 2). Both primers compose of non-overlapping sequences at their 30 end and primer-primer complementary (overlapping) sequences at the 50 end (see Note 4). 2. The length of primer optimal for PCR is 18–30 bp (see Note 5). The non-overlapping sequences should be larger than complementary sequences and the melting temperature of non-overlapping should be more than complementary sequences around 5–10 C (see Note 6). 3. GC content of primer should be 40–60% to ensure stable binding of primer. The condition for the annealing step is calculated by Tm of primer 5 C (see Note 7). 4. Design the bases to be mutated in either the complementary region or non-overlapping region [27].
3.3 Site-Directed Mutagenesis
1. PCR reaction for site-directed mutagenesis contains 1 μM each primer, 200 μM each dNTP, 1 PCR reaction buffer, 3–5-unit DNA polymerase, 1–100 ng plasmid template, and sterilized water (see Note 8). 2. PCR cycles as follows: 95 C initial denaturation for 30 s, 95 C denaturation for 10 s, Tm of primer 5 C annealing for 30 s, 72 C extension for 1 min/kb, 72 C final extension for 10 min. 3. The PCR product should be investigated by gel electrophoresis and nucleotide sequencing to confirm mutated location. Prepare 0.8–1.0% (w/v) agarose gel in the TAE buffer by melting with microwave, then set the gel in an agarose gel electrophoresis tray. Load PCR product that mixes with 6 loading dye (5 PCR:1 Dye) to compare size with DNA marker.
3.4 Screening Conditions
1. The wild-type enzyme of interest should be investigated for initial activity by using a protocol specific for the enzyme (see Note 9). 2. Mutant enzymes predicted by the computational program are transformed into host cells for expression by using a protocol specific for the enzyme of interest.
170
Vinutsada Pongsupasa et al.
Fig. 2 Primers design of mutagenesis PCR amplification; primers contain non-overlapping and complementary regions to enhance PCR yield
3. After protein expression, cells are disrupted and clarified to obtain crude enzymes. Each candidate crude enzyme is, then, heat-treated with a temperature 5–10 C more than that of the wild-type enzyme’s melting point for 10 min (see Note 10). After incubation, the precipitated proteins are cleared by centrifugation. 4. Determine protein concentration of the clear supernatant by Bradford assay, where same concentration of protein is employed for each variant reaction assay in order to normalize the enzyme activity (see Note11). Measure activity of enzyme variants under increasing temperature until the enzyme gets inactivated (see Note 10). 5. The specific activity of each variant is compared with that of wild-type (WT) enzymes. The variants that have greater specific activity than the WT are selected for further characterization. 3.5 Time-Dependent Thermal Inactivation Assays
1. Incubate each high thermotolerance variant in a water bath at the temperature that can tolerate heat far more than WT for various incubation times (e.g., 0–180 min, depend on thermotolerance of enzyme) (see Note 12). 2. After harvest cell and lysis cell, centrifuge cell lysates at 10,000 g and 4 C, for 10 min to discard cell pellets and collect supernatant to measure the enzyme activity. 3. Assay the enzyme activity by determining the decrease of substrate or the increase of product over time. 4. The result should plot the relativity between incubation time and activity of enzyme (initial rate) to observe the decrease of activity with increasing temperature. The continuous line of each curve represents the fits to the data created using single exponential decay. The error bar represents the measurements
Rational-Design Engineering to Improve Enzyme Thermostability
171
from three to five replicates. Then, compare the result of enzyme thermovariants with WT to select the mutant for further experiment. This information is beneficial for industry applications [28]. 3.6 Thermal Shift Assay (TSA) by Differential Scanning Fluorimetry
1. Take 50 μM candidate thermostable enzyme from the gene library to test in RT-qPCR reactions that contains 50 μL buffer, optimal concentration of fluorescence dye (see Note 13). The exact concentrations of protein and dye are defined by experimental assay development studies. 2. Incubate enzymes in a real-time thermal cycler at 35 C and increase temperature by 1 C/min until 100 C. Fluorescence readouts are taken at each interval. Derivative values are plotted and the peak minima corresponding to the enzyme melting temperature is reported. A typical temperature ramp rates range from 0.1 C to 10 C/min but generally in the range of 1 C/min. The fluorescence in each well is measured at regular intervals, 0.2–1 C/image, over a temperature range spanning the typical protein unfolding temperatures of 25–95 C. 3. The data from DSF indicate the dye emission maximum which is used for melt curve plotting. The Tm values are calculated by determining the inflection point(s) of each melt curve. The stability curve and its midpoint value (melting temperature, Tm also known as the temperature of hydrophobic exposure) are obtained by gradually increasing the temperature to unfold the protein and measuring the fluorescence at each point. Curves are measured for protein only and protein + ligand, and ΔTm is calculated. 4. The result of fluorescent responses illustrates the enzyme unfolding in a thermal shift. This assay measures changes in the thermal denaturation temperature and consequently stability of a protein under varying conditions. It is measuring the temperature at which a protein goes from native to denatured forms. Fluorescent dye such as SYPRO Orange will bind to hydrophobic surfaces, and water strongly quenches its fluorescence. When the protein unfolds, the exposed hydrophobic surfaces bind the dye, resulting in an increase in fluorescence by excluding water. 5. See examples provided in Subheadings 3.7 and 3.8.
3.7 Example 1: Screening Thermal Stability of Flavin Reductase (C1) [29]
1. The pET11a-C1 plasmid is used as a template for site-directed mutagenesis to generate the C1 library by PCR. Transform C1 variants into Escherichia coli BL21(DE3) for protein expression. Culture at 37 C in Luria-Bertani (LB) broth medium containing 50 μg/mL ampicillin until OD600 ~ 1.5, then add
172
Vinutsada Pongsupasa et al.
1 mM IPTG and culture at 23 C until OD600 reach ~4.0. Collect cells by centrifugation at 15,000 rpm for 1 h. 2. Suspend the cell paste with 50 mM sodium phosphate buffer, pH 7.0, containing 1 mM DTT, 0.5 mM EDTA, and 100 μM PMSF, then extract crude protein by ultrasonication and subsequently centrifuge to remove cell pellets. 3. Screen thermostability of crude extract of C1 variant by incubating in a water bath heating at 45 C for 10 min. After incubation, centrifuge to sediment precipitated proteins. Determine the protein content of C1 by Bradford assay for enzyme activity normalization. 4. Measure C1 activity by NADH oxidase activity: C1 can oxidize NADH to reduce FMN and the activity can be detected as the absorbance decreases at 340 nm by spectrophotometry. The reaction typically contains C1 (4–16 nM), FMN (15 mM), and NADH (200 mM) in sodium phosphate buffer (pH 7.0, 50 mM). 5. Determine time-dependent thermal inactivation of C1 by incubating C1 at each time point for 0–180 min and assay for NADH oxidation activity. 6. Plot the result of C1 activity at increasing time by fitting with the Michaelis–Menten equation (see Note 14), the vmax, Km, and kcat value will be used to compare the activity at high temperature (see Note 15). 7. For thermal denaturation of C1, mix 5 μM C1 with the sodium phosphate buffer (pH 7.0, 50 mM) and adjust total volume to 20 μL in PCR tube. While the temperature increases from 25 C to 90 C, monitor intrinsic fluorescence in the realtime PCR instrument. 3.8 Example 2: Screening Thermostability of HadA Variants [30]
1. Based on computational and rational analysis, express HadA variant in E. coli BL21 (DE3) and grow at 37 C in 650 mL of ZYP-5052 autoinducing medium containing 50 μg/mL ampicillin until OD600 reach 1.0, then switch temperature to 25 C and shake for 16 h. 2. Harvest cells by centrifugation and resuspend cell pellets in lysis buffer (50 mM NaH2PO4 buffer pH 7.0 containing 5 mM EDTA, 100 μM PMSF, and 1 mM DTT), then lyse by ultrasonication. 3. Sediment cell debris by centrifugation and transfer supernatant to a fresh tube. Add ammonium sulfate at 20–40% to the supernatant to precipitate the desired protein, resuspend pellet in 30 mM NaH2PO4 buffer pH 6.5 containing 50 mM NaCl. Dialyze crude protein with 14 MWCO dialysis bag in NaH2PO4 for 16 h. After dialysis, purify HadA by an anion-
Rational-Design Engineering to Improve Enzyme Thermostability
173
exchange chromatography using DEAE-Sepharose column (GE-healthcare). 4. Determine protein melting temperature (Tm) by using a realtime PCR machine to monitor fluorescence changes. The reaction contains 10 μM protein and 5 SYPRO Orange dye in a 100 mM HEPES pH 8.0 buffer. The temperature gradient should be gradually increased from 25 C to 95 C, with increments of 1 C/min. 5. To determine thermostability of HadA variants, incubate HadA at 25, 35, 40, 45, 50, and 55 C for various time periods, then cool on ice for 10 min to stop heating denaturation, and centrifuge to discard the denatured proteins. 6. Examine the remaining activity of HadA by measuring the rates of 4-nitrophenol (4-NP) depletion at 400 nm by spectrophotometry. Mix 50 μM 4-NP with 2 μM of HadA, 1 μM of C1, 20 μM NAD+, 20 μM FAD into 1 mL of assay buffer. 7. Calculate half-life from the equation t1/2 ¼ (ln2)/k and analyze the first-order equation. The half-life indicates the time of remaining 50% activity, thus use this value to compare thermostability of variants and WT. 3.9 Site Saturation Mutagenesis
1. Generate mutant library of enzyme by using site saturation mutagenesis at the residue important to thermotolerance (see Note 16). 2. Transform mutant product to host cell by applying the protocol most suited for your target enzyme. Pick mutant colonies into 96-well plates containing 200 μL culture media supplemented with the appropriate antibiotic. Place lids over the plates and grow at a suitable time, temperature, and shaker speed. Use 5% of these master plates for inoculation to express protein. Grow the culture to mid-exponential phase. When the density of cell at OD600 reaches ~0.2–0.4, add protein expression inducer and check until OD600 reaches saturation. Add 50% glycerol to the master plate and place it at 80 C for library storage (Fig. 3). 3. After expression, harvest the cells by using a centrifuge (4000 rpm at 4 C for 10 min) and extract the protein by using lysozyme in a suitable lysis buffer to break cells (see Note 17). Then centrifuge (4000 rpm at 4 C for 10 min) and transfer the cell lysate into a fresh plate. Supernatant will be used for enzyme assay in further experiments. 4. Aliquot 1 μg protein to 96-well plate, heat samples for 10 min at a temperature above 5–10 C from the melting temperature of the wild-type enzyme, then assay activity in the assay buffer suitable for the target enzyme (see Note 18).
174
Vinutsada Pongsupasa et al.
Fig. 3 Overview of site saturation mutagenesis and random mutagenesis with high-throughput screening
5. Calculate residual activity by initial rate of enzyme and plot the increase of temperature against decrease of enzyme activity to calculate the apparent Tm. 6. Select mutants based on improvements in thermostability and/or initial activity for sequencing by re-growing from the master plates stored at 80 C. 7. The result of amino changes to thermotolerance can be explained by using a molecular visualization program such as PyMOL, to analyze the distance and conformational change of mutated location.
Rational-Design Engineering to Improve Enzyme Thermostability
4
175
Notes 1. Provide structure either as a PDB ID or a defined PDB file. The user can then choose a predefined biological unit generated by the MakeMultimer tool or manually select chains for which the calculation should be performed. Resolution is a measure of quality of the data that has been collected on the crystal containing the protein or nucleic acid. High-resolution structures, with resolution values of 1 Å or so, are highly ordered and it is easy to see every atom in the electron density map. Lower resolution structures, with resolution of 3 Å or higher, show only the basic contours of the protein chain, and the atomic structure must be inferred [25]. 2. Adjust parameters to get reliable results; however, it is not necessary as the default setting has been optimized. The time required to complete a calculation depends on the size of the protein and the number of other queries running in parallel. 3. The “results browser” page mostly contains the information about the job, status of the calculation, the results and link for data download. Find candidates based on bond energy, B-factor, associated secondary structure, or 3-D considerations depending on the tool. Mutations near the active site milieu were avoided from alteration as the catalytic activity could be affected. Surface area is thus suggested. ΔΔGfold values were considered for further analysis. Beside residues capable of entering into interactions with neighboring residues, either by introducing salt bridges, disulfide bridges, or hydrogen bonds, or through promoting hydrophobic interactions. 4. The non-overlapping region of primers should be long enough to bind the newly synthesized DNA efficiently. 5. The shorter primers are easy to bind the template; however, their specificity is inadequate. The length of primer should be enough for selectivity and suitable for the annealing temperature. 6. The calculation of Tm can be determined by different methods. In this study, Tm was calculated from a simple formula: Tm ¼ 4 (G + C) + 2(A + T). 7. To promote binding, Tm should be in the range of 52–58 C. If >65 C, it may have a tendency to proceed to secondary annealing. The Tm value of a pair of primers should be similar and try to design the end of primers with C or G at their 30 end. 8. For a perfect result, a good plasmid should be acquired (not broken and diluted for long time) and the buffer should be adjusted in the range of pH 8.5–9 and added with an appropriate salt concentration.
176
Vinutsada Pongsupasa et al.
9. Vary the conditions that are suitable for reaction assay. The substrate concentrations should be maintained at not less than 10 times the Km to ensure that the reaction rates are maximized. The concentration of substrate used or product formed must not inhibit the enzyme reaction. 10. Typically, 10–20 min is a good incubation time. In order to test different temperatures, use the thermocycler’s programmable gradient feature if available; otherwise try different temperatures in separate experiments. 11. Amount of crude protein by Bradford assay is used to normalize the enzyme activity. On the other hand, GFP tag on the interested enzyme might be advantageous to report real-time quantity of expressed enzyme. 12. Time and temperature depend on melting temperature from the previous result. 13. Dyes possess a significant background in the presence of folded proteins. The most favored dye for DSF is SYPRO Orange, mainly owing to its high signal-to-noise ratio, as well as its relatively long excitation wavelength (near 500 nm). This minimizes the interference of most small molecules as these typically have absorption maxima at shorter wavelengths. 14. Michaelis–Menten equation v ¼ VmaxS/(Km + S). 15. Vmax is the maximum rate, Km is the Michaelis constant for substrate, and S is the [NADH]. 16. Site saturation mutagenesis is a technique that substitutes codon at the position with all possible amino acids. To predict the possibility of each amino acid, Qpool will be used for determination (The Qpool value calculated from the fluorescence intensity of each base pair to predict the percent of each amino acid, for all possible amino acids, Qpool should be >0.7). 17. Lysozyme is the best choice to break cells on a small scale. For E. coli cell lysis, use a freshly prepared lysozyme solution (10 mg/mL) to achieve the highest lysis activity. Moreover, add EDTA to capture the divalent metal ion in the active center of metalloprotease enzyme to deactivate them, which could help hydrolyze cell membrane more easily. 18. Assay reagents depend on the type of enzyme and method that are used to determine the activity of enzyme (e.g., absorbance).
Acknowledgments This work was supported by The National Research Council of Thailand (NRCT) Grant NRCT5-RSA63025-02 (to T.W.) and NRCT5-RSA63012-01 (to S.M.). We also thank funding support
Rational-Design Engineering to Improve Enzyme Thermostability
177
from Vidyasirimedhi Institute of Science and Technology (VISTEC), Global Partnership Program from Program Management Unit-B and Royal Academy of Engineering (UK) for V.P., P.A., and T.W. References 1. Modarres HP, Mofrad MR, Sanati-Nezhad A (2016) Protein thermostability engineering. RSC Adv 6(116):115252–115270. https:// doi.org/10.1039/C6RA16992A 2. Musil M, Stourac J, Bendl J, Brezovsky J, Prokop Z, Zendulka J, Martinek T, Bednar D, Damborsky J (2017) FireProt: web server for automated design of thermostable proteins. Nucleic Acids Res 45(W1):W393–W399. https://doi.org/10.1093/nar/gkx285 3. Bednar D, Beerens K, Sebestova E, Bendl J, Khare S, Chaloupkova R, Prokop Z, Brezovsky J, Baker D, Damborsky J (2015) FireProt: energy- and evolution-based computational design of thermostable multiple-point mutants. PLoS Comput Biol 11(11): e1004556. https://doi.org/10.1371/journal. pcbi.1004556 4. Folkman L, Stantic B, Sattar A, Zhou Y (2016) EASE-MM: sequence-based prediction of mutation-induced stability changes with feature-based multiple models. J Mol Biol 428(6):1394–1405. https://doi.org/10. 1016/j.jmb.2016.01.012 5. Capriotti E, Fariselli P, Casadio R (2005) I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure. Nucleic Acids Res 33(Web Server issue):W306–W310. https://doi.org/10. 1093/nar/gki375 6. Schymkowitz J, Borg J, Stricher F, Nys R, Rousseau F, Serrano L (2005) The FoldX web server: an online force field. Nucleic Acids Res 33(Web Server issue):W382–W388. https:// doi.org/10.1093/nar/gki387 7. Goldenzweig A, Goldsmith M, Hill SE, Gertman O, Laurino P, Ashani Y, Dym O, Unger T, Albeck S, Prilusky J, Lieberman RL, Aharoni A, Silman I, Sussman JL, Tawfik DS, Fleishman SJ (2016) Automated structure- and sequence-based design of proteins for high bacterial expression and stability. Mol Cell 63 (2):337–346. https://doi.org/10.1016/j. molcel.2016.06.012 8. Craig DB, Dombkowski AA (2013) Disulfide by Design 2.0: a web-based tool for disulfide engineering in proteins. BMC Bioinformatics 14:346. https://doi.org/10.1186/14712105-14-346
9. Pires DE, Ascher DB, Blundell TL (2014) mCSM: predicting the effects of mutations in proteins using graph-based signatures. Bioinformatics 30(3):335–342. https://doi.org/ 10.1093/bioinformatics/btt691 10. Rohl CA, Strauss CE, Misura KM, Baker D (2004) Protein structure prediction using Rosetta. Methods Enzymol 383:66–93. https://doi.org/10.1016/S0076-6879(04) 83004-0 ´ , Barlow KA, Pache RA, 11. Conchu´ir SO Ollikainen N, Kundert K, O’Meara MJ, Smith CA, Kortemme T (2015) A web resource for standardized benchmark datasets, metrics, and rosetta protocols for macromolecular modeling and design. PLoS One 10(9):e0130433. https://doi.org/10.1371/journal.pone. 0130433 12. Yin S, Ding F, Dokholyan NV (2007) Eris: an automated estimator of protein stability. Nat Methods 4(6):466–467. https://doi.org/10. 1038/nmeth0607-466 13. Kwasigroch JM, Gilis D, Dehouck Y, Rooman M (2002) PoPMuSiC, rationally designing point mutations in protein structures. Bioinformatics 18(12):1701–1702. https://doi.org/ 10.1093/bioinformatics/18.12.1701 14. Wijma HJ, Floor RJ, Jekel PA, Baker D, Marrink SJ, Janssen DB (2014) Computationally designed libraries for rapid enzyme stabilization. Protein Eng Des Sel 27(2):49–58. https://doi.org/10.1093/protein/gzt061 15. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL (2009) BLAST+: architecture and applications. BMC Bioinformatics 10(1):421 16. Suzek BE, Wang Y, Huang H, PB MG, Wu CH, Consortium U (2015) UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics 31(6):926–932 17. Edgar RC (2010) Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26(19):2460–2461 18. Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W, Lopez R, McWilliam H, Remmert M, So¨ding J (2011) Fast, scalable generation of high-quality protein multiple
178
Vinutsada Pongsupasa et al.
sequence alignments using Clustal Omega. Mol Syst Biol 7(1):539 19. Capra JA, Singh M (2007) Predicting functionally important residues from sequence conservation. Bioinformatics 23(15):1875–1882 20. Amin N, Liu A, Ramer S, Aehle W, Meijer D, Metin M, Wong S, Gualfetti P, Schellenberger V (2004) Construction of stabilized proteins by combinatorial consensus mutagenesis. Protein Eng Des Sel 17(11):787–793 21. Lehmann M, Loch C, Middendorf A, Studer D, Lassen SF, Pasamontes L, van Loon AP, Wyss M (2002) The consensus concept for thermostability engineering of proteins: further proof of concept. Protein Eng 15 (5):403–411 22. Pey AL, Rodriguez-Larrea D, Bomke S, Dammers S, Godoy-Ruiz R, Garcia-Mira MM, Sanchez-Ruiz JM (2008) Engineering proteins with tunable thermodynamic and kinetic stabilities. Proteins 71(1):165–174 23. Sullivan BJ, Nguyen T, Durani V, Mathur D, Rojas S, Thomas M, Syu T, Magliery TJ (2012) Stabilizing proteins from sequence statistics: the interplay of conservation and correlation in triosephosphate isomerase stability. J Mol Biol 420(4–5):384–399 24. Heller RC, Chung S, Crissy K, Dumas K, Schuster D, Schoenfeld TW (2019) Engineering of a thermostable viral polymerase using metagenome-derived diversity for highly sensitive and specific RT-PCR. Nucleic Acids Res 47 (7):3619–3630. https://doi.org/10.1093/ nar/gkz104 25. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov
IN, Bourne PE (2000) The Protein Data Bank. Nucleic Acids Res 28(1):235–242. https:// doi.org/10.1093/nar/28.1.235 26. Coordinators NR (2014) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 42(Database issue):D7–D17. https://doi.org/10.1093/ nar/gkt1146 27. Liu H, Naismith JH (2008) An efficient one-step site-directed deletion, insertion, single and multiple-site plasmid mutagenesis protocol. BMC Biotechnol 8:91. https://doi.org/ 10.1186/1472-6750-8-91 28. Peterson ME, Daniel RM, Danson MJ, Eisenthal R (2007) The dependence of enzyme activity on temperature: determination and validation of parameters. Biochem J 402 (2):331–337. https://doi.org/10.1042/ BJ20061143 29. Maenpuen S, Pongsupasa V, Pensook W, Anuwan P, Kraivisitkul N, Pinthong C, Phonbuppha J, Luanloet T, Wijma HJ, Fraaije MW, Lawan N, Chaiyen P, Wongnate T (2020) Creating flavin reductase variants with thermostable and solvent-tolerant properties by rational-design engineering. Chembiochem 21(10):1481–1491. https://doi.org/10. 1002/cbic.201900737 30. Pongpamorn P, Watthaisong P, Pimviriyakul P, Jaruwat A, Lawan N, Chitnumsub P, Chaiyen P (2019) Identification of a hotspot residue for improving the thermostability of a flavindependent monooxygenase. Chembiochem 20(24):3020–3031
Chapter 10 Using Molecular Simulation to Guide Protein Engineering for Biocatalysis in Organic Solvents Haiyang Cui, Markus Vedder, Ulrich Schwaneberg, and Mehdi D. Davari Abstract Biocatalysis in organic solvents (OSs) is very appealing for the industry in producing bulk and/or fine chemicals, such as pharmaceuticals, biodiesel, and fragrances. The poor performance of enzymes in OSs (e.g., reduced activity, insufficient stability, and deactivation) negates OSs’ excellent solvent properties. Molecular dynamics (MD) simulations provide a complementary method to study the relationship between enzymes dynamics and the stability in OSs. Here we describe computational procedure for MD simulation of enzymes in OSs with an example of Bacillus subtilis lipase A (BSLA) in dimethyl sulfoxide (DMSO) cosolvent with software GROMACS. We discuss main essential practical issues considered (such as choice of force field, parameterization, simulation setup, and trajectory analysis). The core part of this protocol (enzyme-OS system setup, analysis of structural-based and solvation-based observables) is transferable to other enzymes and any OS systems. Combining with experimental studies, the obtained molecular knowledge is most likely to guide researchers to access rational protein engineering approaches to tailor OS resistant enzymes and expand the scope of biocatalysis in OS media. Finally, we discuss potential solutions to overcome the remaining challenges of computational biocatalysis in OSs and briefly draw future directions for further improvement in this field. Key words Molecular dynamics simulation, Organic solvents, Biocatalysis, GROMACS , Protein engineering
1
Introduction Biocatalysts are widely applied in chemical and pharmaceutical industries [1–3]. Numerous industrially relevant enzymatic reactions have been presented and optimized for organic (co-)solvents (OSs). There are several advantages to applying the OSs in biocatalysis, including enhanced activity and stability, increased solubility of hydrophobic substrates/products, easer of product recovery, shifting the thermodynamic equilibrium towards new reactions, etc. [4–6]. Besides, enzymatic reactions conducted in OSs have great industrial potential [7–9] since they would combine effi-
Francesca Magnani, Chiara Marabelli and Francesca Paradisi (eds.), Enzyme Engineering: Methods and Protocols, Methods in Molecular Biology, vol. 2397, https://doi.org/10.1007/978-1-0716-1826-4_10, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2022
179
180
Haiyang Cui et al.
ciently the synthetic power of enzymes with chemical synthesis. However, the existing challenge is that the vast majority of enzymes show reduced or no catalytic activity in OSs [10, 11]. Many techniques have been applied to study the enzyme-OS interaction regarding different aspects. For example, conformational changes and structural mobility of enzymes can be experimentally obtained from protein X-ray crystallography [12, 13], CD [14], or NMR spectroscopy [15–17]. The dynamics of the related solvent shell can be verified adequately by ultrafast fluorescence [18, 19], NOE [20], and IR spectra [21]. Besides, molecular dynamics (MD) simulations provide a complementary method to study the connection between protein dynamics and the stability of the enzymes in OSs, which has been validated to possess high consistency with numerous experimental measurements [20, 22– 26]. Evidence that is more recent highlights that the enzyme-OS interactions mostly depend on the molecular structure and properties of OS and “type” of protein [27, 28]. Overall, the interaction between enzymes and OSs is primarily via five different aspects: (a) conformational changes [27, 29, 30]; (b) losing bound water [31–35]; (c) inhibition in the active site [36–39]; (d) interfacial inactivation [40]; and (e) thermodynamic stabilization of the substrate ground state [41, 42]. The enzyme structure in homogeneous OSs is sensitive to OS’s functional groups [27, 29, 30]. Generally, polar OSs dramatically stripped off water molecules from the enzyme surface, which are critical for structure folding and stability [23, 27, 43–45], thus affecting the proteins’ function [32, 33, 35]. MD analysis of Candida antarctica lipase B (CALB) in tert-butyl alcohol, methanol, and hexane cosolvents underlined that residence times of OSs decreased with an increased water activity (aw). The aw is an indicative of water content around enzymes [34, 39, 46], which is usually accompanied by higher enzyme flexibility [45]. Compared with non-polar solvents, polar OSs prefer to penetrate deeper into enzymes and induce conformational changes. Concerning structural integrity, enzymes are much more robust in non-polar OSs than polar ones. Kamal et al. [47] claimed that methanol and isopropanol made the structure of Bacillus subtilis lipase A (BSLA) less rigid but more prone to unfolding. The latter resulted in reduced stability of BSLA. Importantly, the effect of OSs on enzyme stability and flexibility is also associated with the “type” of the enzyme [27, 29, 30]. In this chapter, we provide a guide for researchers looking to start performing their enzyme-OS simulations. Thus we focus on the computational biocatalysis in OSs (best simulation practices, potential pitfalls and limitations, and promising analysis techniques). Notably, the latter would help researchers to identify the significant structural-based and solvation-based observables, determining the catalytic function of proteins, for logically and robustly engineering to tailor the enzyme’s stability in OSs. BSLA is used as
Using Molecular Simulation to Guide Protein Engineering for Biocatalysis. . .
181
a model, and its crystal structure (PDB ID: 1i6w [48], Chain A, resolution 1.5 Å) will be used as the input. The software GROMACS v5.1.2 simulation package [49] will be used to perform the MD simulation of BSLA in dimethyl sulfoxide (DMSO) cosolvent. The MD trajectory is then used to calculate various structure-based and solvation-based observables.
2
Method for Simulating Proteins in Organic Solvents
2.1 Choice of the Force Field and Parameterization Process
The choice of a force field determines the accuracy of reproducing the underlying mechanisms and properties of the interaction in the enzyme-OS-water system. Several force fields have been applied to investigate enzyme behaviors in OSs, such as AMBER [50, 51], OPLS-AA [52, 53], and GROMOS [54–57]. Most of force fields are enable to reproduce the stability of the proteins in water well and show good agreement with experimental data [49, 58]. Generally, a set of modified parameters are required to be developed for yielding more accurate properties of OSs [51, 59–61]. However, it is unlikely that any combination of parameters could yield a very high accuracy for key properties such as density (ρ), dielectric constant (ε), viscosity (η), enthalpy of vaporization (ΔHvap), the surface tension, the heat capacity at constant volume and pressure, the isothermal compressibility, and the volumetric expansion coefficient [51, 62]. And the mixture of OS and water often deviates experimental properties compared to the pure OSs. Moreover, considering the computational cost and accuracy, a suitable water molecule model (e.g., SPC, SPC/E, SPC/L, TIP3P, TIP4P, and TIP5P) should be selected for reproducing the physical properties of OS cosolvent [63, 64]. Generally, SPC and SPC/E are usually applied in the GROMOS force field, but TIP3P and TIP4P/TIP5P show good performance within AMBER and OPLS, respectively. Some water models, developed in a specific force field, are often adopted to other force fields (see Note 1). In order to rapidly set up a reliable enzyme-OS-water MD system, the following aspects should be considered: (1) the established OS parameters in previous studies can be directly applied into new system with the matching force field or additional force fields after modification; (2) several programs can be applied to generate the topology file of OS, such as Automated force field Topology Builder (ATB) [65] and GAFF combining with RESP/Antechamber [66]; however, the parameters of pure OS and/or OS cosolvent system are necessary to be examined to compare with the experimental values; (3) keeping/eliminating crystallographic water for enzymes simulations in OS cosolvent might influence the equilibration process and enzyme catalytic mechanism; (4) the OS concentration in cosolvent should be reasonably converted to the number of OS/water molecules; (5) proper system size, especially
182
Haiyang Cui et al.
box size, should be chosen to achieve the efficient use of computational resources. Although a common protocol for enzyme-OS simulations does not exist, the MD simulation’s credible phenomenon can provide valuable prediction in biocatalysis in OS as long as the model parameters meet the experimental determination well. 2.2 Example GROMACS Run for Enzyme-OS Simulation
This section provides the protocols as we have used them, assess the protein-OS interaction, and inform the potential protein engineering strategies. MD simulations can be carried out in software packages, e.g., GROMACS [67], AMBER [68, 69], CHARMM [70], LAMMPS [71] [72], NAMD [73], and YASARA [74], Cp2k [75]. GROMCAS is free, open-source software and has consistently been one of the fastest (if not the fastest) MD codes available. Several detailed documents and tutorials online are supported to grasp and apply GROMACS, especially the websites http://www. gromacs.org/ [76] and http://www.mdtutorials.com/gmx/. Here, GROMACS v5.1.2 and GROMOS96 (54a7) force field was used for the simulations of BSLA in DMSO. The BSLA wild type and 60% (v/v) DMSO is used as the model for protein and OS cosolvent, respectively (Fig. 1). The starting structure for MD simulations was taken from crystal structure of BSLA (PDB ID: 1i6w [48], Chain A, resolution 1.5 Å). This force field has been reported to be a reliable force field for simulations of OSs and proteins [52, 58, 77–79]. The topology file of OS molecule structures (e.g., DMSO_ATB.itp) were firstly taken from ATB (Automated force field Topology Builder) with the parameter set of GROMOS96 (54a7) force field [65]. Then the parameters of each model were modified according to the reported models [80– 83]. Before the protein-OS system, simulation 10 ns MD simulations in triplicate were performed using both pure OS and OS-water mixtures to validate OSs force field parameters for the reproducibility of the experimental property (see Note 2).
2.2.1 Specification of Force Field Path
Several forced fields are available in the GROMACS software. In addition, GROMACS allows calling the modified force field, which offers the complex systems opportunity. Our modified force field was directly obtained from the webserver ATB (https://atb.uq.edu. au/). After the execution of the following command, the new “customized” force field will be recognized by GROMACS: export GMXLIB=/user/directonary1/directonary2/directonary3/ force_field/
2.2.2 Preparation of the Topology Files for the Protein
The topology file can be built following the GROMACS specification for a molecular topology. The topology file lists each atom’s constant attributes to define the “rule” for molecules in simulation. Moreover, it contains internal coordinates that allow the automatic
Using Molecular Simulation to Guide Protein Engineering for Biocatalysis. . .
183
Fig. 1 Computational workflow for MD simulation of protein-OS system
assignment of coordinates to hydrogens and other atoms missing from a crystal PDB file. The topology file with a (modified) force field could be generated with the following command: echo 1 | gmx pdb2gmx -f BSLA_WT.pdb -o BSLA_WT.gro -p BSLA_DMSO.top -water spce -ignh
184
Haiyang Cui et al.
The topology file defined that BSLA structure will be solvated into a box of SPC/E water molecules [84] (see Note 3). Select the Force Field: From ‘/user/directonary1/directonary2/directonary3/force_field’: 1: GROMOS96 54a7 force field (Eur Biophys J (2011), 40, 843-856, DOI: 10.1007/s00249-011-0700-9) From ‘direction to the force field: 2: AMBER03 protein, nucleic AMBER94 (Duan et al., J Comp Chem 24, 1999-2012, 2003) 3: AMBER94 force field (Cornell et al., JACS 117, 5179-5197, 1995) 4: AMBER96 protein, nucleic AMBER94 (Kollman et al., Acc Chem Res 29, 461-469, 1996) 5: AMBER99 protein, nucleic AMBER94 (Wang et al., J Comp Chem 21, 1049-1074, 2000) 6: AMBER99SB protein, nucleic AMBER94 (Hornak et al., Proteins 65, 712-725, 2006) 7: AMBER99SB-ILDN protein, nucleic AMBER94 (Lindorff-Larsen et al., Proteins 78, 1950-58, 2010) 8: AMBERGS force field (Garcia and Sanbonmatsu, PNAS 99, 2782-2787, 2002) 9: CHARMM27 all-atom force field (CHARM22 plus CMAP for proteins) 10: GROMOS96 43a1 force field 11: GROMOS96 43a2 force field (improved alkane dihedrals) 12: GROMOS96 45a3 force field (Schuler JCC 2001 22 1205) 13: GROMOS96 53a5 force field (JCC 2004 vol 25 pag 1656) 14: GROMOS96 53a6 force field (JCC 2004 vol 25 pag 1656) 15: GROMOS96 54a7 force field (Eur Biophys J (2011), 40, 843-856, DOI: 10.1007/s00249-011-0700-9) 16: OPLS-AA/L all-atom force field (2001 aminoacid dihedrals) This table shows the options for the force field selection
2.2.3 Preparation of the Protein Environment
A proper simulation box should be prepared to simulate the environment for the protein. The command to create the simulation box around the protein is as follows: gmx editconf -f BSLA_WT.gro -o BSLA_WT_box.gro -c -d 1.2
The output file now includes the information that the protein is in the default cubic simulation box and centered (-c) in the middle. The box is also a minimum 1.2 nm away from the closest point of the protein (-d 1.2) using periodic boundary (see Note 4).
Using Molecular Simulation to Guide Protein Engineering for Biocatalysis. . . 2.2.4 Energy Minimization of Protein in Vacuum
185
Energy minimization is commonly used to remove the steric clashes or inappropriate geometry and refine the experimental structure with low resolution. Different minimization algorithms can be chosen, such as steepest descent and conjugate gradient. The energy minimization of protein in a vacuum is favorable for the later energy minimization of a more complex system (e.g., proteinwater-ion) efficiently. For the energy minimization, we need a parameter file em-vac-pme.mdp, specifying which type of minimization should be carried out, the number of steps, etc. Then we use grompp to assemble the simulation parameter (.mdp), structure (. gro), and topology file (.top) (see Note 5): gmx grompp -f em-vac-pme.mdp -c BSLA_WT_box.gro -p BSLA_DMSO. top -o em-vac.tpr -maxwarn 1
After generating the em-vac.tpr, the following command could be used to submit the job for energy minimization: gmx mdrun -v -deffnm em-vac
2.2.5 OS Cosolvent System Generation
To generate the DMSO cosolvent system, the .gro file for the DMSO molecule need to be converted from DMSO.pdb with the following command (see Note 6): gmx editconf -f DMSO.pdb -o DMSO.gro
To mimic the protein’s cosolvent environment, the protein in the box (em-vac.gro) needs to be solvated, in other words, filled with the water and DMSO molecules. Once the OS molecules were filled, space was filled with water molecules (see Note 7): gmx insert-molecules -f em-vac.gro -ci DMSO.gro -nmol 1133 -o BSLA_WT_box_DMSO.gro
Since the DMSO molecules information cannot be automatically updated in the topology file, the topology file needs to be modified as the following command (see Note 8): sed -i ‘s+; Include water topology+#include "/user/directonary1/directonary2/directonary3/force_field/gromos54a7_atb.ff/ DMSO_ATB.itp"+g’ BSLA_DMSO.top echo "DMSO 1133" >> BSLA_DMSO.top
The number of OS molecules was mainly determined by the OS concentration and box sizes. The number of water molecule can be tested with -maxsol command until no more molecules can be inserted because extra water molecules will automatically be excluded:
186
Haiyang Cui et al. gmx solvate -cp BSLA_WT_box_DMSO.gro -cs spc216.gro -maxsol 3673 -p BSLA_DMSO.top -o BSLA_WT_box_DMSO_water.gro
2.2.6 Neutralization of the System
Before continuing with the dynamics, the net charge of the simulation system needs to be neutralized to prevent artifacts that would arise as a side effect caused by the periodic boundary conditions used in the simulation. A net charge would result in electrostatic repulsion between neighboring periodic images. Certain numbers of positive sodium ions or negatively chloride ions will be added to the system by replacing the water group (“SOL”) to achieve neutralization. To prepare the neutralization step, we use grompp to assemble the simulation parameter (.mdp), structure (.gro), and topology file (.top): gmx grompp -f em-sol-pme.mdp -c BSLA_WT_box_DMSO_water.gro -p BSLA_DMSO.top -o BSLA_WT_box_DMSO_water_ion.tpr -maxwarn 1
Then, the resulting system has a net charge of zero by randomly replacing solvent molecules (group 15, water) with monoatomic ions (see Note 9): echo 15 | gmx genion -s BSLA_WT_box_DMSO_water_ion.tpr -o BSLA_WT_box_DMSO_water_ion.gro -neutral -pname NA -nname CL -p BSLA_DMSO.top
2.2.7 Energy Minimization of Protein in OS Cosolvent
The OS solvated and electroneutral system is assembled now. To ensure there is no steric clashes or inappropriate geometry in the system, additional energy minimization step is necessary to relax the structure (see Note 5): gmx grompp -f em-sol-pme.mdp -c BSLA_WT_box_DMSO_water_ion. gro -p BSLA_DMSO.top -o em-sol.tpr -maxwarn 1 gmx mdrun -v -deffnm em-sol
2.2.8 Position-Restraint Equilibration
Position-restraint equilibration is often conducted into two phases: NVT (constant number of particles, volume, and temperature) and NPT (constant number of particles, pressure, and temperature). Both short simulations are performed with harmonic position restraints on the heavy protein atoms with the file posre.itp (generated by pdb2gmx in Subheading 2.2.2). This allows the DMSO cosolvent to equilibrate around the BSLA without disturbing the protein structure. NVT ensemble is also referred to as “isothermalisochoric” or “canonical.” Although the timeframe 1 fs or 2 fs is generally applied, the proper timeframe for such a procedure is contingent on the contents of the system. Typically, 50–100 ps was sufficient to reach a plateau at the desired temperature value set in the .mdp file (e.g., T ¼ 298 K) (see Note 5).
Using Molecular Simulation to Guide Protein Engineering for Biocatalysis. . .
187
To perform the NVT equilibration, the following commands could be used. gmx grompp -f nvt-pr-md.mdp -c em-sol.gro -p BSLA_DMSO.top -o nvt-pr.tpr -maxwarn 1 -r em-sol.gro gmx mdrun -v -deffnm nvt-pr
Prior to MD production run, the pressure (e.g., 1 bar) of the system must be also stabilized by NPT equilibration, which is called the “isothermal-isobaric” ensemble as well. Following commands could be used to run NPT (see Note 5): gmx grompp -f npt-pr-md.mdp -c nvt-pr.gro -r nvt-pr.gro -t nvt-pr.cpt -p BSLA_DMSO.top -o npt-pr.tpr -maxwarn 1 gmx mdrun -v -deffnm npt-pr
2.2.9 Production Simulation Run
After completing the two equilibration steps, the BSLA-DMSO solvent system is now well-equilibrated at the desired temperature and pressure. The position restraints can be released and prepared for the MD production run with the following command (see Notes 5 and 10): gmx grompp -f npt-pr-mdrun.mdp -c npt-pr.gro -r npt-pr.gro -t npt-pr.cpt -p BSLA_DMSO.top -o npt-nopr.tpr -maxwarn 1 gmx mdrun -v -deffnm npt-nopr
2.3 Analysis of Key Observables and the Obtained Knowledge for Designing Better Biocatalysis in OSs
The observables yielded from the MD simulation trajectory can precisely describe the protein status in OSs related to the enzymatic function. Although the protein-OS interaction is pieces in a complex puzzle, each specific observable still can provide the potential messages to guide better biocatalysis in OSs. In total, 11 key observables are represented as fingerprints to characterize the dynamics of the BSLA in DMSO at two different aspects (structure-based and solvation-based in Table 1). The successful BSLA case about integrating the obtained molecular knowledge with protein engineering was briefly discussed as an example.
2.3.1 MD Simulation Trajectory Preprocessing
To clearly and precisely perform all the trajectory analysis, a new comprehensive index file was firstly generated as follows (see Note 11): gmx make_ndx -f npt-pr.tpr -o index_file.ndx 99% ee) were subsequently oxidized by the action of FMN-dependent (S)-enantioselective α-hydroxyacid oxidase (α-HAO) from Aerococcus viridans bearing mutation A95G [4] at the expense of oxygen as electron acceptor, thereby releasing hydrogen peroxide and delivering α-ketoacids as final products in up to quantitative conversion (Scheme 1). Overall, the cascade is driven by the use of sub-stoichiometric (catalytic) hydrogen peroxide, which is consumed in the first step and regenerated in the second step. This atom-efficient cascade is characterized by the formation of water as only by-product and the use of air as oxidant (see Note 1). Both enzymes were employed as purified protein fractions and conversion could be conveniently monitored by gas chromatography.
2 2.1
Materials Cloning
1. pDB-HisGST vector harboring a N-terminal His6-tag followed by a GST-tag: The plasmid used for cloning of P450SPα was obtained from the DNASU plasmid repository (Berkeley Structural Genomics Center) [5, 6]. 2. pET28a(+). 3. Synthetic gene coding for P450SPα from Sphingomonas paucimobilis [3] flanked with NdeI and XhoI restriction sites and codon optimized for expression in E. coli. Uniprot accession number: O24782. 4. Synthetic gene coding for (S)-α-HAO [4] from Aerococcus viridans bearing mutation A95G flanked with NdeI and XhoI
Enzymatic Oxidative Cascade for Oxofunctionalization of Fatty Acids in One-Pot
323
restriction sites and codon optimized for expression in E. coli. Uniprot accession number for the wild-type enzyme: Q44467 (GenBank accession number: D50611.1). 5. E. coli NEB5α cells. 6. E. coli BL21 (DE3) cells. 7. Commercial S.O.C. medium. 8. Lysogeny Broth (LB)/Agar: Dissolve 10 g tryptone, 5 g NaCl, 5 g yeast extract, 15 g agar (in case of agar plates) in 1 L dH2O and autoclave for 15 min at 121 C. 9. Plasmid QIAprep Spin Miniprep kit. 10. 50 mg/mL Kanamycin stock solution: Dissolve 0.5 g kanamycin in 10 mL autoclaved dH2O. Filter sterilize with a 0.22μm syringe filter and store in 1 mL aliquots at 20 C. 11. 60% (v/v) Sterile glycerol stock: Mix 120 mL of glycerol and 80 mL ddH2O, and autoclave the mixture. 2.2 Protein Over-Expression and Purification
1. Terrific Broth (TB) medium: Dissolve 12 g tryptone, 24 g yeast extract, and 4 g glycerol in 900 mL dH2O and autoclave for 15 min at 121 C. Then prepare 10 TB salts by dissolving 2.31 g KH2PO4 and 12.54 g K2HPO4 in dH2O to a final volume of 100 mL and autoclave for 15 min at 121 C. Let cool down to room temperature, then mix 900 mL of TB medium with 100 mL of 10 TB salts to obtain 1 L of TB. 2. Trace element solution [7]: Dissolve 0.5 g CaCl2·2H2O, 0.18 g ZnSO4·7H2O, 0.1 g MnSO4·H2O, 20.1 g Na-EDTA, 16.7 g FeCl3·6H2O, 0.16 g CuSO4·5H2O, 0.18 g CoCl2·6H2O in 1 L ddH2O. Filter sterilize with a 0.22μm syringe filter and store at 4 C. 3. 50 mg/mL Kanamycin stock solution: See item 10 in Subheading 2.1. 4. 0.5 M δ-Aminolevulinic acid (5-ALA) stock solution: Dissolve 655.7 mg of δ-aminolevulinic acid in 10 mL autoclaved dH2O. Filter sterilize with a 0.22μm syringe filter and store in 1 mL aliquots at 20 C. 5. 1 M Isopropyl β-D-1-thiogalactopyranoside (IPTG) stock solution: Dissolve 2.38 g of IPTG in 8 mL of dH2O, then bring to a final volume of 10 mL with autoclaved dH2O. Filter sterilize with a 0.22μm syringe filter and store in 1 mL aliquots at 20 C. 6. 100 mM Potassium phosphate buffer with pH 8.0: Dissolve 16.28 g of K2HPO4 and 888 mg of KH2PO4 into 800 mL dH2O, adjust the pH, if necessary, to 8.0 then bring to a final volume of 1 L with dH2O. Filter sterilize the buffer using a 0.2μm filter paper, degas and store at 4 C (see Note 2).
324
Somayyeh Gandomkar and Me´lanie Hall
7. Resuspension buffer for P450SPα (100 mM potassium phosphate buffer with pH 8.0, containing 100 mM NaCl, 0.8% (w/v) cholate, 1 mM PMSF (phenylmethylsulfonyl fluoride) and 15% (v/v) glycerol): Dissolve 2.92 g NaCl, 4 g cholate, 87.1 mg PMSF, and 75 mL glycerol in 400 mL of 100 mM potassium phosphate buffer with pH 8.0, adjust the pH, then bring to a final volume of 500 mL with 100 mM potassium phosphate buffer with pH 8.0. Filter sterilize the buffer with a 0.2μm filter paper, degas and store at 4 C. 8. Lysis buffer for P450SPα (100 mM potassium phosphate buffer with pH 8.0, containing 100 mM NaCl, 0.8% (w/v) cholate, 1 mM PMSF, 10 mM imidazole, and 15% (v/v) glycerol): Dissolve 2.92 g NaCl, 4 g cholate, 87.1 mg PMFS, 340 mg imidazole, and 75 mL glycerol in 400 mL of 100 mM potassium phosphate buffer with pH 8.0, adjust the pH and bring to a final volume of 500 mL with 100 mM potassium phosphate buffer with pH 8.0. Filter sterilize the buffer with a 0.2μm filter paper, degas and store at 4 C. 9. Binding buffer for P450SPα (100 mM potassium phosphate buffer with pH 8.0, containing 100 mM NaCl, 0.8% (w/v) cholate, 1 mM PMSF, 30 mM imidazole, and 15% (v/v) glycerol): Dissolve 2.92 g NaCl, 4 g cholate, 87.1 mg PMFS, 1.02 g imidazole, and 75 mL glycerol in 400 mL of 100 mM potassium phosphate buffer with pH 8.0, adjust the pH and bring to a final volume of 500 mL with 100 mM potassium phosphate buffer with pH 8.0. Filter sterilize the buffer with a 0.2μm filter paper, degas and store at 4 C. 10. Elution buffer for P450SPα (100 mM potassium phosphate buffer with pH 8.0, containing 100 mM NaCl, 0.8% (w/v) cholate, 1 mM PMSF, 250 mM imidazole, and 15% (v/v) glycerol): Dissolve 2.92 g NaCl, 4 g cholate, 87.1 mg PMFS, 8.51 g imidazole, and 75 mL glycerol in 400 mL of 100 mM potassium phosphate buffer with pH 8.0, adjust the pH and bring to a final volume of 500 mL with 100 mM potassium phosphate buffer with pH 8.0. Filter sterilize the buffer with a 0.2μm filter paper, degas and store at 4 C. 11. Storage buffer for P450SPα (100 mM potassium phosphate buffer with pH 7.4 containing 15% glycerol): Dissolve 12.11 g of K2HPO4 and 4.14 g of KH2PO4 into 800 mL dH2O. Add 150 mL of glycerol, adjust the pH, if necessary, to 7.4 and bring to a final volume of 1 L using dH2O. 12. 0.22μm and 0.45μm syringe filters. 13. 0.2μm, 47 mm membrane disc filters. 14. VIVASPIN tubes (MWCO 10 kDa). 15. PD-10 Desalting Columns.
Enzymatic Oxidative Cascade for Oxofunctionalization of Fatty Acids in One-Pot
325
16. 50 mM FMN stock solution: Dissolve 0.46 g FMN (riboflavin 50 -monophosphate sodium salt hydrate) in 20 mL autoclaved dH2O. Filter sterilize with a 0.22μm syringe filter and store in 1 mL aliquots at 20 C. 17. 50 mM Potassium phosphate buffer with pH 7.5: Dissolve 6.41 g of K2HPO4 and 1.80 g of KH2PO4 into 800 mL dH2O, adjust the pH, if necessary, to 7.5 and bring to a final volume of 1 L. Filter sterilize the buffer with a 0.2μm filter paper, degas and store at 4 C. 18. Lysozyme from chicken egg. 19. Lysis buffer for (S)-α-HAO (50 mM potassium phosphate buffer with pH 7.5, 50 mM imidazole, and 1 mg/mL lysozyme): Dissolve 1.70 g imidazole in 450 mL of 50 mM potassium phosphate buffer with pH 7.5, adjust the pH to 7.5, then bring to 500 mL using 50 mM potassium phosphate buffer with pH 7.5. Filter sterilize the buffer with a 0.2μm filter paper, degas and store at 4 C. Add the proper amount of lysozyme to the desired volume to have a 1 mg/mL final concentration (as an example, dissolve 20 mg of lysozyme in 20 mL lysis buffer and always prepare fresh before use). 20. Binding buffer for (S)-α-HAO (50 mM potassium phosphate buffer with pH 7.5, 20 mM imidazole): Dissolve 0.68 g imidazole in 450 mL of 50 mM potassium phosphate buffer with pH 7.5, adjust the pH to 7.5, then bring to 500 mL using 50 mM potassium phosphate buffer with pH 7.5. Filter sterilize the buffer with a 0.2μm filter paper, degas and store at 4 C. 21. Elution buffer for (S)-α-HAO (50 mM potassium phosphate buffer with pH 7.5, 400 mM imidazole): Dissolve 13.62 g imidazole in 450 mL of 50 mM potassium phosphate buffer with pH 7.5, adjust the pH to 7.5, then bring to 500 mL using 50 mM potassium phosphate buffer with pH 7.5. Filter sterilize the buffer with a 0.2μm filter paper, degas and store at 4 C. 22. Storage buffer for (S)-α-HAO (50 mM potassium phosphate buffer with pH 7.5, 50 mM KCl): Dissolve 1.86 g KCl in 500 mL 50 mM potassium phosphate buffer with pH 7.5. 23. Commercial 10% SDS-PAGE with MOPS (3-(N-morpholino) propanesulfonic acid) running buffer. 24. Sodium dithionite. 25. Laminar flow cabinet. 26. 5 mL His-Trap™ FF column. 2.3
Activity Assay
1. 100 mM Potassium phosphate buffer with pH 7.0: Dissolve 9.34 g of K2HPO4 and 6.31 g of KH2PO4 into 800 mL dH2O, adjust the pH, if necessary, to 7.0 and bring to a final volume of 1 L using dH2O.
326
Somayyeh Gandomkar and Me´lanie Hall
2. Mixture of 3,5-dichloro-2-hydroxybenzenesulfonic acid (1 mM DCHBS) and 4-aminoantipyrine (0.1 mM AAP) stock solution: Dissolve 2.7 mg DCHBS and 0.2μL AAP in 10 mL of 100 mM potassium phosphate buffer with pH 7.0. 3. 100 mM rac-Lactic acid stock solution: Dissolve 90.1 mg raclactic acid in 10 mL of 100 mM potassium phosphate buffer with pH 7.0. 4. 5 mg/mL Horseradish peroxidase (HRP) stock solution: Dissolve 50 mg HRP in 10 mL of 100 mM potassium phosphate buffer with pH 7.0, store in 1 mL aliquots at 20 C. 5. Catalase from bovine liver (commercial lyophilized powder, 1600 U/mg). 6. 50 mM FMN stock solution: See item 16 in Subheading 2.2. 7. 100 mM Potassium phosphate buffer with pH 7.4: See item 11 in Subheading 2.2, without glycerol. 8. 100 mM H2O2 stock solution in 100 mM potassium phosphate buffer with pH 7.4: Mix 20.4μL H2O2 (commercial 30% (v/v) solution in H2O) with 1.796 mL of 100 mM potassium phosphate buffer with pH 7.4 to obtain 2 mL stock solution (see Note 3). To reach defined H2O2 concentration in 1 mL reaction mixture: For example, 0.5 mM, use 5μL from 100 mM H2O2 stock, or for 3 mM, use 30μL from 100 mM H2O2 stock in 1 mL buffer. 9. CO-titration equipment [8, 9]. 2.4 Biotransformation in Cascade
1. 100 mM Fatty acid stock solution: Dissolve 79.3μL of octanoic acid in 5 mL EtOH. 2. 100 mM H2O2 stock solution in 100 mM potassium phosphate buffer with pH 7.4: See item 8 in Subheading 2.3. 3. 100 mM Potassium phosphate buffer with pH 7.4: See item 11 in Subheading 2.2, without glycerol. 4. Protein solutions of P450SPα and (S)-α-HAO: Obtained according to Subheading 3.2. 5. 50 mM FMN stock solution: See item 16 in Subheading 2.2. 6. 4 mL screw cap glass vials.
2.5 Analysis of Biotransformations: Compound Extraction and Derivatization
1. 100 mM Reference compound stock solutions: Dissolve 79.3μL of octanoic acid, or 80.1 mg of 2-hydroxyoctanoic acid, or 79 mg of 2-oxooctanoic acid, in 5 mL EtOH. 2. 100 mM Potassium phosphate buffer with pH 7.4: See item 11 in Subheading 2.2, without glycerol. 3. Various standard samples from fatty acid, α-hydroxyacid and α-ketoacid product with concentration of 0.5 mM, 1 mM,
Enzymatic Oxidative Cascade for Oxofunctionalization of Fatty Acids in One-Pot
327
2 mM, 3 mM, 5 mM, 8 mM, and 10 mM in 100 mM potassium phosphate buffer with pH 7.4, with 10% (v/v) EtOH as co-solvent: As example for 0.5 mM standard sample, add 5μL from 100 mM stock solution made in EtOH and 95μL EtOH to 900μL of 100 mM potassium phosphate buffer with pH 7.4 or for 5 mM standard sample, add 50μL from 100 mM stock solution made in EtOH and 50μL EtOH to 900μL of 100 mM potassium phosphate buffer with pH 7.4. 4. BSTFA/TMCS 99:1 (N,O-Bistrifluoroacetamide/trimethylsilyl chloride)/pyridine solution (1:1): Mix 2.5 mL of BSTFA/ TMSC 99:1 (commercial solution) with 2.5 mL pyridine. 5. 5 mM Extracting solution spiked with dodecanoic acid as internal standard: Dissolve 1 g dodecanoic acid in 1 L ethyl acetate. 6. 2% Aqueous HCl solution: Add 200μL conc. HCl into 9.8 mL H2O. 7. MeOH containing 5% DMAP (4-(dimethylamino)pyridine): Dissolve 5 g of DMAP in 100 mL of MeOH. 8. Ethyl chloroformate. 9. GC-MS: 7890A GC System (Agilent Technologies, Santa Clara, CA, USA), equipped with a 5975C mass selective detector and an HP-5MS column (5% phenylmethylsiloxane, 30 m 320μm 0.25μm, J&W Scientific, Agilent Technologies) using He as carrier gas. Injector temperature: 250 C; Injection volume: 1μL; Flow rate: 0.7 mL/min; Temperature program 1: 100 C, hold time 0.5 min, 10 C/min to 300 C; EI mode, energy 70 eV, MS Source: 230 C, MS Quadrupole: 150 C. 10. Achiral GC: Agilent Technologies 7890 A GC system equipped with an FID-detector and a 7693A Injector in combination with a 7693 Series Autosampler and using a HP-5 column (30 m 320μm 0.25μm, J&W Scientific, Agilent Technologies) using He as carrier gas. Injector temperature: 250 C; Injection volume: 5μL; Flow rate: 0.7 mL/min; Temperature program 1: 100 C, hold time 0.5 min, 10 C/min to 300 C. 11. Chiral GC: Agilent Technologies 7890 A GC system equipped with an FID-detector and a 7693A Injector in combination with a 7693 Series Autosampler and using a Chirasil ChiralDexCB column (25 m 320μm 0.25μm) using H2 as carrier gas. Injector temperature: 250 C; Injection volume: 1μL; Flow rate: 1.3 mL/min; Detector temperature: 250 C; Temperature program 2: 100 C, hold time 1 min, 10 C/min to 130 C, hold time 5 min, 10 C/min, 180 C, hold time 1 min.
328
3 3.1
Somayyeh Gandomkar and Me´lanie Hall
Methods Cloning
3.1.1 P450SPα
1. Use the pDB-HisGST vector as expression vector (see Note 4). 2. Insert the gene coding for P450SPα [3] into pDB-HisGST vector using NdeI and XhoI restriction sites according to standard molecular biology protocol. 3. For amplification, transform the recombinant plasmid into E. coli NEB5α cell lines using a heat shock according to standard molecular biology protocol. 4. For transformation, mix 1μL of plasmid DNA (100 ng/μL) gently with 50μL of E. coli NEB5α cell suspension (see Note 5). Incubate the mixture on ice for 30 min. Afterwards, incubate the mixture at 42 C for 30 s and next on ice for 5 min. Then add 250μL of S.O.C. medium to the transformed cell suspension and incubate at 37 C for 1 h. Plate aliquots of transformants on LB agar plates containing 50μg/mL kanamycin antibiotic and cultivate overnight at 37 C. 5. Next day, use colonies for preparing overnight cultures (ONCs) and further plasmid isolation. Cultivate pre-cultures in 2 mL LB medium containing 50μg/mL kanamycin at 37 C (15 h, 120 rpm). 6. Isolate the plasmids using QIAprep Spin Miniprep Kit. 7. Sequence the isolated plasmids to confirm formation of the desired construct. 8. Transform the plasmids into E. coli BL21 (DE3) cells. For transformation, mix 1μL of plasmid DNA (100 ng/μL) gently with 100μL of E. coli BL21 (DE3) cell suspension (see Note 6). Incubate the mixture on ice for 30 min. Afterwards, incubate the mixture at 42 C for 10 s and next on ice for 5 min. Then add 250μL of S.O.C. medium (pre-warmed) to the transformed cell suspension and incubate at 37 C for 1 h. Plate aliquots of transformants on LB agar plates containing 50μg/ mL kanamycin and cultivate overnight at 37 C. 9. Prepare the ONCs of the transformants by mixing 10 mL LB medium and 10μL from 50 mg/mL kanamycin stock solution (50μg/mL final concentration) in 50 mL tubes and incubate at 37 C and shake at 120 rpm overnight. 10. Prepare the glycerol stocks from the ONCs by mixing 500μL of the ONC with 500μL of 60% sterile glycerol stock and store at 20 C or 80 C. Use the ONCs for inoculating the main culture.
Enzymatic Oxidative Cascade for Oxofunctionalization of Fatty Acids in One-Pot 3.1.2 (S)-α-HAO A95G from Aerococcus viridans
329
1. Clone the gene coding for the protein bearing mutation A95G [4] in pET28a(+) using restriction sites NdeI and XhoI according to standard molecular biology protocol. 2. For amplification, transform the recombinant plasmid into E. coli NEB5α cell lines using heat shock according to standard molecular biology protocol. 3. Perform the transformation of E. coli NEB5α: See step 4 in Subheading 3.1.1. 4. Perform the transformation of E. coli BL21 (DE3), cultivation of transformants, and preparation of glycerol stocks: See steps 5–10 in Subheading 3.1.1.
3.2 Protein Over-Expression and Purification 3.2.1 P450SPα
1. For growing the cells, autoclave 1 L baffled shaking flasks and fill them with 330 mL of autoclaved TB medium. 2. Prepare overnight culture form glycerol stock. For that purpose, mix 10 mL TB medium, 10μL from 50 mg/mL kanamycin stock solution (50μg/mL final concentration), and 5μL of glycerol stock of P450SPα in 50 mL tubes under laminar flow and incubate at 37 C and shake at 120 rpm overnight. 3. For the main culture, add 2 mL of overnight culture into each flask containing 330μL of TB medium, 330μL of 50 mg/mL kanamycin stock solution (50μg/mL final concentration), and 330μL of trace elements solution (see Note 7). Incubate the flasks at 37 C and 120 rpm. 4. At an OD600 of 0.6–0.8, cool down the cultures to room temperature, then add 330μL of 0.5 M δ-aminolevulinic acid stock solution (0.5 mM final concentration) to each flask. Incubate the flasks at 20 C till OD600 of 1.0, then induce by adding 33μL of 1 M IPTG stock (0.1 mM final concentration) and shake the culture overnight at 20 C and 120 rpm. 5. The next day, harvest the cells by centrifugation (12040 g, 20 min, 4 C). Resuspend the cell pellets in 100 mM potassium phosphate buffer with pH 7.4 (use 10 mL buffer per g pellet), centrifuge (12040 g, 20 min, 4 C) and discard the supernatant. For determination of over-expression level, follow steps 6–8 below, for protein purification and determination of protein activity and concentration, follow steps 9–17. 6. After this washing step, resuspend the pellets in 100 mM potassium phosphate buffer with pH 8.0, 100 mM NaCl, 0.8% (w/v) cholate, 1 mM PMSF, and 15% (v/v) glycerol (use 10 mL buffer/g pellet) (see Note 8). 7. Disrupt the cells by ultrasonicating the suspension on ice [(30% amplitude, 2 s pulse on, 4 s pulse off for 2 min) 2], then centrifuge (18800 g, 20 min, 4 C) to remove cell debris.
330
Somayyeh Gandomkar and Me´lanie Hall
Fig. 1 SDS-PAGE of samples of N-terminal His-tagged GST-fused P450SPα using pDB-HisGST vector. (Std: PageRuler Prestained protein ladder, lane 1: pellet fraction, lane 2: supernatant fraction). MW ~ 73 kDa (P450SPα ~ 46 kDa + HisGST tag ~ 27 kDa)
8. Analyze the expression level of P450SPα from the supernatant and the pellet samples by SDS-PAGE (Fig. 1). The molecular weight of the resulting His-tagged GST-fused P450SPα is about 73 kDa. 9. In order to purify the N-terminal His-tagged GST-fused P450SPα, resuspend the pellets obtained from step 5 in lysis buffer (use 10 mL buffer per g pellet) and incubate for 2 h on ice. 10. Ultrasonicate the suspension [(30% amplitude, 2 s pulse on, 4 s pulse off for 2 min) 2] and centrifuge again (38,800 g, 20 min, 4 C). Filter sterilize the sample using 0.45μm syringe filter. 11. Purify the enzyme by using a 5 mL His-Trap™ FF column. Wash the column first with 50 mL H2O then with 50 mL binding buffer. Afterwards, load the sample onto the column and wash again with 50 mL binding buffer to remove protein impurities. 12. Elute the enzyme by using 10–15 mL of elution buffer.
Enzymatic Oxidative Cascade for Oxofunctionalization of Fatty Acids in One-Pot
331
13. Concentrate the eluted fractions by using VIVASPIN tubes (MWCO 10 kDa, 4688 g at 4 C). 14. After concentration, desalt the samples by using PD-10 desalting columns, and elute with storage buffer and store at 4 C (see Note 9). 15. Before applying the enzyme in the biotransformations, remove glycerol from the enzyme solution by centrifugation using VIVASPIN tubes (MWCO 10 kDa, 4688 g at 4 C) and addition of fresh reaction buffer (100 mM potassium phosphate buffer with pH 7.4) (see Note 10). 16. Measure both activity and concentration of P450SPα via analysis of reduced CO difference spectra [8, 9]. The titration relies on the high affinity for CO of the Fe(II) atom bound to the heme. Prepare a known dilution (1:5 or 1:10) from P450SPα solution with the same buffer used for enzyme preparation, put the sample in 1.5 mL Eppendorf tube and add sodium dithionite (tip of spatula, blank sample), shake the tube and centrifuge for few seconds in case the solution was not clear, transfer the sample into a cuvette and measure the blank in a spectrophotometer scanning from 360 nm to 800 nm wavelengths. Take back the cuvette and bubble CO in for about 30–50 s till the cuvette is filled with bubbles and perform a scan again with this sample. Use the following equation (Lambert–Beer law) to determine the concentration of active enzyme (ε ¼ 91 mM1 cm1). C ½μM ¼
Abs448 Abs500 Dilution factor 1000 ε
17. Typically, 4 mL of enzyme solution is obtained (38.85μM). 3.2.2 (S)-α-HAO A95G from Aerococcus viridans
1. Prepare overnight culture from glycerol stock. For that purpose, mix 10 mL LB medium, 10μL from 50 mg/mL kanamycin stock solution (50μg/mL final concentration), and 10μL glycerol stock under laminar flow in 50 mL tubes and incubate at 37 C and shake at 140 rpm overnight. 2. For growing the cells, autoclave 1 L baffled shaking flasks and fill them with 100 mL of autoclaved TB medium, 100μL from 50 mg/mL kanamycin stock solution (50μg/mL final concentration), and 1 mL of the pre-culture and incubate at 37 C and 140 rpm. 3. At an OD600 of 1, induce by adding 50μL of 1 M IPTG stock solution (0.5 mM final concentration) and incubate the flasks for 20 h at 30 C and 140 rpm. 4. In the next day, harvest the cells by centrifugation (3000 g, 15 min, 4 C). Discard the supernatant and wash the pellets with 100 mM potassium phosphate buffer with pH 7.5 and freeze the pellets at 20 C until further use.
332
Somayyeh Gandomkar and Me´lanie Hall
5. Thaw the pellets on ice and resuspend in 20 mL lysis buffer, then incubate on ice for 2 h. 6. Before the sonication, add FMN (4μL from 50 mM FMN stock solution to the pellets in 20 mL lysis buffer, 10μM final concentration) and disrupt by sonication (amplitude 20%, 4 s on, 4 s off, total time 5 min). 7. Remove the cell debris by ultracentrifugation (14000 g, 20 min, 4 C). 8. Eliminate residual particles by pressing the supernatant through a sterile 0.45μm syringe filter. 9. For purification, use a 5 mL His-Trap™ FF column. Wash the column first with 50 mL autoclaved ddH2O, then 50 mL binding buffer, afterwards load the supernatant on the column. 10. Elute the protein impurities by washing the column with 50 mL binding buffer. 11. Elute the target enzyme with 20 mL elution buffer. 12. Use a PD-10 desalting column to exchange the buffer from elution buffer to the storage buffer. 13. To confirm protein purity, load all samples on 10% SDS-PAGE (Fig. 2). The molecular weight of His-tagged (S)-HAO is about 43 kDa. 14. Typically, 4 mL of (S)-HAO enzyme solution is obtained (15.36 mg/mL).
Fig. 2 SDS-PAGE from purification of (S)-α-HAO. Std: PageRuler Prestained protein ladder, lane 1: cell free lysate, lanes 2–3: flow through, lane 4: washing fraction, lanes 5–11: elution fractions. MW ~ 43 kDa ((S)-α-HAO ~ 41 kDa + His-tag + amino acids from plasmid)
Enzymatic Oxidative Cascade for Oxofunctionalization of Fatty Acids in One-Pot
3.3
Activity Assay
3.3.1 P450SPα
333
1. To evaluate the hydroxylation activity of P450SPα, apply the tagged enzyme in the hydroxylation of octanoic acid (1) (Scheme 1, reaction 1!(S)-2). 2. Reaction mixture composition for single oxidation step: Mix 100μL from 100 mM octanoic acid stock solution in EtOH (10 mM final concentration of octanoic acid and 10% (v/v) EtOH), 64μL from a typical enzyme solution (see step 17 in Subheading 3.2.1, 2.5μM final concentration), 30μL of 100 mM H2O2 stock solution in buffer (3.0 mM final concentration), and 100 mM potassium phosphate buffer with pH 7.4 to complete to 1 mL. 3. Perform the reaction in 1 mL scale in closed glass vials at 170 rpm for 24 h. Analyze the samples, after extraction and derivatization, by GC and GC-MS.
3.3.2 (S)-α-HAO A95G from Aerococcus viridans
1. Measure the oxidase activity at room temperature using a horseradish peroxidase (HRP)-coupled assay containing 3,5-dichloro-2-hydroxybenzenesulfonic acid (DCHBS) and 4-aminoantipyrine (AAP) as chromogenic substrates (HRP-AAP/DCHBS assay) in combination with lactic acid as substrate [10]. The oxidation of lactic acid by (S)-α-HAO releases H2O2, which formation is monitored by the assay. 2. Assay mixture composition: Add 200μL of 0.1 mM AAP and 1 mM DCHBS stock solution, 20μL of 5 mg/mL HRP, 10μL of 100 mM substrate stock solution (rac-lactic acid) to 680μL 100 mM potassium phosphate buffer with pH 7.0. Add 50μL of purified (S)-α-HAO (see step 14 in Subheading 3.2.2) directly before measuring absorption at 515 nm (ε515 ¼ 26 mM1 cm1). 3. Test the oxidase activity of (S)-α-HAO in the oxidation of rac2-hydroxyoctanoic acid (2) (Scheme 2) in 1 mL scale in closed glass vials at 170 rpm for 24 h at room temperature. 4. Reaction mixture composition with (S)-α-HAO: Mix 100μL from 100 mM rac-2-hydroxyoctanoic acid stock solution in
Scheme 2 Oxidation of rac-2-hydroxyoctanoic acid (2) with (S)-α-HAO (stoichiometry not indicated for clarity: 2H2O2!O2 + 2H2O)
334
Somayyeh Gandomkar and Me´lanie Hall
EtOH (10 mM final concentration and 10% (v/v) EtOH), 33μL or 65μL, respectively, of a solution of (S)-α-HAO (see step 14 in Subheading 3.2.2, 0.5 mg/mL and 1.0 mg/mL final concentration, respectively), 80 U/mL catalase, 2μL from 50 mM FMN stock solution (0.1 mM final concentration) in total volume 1 mL with 100 mM potassium phosphate buffer with pH 7.4 (see Note 11). Incubate the samples for 24 h at room temperature and 170 rpm. 5. Analyze the samples, after extraction and derivatization, by GC and GC-MS. 3.4 Biotransformation 3.4.1 Cascade
1. Perform the enzymatic oxidation of octanoic acid to 2-oxooctanoic acid by employing P450SPα and (S)-α-HAO in cascade according to the following procedure (Scheme 1, reaction 1!3). 2. Perform typical reaction in 1 mL (final volume) 100 mM potassium phosphate buffer with pH 7.4, containing 129μL of a solution of P450SPα (HisGST-tagged enzyme, see step 17 in Subheading 3.2.1, 5μM final concentration), 100μL from 100 mM octanoic acid stock solution (10 mM final concentration, 10% (v/v) EtOH), 2μL from 50 mM FMN stock solution (0.1 mM final concentration), 65μL of a solution of (S)-α-HAO (see step 14 in Subheading 3.2.2, 1 mg/mL final concentration) and H2O2 at varying concentrations (0.5 mM to 3 mM) using 5–30μL of 100 mM H2O2 stock solution. 3. Perform the cascade reactions in duplicates in 4 mL closed glass vials at room temperature and shake for 24 h at 170 rpm vertically. Analyze the samples, after extraction and derivatization, by GC and GC-MS.
3.4.2 Scale-up Experiments
1. Scale up the cascade by performing the reaction in 35 1 mL 100 mM potassium phosphate buffer with pH 7.4: See step 2 in Subheading 3.4.1. 2. After 5 h reaction time, add an additional 1.0 mM H2O2 (using 10μL of 100 mM H2O2 stock solution) to the reaction mixture and shake for 17 h at room temperature, vertical shaking, at 170 rpm (see Note 12). 3. Work up the reactions by combining all samples in one vessel. Acidify using 2% aq. HCl solution (3 mL), then extract with ethyl acetate (3 20 mL) through centrifugation (3 10 min at 4688 g, 20 C) for phase separation. 4. Dry the combined organic phases over anhydrous Na2SO4 and evaporate the solvent under reduced pressure using a rotavapor. 5. No further purification is needed. 50 mg (91% isolated yield) of 2-oxooctanoic acid is obtained as an oil with a purity of 93%, as determined by 1H-NMR.
Enzymatic Oxidative Cascade for Oxofunctionalization of Fatty Acids in One-Pot
335
Fig. 3 Silylated compounds obtained from derivatization procedure of octanoic acid 1, 2-hydroxyoctanoic acid 2, and 2-oxooctanoic acid 3 (keto and enol forms) 3.5 General Procedure for Analysis of Biotransformations 3.5.1 Achiral GC Measurements for Determination of Conversion
1. Following biotransformations, extract the samples, derivatize and measure on GC and GC-MS. 2. Silylation is found to be the most suitable derivatization method before achiral GC and GC-MS analysis, leading to quantitative yield to corresponding silylated substrate, hydroxy- and oxo-products (Fig. 3) (see Note 13). 3. To this end, add 100μL of 2% aq. HCl solution to 1 mL aqueous buffer solution (blank or enzymatic reaction mixture or standard) before extraction. 4. Perform the extraction with ethyl acetate (2 0.5 mL) containing 5 mM dodecanoic acid as internal standard. Dry the combined organic phases over anhydrous Na2SO4. 5. Afterwards, mix 100μL of the dried organic phase with 200μL of 1:1 (BSTFA/TMCS 99:1)/pyridine solution and incubate for 2 h at room temperature. Analyze the samples directly on GC (Figs. 4, 5, and 6) and GC-MS. Retention times on GC are as follows: derivatized octanoic acid 6.99 min, derivatized 2-hydroxyoctanoic acid 9.22 min, derivatized 2-oxooctanoic acid 8.06 min and 9.52 min (keto and enol form) and derivatized internal standard, dodecanoic acid 11.38 min. 6. Prepare the calibration curves using standard samples of the substrate, hydroxyacid intermediate, and the product. For that purpose, prepare various standard samples with concentration of 0.5 mM, 1 mM, 2 mM, 3 mM, 5 mM, 8 mM, and 10 mM in 100 mM potassium phosphate buffer with pH 7.4. Then extract the samples, derivatize and analyze by GC (see steps 3, 4 and 5). Generate the calibration curves by using the area of the analyte peak normalized with the area of the internal standard peak.
336
Somayyeh Gandomkar and Me´lanie Hall
Fig. 4 Achiral GC trace of derivatized octanoic acid 1 (internal standard dodecanoic acid with retention time 11.38 min)
Fig. 5 Achiral GC trace of derivatized 2-hydroxyoctanoic acid 2 (internal standard dodecanoic acid with retention time 11.38 min) 3.5.2 Chiral Measurements for Determination of ee Value of 2-Hydroxyoctanoic Acid
1. Shock-freeze the reaction samples (see step 4 in Subheading 3.3.2, 1 mL) in liquid nitrogen and lyophilize. 2. Take up the resulting residue in 700μL MeOH containing 5% DMAP, then add 150μL ethyl chloroformate and incubate for 1 h at 50 C, 700 rpm.
Enzymatic Oxidative Cascade for Oxofunctionalization of Fatty Acids in One-Pot
337
Fig. 6 Achiral GC trace of derivatized 2-oxooctanoic acid 3 with keto and enol forms (internal standard dodecanoic acid with retention time 11.38 min)
3. Remove the solvent under air flow, then add 700μL of 2% aq. HCl solution and extract with ethyl acetate (2 500μL). 4. Dry the combined organic phases over anhydrous Na2SO4 and analyze on GC equipped with a ChiralDexCB column. Retention times are as follows: derivatized (R)-2-hydroxyoctanoic acid 7.37 min, derivatized (S)-2-hydroxyoctanoic acid 7.64 min.
4
Notes 1. An alternative three-enzyme cascade was developed for the regioselective oxofunctionalization of saturated fatty acids, which relies on a non-enantioselective P450 monooxygenase (P450CLA from Clostridium acetobutylicum) in the first step. Two stereocomplementary oxidases are therefore required for the oxidation of both enantiomers of the hydroxyacid intermediates: (R)-α-HAO from Gluconobacter oxydans 621H (‘GO-LOX’) and (S)-α-HAO described here. Hydrogen peroxide can be employed in sub-stoichiometric amounts in that case too. Although overall reaction conditions and substrate scope are comparable to the procedure applied to the two-enzyme cascade, the production of a third enzyme is necessary. Details can be found in the original publication [2]. 2. It is recommended to use potassium phosphate buffer with pH 8.0 rather than pH 7.5 initially reported in the original publication [2] throughout steps 6–12 in Subheading 3.2.1.
338
Somayyeh Gandomkar and Me´lanie Hall
3. Prepare the H2O2 stock solution freshly each time before use in the biotransformation. 4. GST-tag is used to enhance soluble expression. The presence of the GST-tag does not affect the activity of P450SPα nor the yield of oxo-acid obtained as final product in the biotransformation, therefore the tag is kept onto the protein throughout. If desired, the whole N-terminal HisGST-tag can be cleaved from the purified protein by using TEV protease [2, 6]. 5. For a better transformation result, use the commercial NEB5α cells. 6. For transformation in E. coli BL21 (DE3) cells, self-made chemically competent cells can be used. 7. Do not autoclave the trace element solution. Use autoclaved dH2O for preparing the solution, then filter sterilize the solution under laminar flow. 8. For storage of pellets at 20 C, wash the pellets with 100 mM phosphate buffer with pH 7.4. For His-tag purification, resuspend the cells in lysis buffer (see step 9 in Subheading 3.2.1). 9. For storing the purified enzyme, 15% (v/v) glycerol is necessary, and the storage can be done at 4 C. Purified enzyme should not be frozen. Upon storage of purified enzyme at 4 C, the enzyme loses activity over time, therefore determination of protein activity and concentration using CO-titration should be performed each time prior to setting up the biotransformation for accurate dosing of active enzyme [8, 9]. 10. Glycerol should be removed from the enzyme preparation before use in biotransformations since glycerol interferes with GC analysis by co-eluting with analytes of the reaction. 11. Catalase is needed to remove H2O2 produced through reduction of molecular oxygen during the oxidation reaction. Accumulation of H2O2 is detrimental to protein integrity and stability of the oxo-acid product. FMN is added to ensure saturation of (S)-α-HAO with the flavin coenzyme. 12. The second addition of H2O2 is beneficial for the overall conversion due to likely spontaneous disproportionation and thus loss of H2O2 despite internal recycling, as noticed by monitoring progress of the reaction (although conversion was incomplete, the reaction stopped after a few hours) [2]. 13. Silylation of the final oxo-product delivers a mixture of derivatized oxo-acid and derivatized enol form of the oxo-acid. The combined area of both peaks is taken into account for the analysis on GC.
Enzymatic Oxidative Cascade for Oxofunctionalization of Fatty Acids in One-Pot
339
Acknowledgments The authors are supported by a grant from the Austrian Science Fund (FWF) through P32815-N. Dr. Alexander Dennig, Dr. Andela Dordic, Lucas Hammerer, and Dr. Mathias Pickl are thanked for their valuable technical expertise during this project. References 1. Schrittwieser JH, Velikogne S, Hall M et al (2018) Artificial biocatalytic linear cascades for preparation of organic molecules. Chem Rev 118:270–348 2. Gandomkar S, Dennig A, Dordic A et al (2018) Biocatalytic oxidative cascade for the conversion of fatty acids into α-ketoacids via internal H2O2 recycling. Angew Chem Int Edit 57:427–430 3. Matsunaga I, Yamada M, Kusunose E et al (1996) Direct involvement of hydrogen peroxide in bacterial alpha-hydroxylation of fatty acid. FEBS Lett 386:252–254 4. Yorita K, Aki K, Ohkumasoyejima T et al (1996) Conversion of L-lactate oxidase to a long chain alpha-hydroxyacid oxidase by sitedirected mutagenesis of alanine 95 to glycine. J Biol Chem 271:28300–28305 5. Seiler CY, Park JG, Sharma A et al (2014) DNASU plasmid and PSI: biology-materials repositories: resources to accelerate biological research. Nucleic Acids Res 42:D1253–D1260
6. Amaya JA, Rutland CD, Makris TM (2016) Mixed regiospecificity compromises alkene synthesis by a cytochrome P450 peroxygenase from Methylobacterium populi. J Inorg Biochem 158:11–16 7. Nazor J, Dannenmann S, Adjei RO et al (2008) Laboratory evolution of P450 BM3 for mediated electron transfer yielding an activity-improved and reductase-independent variant. Protein Eng Des Sel 21:29–35 8. Omura T, Sato R (1964) Carbon monoxidebinding pigment of liver microsomes. I. Evidence for its hemoprotein nature. J Biol Chem 239:2370–2378 9. Omura T, Sato R (1964) Carbon monoxidebinding pigment of liver microsomes. II. Solubilization, purification and properties. J Biol Chem 239:2379–2385 10. van Hellemond EW, van Dijk M, Heuts DPHM et al (2008) Discovery and characterization of a putrescine oxidase from Rhodococcus erythropolis NCIMB 11540. Appl Microbiol Biotechnol 78:455–463
Chapter 17 CRISPR-Cas9 Editing of the Synthesis of Biodegradable Polyesters Polyhydroxyalkanaotes (PHA) in Pseudomonas putida KT2440 Si Liu, Tanja Narancic, Chris Davis, and Kevin E. O’Connor Abstract Genome editing technologies allow us to study the metabolic pathways of cells and the contribution of each associated enzyme to various processes, including polyhydroxyalkanoate (PHA) synthesis. These biodegradable polyesters accumulated by a range of bacteria are thermoplastic, elastomeric, and biodegradable, thus have great applicative potential. However, several challenges are associated with PHA production, mainly the cost and shortcomings in their physical properties. The advances in synthetic biology and metabolic engineering provide us with a tool to improve the production process and allow the synthesis of tailor-made PHAs. CRISPR/Cas9 technology represents a new generation of genome editing tools capable of application in nearly all organisms. However, off-target activity is a crucial issue for CRISPR/ Cas9 technology, as it can cause genomic instability and disruption of functions of otherwise normal genes. Here, we provide a detailed protocol for scarless deletion of the genes implicated in PHA metabolism of Pseudomonas putida KT2440 using modified CRISPR/Cas9 systems and methodology. Key words Pseudomonas putida KT2440, Polyhydroxyalkanoate (PHA), CRISPR/Cas9, Metabolic engineering
1
Introduction Polyhydroxyalkanoates (PHAs), a family of biodegradable polyesters, were first discovered in Bacillus megaterium by Lemoigne in 1925 [1]. PHAs are accumulated as intracellular granules, called carbonosomes [2] and their primary role is described as a carbon or/and energy storage material in a wide variety of Gram-positive and Gram-negative bacteria [3–5]. PHAs are composed of repeating (R)-hydroxyalkanoyl-monomer units. Each monomer unit is covalently linked via ester bonds with a neighboring monomer (Fig. 1). About 150 different types of PHA monomers have been reported [7]. Generally, based on the total number of carbon atoms in the monomer unit, PHAs are
Francesca Magnani, Chiara Marabelli and Francesca Paradisi (eds.), Enzyme Engineering: Methods and Protocols, Methods in Molecular Biology, vol. 2397, https://doi.org/10.1007/978-1-0716-1826-4_17, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2022
341
342
Si Liu et al.
Fig. 1 Chemical structure of the PHA polymer. The chemical structure of PHA was adapted from Kootstra’s [6]. Monomers, (R)-3-hydroxyalkanoic acids are connected by ester bonds. The value of n varies from approximately 600 to 35,000
classified in two groups: short-chain-length PHA (scl-PHA), which consist of the monomer units with 3–5 carbon atoms; mediumchain-length PHA (mcl-PHA), which are made of monomer units containing 6–14 carbon atoms [8]. In vivo, PHAs behave like a mobile amorphous polymer [9]. The molecular weight of PHA polymers are usually in range of 2 105 to 3 106 Da, which also depends on growth conditions, carbon source, downstream processing techniques, and microorganism species [10, 11]. After extraction from bacterial cells, PHAs display a variety of mechanical and physical properties such as crystallinity, flexibility, and elasticity depending on monomer constituents [12, 13]. The physical properties are greatly influenced by side chain length and its functional group, like branched alkyls, aromatic, phenyl, olefins, halogens, benzoyl, cyano, etc. [4, 7, 11, 14, 15]. Generally, upon extraction, scl-PHAs are crystalline, stiff, and brittle, while mcl-PHAs are less crystalline and more flexible [7, 10, 15]. Due to their biodegradability PHAs are considered an environmentally friendly alternative to petrochemical plastic. Moreover, they can be produced by microorganisms from renewable sources, such as biomass, but also using various waste residues as feedstock [16–21]. PHB and copolymer poly(3HB-co-3 HV) have been applied in some commercial goods including food packaging, bottles, bags, containers, paper coatings, cups, agricultural films and nets, etc. [13, 22]. Furthermore, as a biodegradable and biocompatible biomaterial PHB and copolymer poly(3HB-co-3 HV) also have a wide range of pharmaceutical and biomedical applications, such as bone graft replacements, screws, scaffolds, vessel alternates, heart valves, stents, sutures, wound dressings, nerve repair, haemostats, and drug delivery [11, 23–25]. However, the application of PHAs is still limited by their high production cost and the shortcomings in physical properties (for example, PHB is too brittle and stiff). The blending of PHAs with other natural and synthesized polymers is an effective and simple way to create the materials with suitable properties extending the potential uses of these polymers. Another means to improve the physical properties of PHAs would be to control their monomer composition. Applying the tools of synthetic biology would allow
CRISPR-Cas9 Editing of the Synthesis of Biodegradable Polyesters. . .
343
us to engineer the enzymes involved in PHA metabolism, and therefore control the type of the monomer incorporated in this polyester. In addition to their use as a material, the fact that over 150 (R)hydroxyalkanoic acid monomers in PHAs were reported [26] makes PHAs a rich source of fine chemicals. The pure chiral compound (R)-3-hydroxyalkanoic acids and its derivatives can be effectively generated by chemical or enzymatic hydrolysis (depolymerization) of PHAs [27, 28]. These monomers can be used as building blocks in the production of active chemicals like pharmaceuticals, vitamins, antibiotics, pheromones, etc. [26, 29]. Mcl-PHAs are accumulated in many bacteria, mainly fluorescent pseudomonads [30]. To date, Pseudomonas putida KT2440 is one of the mostly widely studied mcl-PHA producers [31]. Three main metabolic pathways providing precursor molecules for mcl-PHA synthesis have been described in bacteria: (1) β-oxidation, which is mainly responsible for fatty acids degradation, is the main route during growth on fatty acids; (2) de novo fatty acids synthesis is the main pathway employed when PHA non-related substrates such as glucose and glycerol are used as carbon and energy source; (3) chain elongation, in which fatty acids are extended with acetyl-CoA [15, 32, 33]. β-oxidation is a cyclic pathway, in which the fatty acid chain is shortened with the release of acetyl-CoA in each cycle of the pathway with concomitant energy production (Fig. 2). (R)-specific enoyl-CoA hydratase (PhaJ) catalyzes stereospecific hydration of trans-2-enoyl-CoA to form an (R)-3-hydroxyacyl-CoA. This enzyme was identified as a key supplier of (R)-3-hydroxyacyl-CoA monomers for the mcl-PHA synthesis from fatty acids in P. aeruginosa [34–36], P. chlororaphis [37], and P. putida [38, 39]. Additionally, an (R)-specific 3-ketoacyl-CoA reductase (FabG) from P. aeruginosa [40] has been reported to show activity toward 3-ketoacyl-CoA. The last enzyme putatively involved in supplying mcl-PHA monomers is epimerase converting (S)-3hydroxyacyl-CoA substrates to (R)-3- hydroxyacyl-CoA [41]. Four PhaJ homologues (named PhaJ1Pa, PhaJ2Pa, PhaJ3Pa, PhaJ4Pa) have been identified as the monomer suppliers for the PHA synthesis from fatty acids in P. aeruginosa [34, 35]. This enzyme catalyzes stereospecific hydration of trans-2-enoyl coenzyme A (enoyl-CoA), an intermediate of β-oxidation. It was demonstrated that these four PhaJ proteins have preferences towards different length of enoyl-CoAs, therefore providing different monomers for PHA synthesis. PhaJ1Pa shows high specific activity toward shorter chain-length enoyl-CoA (C4-C6). The PhaJ2Pa and PhaJ4Pa exhibit similar substrate specificities for medium-chain-length enoyl-CoA(C6-C12) while PhaJ3Pa has highest activity with medium (C8) and slightly lower but similar activity with longer (C10 and C12) chain-length enoyl-CoA substrates.
344
Si Liu et al.
Fig. 2 Biosynthetic pathway of mcl-PHA from fatty acids incorporating β-oxidation. (1) acyl-CoA ligase (FadD); (2) acyl-CoA dehydrogenase (FadE); (3) enoyl-CoA hydratase (FadB); (4) NAD dependent (S)-3-hydroxyacylCoA dehydrogenase (FadB); (5) 3-ketoacyl-CoA thiolase (FadA); (6) (R)-specific enoyl-CoA hydratase (PhaJ); (7) NADPH dependent 3-ketoacyl-CoA reductase (FabG); (8) 3-hydroxyacyl-CoA epimerase; (9) PHA polymerase (PhaC); (10) PHA depolymerase (PhaZ)
Pseudomonas putida KT2440, whose complete genome is available (http://www.ncbi.nlm.nih.gov), is one of the best-known gram-negative organisms for mcl-PHA (mcl-PHA) production [42]. The strain P. putida KT2440 exhibits a very high level of genome conservation with P. aeruginosa [43, 44]. It is known that the production of mcl-PHA in the P. putida KT2440 strain is mainly through the β-oxidation pathway when fatty acids are used
CRISPR-Cas9 Editing of the Synthesis of Biodegradable Polyesters. . .
345
as a carbon and energy source [45]. Two PhaJ homologues encoded by PP_4552 (PhaJ1) and PP_4817 (PhaJ4) were identified in P. putida KT2440 and their role in PHA accumulation was confirmed [39]. Both P. putida PhaJ homologues show high preference for medium-chain-length monomers, although PhaJ4 seems to incorporate more of 3-hydroxydecanoate (3HD) and 3-hydroxydodecanoate (3HDD) monomers into PHA when compared with PhaJ1 [39]. In addition, MaoC encoded by PP_0580 shows low amino acid identity with PhaJ3 homologue from P. aeruginosa [39]. Genome editing technology provides a powerful approach to study the metabolic pathway of cells and the contribution of each associated enzyme to PHA synthesis. The nascent CRISPR (clustered regularly interspaced short palindromic repeats)-associated nuclease Cas9 (CRISPR/Cas9) technology represents a new generation of genome editing tools capable of application in nearly all organisms [46]. Cas9 protein derived from Streptococcus pyogenes is an RNA guided nuclease which recognizes and cleaves a specific DNA sequence which can be easily programed to target any desired site of a genome by altering the guide RNA sequence [47–50]. In order to accomplish this, the recognition of specific protospacer adjacent motifs (PAM) by Cas9 nuclease is also required for targeting specific DNA sequences using a single guide RNA (sgRNA) [51, 52]. Compared to the traditional genome editing approach based on I-SceI nuclease and homologous recombination [53], CRISPR/Cas9 technology is easier, faster, more efficient, and more accurate [54]. Gene deletion, gene insertion, and gene replacement using CRISPR/Cas9 methods established in P. putida KT2440 can be achieved within 5 days, with a mutation efficiency up to >70% [55]. Furthermore, the λRed/Cas9 recombineering method for gene knockout in P. putida KT2440 developed by Cook reached efficiencies of around 85–100% [47]. Despite CRISPR/Cas9 being a powerful genome editing tool, it does have some drawbacks. Off-target activity is a crucial issue for CRISPR/Cas9 technology, as it can cause genomic instability and disruption of functions of otherwise normal genes [56]. SgRNA with minimal off-target effect could be designed using a CRISPR design tool (https://www.synthego.com/products/bioinformat ics/crispr-design-tool) provided by Synthego. The genes of interest including phaJ1(PP_4552), phaJ4 (PP_4817), and maoC(PP_0580) in P. putida KT2440 were scarlessly deleted using modified CRISPR/Cas9 systems and methodology [47] (Fig. 3). All strains, plasmids, primers, and DNA sequences of single guide RNA (sgRNA) used for knocking out mutant generation are listed in Tables 1, 2, 3, and 4, respectively. A single guide RNA (sgRNA) is an RNA molecule that can direct the Cas9 nuclease to bind and cleave a particular DNA sequence for
346
Si Liu et al.
Fig. 3 CRISPR/Cas9 system for P. putida KT2440 gene knocking out [47] Table 1 Strains used for the generation of P. putida KT2440 mutants
Strains
Description/genotype
Source or reference
P. putida KT2440
Wild type
Lab strain
E. coli DH5α
General cloning strain, endA1, recA1, φ80dlacZΔM15
Novagen
E. coli DH5α λpir supE44, ΔlacU169 (ΦlacZΔM15), recA1, endA1, hsdR17, thi-1, gyrA96, relA1, λpir phage lysogen E. coli HB101/ pRK 600
Helper strain in a tripartite conjugation, ChlR
Novagen Novagen
genome editing. The sgRNAs were designed using Synthego CRISPR Design Tool (https://design.synthego.com/#/) to target the sequence of a specific P. putida KT2440 gene to be deleted. The CRISPR/Cas9 systems used in this study consist of three plasmids (Fig. 3 and Table 2): pCas9/λRed vector was used to provide constitutively expressed Cas9 nuclease and L-arabinose inducible λRed recombinases; pKnock suicide vector was used to
CRISPR-Cas9 Editing of the Synthesis of Biodegradable Polyesters. . .
347
Table 2 Plasmids used for the generation of P. putida KT2440 mutants Source or reference
Plasmids Description/genotype pCas9/λ Red
Recombineering plasmid with constitutively expressed cas9 and the araBAD promoter expressing αβγ, GentR
Addgene
pKnock
Suicide plasmid for gene knockout, KanR
Addgene
pgRNAtet Guide RNA plasmid, Tet
R
Addgene
Table 3 The list of primers used for generation of P. putida KT2440 mutants Primers
Sequence (50 !30 )
(a) pKnock_For
CTGCAGGAATTCGATATCAAGC
pKnock_Rev
GGATCCACTAGTTCTAGAGCGG
pKnock Flank Insert_ For
GCGGCCGCTCTAGAACTAGTGGAT
pKnock Flank Insert_ Rev
CGACGGTATCGATAAGCTTGATATCGAATTCCTGC
pgRNA(3.1 kb)_For
AAGTGGCACCGAGTCGGTGCTTTTTTT
pgRNA(3.1 kb)_Rev
GCACTAGTATTATACCTAGGACTGAGCTAGCTGTC
pgRNA(500 bp)_For
AAGTGGCACCGAGTCGGTGCTTTTTTT
pgRNA(500 bp)_Rev
GCCGTATTACCGCCTTTGAGTGAGC
(b) phaJ1_US_For
GCTCTAGAACTAGTGGATCCACTGCCGAACTGGGCAAC
phaJ1_US_Rev
TCACACTTCACGAGGCTTCCTTAAGCGTTGGG
phaJ1_DS_For
AGCCTCGATGTGAAGTGTGATGCAGCCAGC
phaJ1_DS_Rev
TTGATATCGAATTCCTGCAGTGATCTACCTGGTGTTCAACGAGG
phaJ4_US_For
GCTCTAGAACTAGTGGATCCGCCGCAGGTTAAGATAGAGCG
phaJ4_US_Rev
GCATGTATCACATCGCGGACTCTCCGGG
phaJ4_DS_For
GTCCGCGATGTGATACATGCCCGGGGAGC
phaJ4_DS_Rev
TTGATATCGAATTCCTGCAGTGCATCGCCGGCATGTGC
maoC_US_For
GCTCTAGAACTAGTGGATCCTTGTAGAAGGCCAGCGATAACC
maoC_US_Rev
CTTGAACATGTGGGAGAGCCTGTAGGGC
maoC_DS_For
GGCTCTCCCACATGTTCAAGCCCCCATCAG
maoC_DS_Rev
TTGATATCGAATTCCTGCAGAGAGTGCCTTCATCTCTGGC (continued)
348
Si Liu et al.
Table 3 (continued) Primers
Sequence (50 !30 )
(c) phaJ1_Flank_For
TCTGAGCGAGGCCGGCTTT
phaJ1_Flank_Rev
AGGTACCGGAGCTGAGTGAACTGC
phaJ1_Internal_For
ATGTCCCAGGTCACCAACACGCCTTA
phaJ1_Internal _Rev
TCAGCTCGCCACAAAGTTCGGC
phaJ4_Flank_For
GTAGCTCTGTAACCGGTACATGGT
phaJ4_Flank_Rev
CAGCCAAGCTGGAGACCGTTT
phaJ4_Internal_For
AAGATCGACCAGCAGCGCATCAACC
phaJ4_Internal _Rev
AGCTTTACCTTCAGCCGAACCCGG
maoC_Flank_For
GCGGCCGCTCTAGAACTAGTGGAT
maoC_Flank_Rev
CGACGGTATCGATAAGCTTGATATCGAATTCCTGC
maoC_Internal_For
AAGGCTTTGCCACTTTCCCGATGA
maoC_Internal _Rev
TCAAAGGCATAGCCACTGTGCGG
(d) pgRNA_phaJ1_For
GTCCTAGGTATAATACTAGTGAGGGCTTCGTAAGGCGTGT
pgRNA_phaJ4_For
GTCCTAGGTATAATACTAGTAGCTCTGTAACCGGTACATG
pgRNA_maoC_For
GTCCTAGGTATAATACTAGTGCTTGAACATGAGCCGACAA
carry the repair template that integrate into the genome of P. putida KT2440 to generate the knockout; pgRNA vector was used to constitutively express the sgRNA.
2
Materials 1. Gas Chromatograph/Mass Selective Detector equipped with an HP-5MS column (30 m 250 μm, 0.25 μm thick film phase, Agilent Technologies, USA). 2. 100 mL of 300 mM sucrose. 3. Alkaline polyethylene glycol (PEG) reagent: 20 mM KOH and 60% PEG (w/v), pH 13. 4. 5 mL of 20% (w/v) L-arabinose stock solution. 5. 100 mL of 33.7 g/L (19.5 gc/L) sodium octanoate stock solution. 6. 50 mg/mL kanamycin stock solution.
GTCCTAGGTATAATACTAGTAGCTCTGTAACCGGTACATGGTTTTAGAGCTAGAAATAGCAA GTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTT GTCCTAGGTATAATACTAGTGCTTGAACATGAGCCGACAAGTTTTAGAGCTAGAAATAGCAAG TTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTT
sgRNA_phaJ4
sgRNA_maoC
Note: the sequence for each gene in the list is composed of DNA sequences of sgRNA (underlined) and of the pgRNA vector sequence. The oligonucleotides were synthesized by Merck
GTCCTAGGTATAATACTAGT GAGGGCTTCGTAAGGCGTGTGTTTTAGAGCTAGAAATAGCAA GTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTT
Sequence (50 !30 )
sgRNA_phaJ1
sgRNA
Table 4 The DNA sequence of sgRNA used for the generation of P. putida KT2440 mutants
CRISPR-Cas9 Editing of the Synthesis of Biodegradable Polyesters. . . 349
350
Si Liu et al.
7. 35 mg/mL gentamycin stock solution. 8. 10 mg/mL tetracycline stock solution. 9. 25 mg/mL tetracycline stock solution. 10. 50 mg/mL chloramphenicol stock solution. 11. Acidified methanol (85% (v/v) methanol with 15% (v/v) sulfuric acid). 12. Chloroform with 6 mg/L benzoate methyl ester required for GC analysis. 2.1 Medium and Strains Used for Generation of P. putida KT2440 Mutants
For each specific section, the following materials are needed. 1. Strains used in this section: P. putida KT2440, E. coli DH5α, E. coli DH5α λpir, E. coli HB101/pRK 600 (Table 1). 2. Plasmids used in this section: pCas9/λRed, pKnock, and pgRNAtet (Table 2). 3. High fidelity DNA polymerase. 4. NEBuilder HiFi DNA Assembly Master Mix Kit (New England Biolabs Inc., UK). 5. DNA gel extraction kit. 6. Plasmid DNA miniprep kit. 7. Luria-Bertani (LB) liquid media containing 5 g/L yeast extract, 10 g/L tryptone, and 5 g/L NaCl. 8. LB agar media containing 5 g/L yeast extract, 10 g/L tryptone, 5 g/L NaCl, and 15 g/L agar. 9. Pseudomonas isolation agar media containing 45.03 g/L Pseudomonas isolation agar powder and 20 mL/L glycerol. 10. After autoclaving at 121 C for 15 min, the medium was supplemented with kanamycin to a final concentration 50 μg/mL (Km 50), carbenicillin (50 μg/mL, Carb 50), gentamicin (35 μg/mL, Gent 35) or tetracycline (25 μg/mL, Tet 25 for P. putida; 10 μg/mL Tet 10 for E. coli) if required.
2.2 Medium Used for PHA Accumulation
1. Strains used in this section: P. putida KT2440 and gene deletion mutants of P. putida KT2440. 2. Minimal Salts Medium (MSM) containing 9 g/L Na3PO4∙12H2O, 1.5 g/L KH2PO4, 1 g/L NH4Cl (non-limited conditions; MSMfull) or 0.25 g/L NH4Cl (nitrogen (N)-limitation; MSMlim), 1 mL MgSO4∙7H2O (1 M stock solution; added after autoclaving) and 1 mL trace elements (4 g/L ZnSO4∙7H2O, 1 g/L MnCl∙4H2O, 0.2 g/L Na2B4O7∙10H2O, 0.3 g/L NiCl2∙6H2O, 1 g/L Na2MoO4∙2H2O, 1 g/L CuCl2∙2H2O, 7.6 g/L FeSO4∙7H2O, added after autoclaving).
CRISPR-Cas9 Editing of the Synthesis of Biodegradable Polyesters. . .
3 3.1
351
Methods DNA Techniques
3.1.1 Genomic DNA (gDNA) Isolation
1. Grow the wild-type P. putida KT2440 in LB medium at 30 C with shaking at 200 rpm for overnight (around 16 h). 2. Harvest the cells in a 2 mL microcentrifuge tube by centrifugation for 1 min at 12,000 g. Discard the supernatant. 3. Use the cell pellets to isolate genomic DNA (gDNA) by a genomic DNA purification kit. 4. Visualize the extracted gDNA by running it on 1% ethidium bromide stained agarose gel with an appropriate DNA Ladder (i.e., showing the bands of 20 kbp) and inspect the DNA quality (i.e., no smear).
3.1.2 Polymerase Chain Reaction (PCR)
1. Amplify the genes of interest by polymerase chain reaction (PCR), in a total volume of 25 μL; the reaction mix of 12.5 μL of a polymerase high fidelity 2 Master Mix, 0.5 μM of each primer (stock concentration 50 pmol/μL) and 1–10 ng DNA template (gDNA or plasmid). 2. PCR program consists of one cycle of initial denaturation at 98 C for 30 s, then 30 cycles of denaturation at 98 C for 10 s, annealing at 56–65 C for 20 s, elongation at 72 C for 15 s to 2 min (20 s/kb), followed by a final extension at 72 C for 2 min. Annealing temperature and elongation time should be adapted appropriately according to specific primer melting temperature (Tm) and DNA product size. 3. Visualize the PCR product on 1% (w/v) agarose gel, stained with 1 μg/mL ethidium bromide, with an appropriate DNA Ladder to estimate the size of DNA. If non-specific PCR products are obtained, excise the band of correct size and purify using a DNA gel extraction kit.
3.1.3 DNA Assembly
1. Assemble DNA fragments in a 10 μL total volume reaction containing 5 μL of DNA Assembly Master Mix, 50 ng linearized vector (obtained by PCR amplification of the vector backbone and appropriate primers), and 100 ng DNA fragment of interest. 2. Incubate the reaction at 50 C for 15 min, followed by cooling on ice or at 20 C for subsequent transformation.
3.1.4 Transformation
1. Prepare the chemically competent E. coli cells by washing the prechilled culture of E. coli collected at exponential phase (OD600nm of 0.3–0.5) via resuspending pellets in 10 mL 0.1 M calcium chloride solution, followed by 30 min incubation on ice and 15 min centrifugation at 2377 g at 4 C.
352
Si Liu et al.
2. Discard the supernatant and resuspend the cells in 3 mL 0.1 M calcium chloride and 1 mL sterile 50% glycerol. These can be aliquoted into 100 μL stocks and stored at 80 C. 3. Thaw the competent cells on ice and then gently mix with 10 μL of chilled HiFi assembly product or extracted plasmids (< 2 ng). 4. Incubate the mixture on ice for 30 min. 5. Transfer the tubes with transformation mixtures to a water bath for heat shock at 42 C for 90 s, followed by 2 min incubation on ice. 6. Transfer the mixture to fresh tubes with 200 μL LB medium and incubate at 37 C with shaking at 200 rpm to recover the cells. 7. Spread 100 μL on an appropriate selective plate, and incubate at 37 C for 16–24 h, or until colonies appear. 3.1.5 Colony PCR
1. Pick the colonies from solid medium and lyse in 50 μL of alkaline polyethylene glycol (PEG) reagent via 15 min incubation at room temperature. 2. Add 1 μL of lysate to 10 μL of PCR mixture with PCR Master mix and 0.5 μM of each primer (50 pmol/μL stock concentration). 3. PCR conditions: one cycle of initial denaturation at 95 C for 5 min, then 30 cycles of denaturation at 95 C for 40 s, annealing at 56–65 C for 30 s, elongation at 68 C for 30 s to 4 min (60 s/kb) and finally an incubation at 68 C for 10 min. Adapt annealing temperature and elongation time appropriately according to specific primer melting temperature (Tm) and DNA product size. 4. Separate and visualize the PCR product on 1% (w/v) agarose gel, stained with 1 μg/mL ethidium bromide. Use an appropriate size DNA Ladder to estimate the size of DNA.
3.2 The Construction of Recombinant pKnock_USDS Vectors
1. Primers required for this work (Table 3a–d) can be designed using NEBuilder Assembly Tool (https://nebuilderv1.neb. com/). 2. To amplify the upstream and downstream sequence of ~800 bp flanking regions of the target genes by PCR (see Subheading 3.1.2), use specific primer pairs gene_US_For/ gene_US_Rev and gene_DS_For/ gene_DS_Rev, respectively (Table 3b). 3. Excise the PCR products of the correct size from 1% agarose gel and purify using a gel extraction kit. 4. Integrate the upstream and downstream fragments into one single DNA fragment by performing PCR (see Subheading
CRISPR-Cas9 Editing of the Synthesis of Biodegradable Polyesters. . .
353
3.1.2) using specific primer pairs gene US_For/ gene DS_Rev (Table 3b) and with 10 ng of each purified upstream and downstream DNA fragment as template. 5. Assemble the generated US_DS fragment with pKnock suicide vector, previously linearized by PCR using primer pairs pKnock_For and pKnock_Rev (Table 3a), using DNA assembly (see Subheading 3.1.3). 6. Transform the resulting recombinant plasmids into competent cells of E. coli DH5α λpir (see Note 1). Use LB Km50 plates incubated at 30 C to select the transformants and screen by colony PCR (see Subheading 3.1.4) using specific primer pairs gene US_For and gene DS_Rev (Table 3b). 7. Extract positive clones from E. coli DH 5α λpir using a plasmid miniprep kit and verify by sequencing. 3.3 Synthesis of sgRNA
1. Design sgRNA for each target gene using CRISPR design tool (https://www.synthego.com/products/bioinformatics/ crispr-design-tool) provided by Synthego. 2. Synthesize the corresponding DNA fragment (20 bp) as a DNA oligonucleotide and clone into pgRNA (see Note 2). 3. Transform E. coli DH5α with the pgRNA construct and select on LB Tet10 plate at 37 C. 4. Verify the correct DNA insertions in pgRNA vector by colony PCR (see Subheading 3.1.4) using specific primer pairs pgRNA_gene_For/ pgRNA(500 bp)_Rev (Table 3a and d). 5. Isolate the constructs from positive E. coli DH 5α clones using a plasmid miniprep kit and verify by sequencing.
3.4 Conjugation of pKnock_USDS_gene and Transformation of pgRNA_ Gene into P. putida KT2440
1. Prepare the overnight cultures of helper strain E. coli HB101/ pRK600 (expressing mobilization and transfer genes involved in transferring DNA into recipient strain), donor strain E. coli DH5α λpir (harboring the recombinant pKnock vector of interest), and recipient strain P. putida KT2440 by growing cells in 3 mL LB medium supplemented with 50 μg/mL chloramphenicol, 50 μg/mL kanamycin, and 50 μg/mL carbenicillin, respectively. Cultivate E. coli and P. putida KT2440 at 37 C and 30 C respectively with shaking at 200 rpm. 2. Combine the overnight cultures of donor strain (50 μL), helper strain (50 μL), and recipient strain (200 μL) in a sterile 1.5 mL Eppendorf tube. Mix the culture by pipetting. 3. Spread the entire mixture on a non-selective LB agar plate. 4. Incubate overnight at 30 C, and wash off the conjugation mixture of the plate with 2 mL of LB medium. 5. Prepare a serial of dilution (10-, 100-, and 1000-fold dilution) with LB and spread 200 μL of each dilution on a Pseudomonas
354
Si Liu et al.
isolation agar Km50 plate. Incubate the plate at 30 C until the colonies appear. 6. Verify the correct integration of pKnock_USDS_gene plasmid to P. putidaKT2440 chromosome by colony PCR (see Subheading 3.1.4) using two different pairs of primers: one genome- and one vector-specific primer (gene_Flank_For/ pKnock Flank Insert_ Rev. and pKnock Flank Insert_ For/ gene_Flank_For) (Table 3c). 7. Grow the positive colonies in 3 mL LB medium supplemented with 50 μg/mL of kanamycin at 30 C with 200 rpm shaking for overnight. 8. Harvest the cells from 3 mL overnight culture via centrifuging at 5752 g, 4 C for 3 min. 9. Wash the cell pellets twice with 5 mL of sterile 300 mM sucrose at room temperature and finally resuspend in 100 μL of 300 mM sucrose. 10. Transfer the mixture of 100 μL cells and 100 ng pCas9/λRed vector into a prechilled 2 mm gap width electroporation cuvette, followed by electroporation at a voltage of 2.5 kV. 11. Transfer the cells into 1 mL LB in a sterile 13 mL tube and incubate at 30 C, 200 rpm for 2 h. 12. Harvest the cells by centrifuging at 5752 g, at room temperature, for 1 min. Discard 700 μL of supernatant, resuspend cells with residual medium. 13. Spread 100 μL cell resuspension on LB Km50/Gent35 agar plate and incubate at 30 C until colonies appear. 14. Select a colony and cultivate overnight in 3 mL LB medium supplemented with 35 μg/mL gentamycin and 50 μg/mL kanamycin at 30 C, 200 rpm. 15. Induce λRed gene on pCas9/λRed vector by adding 2% (w/v) L-arabinose at 30 C, 200 rpm for 2 h. 16. Transform the cells with 50 ng of pgRNA_gene by electroporation as described in steps 7–13. 17. Select cells on LB Gent35/Tet25 plate at 30 C until colonies appear. 3.5 Selection of the Deletion Mutants
1. Screen the colonies for the loss of kanamycin resistance by streaking a single colony on both LB Km50 and LB plates (see Note 3). 2. Verify the loss of kanamycin resistance by colony PCR (see Subheading 3.1.4) using primers flanking the gene of interest (Table 3c). 3. Re-confirm the positive colonies by secondary colony PCR (see Subheading 3.1.4) with the specific primers internal to the sequence of target gene (Table 3c).
CRISPR-Cas9 Editing of the Synthesis of Biodegradable Polyesters. . .
355
4. Grow the cells overnight in LB without any antibiotics and streak on a non-selective LB plate to cure the pCas9 vector and pgRNA vector. 5. Verify the loss of gentamycin and tetracycline resistance by screening the single colonies of generated mutants on LB Gent35 and LB Tet25 plates. 3.6 Growth and PHA Accumulation by the Deletion Mutants
1. Inoculate a single colony of P. putida KT2440 wild type or P. putida KT2440 deletion mutants in 3 mL MSMfull supplemented with 1.95 gC/L sodium octanoate and cultivate for 16 h at 30 C with shaking at 200 rpm. 2. Dilute the overnight cultures with MSMfull to get an OD600nm of 1 and transfer into 250 mL Erlenmeyer flasks containing 50 mL MSMfull or MSMlim medium supplemented with 1.95 gC/L sodium octanoate. 3. Incubate the flasks at 30 C with shaking at 200 rpm for 48 h. 4. Harvest the cells by centrifugation at 1878 g at room temperature for 10 min, and discard the supernatant. 5. Lyophilize the cells and weigh for determination of cell dry weight (CDW). 6. Take approximately 5–10 mg of dried cellular material in a 15 mL Pyrex® test tube to perform the acidic methanolysis [57]. 7. Resuspend cells with 2 mL of acidified methanol (containing 15% H2SO4, v/v) and 2 mL of chloroform containing 91 mg/ mL of methyl benzoate as an internal standard. 8. Incubate the mixture for 3 h at 100 C in oil bath. Remove tubes from oil bath and wipe with tissue. Place the tube at room temperature to allow them to cool. 9. Incubate tubes on ice for at least 2 min. 10. Add 1 mL of deionized water to each sample and vortex vigorously for 30 s. 11. Centrifuge at 1878 g for 5 min to allow phase separation. 12. Remove the lower phase and pass through a cotton wool to dry. 13. Assay the (R)-3-hydroxyalkanoic acid (R3HA) methyl esters using a Gas Chromatograph/Mass Selective Detector equipped with an HP-5MS column (30 m 250 μm, 0.25 μm thick film phase) with an oven method of 50 C for 3 min, increasing by 10 C/min to 250 C and holding at this temperature for 1 min. Use commercially available R3HA methyl esters (Bioplastech Ltd., Ireland) as standards for the PHA samples.
356
4
Si Liu et al.
Notes 1. Replication of pKnock suicide vector requires the π protein encoded by the pir gene. Thus, E. coli DH5a λpir was used for achieving high copy of recombinant pKnock vector. 2. The sgRNA is very short and this poses an issue with inserting it into pgRNA vector. The issue can be overcome by synthesizing a longer DNA sequence, i.e., we have synthesized a DNA fragment 118 bp length composed of 20 bp DNA sequence of designed sgRNA and 98 bp homology sequence of pgRNA vector (Table 4). To increase the efficiency of the assembly with pgRNA vector, this sequence (118 bp) was overlapped and integrated with a fragment of pgRNA sequence of 500 bp length, which was obtained by PCR (see Subheading 3.1.2) using primer pairs pgRNA (500 bp) For/ pgRNA (500 bp) _Rev (Table 3a). The overlap PCR for the integration of sgRNA(118 bp) with pgRNA(500 bp) were performed by PCR (see Subheading 3.1.2) using primer pairs pgRNA_gene_For/ pgRNA(500 bp)_Rev (Table 3a and d) with 2 ng of sgRNA(118 bp) and 10 ng of pgRNA(500 bp) used as template. The generated DNA fragment sgRNA(118 bp)-pgRNA (500 bp) was purified from 1% agarose gel and assembled with a pgRNA fragment of 3.1 kb length (see Subheading 3.1.3), obtained by amplification with specific primer pairs pgRNA (3.1 kb)_For/ pgRNA(3.1 kb)_Rev (Table 3a). The designed pgRNA construct was then introduced into E. coli as described by protocol. 3. In this specific procedure, it was relatively easy to cure the plasmids by simply cultivating the cells without the selective pressure, i.e., antibiotic. However, sometimes it is required to use curing agents, such as ethidium bromide, acridine orange, sodium dodecyl sulfate (SDS), elevated growth temperature or other treatments.
References 1. Lemoigne M (1925) Etudes sur l’autolyse microbienne acidification par formation d’acide β-oxybutyrique. Ann Inst Pasteur 39:144–173 2. Jendrossek D (2007) Peculiarities of PHA granules preparation and PHA depolymerase activity determination. Appl Microbiol Biotechnol 74(6):1186–1196 3. Braunegg G, Genser K, Bona R et al Production of PHAs from agricultural waste material. In: Macromolecular symposia, 1999, vol 1. Wiley Online Library, pp 375–383 4. Ojumu T, Yu J, Solomon B (2004) Production of polyhydroxyalkanoates, a bacterial
biodegradable polymers. Afr J Biotechnol 3 (1):18–24 5. Rehm BH (2003) Polyester synthases: natural catalysts for plastics. Biochem J 376(1):15–33 6. Kootstra M, Elissen H, Huurman S (2017) PHA’s (polyhydroxyalkanoates): general information on structure and raw materials for their production. A running document for “Kleinschalige Bioraffinage WP9: PHA”. Task 5. Wageningen Plant Research report 727. Wageningen UR, PPO/Acrres 7. Ishii-Hyakutake M, Mizuno S, Tsuge T (2018) Biosynthesis and characteristics of aromatic polyhydroxyalkanoates. Polymers 10(11):1267
CRISPR-Cas9 Editing of the Synthesis of Biodegradable Polyesters. . . 8. Ong SY, Zainab LI, Pyary S et al (2018) A novel biological recovery approach for PHA employing selective digestion of bacterial biomass in animals. Appl Microbiol Biotechnol 102(5):2117–2127 9. Barnard GN, Sanders J (1989) The poly-betahydroxybutyrate granule in vivo. A new insight based on NMR spectroscopy of whole cells. J Biol Chem 264(6):3286–3291 10. Amache R, Sukan A, Safari M et al (2013) Advances in PHAs production. Chem Eng Trans 32:931–936 11. Leong YK, Show PL, Ooi CW et al (2014) Current trends in polyhydroxyalkanoates (PHAs) biosynthesis: insights from the recombinant Escherichia coli. J Biotechnol 180:52–65 12. Wang YJ, Hua FL, Tsang YF et al (2007) Synthesis of PHAs from waster under various C:N ratios. Bioresour Technol 98(8):1690–1693 13. Muhammadi S, Afzal M et al (2015) Bacterial polyhydroxyalkanoates-eco-friendly next generation plastic: production, biocompatibility, biodegradation, physical properties and applications. Green Chem Lett Rev 8(3–4):56–77 14. Foster LJ (2007) Biosynthesis, properties and potential of natural-synthetic hybrids of polyhydroxyalkanoates and polyethylene glycols. Appl Microbiol Biotechnol 75(6):1241–1247 15. Kim DY, Kim HW, Chung MG et al (2007) Biosynthesis, modification, and biodegradation of bacterial medium-chain-length polyhydroxyalkanoates. J Microbiol 45(2):87–97 16. Blank LM, Narancic T, Mampel J et al (2020) Biotechnological upcycling of plastic waste and other non-conventional feedstocks in a circular economy. Curr Opin Biotechnol 62:212–219 17. Kenny ST, Runic JN, Kaminsky W et al (2008) Up-cycling of PET (polyethylene terephthalate) to the biodegradable plastic PHA (polyhydroxyalkanoate). Environ Sci Technol 42 (20):7696–7701 18. Guzik MW, Kenny ST, Duane GF et al (2014) Conversion of post consumer polyethylene to the biodegradable polymer polyhydroxyalkanoate. Appl Microbiol Biotechnol 98 (9):4223–4232 19. Ward PG, Goff M, Donner M et al (2006) A two step chemo-biotechnological conversion of polystyrene to a biodegradable thermoplastic. Environ Sci Technol 40(7):2433–2437 20. Ruiz C, Kenny ST, Babu PR et al (2019) High cell density conversion of hydrolysed waste cooking oil fatty acids into medium chain length polyhydroxyalkanoate using Pseudomonas putida KT2440. Catalysts 9(5) 21. Ruiz C, Kenny ST, Narancic T et al (2019) Conversion of waste cooking oil into medium
357
chain polyhydroxyalkanoates in a high cell density fermentation. J Biotechnol 306:9–15 22. Amelia TSM, Govindasamy S, Tamothran AM et al (2019) Applications of PHA in agriculture. In: Biotechnological applications of polyhydroxyalkanoates. Springer, pp 347–361 23. Luckachan GE, Pillai C (2011) Biodegradable polymers—a review on recent trends and emerging perspectives. J Polym Environ 19 (3):637–676 24. Tan G-YA, Chen C-L, Li L et al (2014) Start a research on biopolymer polyhydroxyalkanoate (PHA): a review. Polymers 6(3):706–754 25. Manavitehrani I, Fathi A, Badr H et al (2016) Biomedical applications of biodegradable polyesters. Polymers 8(1):20 26. Gao X, Chen J-C, Wu Q et al (2011) Polyhydroxyalkanoates as a source of chemicals, polymers, and biofuels. Curr Opin Biotechnol 22 (6):768–774 27. de Roo G, Kellerhals MB, Ren Q et al (2002) Production of chiral R-3-hydroxyalkanoic acids and R-3-hydroxyalkanoic acid methylesters via hydrolytic degradation of polyhydroxyalkanoate synthesized by pseudomonads. Biotechnol Bioeng 77(6):717–722 28. Chen G-Q, Wu Q (2005) Microbial production and applications of chiral hydroxyalkanoates. Appl Microbiol Biotechnol 67 (5):592–599 29. De Roo G (2002) Physiological basis of polyhydroxyalkanoate metabolism in Pseudomonas putida. ETH Zurich 30. Ward PG (2004) Polyhydroxyalkanoate accumulation by Pseudomonas putida CA-3. University College Dublin 31. Prieto A, Escapa IF, Martı´nez V et al (2016) A holistic view of polyhydroxyalkanoate metabolism in Pseudomonas putida. Environ Microbiol 18(2):341–357 32. Rehm BH, Kruger N, Steinbuchel A (1998) A new metabolic link between fatty acid de novo synthesis and polyhydroxyalkanoic acid synthesis. The PHAG gene from Pseudomonas putida KT2440 encodes a 3-hydroxyacyl-acyl carrier protein-coenzyme a transferase. J Biol Chem 273(37):24044–24051 33. Witholt B, Kessler B (1999) Perspectives of medium chain length poly (hydroxyalkanoates), a versatile set of bacterial bioplastics. Curr Opin Biotechnol 10(3):279–285 34. Tsuge T, Fukui T, Matsusaki H et al (2000) Molecular cloning of two (R)-specific enoylCoA hydratase genes from Pseudomonas aeruginosa and their use for polyhydroxyalkanoate synthesis. FEMS Microbiol Lett 184 (2):193–198
358
Si Liu et al.
35. Tsuge T, Taguchi K, Doi Y (2003) Molecular characterization and properties of (R)-specific enoyl-CoA hydratases from Pseudomonas aeruginosa: metabolic tools for synthesis of polyhydroxyalkanoates via fatty acid ß-oxidation. Int J Biol Macromol 31(4–5):195–205 36. Davis R, Chandrashekar A, Shamala TR (2008) Role of (R)-specific enoyl coenzyme A hydratases of Pseudomonas sp in the production of polyhydroxyalkanoates. Antonie Van Leeuwenhoek 93(3):285–296 37. Chung MG, Rhee YH (2012) Overexpression of the (R)-specific enoyl-CoA hydratase gene from Pseudomonas chlororaphis HS21 in Pseudomonas strains for the biosynthesis of polyhydroxyalkanoates of altered monomer composition. Biosci Biotechnol Biochem 76 (3):613–616 38. Fiedler S, Steinbuchel A, Rehm BH (2002) The role of the fatty acid beta-oxidation multienzyme complex from Pseudomonas oleovorans in polyhydroxyalkanoate biosynthesis: molecular characterization of the fadBA operon from P. oleovorans and of the enoylCoA hydratase genes phaJ from P. oleovorans and Pseudomonas putida. Arch Microbiol 178 (2):149–160 39. Sato S, Kanazawa H, Tsuge T (2011) Expression and characterization of (R)-specific enoyl coenzyme A hydratases making a channeling route to polyhydroxyalkanoate biosynthesis in Pseudomonas putida. Appl Microbiol Biotechnol 90(3):951–959 40. Ren Q, Sierro N, Witholt B et al (2000) FabG, an NADPH-dependent 3-ketoacyl reductase of Pseudomonas aeruginosa, provides precursors for medium-chain-length poly-3-hydroxyalkanoate biosynthesis in Escherichia coli. J Bacteriol 182(10):2978–2981 41. Steinbu¨chel A, Lu¨tke-Eversloh T (2003) Metabolic engineering and pathway construction for biotechnological production of relevant polyhydroxyalkanoates in microorganisms. Biochem Eng J 16(2):81–96 42. Le Meur S, Zinn M, Egli T et al (2012) Production of medium-chain-length polyhydroxyalkanoates by sequential feeding of xylose and octanoic acid in engineered Pseudomonas putida KT2440. BMC Biotechnol 12:53 43. Nelson KE, Weinel C, Paulsen IT et al (2002) Complete genome sequence and comparative analysis of the metabolically versatile Pseudomonas putida KT2440. Environ Microbiol 4 (12):799–808 44. Hori K, Marsudi S, Unno H (2002) Simultaneous production of polyhydroxyalkanoates and rhamnolipids by Pseudomonas aeruginosa. Biotechnol Bioeng 78(6):699–707
45. Madison LL, Huisman GW (1999) Metabolic engineering of poly(3-hydroxyalkanoates): from DNA to plastic. Microbiol Mol Biol Rev 63(1):21–53 46. Song G, Jia M, Chen K et al (2016) CRISPR/ Cas9: a powerful tool for crop genome editing. Crop J 4(2):75–82 47. Cook TB, Rand JM, Nurani W et al (2018) Genetic tools for reliable gene expression and recombineering in Pseudomonas putida. J Ind Microbiol Biotechnol 45(7):517–527 48. Liang X, Potter J, Kumar S et al (2015) Rapid and highly efficient mammalian cell engineering via Cas9 protein transfection. J Biotechnol 208:44–53 49. Wang H, La Russa M, Qi LS (2016) CRISPR/ Cas9 in genome editing and beyond. Annu Rev Biochem 85:227–264 50. Wu D, Guan X, Zhu Y et al (2017) Structural basis of stringent PAM recognition by CRISPR-C2c1 in complex with sgRNA. Cell Res 27(5):705–708 51. Shah SA, Erdmann S, Mojica FJ et al (2013) Protospacer recognition motifs: mixed identities and functional diversity. RNA Biol 10 (5):891–899 52. Kleinstiver BP, Prew MS, Tsai SQ et al (2015) Broadening the targeting range of Staphylococcus aureus CRISPR-Cas9 by modifying PAM recognition. Nat Biotechnol 33 (12):1293–1298 53. Martı´nez-Garcı´a E, de Lorenzo V (2011) Engineering multiple genomic deletions in Gram-negative bacteria: analysis of the multiresistant antibiotic profile of Pseudomonas putida KT2440. Environ Microbiol 13 (10):2702–2716 54. Manghwar H, Lindsey K, Zhang X et al (2019) CRISPR/Cas system: recent advances and future prospects for genome editing. Trends Plant Sci 24(12):1102–1125 55. Sun J, Wang Q, Jiang Y et al (2018) Genome editing and transcriptional repression in Pseudomonas putida KT2440 via the type II CRISPR system. Microb Cell Factories 17 (1):41 56. Zheng T, Hou Y, Zhang P et al (2017) Profiling single-guide RNA specificity reveals a mismatch sensitive core sequence. Sci Rep 7:40638 57. Lageveen RG, Huisman GW, Preusting H et al (1988) Formation of polyesters by Pseudomonas oleovorans: effect of substrates on formation and composition of poly-(R)-3hydroxyalkanoates and poly-(R)-3-hydroxyalkenoates. Appl Environ Microbiol 54 (12):2924–2932
INDEX A
E
Access tunnels...............................................193, 203–223 Active sites ...................................50, 111, 113, 114, 127, 164, 175, 180, 190–193, 203–207, 209, 227, 228, 231, 232, 235–237, 241, 244, 251–253, 256 Alcohol dehydrogenase................................................... 50 Alignments .............................. 88–94, 96–102, 104, 107, 115, 116, 118, 120, 165, 209–212, 214, 215, 231, 241, 254 Ancestral sequence reconstruction (ASR) ..............88–92, 94, 97, 99, 112–118, 120, 122–125, 127, 128, 130
Electronic structure..................................... 227, 228, 236 Electron microscopy ............................................ 140, 149 Environmental DNA (eDNA)............................ 3, 4, 6–9, 14, 15, 20, 26, 30 Enzymatic cascade................................................ 321–338 Enzyme engineering functions ...................................................20, 194, 205 immobilization .......................................263–274, 278 inhibition .......................................180, 193, 288, 289 kinetic .....................................................278–287, 314 screening ........................................................... 20, 249 stability..................................................................... 193 Enzymes....................................................................3–5, 8, 19–22, 26, 30, 33, 34, 36, 37, 40, 41, 49, 50, 59–61, 71, 72, 74, 78–80, 86, 94, 97, 99, 111–131, 137, 144, 147, 159–176, 180, 181, 192–195, 203–223, 227–245, 249–257, 263–275, 277–318, 322, 323, 330–334, 337, 338, 343, 345 Evolutionary .................................. 86, 88, 90, 91, 93, 98, 101, 102, 104, 105, 107, 112–115, 117–124, 126, 128, 130, 160, 161, 252
B Batch reactor ............ 287–306, 309, 312, 313, 317, 318 Beta glucosidase .......................................... 287, 288, 291 Biocatalysis.................................19, 20, 33, 49, 179, 180, 182, 187–194, 264 Biocatalysts ................................20, 33, 40, 49, 179, 250, 263–265, 268–272, 274, 278, 289 Biochemistry............. 112, 113, 115, 125, 130, 228, 257 Bioinformatics .............9, 20, 50, 88, 160, 161, 250–255
C
F
Carboxylic acid reductase .........................................50, 51 Cofactor co-immobilization ................................ 265, 266 CompassR............................................ 71–74, 78–80, 193 Computational enzyme design modeling........................................250, 252, 253, 255 Conformational dynamics ........................................50, 66 CRISPR/Cas9...................................................... 345, 346 Cytochrome p450 ........................................................... 99
D De novo proteins ................................................. 137–154 Development of kinetic models .......................... 279, 288 Directed evolution ........................ 20, 21, 33, 49, 71–74, 78, 86–88, 114, 207, 249, 252 Dynamics ................................ 51, 52, 54, 161, 180, 186, 187, 189, 192, 194, 196, 205, 206, 209, 210, 212, 227–246, 250, 252–255, 284
Fatty acids ................................................... 207, 321–335, 337, 338, 343, 344 Flow reactors ................................................................. 264 Fluorescence imaging ........................... 21, 140, 148–149 Functional metagenomics............................................... 20
G Graphical Representation of Ancestral Sequence Predictions (GRASP) .........................85–108, 182 GROMACS ......................................... 181–187, 195–197
H Hemes.................................................138, 139, 143–147, 152, 153, 331 High throughput screening................................... 71, 165 Hot-spot .........................................................50, 249, 252
Francesca Magnani, Chiara Marabelli and Francesca Paradisi (eds.), Enzyme Engineering: Methods and Protocols, Methods in Molecular Biology, vol. 2397, https://doi.org/10.1007/978-1-0716-1826-4, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2022
359
ENZYME ENGINEERING: METHODS AND PROTOCOLS
360 Index I
Indels ....................................... 85–88, 91, 92, 94, 96–98, 102, 104, 105, 107 In silico design.................................... 40, 50, 72, 73, 209, 250 Internal H2O2 recycling ............................................... 322 In vivo cofactor loading....................................... 137–154
K Kinetic data analysis ............................................. 283, 285
L Library ....................................v, 3–16, 19–31, 33–43, 49, 50, 62, 72, 74–77, 79, 86–88, 91, 96, 97, 106–107, 164, 167, 171, 173, 193, 253, 254 Lignin ...........................................................264–272, 274
Polyethyleneimine ................................................ 265–267 Polyhydroxyalkanoate (PHA).............................. 341–356 Polyketide synthase ........................................................... 3 Protein design ............................137, 143, 210, 250, 252 Protein engineering environments ........................................................... 159 expression and purification ...................114, 139–141, 143–145, 148, 151, 153, 160, 165, 210, 215, 216, 218, 323–325, 329–332 stabilities ..........................................33, 114, 159, 168 Protein ligand interaction............................50–52, 58, 66 Pseudomonas putida KT2440............ 341–347, 350–356
Q QM region selection ..................................................... 235 Quantum-Mechanical/Molecular-Mechanical (QM/MM) ............................................... 227–245
M
R
Machine learning......................................... 194, 249, 256 Metabolic engineering .................................................. 341 Metagenome libraries ................................................. 3–16 Michaelis-Menten model ..................................... 289, 303 Microfluidics..............................................................19–31 Molecular dynamics simulation evolution.................................................................. 252 modelling................................................................... 51
Rational design enzyme engineering ................................................ 193 Reaction intensification ................................................ 278 Recombination strategies ........................... 72, 73, 78, 80
N Natural product gene clusters .......................................... 3 Nucleoside 2’-deoxyribosyl transferases ........................ 19
O Organic solvents.......................72, 73, 79, 150, 179–197 Oxidases...................................21–23, 26, 172, 207, 223, 322, 333, 337 Oxofunctionalization ........................................... 321–339
P Partial order graphs...................................................91, 95 Phylogenies............................................... 88, 89, 98, 112, 115–121, 123–125, 127–130, 256
S Seamless cloning .......................................................34, 41 Sequence-based screening ................................................ 3 Site-directed mutagenesis saturation mutagenesis..........................34, 49–67, 79, 168, 173–174, 176, 193 Substitutions......................................... 50, 71–80, 85, 87, 89, 90, 92, 98, 101, 103, 104, 111–113, 115, 116, 118, 120, 122, 124–127, 130, 149, 193–195, 256 Substrate entry .............................................................. 204 Synthetic biology DNA .................................................................... 33–43
T Tetrapyrroles......................................................... 137–154 Thermostable .........................................35, 88, 128, 160, 161, 164, 168, 171